Git Project In-depth Analysis Skill
Conduct in-depth analysis of open source projects and generate professional architecture reports. The report is technical research with deep insights, after reading which readers can understand business problems, master architecture design, and develop their own thinking.
When to Use
- Analyze the architecture and design of open source projects
- Compare design differences between two similar projects
- Deeply research the implementation ideas of a framework or library
When NOT to Use
- Simple code problems or debugging
- Single file analysis or code review
- Code modifications that do not involve the architecture level
Output Language
Default to Chinese. If the user asks in other languages, follow the user's language.
Core Principles
1. Business Perspective First
Start from "what problem does this project solve", not "what functions are there in this file".
| Don't | Do |
|---|
| The function receives a Context parameter... | After a request enters the system, it goes through three stages: authentication, rate limiting, and routing distribution... |
interface MessageQueue { push(); pop() }
| Modules are decoupled through message queues. Producers only need to deliver messages, and consumers pull messages by priority |
2. Control Abstraction Level: Explain Design Instead of Pasting Code
Describe at the design pattern and architecture level by default, do not paste original code unless necessary. Focus on processes, logic, and design ideas, express them with architecture diagrams (Mermaid), flowcharts, and tables instead of code snippets. Only show code when the design is particularly exquisite, the project has self-created unique concepts, or the implementation is the core selling point, and must be explained in natural language first.
3. Global Correlation
Each local analysis must be connected to the overall design philosophy of the project - this is the key difference between a "code manual" and "architecture analysis". See the global correlation chapter in analysis-guide.md for details.
4. Heuristic Writing
The goal is to let readers learn something and generate thinking, not get a code manual. It's like a senior engineer explaining in front of a whiteboard - with opinions, reasoning, and comparisons. See the heuristic writing chapter in analysis-guide.md for details.
5. Deep Insight: Why > What (Mandatory)
Each design decision must explain the motivation, trade-offs, and the cost of alternative solutions. Describing "what it is" is just the starting point, explaining "why" is the value of analysis. For each core module and overall architecture, answer:
- Why is it designed this way? Not just "what pattern is used", but "why it is suitable for this scenario"
- What if it's not designed this way? The cost of alternative solutions
- What is the gap with industry best practices? Advantages and room for improvement
- If you redesign it? Show deeper understanding
- What is the systematic design philosophy? The style throughout the project (such as "Convention over Configuration", "Zero-cost Abstraction")
Example:
❌ The routing system adopts the middleware pattern and supports chained calls.
✅ The routing system chooses the onion model instead of the linear pipeline. The linear pipeline is simpler to implement, but the onion model allows each middleware to handle both request and response stages at the same time - which is crucial for logging, timing, and error recovery. Express chose the linear model back then, and later had to use various hacks to handle post-response logic. Koa learned the lesson and switched to the onion model. If I were to redesign it, I would consider adding middleware dependency declarations to let the framework automatically sort them - this is the approach of Fastify, which can avoid hidden bugs caused by order.
Additional Requirements
- Code as the sole basis - All conclusions have code basis, mark or
file path:line number range
, prohibit vague expressions
- Approachable tone - Like a senior engineer doing onboarding for new colleagues, add subjective evaluations and suggestions, avoid AI-style clichés
- Deep dive on key points, brief on secondary content - Conduct in-depth analysis of core innovation points, and mention general utility functions in one sentence
- Critical thinking - Compare with industry practices, point out real problems, do not avoid defects. Refer to analysis-guide.md
- Fluent and easy to understand writing - The overall writing should be fluent and natural, so that entry-level engineers can also understand and learn. Avoid being too academic or piling up terms
- Avoid chronological listing without depth - Each module should reflect deep details, cannot be mentioned in one sentence or generalized. Add corresponding Mermaid architecture diagrams to each module if appropriate, so that readers can get inspiration and learn the essence of design after reading
Analysis Workflow
Flexibility Principle: All the following stages and chapters are suggestive guidelines, not a list that must be strictly followed. The agent should make dynamic decisions based on the characteristics of the currently analyzed project - if a stage or link is meaningless for the current project, it can be skipped or simplified. Everything is subject to the quality of the final report.
Phase 1: Project Acquisition and Initialization
- Parse user input (supports , GitHub/GitLab/Gitee URL, local path, project name)
- Create workspace: Create
repo-analyses/${REPO_NAME}-{YYYYMMDD}
directory under the user's home directory as (cross-platform: use for macOS/Linux, use or for Windows)
- Skip clone if the user provides a local path, otherwise run to clone the repository
- Obtain basic metadata (Star, Fork, contributors, code statistics)
Phase 2: Project Scale Evaluation and Analysis Mode Selection
- Count effective lines of code (exclude skippable code), list the distribution by module
- Definition of skippable code: test code, build/deployment configuration (Dockerfile, CI yaml, etc.), automatically generated code (protobuf generated, lock files, etc.), example/documentation code
- Use tools such as + or for statistics, grouped by top-level directory
- Report code scale to the user, use AskUserQuestion to let the user choose the analysis mode:
| Mode | Core module coverage | Secondary module coverage | Applicable scenarios |
|---|
| Quick analysis | ≥30% | ≥10% | Quickly understand the overall picture of the project |
| Standard analysis (recommended) | ≥60% | ≥30% | Regular architecture analysis |
| In-depth analysis | ≥90% | ≥60% | In-depth study of each design decision |
- Write the code scale statistics and the analysis mode selected by the user into , and control the analysis depth according to this in subsequent stages
Coverage calculation rules:
- Coverage = Union of line ranges actually requested through the Read tool / Total lines of the file
- For large files (>500 lines), they must be read in segments to ensure that the following key paragraphs are covered:
- Type definitions and imports at the head of the file (first 100 lines)
- Core business logic functions (located through directory structure or function names)
- Test code at the end of the file (if any)
- If only a small part of the file (<30%) is read, it will not be included in the coverage and will be regarded as "unread"
- Automatically generated code (proto generated, lock files, etc.) can reduce coverage requirements: just scan the structure, no need to read line by line
Phase 3: External Research + Project Document Study (Search first before reading)
- Use WebSearch to search for project evaluations, comparisons, and architecture discussions (at least 3-5 searches)
- Traverse the project official website (if exists):
- Extract the official website URL from README or GitHub page
- Use WebFetch/tavily_crawl to traverse key pages of the official website (homepage, Features, Use Cases, Comparison, Blog, etc.)
- Focus on extracting: product positioning (tagline), typical usage scenarios, official competitor comparisons, user cases/testimonials
- The official website content is often the best source to understand "why this product is needed", which is more direct than code and technical documents
- Read through the project's own documents:
- Architecture documents (directories such as , , , etc.)
- Developer guidelines such as CONTRIBUTING.md, AGENTS.md
- RFC, ADR (Architecture Decision Records), design proposals, etc.
- These documents often contain the developer's design ideas, trade-offs, and historical decision background, which are first-hand information to understand "why it is designed this way"
- Extract key design decisions and ideas from the documents into research notes
- Organize research findings and write them into , which must include the following structured paragraphs (mark "not found" if information is insufficient):
- Core problems solved by the project: Describe pain points with 1-3 specific scenarios (who, under what circumstances, what problems encountered, why existing solutions are not sufficient)
- Competitor/similar project comparison: List 3-5 most relevant competitors, explain their respective positioning differences and technical route differences
- Why this project needs to be built separately: Can't it be solved by combining existing solutions? What is the unique value proposition of this project?
- Organizational motivation behind the project (if applicable): Strategic considerations of commercial companies, ecological positioning of open source communities
- Generate analysis plan and write it into
Phase 4: Project Feature Identification + Adaptive Questioning
This is the core phase. Do not use a fixed list of questions, but generate targeted questions based on project characteristics.
Steps:
-
Quick scan: Scan entry files, directory structure, dependency declarations, project documents, README
-
Identify core project characteristics:
- Project type and positioning (library/framework/application/tool)
- Scale and maturity
- Design style signals (type gymnastics, minimalist API, configuration-driven, etc.)
- Technology stack characteristics (emerging technologies, multi-language, specific runtime)
- Community positioning (core infrastructure, application layer tools, teaching projects, etc.)
-
Extract questions from characteristics: Generate targeted questions based on observed project characteristics. Questions should help focus the analysis direction, not just follow the process
Thinking process - each observation may imply a question worth asking the user:
- Observed technology selection → Ask about motivation (uncommon technology combination? Self-implemented functions that are usually solved by third-party libraries?)
- Observed architecture characteristics → Ask about priority (performance optimization traces? Complex plugin/extension system?)
- Observed design tension → Ask about trade-offs (simplicity vs flexibility? Burden of backward compatibility?)
- Observed project positioning → Ask about audience (who is the target user? Is it an alternative or a gap filler in the ecosystem?)
Dimension inspiration - what project characteristics imply what analysis angles:
- Small and sophisticated library → API design philosophy, boundary delineation; Large framework → Modular strategy, backward compatibility, ecological governance
- Use of emerging technologies → Why choose it, migration cost; Multi-language/multi-paradigm → Language boundary design
- A large number of generics/type gymnastics → Type safety vs complexity trade-off; Minimalist API → How simplicity is achieved, what is sacrificed
Characteristics of good questions: Specific (based on phenomena observed in the code), valuable for analysis (answers will affect the analysis direction), answerable by users (ask about concerns and preferences, do not ask technical details that require in-depth code to answer), non-repetitive (do not ask questions that can be answered through code)
-
Ask the user: Use AskUserQuestion tool to ask the user questions, no more than 3 questions at a time
- One of the questions should confirm the level of detail at the beginning of the report: For well-known projects, users may not need lengthy product introductions and competitor comparisons, and just want to go directly to code analysis. Ask the user whether they need scenario-based introduction and competitor positioning chapters, or directly start with project overview and code analysis
-
Unlimited rounds: Multiple rounds of questions can be asked until the direction is clear, and new key分歧 points found during the analysis can be further asked
Key principle: Questions are completely driven by project characteristics, no preset categories. Different projects should generate completely different questions.
Phase 5: Dynamic Report Structure Design
Design the chapter structure of this report based on user answers + project characteristics.
Steps:
- Synthesize information: Combine the research of Phase 3, project characteristics of Phase 4 and user answers
- Design chapter structure: Do not use fixed templates, but must meet the skeleton constraints (see below)
- Output report outline: Output the designed report outline to the user for confirmation before continuing
- Identify modules: Track core data flow, identify N logical modules (divided by business function), divided into core modules and secondary modules
- Design module narrative line: Determine the presentation order and transition logic of modules in the report, not arranged by directory structure, but organized according to the best path for readers to understand:
- Choose the main narrative line: data flow driven (which modules the request goes through from entry to exit), layer driven (from bottom to top), or problem driven (expand layer by layer from core problems to solutions)
- Write the transition logic between every two adjacent modules: the output/problem/limitation of the previous module → leads to the necessity of the next module
- Write the narrative line into
drafts/05-modules-plan.md
, format example: Module A →[Output of A needs to be consumed by B]→ Module B →[B solves X but leads to Y problem]→ Module C
- Write into plan: Output module list and report outline and write into
drafts/05-modules-plan.md
Skeleton constraints (the report does not specify specific chapters, but must meet):
- There is scenario-based problem introduction (use specific scenarios to clearly explain what problems the project solves, the shortcomings of existing solutions, why this project is needed - materials come from Phase 3 research notes). Note: If the user indicates in Phase 4 that lengthy introduction is not needed (e.g. the project is already very well-known), this chapter can be simplified or skipped, and start directly with the project overview
- There is competitor positioning (key differences from similar projects, not a feature list comparison, but differences in design philosophy and technical route). Note: Same as above, users can choose to skip
- There is project overview (let readers quickly understand what the project is and what problems it solves)
- There is in-depth analysis (Why of core design, trade-offs, comparison with the industry)
- There is evaluation and inspiration (honest advantages and disadvantages, what readers can learn from it)
- There is architecture visualization (Mermaid charts)
- All conclusions have code basis
Phase 6: Parallel In-depth Analysis (subagent team)
Must use Agent tool to start subagents in parallel. Refer to the prompt template and collaboration specifications in module-analysis-guide.md.
The prompt of each subagent must include the overall design philosophy of the project and global perspective requirements to ensure that module analysis is not isolated.
The prompt of each subagent must also include the narrative context of the module (from the narrative line design of Phase 5): what was covered in the previous module, what questions readers have when entering this module, what this module needs to pave the way for the next module. The subagent should connect the previous module with 1-2 sentences at the beginning of the draft, and pave the way for the next module with 1 sentence at the end of the draft.
The end of each subagent's prompt must be appended with coverage requirements (refer to the coverage requirements paragraph in module-analysis-guide.md), inform the current analysis mode and minimum coverage target, and require a coverage detail table to be attached at the end of the draft.
subagent writing strategy:
For large modules (total lines of files > 5000 lines), it must be required in the subagent prompt to write the draft incrementally:
- After completing the analysis of each subsystem/submodule, immediately write this part to the draft file
- The first subsystem uses Write to create the file, and subsequent subsystems use Edit to append
- Do not wait until all files are read before writing at one time
- Append the coverage detail table at the end
Main agent waiting discipline:
- After the subagent is started, the main agent shall not read the source code files responsible for the subagent
- The main agent should focus on during the waiting period: reading project documents (architecture/, docs/), external research, designing report skeleton, preparing the fusion framework for Phase 8
- The standard for judging whether a subagent is stuck: no new lines are added to the output file for more than 5 minutes. Only after confirming that it is stuck, the main agent can take over the analysis of this module
- No early merging is strictly prohibited: Must wait for all subagents to complete before starting the merging work of Phase 7 and Phase 8. Do not start writing the final report while some subagents are still running
Phase 7: Cross Validation + Quality Control (main agent)
7.1 Coverage Gating:
- Read the coverage detail table at the end of each
- Quick check: Whether there is a coverage table at the end of each draft, whether the total line is marked as meeting the standard (✅/❌)
- Only modules marked ❌ or missing coverage tables need in-depth inspection
- Non-compliant modules → The main agent automatically supplements reading of uncovered key files, and appends supplementary findings to the corresponding draft
- Still not compliant after supplementation → Report to the user which modules are not compliant and the reasons (e.g. too large files, binary files, etc.)
7.2 Spot Check Verification:
- Select 2-3 key conclusions from each core module draft
- Go back to the source code to verify the accuracy of the conclusions line by line
- Correct the corresponding content in the draft if deviations are found
7.3 Cross Validation:
- Cross-verify cross-module conclusions marked [To be verified by main agent]
- Comprehensively answer exploration questions, identify cross-module design patterns
- Verify global correlation: Whether the analysis of each module is connected to the overall design philosophy of the project
- Write into
drafts/07-cross-validation.md
Phase 8: Multi-source Fusion and Final Report (main agent)
- Refine architecture insights and systematic design philosophy
- Deepen competitor comparison based on Phase 3 research results (supplement search only when Phase 3 information is insufficient)
- Propose "if redesign" improvement suggestions
- Write into
- Multi-source fusion: Take the report chapter structure designed in Phase 5 as the skeleton, extract content from each draft to fill. When the same concept appears in multiple drafts, take the most detailed version and supplement the unique information of other versions. Eliminate all jump instructions such as "see draft X", "see appendix" after fusion
- Narrative coherence: Organize module chapters according to the narrative line designed in Phase 5. The beginning of each module chapter must have 1-2 transitional sentences connecting the conclusions or questions of the previous module. Avoid abrupt transitions such as "Next we analyze X module", use natural transitions instead (e.g. "Gateway completes the authentication and routing of requests, but it is only responsible for 'who can come in', not 'what can be done after coming in'. This behavior control responsibility is undertaken by the policy engine.")
- Write in segments: The final report is usually more than 500 lines, first Write the first few chapters (200-300 lines), then append with Edit later, Read to confirm the end position before each append
- Coverage summary: Summarize the coverage data and write into (not included in the final report)
- Data is directly extracted from the coverage detail table at the end of each subagent draft, no need for the main agent to recalculate
- If the main agent supplemented reading in Phase 7, add the supplemented lines to the "read lines" of the corresponding module
- Summary table format:
| Module | Type | Number of files | Effective lines of code | Read lines | Coverage | Meets standard |
|---|
| ... | Core/Secondary | ... | ... | ... | ...% | ✅/❌ |
- Summarize and generate the final report (excluding coverage chapters)
Draft File List
All intermediate processes are saved to
:
| Phase | File |
|---|
| 3 | , |
| 5 | |
| 6 | (generated by subagent) |
| 7 | |
| 8 | , |
Files are written in blocks, no more than 300 lines or 15KB per time.
Special Scenarios
- Extra large projects (>50000 lines): Prioritize analysis of core modules, use Agent to analyze in parallel
- Comparative analysis mode: Complete Phase 1-4 for the two projects respectively, then design a comparative report structure in Phase 5, add "design decision comparison" and "selection suggestion" to the skeleton constraints
Output Requirements
- The final report is a single markdown file:
$WORK_DIR/ANALYSIS_REPORT.md
- Use a large number of Mermaid charts to show architecture, processes, and data flows
- Targeted at developers who need to understand business architecture
- The evaluation thinking framework for highlights and problems refers to analysis-guide.md
- The analysis philosophy and depth standards refer to analysis-guide.md