analyze-project
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese/analyze-project — Root Cause Analyst Workflow
/analyze-project — 根因分析师工作流
Analyze AI-assisted coding sessions in and produce a report that explains not just what happened, but why it happened, who/what caused it, and what should change next time.
~/.gemini/antigravity/brain/分析存放在路径下的AI辅助编码会话,生成的报告不仅会说明发生了什么,还会解释为什么会发生、是谁/什么导致的,以及下次需要做哪些调整。
~/.gemini/antigravity/brain/Goal
目标
For each session, determine:
- What changed from the initial ask to the final executed work
- Whether the main cause was:
- user/spec
- agent
- repo/codebase
- validation/testing
- legitimate task complexity
- Whether the opening prompt was sufficient
- Which files/subsystems repeatedly correlate with struggle
- What changes would most improve future sessions
针对每个会话,需确定:
- 从初始需求到最终交付的内容有哪些变化
- 核心问题原因属于哪一类:
- 用户/需求说明
- agent
- 仓库/代码库
- 验证/测试
- 任务本身的合理复杂度
- 初始prompt是否足够清晰完整
- 哪些文件/子系统反复出现开发卡点
- 哪些调整能最大程度优化后续会话效果
When to Use
适用场景
- You need a postmortem on AI-assisted coding sessions, especially when scope drift or repeated rework occurred.
- You want root-cause analysis that separates user/spec issues from agent mistakes, repo friction, or validation gaps.
- You need evidence-backed recommendations for improving future prompts, repo health, or delivery workflows.
- 你需要对AI辅助编码会话做事后复盘,尤其是出现范围蔓延或多次返工的场景
- 你需要做根因分析,区分是用户/需求问题、agent错误、代码库摩擦还是验证环节缺口
- 你需要有证据支撑的建议,用于优化后续prompt、代码库健康度或交付工作流
Global Rules
全局规则
- Treat counts as iteration signals, not proof of failure
.resolved.N - Separate human-added scope, necessary discovered scope, and agent-introduced scope
- Separate agent error from repo friction
- Every diagnosis must include evidence and confidence
- Confidence levels:
- High = direct artifact/timestamp evidence
- Medium = multiple supporting signals
- Low = plausible inference, not directly proven
- Evidence precedence:
- artifact contents > timestamps > metadata summaries > inference
- If evidence is weak, say so
- 将计数视为迭代信号,而非失败的证明
.resolved.N - 区分人为新增范围、必要的发现式范围和agent引入的额外范围
- 区分agent错误和代码库摩擦
- 每个诊断结论必须附带证据和置信度
- 置信度等级:
- 高 = 有直接的产物/时间戳证据
- 中 = 有多个支撑信号
- 低 = 合理推断,无直接证据
- 证据优先级:
- 产物内容 > 时间戳 > 元数据摘要 > 推断
- 如果证据不足,请明确说明
Step 0.5: Session Intent Classification
Step 0.5: 会话意图分类
Classify the primary session intent from objective + artifacts:
DELIVERYDEBUGGINGREFACTORRESEARCHEXPLORATIONAUDIT_ANALYSIS
Record:
session_intentsession_intent_confidence
Use intent to contextualize severity and rework shape.
Do not judge exploratory or research sessions by the same standards as narrow delivery sessions.
根据目标和产物将会话的核心意图分类:
- (功能交付)
DELIVERY - (问题调试)
DEBUGGING - (代码重构)
REFACTOR - (技术调研)
RESEARCH - (方案探索)
EXPLORATION - (审计分析)
AUDIT_ANALYSIS
记录:
- (会话意图)
session_intent - (会话意图置信度)
session_intent_confidence
根据意图来判断问题严重程度和返工模式的合理性,不要用窄范围交付会话的标准评判探索类或调研类会话。
Step 1: Discover Conversations
Step 1: 识别会话列表
- Read available conversation summaries from system context
- List conversation folders in the user’s Antigravity directory
brain/ - Build a conversation index with:
conversation_idtitleobjectivecreatedlast_modified
- If the user supplied a keyword/path, filter to matching conversations; otherwise analyze all
Output: indexed list of conversations to analyze.
- 从系统上下文读取可用的会话摘要
- 列出用户Antigravity 目录下的会话文件夹
brain/ - 构建会话索引,包含以下字段:
- (会话ID)
conversation_id - (标题)
title - (目标)
objective - (创建时间)
created - (最后修改时间)
last_modified
- 如果用户提供了关键词/路径,过滤出匹配的会话;否则分析所有会话
输出:待分析的索引会话列表
Step 2: Extract Session Evidence
Step 2: 提取会话证据
For each conversation, read if present:
针对每个会话,读取所有存在的相关内容:
Core artifacts
核心产物
task.mdimplementation_plan.mdwalkthrough.md
task.mdimplementation_plan.mdwalkthrough.md
Metadata
元数据
*.metadata.json
*.metadata.json
Version snapshots
版本快照
task.md.resolved.0 ... Nimplementation_plan.md.resolved.0 ... Nwalkthrough.md.resolved.0 ... N
task.md.resolved.0 ... Nimplementation_plan.md.resolved.0 ... Nwalkthrough.md.resolved.0 ... N
Additional signals
额外信号
- other artifacts
.md - timestamps across artifact updates
- file/folder/subsystem names mentioned in plans/walkthroughs
- validation/testing language
- explicit acceptance criteria, constraints, non-goals, and file targets
Record per conversation:
- 其他产物
.md - 所有产物更新的时间戳
- 计划/开发流程中提到的文件/文件夹/子系统名称
- 验证/测试相关表述
- 明确的验收标准、约束、非目标和目标文件
为每个会话记录以下内容:
Lifecycle
生命周期
has_taskhas_planhas_walkthroughis_completed- = task exists but no walkthrough
is_abandoned_candidate
- (是否存在任务说明)
has_task - (是否存在实现计划)
has_plan - (是否存在开发流程记录)
has_walkthrough - (是否已完成)
is_completed - (是否疑似废弃:存在任务说明但无开发流程记录)
is_abandoned_candidate
Revision / change volume
修订/变更量
task_versionsplan_versionswalkthrough_versionsextra_artifacts
- (任务说明版本数)
task_versions - (实现计划版本数)
plan_versions - (开发流程记录版本数)
walkthrough_versions - (额外产物数量)
extra_artifacts
Scope
范围
task_items_initialtask_items_finaltask_completed_pctscope_delta_rawscope_creep_pct_raw
- (初始任务项数量)
task_items_initial - (最终任务项数量)
task_items_final - (任务完成率)
task_completed_pct - (原始范围偏差值)
scope_delta_raw - (原始范围蔓延百分比)
scope_creep_pct_raw
Timing
时间
created_atcompleted_atduration_minutes
- (创建时间)
created_at - (完成时间)
completed_at - (耗时分钟数)
duration_minutes
Content / quality
内容/质量
objective_textinitial_plan_summaryfinal_plan_summaryinitial_task_excerptfinal_task_excerptwalkthrough_summarymentioned_files_or_subsystemsvalidation_requirements_presentacceptance_criteria_presentnon_goals_presentscope_boundaries_presentfile_targets_presentconstraints_present
- (目标文本)
objective_text - (初始计划摘要)
initial_plan_summary - (最终计划摘要)
final_plan_summary - (初始任务摘要)
initial_task_excerpt - (最终任务摘要)
final_task_excerpt - (开发流程摘要)
walkthrough_summary - (提到的文件或子系统)
mentioned_files_or_subsystems - (是否存在验证要求)
validation_requirements_present - (是否存在验收标准)
acceptance_criteria_present - (是否明确非目标)
non_goals_present - (是否明确范围边界)
scope_boundaries_present - (是否明确目标文件)
file_targets_present - (是否明确约束条件)
constraints_present
Step 3: Prompt Sufficiency
Step 3: Prompt充分度评估
Score the opening request on a 0–2 scale for:
- Clarity
- Boundedness
- Testability
- Architectural specificity
- Constraint awareness
- Dependency awareness
Create:
prompt_sufficiency_score- = High / Medium / Low
prompt_sufficiency_band
Then note which missing prompt ingredients likely contributed to later friction.
Do not punish short prompts by default; a narrow, obvious task can still have high sufficiency.
从以下维度对初始请求按0-2分打分:
- 清晰度
- 边界明确性
- 可测试性
- 架构明确度
- 约束感知度
- 依赖感知度
生成:
- (prompt充分度得分)
prompt_sufficiency_score - (prompt充分度等级:高/中/低)
prompt_sufficiency_band
然后标注哪些缺失的prompt要素可能是后续开发卡点的诱因。
不要默认惩罚短prompt:窄范围、需求明确的任务依然可以获得高充分度评分。
Step 4: Scope Change Classification
Step 4: 范围变更分类
Classify scope change into:
- Human-added scope — new asks beyond the original task
- Necessary discovered scope — work required to complete the original task correctly
- Agent-introduced scope — likely unnecessary work introduced by the agent
Record:
scope_change_type_primary- (optional)
scope_change_type_secondary scope_change_confidence- evidence
Keep one short example in mind for calibration:
- Human-added: “also refactor nearby code while you’re here”
- Necessary discovered: hidden dependency must be fixed for original task to work
- Agent-introduced: extra cleanup or redesign not requested and not required
将范围变更分为以下几类:
- 人为新增范围 — 超出初始任务的新增需求
- 必要的发现式范围 — 正确完成初始任务必须额外做的工作
- agent引入的范围 — agent引入的大概率不必要的额外工作
记录:
- (主要范围变更类型)
scope_change_type_primary - (次要范围变更类型,可选)
scope_change_type_secondary - (范围变更分类置信度)
scope_change_confidence - 证据
可以用以下简单示例作为校准参考:
- 人为新增:“既然你都在改这里了,顺便把附近的代码也重构下”
- 必要发现式:要完成初始任务必须先修复隐藏的依赖问题
- agent引入:没有被要求也不必要的额外清理或重构
Step 5: Rework Shape
Step 5: 返工模式分类
Classify each session into one primary pattern:
- Clean execution
- Early replan then stable finish
- Progressive scope expansion
- Reopen/reclose churn
- Late-stage verification churn
- Abandoned mid-flight
- Exploratory / research session
Record:
rework_shaperework_shape_confidence- evidence
将每个会话归为一个主要模式:
- Clean execution(顺利交付)
- Early replan then stable finish(早期重定计划后稳定交付)
- Progressive scope expansion(范围逐步扩大)
- Reopen/reclose churn(反复开启/关闭任务)
- Late-stage verification churn(后期验证环节反复调整)
- Abandoned mid-flight(中途废弃)
- Exploratory / research session(探索/调研类会话)
记录:
- (返工模式)
rework_shape - (返工模式置信度)
rework_shape_confidence - 证据
Step 6: Root Cause Analysis
Step 6: 根因分析
For every non-clean session, assign:
针对所有非顺利交付的会话,判定:
Primary root cause
主要根本原因
One of:
SPEC_AMBIGUITYHUMAN_SCOPE_CHANGEREPO_FRAGILITYAGENT_ARCHITECTURAL_ERRORVERIFICATION_CHURNLEGITIMATE_TASK_COMPLEXITY
可选值:
SPEC_AMBIGUITYHUMAN_SCOPE_CHANGEREPO_FRAGILITYAGENT_ARCHITECTURAL_ERRORVERIFICATION_CHURNLEGITIMATE_TASK_COMPLEXITY
Secondary root cause
次要根本原因
Optional if materially relevant
如果有实质相关性可以填写
Root-cause guidance
根因判定指引
- SPEC_AMBIGUITY: opening ask lacked boundaries, targets, criteria, or constraints
- HUMAN_SCOPE_CHANGE: scope expanded because the user broadened the task
- REPO_FRAGILITY: hidden coupling, brittle files, unclear architecture, or environment issues forced extra work
- AGENT_ARCHITECTURAL_ERROR: wrong files, wrong assumptions, wrong approach, hallucinated structure
- VERIFICATION_CHURN: implementation mostly worked, but testing/validation caused loops
- LEGITIMATE_TASK_COMPLEXITY: revisions were expected for the difficulty and not clearly avoidable
Every root-cause assignment must include:
- evidence
- why stronger alternative causes were rejected
- confidence
- SPEC_AMBIGUITY:初始需求缺少边界、目标、验收标准或约束
- HUMAN_SCOPE_CHANGE:范围扩大是因为用户拓宽了任务边界
- REPO_FRAGILITY:隐藏的耦合、脆弱的文件、不清晰的架构或环境问题导致额外工作
- AGENT_ARCHITECTURAL_ERROR:选错了文件、假设错误、方案错误、生成了不存在的结构
- VERIFICATION_CHURN:实现基本可用,但测试/验证环节导致反复调整
- LEGITIMATE_TASK_COMPLEXITY:任务难度本身就需要多次调整,没有明确的优化空间
每个根因判定必须包含:
- 证据
- 为什么排除了其他更可能的原因
- 置信度
Step 6.5: Session Severity Scoring (0–100)
Step 6.5: 会话严重度评分(0–100)
Assign each session a severity score to prioritize attention.
Components (sum, clamp 0–100):
- Completion failure: 0–25 ()
abandoned = 25 - Replanning intensity: 0–15
- Scope instability: 0–15
- Rework shape severity: 0–15
- Prompt sufficiency deficit: 0–10 ()
low = 10 - Root cause impact: 0–10 (/
REPO_FRAGILITYhighest)AGENT_ARCHITECTURAL_ERROR - Hotspot recurrence: 0–10
Bands:
- 0–19 Low
- 20–39 Moderate
- 40–59 Significant
- 60–79 High
- 80–100 Critical
Record:
session_severity_scoreseverity_band- = top 2–4 contributors
severity_drivers severity_confidence
Use severity as a prioritization signal, not a verdict. Always explain the drivers.
Contextualize severity using session intent so research/exploration sessions are not over-penalized.
为每个会话打严重度分,用于优先级排序。
评分构成(累加后限制在0-100区间):
- 交付失败:0–25(废弃=25分)
- 重计划强度:0–15
- 范围不稳定度:0–15
- 返工模式严重度:0–15
- Prompt充分度缺口:0–10(低充分度=10分)
- 根因影响度:0–10(/
REPO_FRAGILITY得分最高)AGENT_ARCHITECTURAL_ERROR - 热点重复出现次数:0–10
等级划分:
- 0–19 低
- 20–39 中
- 40–59 高
- 60–79 很高
- 80–100 严重
记录:
- (会话严重度得分)
session_severity_score - (严重度等级)
severity_band - (前2-4个主要影响因素)
severity_drivers - (严重度评分置信度)
severity_confidence
严重度仅作为优先级排序信号,而非最终判定结论,必须解释影响因素。结合会话意图调整严重度,不要过度惩罚调研/探索类会话。
Step 7: Subsystem / File Clustering
Step 7: 子系统/文件聚类
Across all conversations, cluster repeated struggle by file, folder, or subsystem.
For each cluster, calculate:
- number of conversations touching it
- average revisions
- completion rate
- abandonment rate
- common root causes
- average severity
Goal: identify whether friction is mostly prompt-driven, agent-driven, or concentrated in specific repo areas.
在所有会话中,按文件、文件夹或子系统聚类反复出现的卡点。
为每个聚类计算:
- 涉及的会话数量
- 平均修订次数
- 完成率
- 废弃率
- 常见根因
- 平均严重度
目标:识别卡点主要是prompt导致、agent导致,还是集中在特定的代码库区域。
Step 8: Comparative Cohorts
Step 8: 队列对比分析
Compare:
- first-shot successes vs re-planned sessions
- completed vs abandoned
- high prompt sufficiency vs low prompt sufficiency
- narrow-scope vs high-scope-growth
- short sessions vs long sessions
- low-friction subsystems vs high-friction subsystems
For each comparison, identify:
- what differs materially
- which prompt traits correlate with smoother execution
- which repo traits correlate with repeated struggle
Do not just restate averages; extract cautious evidence-backed patterns.
对比以下分组:
- 一次成功的会话 vs 需要重定计划的会话
- 已完成的会话 vs 废弃的会话
- prompt充分度高的会话 vs 充分度低的会话
- 窄范围会话 vs 范围增长高的会话
- 短会话 vs 长会话
- 低卡点子系统 vs 高卡点子系统
针对每组对比,识别:
- 核心差异点
- 哪些prompt特征和更顺畅的交付相关
- 哪些代码库特征和反复出现的卡点相关
不要只是罗列平均值,要提取有证据支撑的谨慎结论。
Step 9: Non-Obvious Findings
Step 9: 非显性发现
Generate 3–7 findings that are not simple metric restatements.
Each finding must include:
- observation
- why it matters
- evidence
- confidence
Examples of strong findings:
- replans cluster around weak file targeting rather than weak acceptance criteria
- scope growth often begins after initial success, suggesting post-success human expansion
- auth-related struggle is driven more by repo fragility than agent hallucination
生成3-7个不是简单指标复述的发现。
每个发现必须包含:
- 观察结果
- 影响说明
- 证据
- 置信度
优质发现示例:
- 重计划更多集中在目标文件不明确的场景,而非验收标准不清晰的场景
- 范围增长通常出现在初步成功之后,说明是成功后用户主动扩大了需求
- 权限相关的卡点更多是由代码库脆弱性导致,而非agent幻觉
Step 10: Report Generation
Step 10: 报告生成
Create with this structure:
session_analysis_report.md生成,结构如下:
session_analysis_report.md📊 Session Analysis Report — [Project Name]
📊 会话分析报告 — [项目名称]
Generated: [timestamp]
Conversations Analyzed: [N]
Date Range: [earliest] → [latest]
Conversations Analyzed: [N]
Date Range: [earliest] → [latest]
生成时间:[timestamp]
分析会话数:[N]
日期范围:[最早时间] → [最晚时间]
分析会话数:[N]
日期范围:[最早时间] → [最晚时间]
Executive Summary
执行摘要
| Metric | Value | Rating |
|---|---|---|
| First-Shot Success Rate | X% | 🟢/🟡/🔴 |
| Completion Rate | X% | 🟢/🟡/🔴 |
| Avg Scope Growth | X% | 🟢/🟡/🔴 |
| Replan Rate | X% | 🟢/🟡/🔴 |
| Median Duration | Xm | — |
| Avg Session Severity | X | 🟢/🟡/🔴 |
| High-Severity Sessions | X / N | 🟢/🟡/🔴 |
Thresholds:
- First-shot: 🟢 >70 / 🟡 40–70 / 🔴 <40
- Scope growth: 🟢 <15 / 🟡 15–40 / 🔴 >40
- Replan rate: 🟢 <20 / 🟡 20–50 / 🔴 >50
Avg severity guidance:
- 🟢 <25
- 🟡 25–50
- 🔴 >50
Note: avg severity is an aggregate health signal, not the same as per-session severity bands.
Then add a short narrative summary of what is going well, what is breaking down, and whether the main issue is prompt quality, repo fragility, workflow discipline, or validation churn.
| 指标 | 数值 | 评级 |
|---|---|---|
| 一次成功率 | X% | 🟢/🟡/🔴 |
| 完成率 | X% | 🟢/🟡/🔴 |
| 平均范围增长率 | X% | 🟢/🟡/🔴 |
| 重计划率 | X% | 🟢/🟡/🔴 |
| 中位耗时 | Xm | — |
| 平均会话严重度 | X | 🟢/🟡/🔴 |
| 高严重度会话占比 | X / N | 🟢/🟡/🔴 |
阈值规则:
- 一次成功率:🟢 >70 / 🟡 40–70 / 🔴 <40
- 范围增长率:🟢 <15 / 🟡 15–40 / 🔴 >40
- 重计划率:🟢 <20 / 🟡 20–50 / 🔴 >50
平均严重度指引:
- 🟢 <25
- 🟡 25–50
- 🔴 >50
注:平均严重度是整体健康度信号,和单个会话的严重度等级不同。
之后添加一段简短的叙述性摘要,说明哪些方面表现好、哪些环节出问题,核心问题是prompt质量、代码库脆弱性、工作流规范性还是验证环节卡点。
Root Cause Breakdown
根因分布
| Root Cause | Count | % | Notes |
|---|
| 根因 | 数量 | 占比 | 备注 |
|---|
Prompt Sufficiency Analysis
Prompt充分度分析
- common traits of high-sufficiency prompts
- common missing inputs in low-sufficiency prompts
- which missing prompt ingredients correlate most with replanning or abandonment
- 高充分度prompt的共性特征
- 低充分度prompt普遍缺失的要素
- 哪些缺失的prompt要素和重计划或废弃相关性最高
Scope Change Analysis
范围变更分析
Separate:
- Human-added scope
- Necessary discovered scope
- Agent-introduced scope
区分:
- 人为新增范围
- 必要的发现式范围
- agent引入的范围
Rework Shape Analysis
返工模式分析
Summarize the main failure patterns across sessions.
总结所有会话的主要失败模式。
Friction Hotspots
卡点热点
Show the files/folders/subsystems most associated with replanning, abandonment, verification churn, and high severity.
展示和重计划、废弃、验证环节卡点、高严重度相关性最高的文件/文件夹/子系统。
First-Shot Successes
一次成功案例
List the cleanest sessions and extract what made them work.
列出最顺畅的会话,提取成功的原因。
Non-Obvious Findings
非显性发现
List 3–7 evidence-backed findings with confidence.
列出3-7个有证据支撑、带置信度的发现。
Severity Triage
严重度分级处理
List the highest-severity sessions and say whether the best intervention is:
- prompt improvement
- scope discipline
- targeted skill/workflow
- repo refactor / architecture cleanup
- validation/test harness improvement
列出严重度最高的会话,说明最优干预手段是:
- prompt优化
- 范围管控
- 定向skill/工作流优化
- 代码库重构/架构清理
- 验证/测试框架优化
Recommendations
建议
For each recommendation, use:
- Observed pattern
- Likely cause
- Evidence
- Change to make
- Expected benefit
- Confidence
每条建议都包含:
- 观察到的模式
- 可能的原因
- 证据
- 需要做的调整
- 预期收益
- 置信度
Per-Conversation Breakdown
单会话明细
| # | Title | Intent | Duration | Scope Δ | Plan Revs | Task Revs | Root Cause | Rework Shape | Severity | Complete? |
|---|
| 序号 | 标题 | 意图 | 耗时 | 范围偏差 | 计划修订次数 | 任务修订次数 | 根因 | 返工模式 | 严重度 | 是否完成 |
|---|
Step 11: Optional Post-Analysis Improvements
Step 11: 可选的分析后优化
If appropriate, also:
- update any local project-health or memory artifact (if present) with recurring failure modes and fragile subsystems
- generate from high-sufficiency / first-shot-success sessions
prompt_improvement_tips.md - suggest missing skills or workflows when the same subsystem or task sequence repeatedly causes struggle
Only recommend workflows/skills when the pattern appears repeatedly.
如果适用,还可以:
- 用反复出现的失败模式和脆弱子系统更新本地项目健康度或记忆产物(如果存在)
- 基于高充分度/一次成功的会话生成
prompt_improvement_tips.md - 如果同一个子系统或任务序列反复出现卡点,建议补充缺失的skill或工作流
只有当模式反复出现时才推荐新的工作流/skill。
Final Output Standard
最终输出标准
The workflow must produce:
- metrics summary
- root-cause diagnosis
- prompt-sufficiency assessment
- subsystem/friction map
- severity triage and prioritization
- evidence-backed recommendations
- non-obvious findings
Prefer explicit uncertainty over fake precision.
工作流必须产出:
- 指标摘要
- 根因诊断
- prompt充分度评估
- 子系统/卡点地图
- 严重度分级和优先级排序
- 有证据支撑的建议
- 非显性发现
优先明确说明不确定性,而非提供虚假的精确结果。