analyze-project

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/analyze-project — Root Cause Analyst Workflow

/analyze-project — 根因分析师工作流

Analyze AI-assisted coding sessions in
~/.gemini/antigravity/brain/
and produce a report that explains not just what happened, but why it happened, who/what caused it, and what should change next time.
分析存放在
~/.gemini/antigravity/brain/
路径下的AI辅助编码会话,生成的报告不仅会说明发生了什么,还会解释为什么会发生是谁/什么导致的,以及下次需要做哪些调整

Goal

目标

For each session, determine:
  1. What changed from the initial ask to the final executed work
  2. Whether the main cause was:
    • user/spec
    • agent
    • repo/codebase
    • validation/testing
    • legitimate task complexity
  3. Whether the opening prompt was sufficient
  4. Which files/subsystems repeatedly correlate with struggle
  5. What changes would most improve future sessions
针对每个会话,需确定:
  1. 从初始需求到最终交付的内容有哪些变化
  2. 核心问题原因属于哪一类:
    • 用户/需求说明
    • agent
    • 仓库/代码库
    • 验证/测试
    • 任务本身的合理复杂度
  3. 初始prompt是否足够清晰完整
  4. 哪些文件/子系统反复出现开发卡点
  5. 哪些调整能最大程度优化后续会话效果

When to Use

适用场景

  • You need a postmortem on AI-assisted coding sessions, especially when scope drift or repeated rework occurred.
  • You want root-cause analysis that separates user/spec issues from agent mistakes, repo friction, or validation gaps.
  • You need evidence-backed recommendations for improving future prompts, repo health, or delivery workflows.
  • 你需要对AI辅助编码会话做事后复盘,尤其是出现范围蔓延或多次返工的场景
  • 你需要做根因分析,区分是用户/需求问题、agent错误、代码库摩擦还是验证环节缺口
  • 你需要有证据支撑的建议,用于优化后续prompt、代码库健康度或交付工作流

Global Rules

全局规则

  • Treat
    .resolved.N
    counts as iteration signals, not proof of failure
  • Separate human-added scope, necessary discovered scope, and agent-introduced scope
  • Separate agent error from repo friction
  • Every diagnosis must include evidence and confidence
  • Confidence levels:
    • High = direct artifact/timestamp evidence
    • Medium = multiple supporting signals
    • Low = plausible inference, not directly proven
  • Evidence precedence:
    • artifact contents > timestamps > metadata summaries > inference
  • If evidence is weak, say so

  • .resolved.N
    计数视为迭代信号,而非失败的证明
  • 区分人为新增范围必要的发现式范围agent引入的额外范围
  • 区分agent错误代码库摩擦
  • 每个诊断结论必须附带证据置信度
  • 置信度等级:
    • = 有直接的产物/时间戳证据
    • = 有多个支撑信号
    • = 合理推断,无直接证据
  • 证据优先级:
    • 产物内容 > 时间戳 > 元数据摘要 > 推断
  • 如果证据不足,请明确说明

Step 0.5: Session Intent Classification

Step 0.5: 会话意图分类

Classify the primary session intent from objective + artifacts:
  • DELIVERY
  • DEBUGGING
  • REFACTOR
  • RESEARCH
  • EXPLORATION
  • AUDIT_ANALYSIS
Record:
  • session_intent
  • session_intent_confidence
Use intent to contextualize severity and rework shape. Do not judge exploratory or research sessions by the same standards as narrow delivery sessions.

根据目标和产物将会话的核心意图分类:
  • DELIVERY
    (功能交付)
  • DEBUGGING
    (问题调试)
  • REFACTOR
    (代码重构)
  • RESEARCH
    (技术调研)
  • EXPLORATION
    (方案探索)
  • AUDIT_ANALYSIS
    (审计分析)
记录:
  • session_intent
    (会话意图)
  • session_intent_confidence
    (会话意图置信度)
根据意图来判断问题严重程度和返工模式的合理性,不要用窄范围交付会话的标准评判探索类或调研类会话。

Step 1: Discover Conversations

Step 1: 识别会话列表

  1. Read available conversation summaries from system context
  2. List conversation folders in the user’s Antigravity
    brain/
    directory
  3. Build a conversation index with:
    • conversation_id
    • title
    • objective
    • created
    • last_modified
  4. If the user supplied a keyword/path, filter to matching conversations; otherwise analyze all
Output: indexed list of conversations to analyze.

  1. 从系统上下文读取可用的会话摘要
  2. 列出用户Antigravity
    brain/
    目录下的会话文件夹
  3. 构建会话索引,包含以下字段:
    • conversation_id
      (会话ID)
    • title
      (标题)
    • objective
      (目标)
    • created
      (创建时间)
    • last_modified
      (最后修改时间)
  4. 如果用户提供了关键词/路径,过滤出匹配的会话;否则分析所有会话
输出:待分析的索引会话列表

Step 2: Extract Session Evidence

Step 2: 提取会话证据

For each conversation, read if present:
针对每个会话,读取所有存在的相关内容:

Core artifacts

核心产物

  • task.md
  • implementation_plan.md
  • walkthrough.md
  • task.md
  • implementation_plan.md
  • walkthrough.md

Metadata

元数据

  • *.metadata.json
  • *.metadata.json

Version snapshots

版本快照

  • task.md.resolved.0 ... N
  • implementation_plan.md.resolved.0 ... N
  • walkthrough.md.resolved.0 ... N
  • task.md.resolved.0 ... N
  • implementation_plan.md.resolved.0 ... N
  • walkthrough.md.resolved.0 ... N

Additional signals

额外信号

  • other
    .md
    artifacts
  • timestamps across artifact updates
  • file/folder/subsystem names mentioned in plans/walkthroughs
  • validation/testing language
  • explicit acceptance criteria, constraints, non-goals, and file targets
Record per conversation:
  • 其他
    .md
    产物
  • 所有产物更新的时间戳
  • 计划/开发流程中提到的文件/文件夹/子系统名称
  • 验证/测试相关表述
  • 明确的验收标准、约束、非目标和目标文件
为每个会话记录以下内容:

Lifecycle

生命周期

  • has_task
  • has_plan
  • has_walkthrough
  • is_completed
  • is_abandoned_candidate
    = task exists but no walkthrough
  • has_task
    (是否存在任务说明)
  • has_plan
    (是否存在实现计划)
  • has_walkthrough
    (是否存在开发流程记录)
  • is_completed
    (是否已完成)
  • is_abandoned_candidate
    (是否疑似废弃:存在任务说明但无开发流程记录)

Revision / change volume

修订/变更量

  • task_versions
  • plan_versions
  • walkthrough_versions
  • extra_artifacts
  • task_versions
    (任务说明版本数)
  • plan_versions
    (实现计划版本数)
  • walkthrough_versions
    (开发流程记录版本数)
  • extra_artifacts
    (额外产物数量)

Scope

范围

  • task_items_initial
  • task_items_final
  • task_completed_pct
  • scope_delta_raw
  • scope_creep_pct_raw
  • task_items_initial
    (初始任务项数量)
  • task_items_final
    (最终任务项数量)
  • task_completed_pct
    (任务完成率)
  • scope_delta_raw
    (原始范围偏差值)
  • scope_creep_pct_raw
    (原始范围蔓延百分比)

Timing

时间

  • created_at
  • completed_at
  • duration_minutes
  • created_at
    (创建时间)
  • completed_at
    (完成时间)
  • duration_minutes
    (耗时分钟数)

Content / quality

内容/质量

  • objective_text
  • initial_plan_summary
  • final_plan_summary
  • initial_task_excerpt
  • final_task_excerpt
  • walkthrough_summary
  • mentioned_files_or_subsystems
  • validation_requirements_present
  • acceptance_criteria_present
  • non_goals_present
  • scope_boundaries_present
  • file_targets_present
  • constraints_present

  • objective_text
    (目标文本)
  • initial_plan_summary
    (初始计划摘要)
  • final_plan_summary
    (最终计划摘要)
  • initial_task_excerpt
    (初始任务摘要)
  • final_task_excerpt
    (最终任务摘要)
  • walkthrough_summary
    (开发流程摘要)
  • mentioned_files_or_subsystems
    (提到的文件或子系统)
  • validation_requirements_present
    (是否存在验证要求)
  • acceptance_criteria_present
    (是否存在验收标准)
  • non_goals_present
    (是否明确非目标)
  • scope_boundaries_present
    (是否明确范围边界)
  • file_targets_present
    (是否明确目标文件)
  • constraints_present
    (是否明确约束条件)

Step 3: Prompt Sufficiency

Step 3: Prompt充分度评估

Score the opening request on a 0–2 scale for:
  • Clarity
  • Boundedness
  • Testability
  • Architectural specificity
  • Constraint awareness
  • Dependency awareness
Create:
  • prompt_sufficiency_score
  • prompt_sufficiency_band
    = High / Medium / Low
Then note which missing prompt ingredients likely contributed to later friction.
Do not punish short prompts by default; a narrow, obvious task can still have high sufficiency.

从以下维度对初始请求按0-2分打分:
  • 清晰度
  • 边界明确性
  • 可测试性
  • 架构明确度
  • 约束感知度
  • 依赖感知度
生成:
  • prompt_sufficiency_score
    (prompt充分度得分)
  • prompt_sufficiency_band
    (prompt充分度等级:高/中/低)
然后标注哪些缺失的prompt要素可能是后续开发卡点的诱因。 不要默认惩罚短prompt:窄范围、需求明确的任务依然可以获得高充分度评分。

Step 4: Scope Change Classification

Step 4: 范围变更分类

Classify scope change into:
  • Human-added scope — new asks beyond the original task
  • Necessary discovered scope — work required to complete the original task correctly
  • Agent-introduced scope — likely unnecessary work introduced by the agent
Record:
  • scope_change_type_primary
  • scope_change_type_secondary
    (optional)
  • scope_change_confidence
  • evidence
Keep one short example in mind for calibration:
  • Human-added: “also refactor nearby code while you’re here”
  • Necessary discovered: hidden dependency must be fixed for original task to work
  • Agent-introduced: extra cleanup or redesign not requested and not required

将范围变更分为以下几类:
  • 人为新增范围 — 超出初始任务的新增需求
  • 必要的发现式范围 — 正确完成初始任务必须额外做的工作
  • agent引入的范围 — agent引入的大概率不必要的额外工作
记录:
  • scope_change_type_primary
    (主要范围变更类型)
  • scope_change_type_secondary
    (次要范围变更类型,可选)
  • scope_change_confidence
    (范围变更分类置信度)
  • 证据
可以用以下简单示例作为校准参考:
  • 人为新增:“既然你都在改这里了,顺便把附近的代码也重构下”
  • 必要发现式:要完成初始任务必须先修复隐藏的依赖问题
  • agent引入:没有被要求也不必要的额外清理或重构

Step 5: Rework Shape

Step 5: 返工模式分类

Classify each session into one primary pattern:
  • Clean execution
  • Early replan then stable finish
  • Progressive scope expansion
  • Reopen/reclose churn
  • Late-stage verification churn
  • Abandoned mid-flight
  • Exploratory / research session
Record:
  • rework_shape
  • rework_shape_confidence
  • evidence

将每个会话归为一个主要模式:
  • Clean execution(顺利交付)
  • Early replan then stable finish(早期重定计划后稳定交付)
  • Progressive scope expansion(范围逐步扩大)
  • Reopen/reclose churn(反复开启/关闭任务)
  • Late-stage verification churn(后期验证环节反复调整)
  • Abandoned mid-flight(中途废弃)
  • Exploratory / research session(探索/调研类会话)
记录:
  • rework_shape
    (返工模式)
  • rework_shape_confidence
    (返工模式置信度)
  • 证据

Step 6: Root Cause Analysis

Step 6: 根因分析

For every non-clean session, assign:
针对所有非顺利交付的会话,判定:

Primary root cause

主要根本原因

One of:
  • SPEC_AMBIGUITY
  • HUMAN_SCOPE_CHANGE
  • REPO_FRAGILITY
  • AGENT_ARCHITECTURAL_ERROR
  • VERIFICATION_CHURN
  • LEGITIMATE_TASK_COMPLEXITY
可选值:
  • SPEC_AMBIGUITY
  • HUMAN_SCOPE_CHANGE
  • REPO_FRAGILITY
  • AGENT_ARCHITECTURAL_ERROR
  • VERIFICATION_CHURN
  • LEGITIMATE_TASK_COMPLEXITY

Secondary root cause

次要根本原因

Optional if materially relevant
如果有实质相关性可以填写

Root-cause guidance

根因判定指引

  • SPEC_AMBIGUITY: opening ask lacked boundaries, targets, criteria, or constraints
  • HUMAN_SCOPE_CHANGE: scope expanded because the user broadened the task
  • REPO_FRAGILITY: hidden coupling, brittle files, unclear architecture, or environment issues forced extra work
  • AGENT_ARCHITECTURAL_ERROR: wrong files, wrong assumptions, wrong approach, hallucinated structure
  • VERIFICATION_CHURN: implementation mostly worked, but testing/validation caused loops
  • LEGITIMATE_TASK_COMPLEXITY: revisions were expected for the difficulty and not clearly avoidable
Every root-cause assignment must include:
  • evidence
  • why stronger alternative causes were rejected
  • confidence

  • SPEC_AMBIGUITY:初始需求缺少边界、目标、验收标准或约束
  • HUMAN_SCOPE_CHANGE:范围扩大是因为用户拓宽了任务边界
  • REPO_FRAGILITY:隐藏的耦合、脆弱的文件、不清晰的架构或环境问题导致额外工作
  • AGENT_ARCHITECTURAL_ERROR:选错了文件、假设错误、方案错误、生成了不存在的结构
  • VERIFICATION_CHURN:实现基本可用,但测试/验证环节导致反复调整
  • LEGITIMATE_TASK_COMPLEXITY:任务难度本身就需要多次调整,没有明确的优化空间
每个根因判定必须包含:
  • 证据
  • 为什么排除了其他更可能的原因
  • 置信度

Step 6.5: Session Severity Scoring (0–100)

Step 6.5: 会话严重度评分(0–100)

Assign each session a severity score to prioritize attention.
Components (sum, clamp 0–100):
  • Completion failure: 0–25 (
    abandoned = 25
    )
  • Replanning intensity: 0–15
  • Scope instability: 0–15
  • Rework shape severity: 0–15
  • Prompt sufficiency deficit: 0–10 (
    low = 10
    )
  • Root cause impact: 0–10 (
    REPO_FRAGILITY
    /
    AGENT_ARCHITECTURAL_ERROR
    highest)
  • Hotspot recurrence: 0–10
Bands:
  • 0–19 Low
  • 20–39 Moderate
  • 40–59 Significant
  • 60–79 High
  • 80–100 Critical
Record:
  • session_severity_score
  • severity_band
  • severity_drivers
    = top 2–4 contributors
  • severity_confidence
Use severity as a prioritization signal, not a verdict. Always explain the drivers. Contextualize severity using session intent so research/exploration sessions are not over-penalized.

为每个会话打严重度分,用于优先级排序。
评分构成(累加后限制在0-100区间):
  • 交付失败:0–25(废弃=25分)
  • 重计划强度:0–15
  • 范围不稳定度:0–15
  • 返工模式严重度:0–15
  • Prompt充分度缺口:0–10(低充分度=10分)
  • 根因影响度:0–10(
    REPO_FRAGILITY
    /
    AGENT_ARCHITECTURAL_ERROR
    得分最高)
  • 热点重复出现次数:0–10
等级划分:
  • 0–19 低
  • 20–39 中
  • 40–59 高
  • 60–79 很高
  • 80–100 严重
记录:
  • session_severity_score
    (会话严重度得分)
  • severity_band
    (严重度等级)
  • severity_drivers
    (前2-4个主要影响因素)
  • severity_confidence
    (严重度评分置信度)
严重度仅作为优先级排序信号,而非最终判定结论,必须解释影响因素。结合会话意图调整严重度,不要过度惩罚调研/探索类会话。

Step 7: Subsystem / File Clustering

Step 7: 子系统/文件聚类

Across all conversations, cluster repeated struggle by file, folder, or subsystem.
For each cluster, calculate:
  • number of conversations touching it
  • average revisions
  • completion rate
  • abandonment rate
  • common root causes
  • average severity
Goal: identify whether friction is mostly prompt-driven, agent-driven, or concentrated in specific repo areas.

在所有会话中,按文件、文件夹或子系统聚类反复出现的卡点。
为每个聚类计算:
  • 涉及的会话数量
  • 平均修订次数
  • 完成率
  • 废弃率
  • 常见根因
  • 平均严重度
目标:识别卡点主要是prompt导致、agent导致,还是集中在特定的代码库区域。

Step 8: Comparative Cohorts

Step 8: 队列对比分析

Compare:
  • first-shot successes vs re-planned sessions
  • completed vs abandoned
  • high prompt sufficiency vs low prompt sufficiency
  • narrow-scope vs high-scope-growth
  • short sessions vs long sessions
  • low-friction subsystems vs high-friction subsystems
For each comparison, identify:
  • what differs materially
  • which prompt traits correlate with smoother execution
  • which repo traits correlate with repeated struggle
Do not just restate averages; extract cautious evidence-backed patterns.

对比以下分组:
  • 一次成功的会话 vs 需要重定计划的会话
  • 已完成的会话 vs 废弃的会话
  • prompt充分度高的会话 vs 充分度低的会话
  • 窄范围会话 vs 范围增长高的会话
  • 短会话 vs 长会话
  • 低卡点子系统 vs 高卡点子系统
针对每组对比,识别:
  • 核心差异点
  • 哪些prompt特征和更顺畅的交付相关
  • 哪些代码库特征和反复出现的卡点相关
不要只是罗列平均值,要提取有证据支撑的谨慎结论。

Step 9: Non-Obvious Findings

Step 9: 非显性发现

Generate 3–7 findings that are not simple metric restatements.
Each finding must include:
  • observation
  • why it matters
  • evidence
  • confidence
Examples of strong findings:
  • replans cluster around weak file targeting rather than weak acceptance criteria
  • scope growth often begins after initial success, suggesting post-success human expansion
  • auth-related struggle is driven more by repo fragility than agent hallucination

生成3-7个不是简单指标复述的发现。
每个发现必须包含:
  • 观察结果
  • 影响说明
  • 证据
  • 置信度
优质发现示例:
  • 重计划更多集中在目标文件不明确的场景,而非验收标准不清晰的场景
  • 范围增长通常出现在初步成功之后,说明是成功后用户主动扩大了需求
  • 权限相关的卡点更多是由代码库脆弱性导致,而非agent幻觉

Step 10: Report Generation

Step 10: 报告生成

Create
session_analysis_report.md
with this structure:
生成
session_analysis_report.md
,结构如下:

📊 Session Analysis Report — [Project Name]

📊 会话分析报告 — [项目名称]

Generated: [timestamp]
Conversations Analyzed: [N]
Date Range: [earliest] → [latest]
生成时间:[timestamp]
分析会话数:[N]
日期范围:[最早时间] → [最晚时间]

Executive Summary

执行摘要

MetricValueRating
First-Shot Success RateX%🟢/🟡/🔴
Completion RateX%🟢/🟡/🔴
Avg Scope GrowthX%🟢/🟡/🔴
Replan RateX%🟢/🟡/🔴
Median DurationXm
Avg Session SeverityX🟢/🟡/🔴
High-Severity SessionsX / N🟢/🟡/🔴
Thresholds:
  • First-shot: 🟢 >70 / 🟡 40–70 / 🔴 <40
  • Scope growth: 🟢 <15 / 🟡 15–40 / 🔴 >40
  • Replan rate: 🟢 <20 / 🟡 20–50 / 🔴 >50
Avg severity guidance:
  • 🟢 <25
  • 🟡 25–50
  • 🔴 >50
Note: avg severity is an aggregate health signal, not the same as per-session severity bands.
Then add a short narrative summary of what is going well, what is breaking down, and whether the main issue is prompt quality, repo fragility, workflow discipline, or validation churn.
指标数值评级
一次成功率X%🟢/🟡/🔴
完成率X%🟢/🟡/🔴
平均范围增长率X%🟢/🟡/🔴
重计划率X%🟢/🟡/🔴
中位耗时Xm
平均会话严重度X🟢/🟡/🔴
高严重度会话占比X / N🟢/🟡/🔴
阈值规则:
  • 一次成功率:🟢 >70 / 🟡 40–70 / 🔴 <40
  • 范围增长率:🟢 <15 / 🟡 15–40 / 🔴 >40
  • 重计划率:🟢 <20 / 🟡 20–50 / 🔴 >50
平均严重度指引:
  • 🟢 <25
  • 🟡 25–50
  • 🔴 >50
注:平均严重度是整体健康度信号,和单个会话的严重度等级不同。
之后添加一段简短的叙述性摘要,说明哪些方面表现好、哪些环节出问题,核心问题是prompt质量、代码库脆弱性、工作流规范性还是验证环节卡点。

Root Cause Breakdown

根因分布

Root CauseCount%Notes
根因数量占比备注

Prompt Sufficiency Analysis

Prompt充分度分析

  • common traits of high-sufficiency prompts
  • common missing inputs in low-sufficiency prompts
  • which missing prompt ingredients correlate most with replanning or abandonment
  • 高充分度prompt的共性特征
  • 低充分度prompt普遍缺失的要素
  • 哪些缺失的prompt要素和重计划或废弃相关性最高

Scope Change Analysis

范围变更分析

Separate:
  • Human-added scope
  • Necessary discovered scope
  • Agent-introduced scope
区分:
  • 人为新增范围
  • 必要的发现式范围
  • agent引入的范围

Rework Shape Analysis

返工模式分析

Summarize the main failure patterns across sessions.
总结所有会话的主要失败模式。

Friction Hotspots

卡点热点

Show the files/folders/subsystems most associated with replanning, abandonment, verification churn, and high severity.
展示和重计划、废弃、验证环节卡点、高严重度相关性最高的文件/文件夹/子系统。

First-Shot Successes

一次成功案例

List the cleanest sessions and extract what made them work.
列出最顺畅的会话,提取成功的原因。

Non-Obvious Findings

非显性发现

List 3–7 evidence-backed findings with confidence.
列出3-7个有证据支撑、带置信度的发现。

Severity Triage

严重度分级处理

List the highest-severity sessions and say whether the best intervention is:
  • prompt improvement
  • scope discipline
  • targeted skill/workflow
  • repo refactor / architecture cleanup
  • validation/test harness improvement
列出严重度最高的会话,说明最优干预手段是:
  • prompt优化
  • 范围管控
  • 定向skill/工作流优化
  • 代码库重构/架构清理
  • 验证/测试框架优化

Recommendations

建议

For each recommendation, use:
  • Observed pattern
  • Likely cause
  • Evidence
  • Change to make
  • Expected benefit
  • Confidence
每条建议都包含:
  • 观察到的模式
  • 可能的原因
  • 证据
  • 需要做的调整
  • 预期收益
  • 置信度

Per-Conversation Breakdown

单会话明细

#TitleIntentDurationScope ΔPlan RevsTask RevsRoot CauseRework ShapeSeverityComplete?

序号标题意图耗时范围偏差计划修订次数任务修订次数根因返工模式严重度是否完成

Step 11: Optional Post-Analysis Improvements

Step 11: 可选的分析后优化

If appropriate, also:
  • update any local project-health or memory artifact (if present) with recurring failure modes and fragile subsystems
  • generate
    prompt_improvement_tips.md
    from high-sufficiency / first-shot-success sessions
  • suggest missing skills or workflows when the same subsystem or task sequence repeatedly causes struggle
Only recommend workflows/skills when the pattern appears repeatedly.

如果适用,还可以:
  • 用反复出现的失败模式和脆弱子系统更新本地项目健康度或记忆产物(如果存在)
  • 基于高充分度/一次成功的会话生成
    prompt_improvement_tips.md
  • 如果同一个子系统或任务序列反复出现卡点,建议补充缺失的skill或工作流
只有当模式反复出现时才推荐新的工作流/skill。

Final Output Standard

最终输出标准

The workflow must produce:
  1. metrics summary
  2. root-cause diagnosis
  3. prompt-sufficiency assessment
  4. subsystem/friction map
  5. severity triage and prioritization
  6. evidence-backed recommendations
  7. non-obvious findings
Prefer explicit uncertainty over fake precision.
工作流必须产出:
  1. 指标摘要
  2. 根因诊断
  3. prompt充分度评估
  4. 子系统/卡点地图
  5. 严重度分级和优先级排序
  6. 有证据支撑的建议
  7. 非显性发现
优先明确说明不确定性,而非提供虚假的精确结果。