ralph-review-deep
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<!-- markdownlint-disable-file MD041 -->
STARTER_CHARACTER = 🔬
<!-- markdownlint-disable-file MD041 -->
STARTER_CHARACTER = 🔬
Ralph Review (Deep)
Ralph深度评审
Validate the beads graph and launch three independent reviews of the ralph artifacts in parallel using different models via . Each model produces one review file evaluating goal clarity, task decomposition, sequencing, plan completeness, risk and gaps, and feasibility.
opencode runFor a faster single-model review using an internal subagent, use the ralph-review skill (/ralph-review, $ralph-review) instead. This deep variant trades wall time and cost for cross-model consensus.
验证beads图,并通过使用不同模型并行启动三个独立的Ralph工件评审。每个模型生成一份评审文件,评估目标清晰度、任务分解、排序、计划完整性、风险与缺口以及可行性。
opencode run如需使用内部子代理进行更快的单模型评审,请改用ralph-review技能(/ralph-review, $ralph-review)。此深度变体以耗时和成本为代价换取跨模型共识。
Models
模型
| Label | Model ID |
|---|---|
| openai | openai/gpt-5.3-codex |
| gemini | google/gemini-3.1-pro-preview |
| claude | az-anthropic/claude-opus-4-6 |
| 标签 | 模型ID |
|---|---|
| openai | openai/gpt-5.3-codex |
| gemini | google/gemini-3.1-pro-preview |
| claude | az-anthropic/claude-opus-4-6 |
Procedure
流程
Step 1: Validate Artifacts Exist
步骤1:验证工件存在
bash
PROJECT_ROOT=$(git rev-parse --show-toplevel)Verify all of these exist:
$PROJECT_ROOT/.llmtmp/ralph-plan.md$PROJECT_ROOT/.llmtmp/ralph-prompt-insession.md$PROJECT_ROOT/.llmtmp/ralph-prompt-external.md- (directory; bd-aware probe via
$PROJECT_ROOT/.beads/)bd list >/dev/null 2>&1
Abort if any are missing.
bash
PROJECT_ROOT=$(git rev-parse --show-toplevel)验证以下所有内容是否存在:
$PROJECT_ROOT/.llmtmp/ralph-plan.md$PROJECT_ROOT/.llmtmp/ralph-prompt-insession.md$PROJECT_ROOT/.llmtmp/ralph-prompt-external.md- (目录;通过
$PROJECT_ROOT/.beads/进行bd感知探测)bd list >/dev/null 2>&1
若有任何一项缺失则中止。
Step 2: Validate the bd Graph
步骤2:验证bd图
Run these checks before launching opencode. Failures here are HIGH-severity findings and should abort or be flagged loudly.
The bd graph is organized under a per-run epic. The slug is derived from the current branch; the epic is found via the label. All scoped checks filter by .
ralph:<slug>--parent ${EPIC}bash
cd "$PROJECT_ROOT"
SLUG=$(git branch --show-current | tr '/' '-')在启动opencode前运行这些检查。此处的失败属于高严重性问题,应中止流程或进行醒目标记。
bd图按每次运行的史诗(epic)进行组织。slug由当前分支派生而来;史诗通过标签查找。所有范围检查均通过进行过滤。
ralph:<slug>--parent ${EPIC}bash
cd "$PROJECT_ROOT"
SLUG=$(git branch --show-current | tr '/' '-')1. Single open epic for this run
1. 本次运行仅存在一个开放的史诗
EPIC_COUNT=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq '[.[] | select(.status=="open")] | length')
if [[ "$EPIC_COUNT" -ne 1 ]]; then
echo "HIGH: expected exactly one open epic with label ralph:${SLUG}, found ${EPIC_COUNT}"
fi
EPIC=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq -r '[.[] | select(.status=="open")] | .[0].id // empty')
EPIC_COUNT=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq '[.[] | select(.status=="open")] | length')
if [[ "$EPIC_COUNT" -ne 1 ]]; then
echo "HIGH: expected exactly one open epic with label ralph:${SLUG}, found ${EPIC_COUNT}"
fi
EPIC=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq -r '[.[] | select(.status=="open")] | .[0].id // empty')
2. Cycles
2. 循环检查
if ! bd dep cycles >/dev/null 2>&1; then
echo "HIGH: bd graph has dependency cycles"
bd dep cycles
fi
if ! bd dep cycles >/dev/null 2>&1; then
echo "HIGH: bd graph has dependency cycles"
bd dep cycles
fi
3. Minimum child count under the run epic
3. 运行史诗下的最小子任务数
COUNT=$(bd list --parent "$EPIC" --json 2>/dev/null | jq 'length')
if [[ "$COUNT" -eq 0 ]]; then
echo "HIGH: run epic ${EPIC} has no child tasks"
fi
COUNT=$(bd list --parent "$EPIC" --json 2>/dev/null | jq 'length')
if [[ "$COUNT" -eq 0 ]]; then
echo "HIGH: run epic ${EPIC} has no child tasks"
fi
4. At least one ready task under the run epic
4. 运行史诗下至少有一个就绪任务
READY_COUNT=$(bd ready --parent "$EPIC" --json 2>/dev/null | jq 'length')
if [[ "$READY_COUNT" -eq 0 && "$COUNT" -gt 0 ]]; then
echo "HIGH: run epic ${EPIC} has tasks but no ready tasks (deadlock or all closed unexpectedly)"
fi
READY_COUNT=$(bd ready --parent "$EPIC" --json 2>/dev/null | jq 'length')
if [[ "$READY_COUNT" -eq 0 && "$COUNT" -gt 0 ]]; then
echo "HIGH: run epic ${EPIC} has tasks but no ready tasks (deadlock or all closed unexpectedly)"
fi
5. Isolated nodes within the run epic
5. 运行史诗内的孤立节点
if [[ "$COUNT" -gt 1 ]]; then
bd list --parent "$EPIC" --json 2>/dev/null
| jq -r '.[] | select(.dependency_count == 0 and .dependent_count == 0) | "MEDIUM: isolated task " + .id + " " + .title' fi
| jq -r '.[] | select(.dependency_count == 0 and .dependent_count == 0) | "MEDIUM: isolated task " + .id + " " + .title' fi
if [[ "$COUNT" -gt 1 ]]; then
bd list --parent "$EPIC" --json 2>/dev/null
| jq -r '.[] | select(.dependency_count == 0 and .dependent_count == 0) | "MEDIUM: isolated task " + .id + " " + .title' fi
| jq -r '.[] | select(.dependency_count == 0 and .dependent_count == 0) | "MEDIUM: isolated task " + .id + " " + .title' fi
6. Description coverage: every child carries the worker's instructions
6. 描述覆盖率:每个子任务都包含执行者的指令
bd list --parent "$EPIC" --json 2>/dev/null
| jq -r '.[] | select(((.description // "") | length) == 0) | "HIGH: task " + .id + " has empty description: " + .title'
| jq -r '.[] | select(((.description // "") | length) == 0) | "HIGH: task " + .id + " has empty description: " + .title'
bd list --parent "$EPIC" --json 2>/dev/null
| jq -r '.[] | select(((.description // "") | length) == 0) | "HIGH: task " + .id + " has empty description: " + .title'
| jq -r '.[] | select(((.description // "") | length) == 0) | "HIGH: task " + .id + " has empty description: " + .title'
7. Parent set equals label set (defense-in-depth orphan detection)
7. 父集合与标签集合一致(深度防御式孤立任务检测)
PARENT_IDS=$(bd list --parent "$EPIC" --json 2>/dev/null | jq -r '.[].id' | sort)
LABEL_IDS=$(bd list --label "ralph:${SLUG}" --json 2>/dev/null | jq -r '.[] | select(.issue_type != "epic") | .id' | sort)
if [[ "$PARENT_IDS" != "$LABEL_IDS" ]]; then
echo "HIGH: parent set (--parent ${EPIC}) and label set (--label ralph:${SLUG} excluding epic) differ"
echo "Parent-only (label missing): $(comm -23 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))"
echo "Label-only (orphan, parent missing): $(comm -13 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))"
fi
Persist these findings; the reviewers reference them in their final reports.PARENT_IDS=$(bd list --parent "$EPIC" --json 2>/dev/null | jq -r '.[].id' | sort)
LABEL_IDS=$(bd list --label "ralph:${SLUG}" --json 2>/dev/null | jq -r '.[] | select(.issue_type != "epic") | .id' | sort)
if [[ "$PARENT_IDS" != "$LABEL_IDS" ]]; then
echo "HIGH: parent set (--parent ${EPIC}) and label set (--label ralph:${SLUG} excluding epic) differ"
echo "Parent-only (label missing): $(comm -23 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))"
echo "Label-only (orphan, parent missing): $(comm -13 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))"
fi
保存这些检查结果;评审人员会在最终报告中参考这些内容。Step 3: Clean Up Previous Reviews
步骤3:清理之前的评审文件
bash
rm -f "$PROJECT_ROOT"/.llmtmp/ralph-review-openai.md \
"$PROJECT_ROOT"/.llmtmp/ralph-review-gemini.md \
"$PROJECT_ROOT"/.llmtmp/ralph-review-claude.mdbash
rm -f "$PROJECT_ROOT"/.llmtmp/ralph-review-openai.md \
"$PROJECT_ROOT"/.llmtmp/ralph-review-gemini.md \
"$PROJECT_ROOT"/.llmtmp/ralph-review-claude.mdStep 4: Read Files and Build Prompt Context
步骤4:读取文件并构建提示词上下文
Concatenate the contents of these files into a single string:
CONTEXT.llmtmp/ralph-plan.md.llmtmp/ralph-prompt-insession.md.llmtmp/ralph-prompt-external.md- The slug and epic ID resolved in Step 2 (literal label and
SLUG=...)EPIC=... - Output of
bd list --parent "$EPIC" --json - Output of
bd ready --parent "$EPIC" --json - Output of
bd list --label "ralph:${SLUG}" --json - The bd graph validation findings from Step 2
- Plan documents listed in the plan's line. Parse the line, split on commas, trim whitespace, read each path. Wrap each as
plan documents:so the reviewer can distinguish them from baseline. Empty value means no plan documents to load.=== plan-documents/<path> === - and
CLAUDE.mdfrom project root (if they exist)README.md - All files in
.md.llmdocs/
Wrap each as:
text
=== <relative-path or label> ===
<contents>将以下文件的内容拼接成单个字符串:
CONTEXT.llmtmp/ralph-plan.md.llmtmp/ralph-prompt-insession.md.llmtmp/ralph-prompt-external.md- 步骤2中解析得到的slug和史诗ID(字面标签和
SLUG=...)EPIC=... - 的输出
bd list --parent "$EPIC" --json - 的输出
bd ready --parent "$EPIC" --json - 的输出
bd list --label "ralph:${SLUG}" --json - 步骤2中的bd图验证结果
- 计划中行列出的计划文档。解析该行,按逗号分割,去除空格,读取每个路径。将每个路径包装为
plan documents:,以便评审人员区分基线内容。空值表示无需加载计划文档。=== plan-documents/<path> === - 项目根目录下的和
CLAUDE.md(若存在)README.md - 目录下的所有
.llmdocs/文件.md
将每个内容块包装为:
text
=== <相对路径或标签> ===
<内容>Step 5: Construct the Review Prompt
步骤5:构建评审提示词
Build the prompt that each opencode process executes. Replace and :
<PROJECT_ROOT><CONTEXT>text
PROMPT="You are a plan reviewer running headless in a non-interactive session. There is no user present. Do not ask questions. Do not prompt for confirmation. Do not stop to wait for input.
YOUR PRIMARY OBJECTIVE IS TO WRITE A FILE. Everything else is preparation. After completing your review, your VERY NEXT action must be a tool call that writes the output file. Do not summarize, do not reflect, do not plan — write the file immediately.
OUTPUT RULES: You are running headless. Nobody reads your text responses. ALL review content goes exclusively into the output file via tool calls. Any text you generate outside of a tool call is wasted tokens and risks hitting the output token limit before you can write the file. Keep text responses to one short sentence at most.
PROJECT_ROOT: <PROJECT_ROOT>构建每个opencode流程执行的提示词。替换和:
<PROJECT_ROOT><CONTEXT>text
PROMPT="You are a plan reviewer running headless in a non-interactive session. There is no user present. Do not ask questions. Do not prompt for confirmation. Do not stop to wait for input.
YOUR PRIMARY OBJECTIVE IS TO WRITE A FILE. Everything else is preparation. After completing your review, your VERY NEXT action must be a tool call that writes the output file. Do not summarize, do not reflect, do not plan — write the file immediately.
OUTPUT RULES: You are running headless. Nobody reads your text responses. ALL review content goes exclusively into the output file via tool calls. Any text you generate outside of a tool call is wasted tokens and risks hitting the output token limit before you can write the file. Keep text responses to one short sentence at most.
PROJECT_ROOT: <PROJECT_ROOT>Context
Context
<CONTEXT>
<CONTEXT>
How Ralph Loops Work
How Ralph Loops Work
A ralph loop is an autonomous, iterative execution mechanism. Two execution modes share the same artifacts:
- In-session (ralph-loop plugin): a Stop hook re-feeds the prompt to the same Claude session. Cheap on short runs; expensive on long runs because context grows quadratically.
- External (scripts/ralph.sh): a bash while-loop spawns a fresh per iteration. Higher fixed overhead per iteration; stays in the cheap part of the cost curve indefinitely.
claude --print
Authority sources in CONTEXT:
The CONTEXT contains two distinct authority categories. Treat them differently.
- Plan documents (under blocks): forward-looking intent the planner consulted, e.g., a PRD, a Jira ticket, a design note. The plan must faithfully render these. Flag the plan if it omits, contradicts, or stale-references a requirement from a plan document.
=== plan-documents/... - Baseline docs (,
CLAUDE.md,README.md): the system's current state. The plan's job is to change this state. Do NOT flag the plan for deviating from baseline. Flag only if the plan changes baseline without including a corresponding docs-update bead (typically delegated to the docs skill)..llmdocs/*
Mechanics that apply to both modes:
- The bd graph is the state machine. The graph is shared across worktrees of the same repo. Each ralph run is organized under a per-run epic so concurrent runs in different worktrees stay isolated by filtering.
--parentfinds the next unblocked task within this run.bd ready --parent <epic> --json - The run epic carries the label where
ralph:<slug>is derived from the current branch. Every child task carries both<slug>and the same--parent <epic>label. The two signals are independent so the reviewer can detect orphans.ralph:<slug> - The plan's "Run Identity" section records the literal slug and epic ID. The agent reads them directly from the plan each iteration, no derivation.
- One task per iteration. The agent reads , takes the first result, runs the per-task workflow, then closes the task.
bd ready --parent <epic> --json - When the ready queue is empty, the agent closes the epic and emits the sentinel as the terminal action. The epic is not closed by automatically.
bd - Resumable. A partially-completed graph resumes cleanly. Partial completion is normal; iterations are cheap.
- Every action is a bd task, including ralph lifecycle scaffolding. The branch creation, BEGIN tag, code work, docs, and END tag are all first-class beads, all parented to the run epic.
- Each bead carries its full per-task instructions in its description: files to touch, expected behavior, verification command, commit message. The worker reads and executes the description exactly.
bd show <id> - Run-wide conventions (commit style, language, build command, test command) live in , not in every description.
ralph-plan.md - Environment is a precondition. Credentials, network, toolchain, and runtime dependencies are the operator's responsibility. Do not flag missing environment setup as a gap.
Do NOT recommend: collapsing tasks into bundles, adding environment preflight checks, treating partial completion as a risk, modifying the prompt body, replacing the bd graph with a markdown checklist, or reintroducing a workflow-label taxonomy. The label is run-identity for partition and orphan detection, not workflow dispatch; do not flag it as a label taxonomy.
ralph:<slug>Flag as HIGH severity if: zero or more than one open epic exists for ; any open child has an empty description; the bd graph contains cycles; the run epic has no children; the run epic has children but no ready tasks; the parent set under the epic differs from the label set for (orphans or planning typos); the per-task workflow does not explicitly stop after (the agent will do multiple beads in one context, defeating the Ralph design); the per-task workflow does not close the epic before emitting the sentinel; the plan lacks a "Run Identity" section recording the literal slug and epic ID; the plan lacks an "Inputs" section with a line; the plan omits or contradicts a material requirement explicit in any plan document; the plan's Approach prose declares an ordering ("X first", "after Y") that the bd dependency edges do not enforce. Flag as MEDIUM if: a description omits the verification command or commit message; an isolated task exists in a multi-task graph; the plan duplicates information that already lives in bead descriptions; a bead that touches README.md/CLAUDE.md/.llmdocs/ authors edits inline instead of running the docs skill.
ralph:<slug>ralph:<slug>bd closeplan documents:A ralph loop is an autonomous, iterative execution mechanism. Two execution modes share the same artifacts:
- In-session (ralph-loop plugin): a Stop hook re-feeds the prompt to the same Claude session. Cheap on short runs; expensive on long runs because context grows quadratically.
- External (scripts/ralph.sh): a bash while-loop spawns a fresh per iteration. Higher fixed overhead per iteration; stays in the cheap part of the cost curve indefinitely.
claude --print
Authority sources in CONTEXT:
The CONTEXT contains two distinct authority categories. Treat them differently.
- Plan documents (under blocks): forward-looking intent the planner consulted, e.g., a PRD, a Jira ticket, a design note. The plan must faithfully render these. Flag the plan if it omits, contradicts, or stale-references a requirement from a plan document.
=== plan-documents/... - Baseline docs (,
CLAUDE.md,README.md): the system's current state. The plan's job is to change this state. Do NOT flag the plan for deviating from baseline. Flag only if the plan changes baseline without including a corresponding docs-update bead (typically delegated to the docs skill)..llmdocs/*
Mechanics that apply to both modes:
- The bd graph is the state machine. The graph is shared across worktrees of the same repo. Each ralph run is organized under a per-run epic so concurrent runs in different worktrees stay isolated by filtering.
--parentfinds the next unblocked task within this run.bd ready --parent <epic> --json - The run epic carries the label where
ralph:<slug>is derived from the current branch. Every child task carries both<slug>and the same--parent <epic>label. The two signals are independent so the reviewer can detect orphans.ralph:<slug> - The plan's "Run Identity" section records the literal slug and epic ID. The agent reads them directly from the plan each iteration, no derivation.
- One task per iteration. The agent reads , takes the first result, runs the per-task workflow, then closes the task.
bd ready --parent <epic> --json - When the ready queue is empty, the agent closes the epic and emits the sentinel as the terminal action. The epic is not closed by automatically.
bd - Resumable. A partially-completed graph resumes cleanly. Partial completion is normal; iterations are cheap.
- Every action is a bd task, including ralph lifecycle scaffolding. The branch creation, BEGIN tag, code work, docs, and END tag are all first-class beads, all parented to the run epic.
- Each bead carries its full per-task instructions in its description: files to touch, expected behavior, verification command, commit message. The worker reads and executes the description exactly.
bd show <id> - Run-wide conventions (commit style, language, build command, test command) live in , not in every description.
ralph-plan.md - Environment is a precondition. Credentials, network, toolchain, and runtime dependencies are the operator's responsibility. Do not flag missing environment setup as a gap.
Do NOT recommend: collapsing tasks into bundles, adding environment preflight checks, treating partial completion as a risk, modifying the prompt body, replacing the bd graph with a markdown checklist, or reintroducing a workflow-label taxonomy. The label is run-identity for partition and orphan detection, not workflow dispatch; do not flag it as a label taxonomy.
ralph:<slug>Flag as HIGH severity if: zero or more than one open epic exists for ; any open child has an empty description; the bd graph contains cycles; the run epic has no children; the run epic has children but no ready tasks; the parent set under the epic differs from the label set for (orphans or planning typos); the per-task workflow does not explicitly stop after (the agent will do multiple beads in one context, defeating the Ralph design); the per-task workflow does not close the epic before emitting the sentinel; the plan lacks a "Run Identity" section recording the literal slug and epic ID; the plan lacks an "Inputs" section with a line; the plan omits or contradicts a material requirement explicit in any plan document; the plan's Approach prose declares an ordering ("X first", "after Y") that the bd dependency edges do not enforce. Flag as MEDIUM if: a description omits the verification command or commit message; an isolated task exists in a multi-task graph; the plan duplicates information that already lives in bead descriptions; a bead that touches README.md/CLAUDE.md/.llmdocs/ authors edits inline instead of running the docs skill.
ralph:<slug>ralph:<slug>bd closeplan documents:Your Task
Your Task
Review the Ralph artifacts:
- ralph-plan.md (mode-neutral execution plan)
- ralph-prompt-insession.md (slash-command wrapper for the in-session driver)
- ralph-prompt-external.md (body for scripts/ralph.sh)
- The bd graph (issues, dependencies, ready/closed state)
Evaluate across these areas:
Review the Ralph artifacts:
- ralph-plan.md (mode-neutral execution plan)
- ralph-prompt-insession.md (slash-command wrapper for the in-session driver)
- ralph-prompt-external.md (body for scripts/ralph.sh)
- The bd graph (issues, dependencies, ready/closed state)
Evaluate across these areas:
1. Goal Clarity
1. Goal Clarity
- Is the objective well-defined and unambiguous?
- Would a fresh-context iteration understand what success looks like from the plan alone?
- Is the objective well-defined and unambiguous?
- Would a fresh-context iteration understand what success looks like from the plan alone?
2. Task Decomposition
2. Task Decomposition
- Are tasks the right granularity (one focused deliverable each)?
- Is each task independently completable?
- Are there missing tasks needed to achieve the goal?
- Are there unnecessary or redundant tasks?
- Do titles name deliverables rather than techniques?
- Do descriptions carry enough detail (files, verification, commit message) for a cold-start worker?
- Are tasks the right granularity (one focused deliverable each)?
- Is each task independently completable?
- Are there missing tasks needed to achieve the goal?
- Are there unnecessary or redundant tasks?
- Do titles name deliverables rather than techniques?
- Do descriptions carry enough detail (files, verification, commit message) for a cold-start worker?
3. Sequencing and Dependencies
3. Sequencing and Dependencies
- Are dependencies correct in the bd graph?
- Would executing in order produce correct results?
bd ready --json - Are there implicit dependencies that should be explicit edges?
- Are there cycles or isolated nodes?
- Are dependencies correct in the bd graph?
- Would executing in order produce correct results?
bd ready --json - Are there implicit dependencies that should be explicit edges?
- Are there cycles or isolated nodes?
4. Plan Completeness
4. Plan Completeness
- Do the instructions cover repo-specific conventions, tools, and commands?
- Are build, test, and lint commands accurate?
- Is the per-task workflow clear (read description, execute it, close)?
- Does the plan have a "Run Identity" section that records the literal branch slug, the literal epic ID, and the epic label ?
ralph:<slug> - Does the Per-Task Workflow read the epic ID from the "Run Identity" section and scope with
bd ready?--parent <epic-id> - Does the Per-Task Workflow close the epic via before emitting the sentinel when the ready queue is empty?
bd close <epic-id> - Does the Per-Task Workflow explicitly say STOP after so the agent does not loop within a single context?
bd close - Does the Git Workflow section require a feature branch and BEGIN/END tags?
ralph/ - Is the Per-Task Workflow explicit that for the current task runs BEFORE the sentinel?
bd close - Do beads that touch ,
README.md, orCLAUDE.mdinvoke the docs skill (/docs, $docs) rather than authoring edits inline?.llmdocs/
- Do the instructions cover repo-specific conventions, tools, and commands?
- Are build, test, and lint commands accurate?
- Is the per-task workflow clear (read description, execute it, close)?
- Does the plan have a "Run Identity" section that records the literal branch slug, the literal epic ID, and the epic label ?
ralph:<slug> - Does the Per-Task Workflow read the epic ID from the "Run Identity" section and scope with
bd ready?--parent <epic-id> - Does the Per-Task Workflow close the epic via before emitting the sentinel when the ready queue is empty?
bd close <epic-id> - Does the Per-Task Workflow explicitly say STOP after so the agent does not loop within a single context?
bd close - Does the Git Workflow section require a feature branch and BEGIN/END tags?
ralph/ - Is the Per-Task Workflow explicit that for the current task runs BEFORE the sentinel?
bd close - Do beads that touch ,
README.md, orCLAUDE.mdinvoke the docs skill (/docs, $docs) rather than authoring edits inline?.llmdocs/
5. Risk and Gaps
5. Risk and Gaps
- What could go wrong during autonomous execution?
- Are there tasks that require human judgment or external access?
- Could any task leave the repo in a broken state?
- What could go wrong during autonomous execution?
- Are there tasks that require human judgment or external access?
- Could any task leave the repo in a broken state?
6. Feasibility
6. Feasibility
- Can a single iteration realistically complete each task?
- Are there tasks that exceed what a fresh context can do autonomously?
For each area, list specific findings with severity (high/medium/low) and actionable recommendations.
- Can a single iteration realistically complete each task?
- Are there tasks that exceed what a fresh context can do autonomously?
For each area, list specific findings with severity (high/medium/low) and actionable recommendations.
Output
Output
Your VERY NEXT action after the review must be a tool call that writes the file. No intermediate steps.
Determine your model label:
- Claude variants: claude
- GPT variants: openai
- Gemini variants: gemini
Write to using this template:
<PROJECT_ROOT>/.llmtmp/ralph-review-$MODEL_LABEL.md```markdown
Your VERY NEXT action after the review must be a tool call that writes the file. No intermediate steps.
Determine your model label:
- Claude variants: claude
- GPT variants: openai
- Gemini variants: gemini
Write to using this template:
<PROJECT_ROOT>/.llmtmp/ralph-review-$MODEL_LABEL.md```markdown
Ralph Loop Review
Ralph Loop Review
Model: <MODEL_LABEL>
Model: <MODEL_LABEL>
Goal Clarity
Goal Clarity
- [medium] Short title. Explanation...
- [medium] Short title. Explanation...
Task Decomposition
Task Decomposition
- [high] Short title. Explanation...
- [high] Short title. Explanation...
Sequencing and Dependencies
Sequencing and Dependencies
- [medium] Short title. Explanation...
- [medium] Short title. Explanation...
Plan Completeness
Plan Completeness
- [low] Short title. Explanation...
- [low] Short title. Explanation...
Risk and Gaps
Risk and Gaps
- [high] Short title. Explanation...
- [high] Short title. Explanation...
Feasibility
Feasibility
- [medium] Short title. Explanation...
- [medium] Short title. Explanation...
Summary
Summary
<overall assessment and top 3 recommendations>
```
Finding format: `- [severity] Title. Description.`
Every section must be present. If no findings for an area, write 'No findings.' under its heading.
Then verify: `ls -la '<PROJECT_ROOT>/.llmtmp/ralph-review-<MODEL_LABEL>.md'`
If the file does not exist, write it again. Do not exit without the file."
undefined<overall assessment and top 3 recommendations>
```
Finding format: `- [severity] Title. Description.`
Every section must be present. If no findings for an area, write 'No findings.' under its heading.
Then verify: `ls -la '<PROJECT_ROOT>/.llmtmp/ralph-review-<MODEL_LABEL>.md'`
If the file does not exist, write it again. Do not exit without the file."
undefinedStep 6: Write Prompt to File
步骤6:将提示词写入文件
Avoid shell interpolation issues with large prompts:
bash
STATE_DIR="$PROJECT_ROOT/.llmtmp/ralph_review_state"
mkdir -p "$STATE_DIR"
OPENAI_DIR=$(mktemp -d)
GEMINI_DIR=$(mktemp -d)
CLAUDE_DIR=$(mktemp -d)
cat > /tmp/ralph-review-prompt.txt <<'PROMPT_EOF'
<the prompt from Step 5>
PROMPT_EOF避免大型提示词导致的shell插值问题:
bash
STATE_DIR="$PROJECT_ROOT/.llmtmp/ralph_review_state"
mkdir -p "$STATE_DIR"
OPENAI_DIR=$(mktemp -d)
GEMINI_DIR=$(mktemp -d)
CLAUDE_DIR=$(mktemp -d)
cat > /tmp/ralph-review-prompt.txt <<'PROMPT_EOF'
<the prompt from Step 5>
PROMPT_EOFStep 7: Launch 3 Separate Background Bash Tasks
步骤7:启动3个独立的后台Bash任务
Each opencode process must run as its own background Bash task. Do NOT chain them in a single shell with ; child processes get killed when the parent exits.
&OpenAI:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
-m openai/gpt-5.3-codex \
--format json \
--print-logs \
--log-level INFO \
--dir "<OPENAI_DIR>" \
--title "Ralph Review - OpenAI" \
"$(cat /tmp/ralph-review-prompt.txt)" \
> "$STATE_DIR/openai.ndjson" 2>"$STATE_DIR/openai.log"Gemini:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
-m google/gemini-3.1-pro-preview \
--format json \
--print-logs \
--log-level INFO \
--dir "<GEMINI_DIR>" \
--title "Ralph Review - Gemini" \
"$(cat /tmp/ralph-review-prompt.txt)" \
> "$STATE_DIR/gemini.ndjson" 2>"$STATE_DIR/gemini.log"Claude:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
-m az-anthropic/claude-opus-4-6 \
--format json \
--print-logs \
--log-level INFO \
--dir "<CLAUDE_DIR>" \
--title "Ralph Review - Claude" \
"$(cat /tmp/ralph-review-prompt.txt)" \
> "$STATE_DIR/claude.ndjson" 2>"$STATE_DIR/claude.log"All 3 launch as separate background tasks. Do NOT use in a single shell.
&每个opencode流程必须作为独立的后台Bash任务运行。不要在单个shell中用串联;父进程退出时子进程会被终止。
&OpenAI:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
-m openai/gpt-5.3-codex \
--format json \
--print-logs \
--log-level INFO \
--dir "<OPENAI_DIR>" \
--title "Ralph Review - OpenAI" \
"$(cat /tmp/ralph-review-prompt.txt)" \
> "$STATE_DIR/openai.ndjson" 2>"$STATE_DIR/openai.log"Gemini:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
-m google/gemini-3.1-pro-preview \
--format json \
--print-logs \
--log-level INFO \
--dir "<GEMINI_DIR>" \
--title "Ralph Review - Gemini" \
"$(cat /tmp/ralph-review-prompt.txt)" \
> "$STATE_DIR/gemini.ndjson" 2>"$STATE_DIR/gemini.log"Claude:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
-m az-anthropic/claude-opus-4-6 \
--format json \
--print-logs \
--log-level INFO \
--dir "<CLAUDE_DIR>" \
--title "Ralph Review - Claude" \
"$(cat /tmp/ralph-review-prompt.txt)" \
> "$STATE_DIR/claude.ndjson" 2>"$STATE_DIR/claude.log"三个流程均作为独立后台任务启动。不要在单个shell中使用。
&Output Streams
输出流
Each opencode process produces two output files:
- (stdout): Structured NDJSON events from
<label>.ndjson. Use for programmatic progress tracking.--format json - (stderr): Plain-text info-level logs. Use for diagnosing startup failures, permission issues, MCP server errors, plugin loading.
<label>.log
Log lines are structured text, one per line:
text
INFO 2026-03-13T00:54:25 +4ms service=default directory=/private/tmp creating instance每个opencode流程生成两个输出文件:
- (标准输出):
<label>.ndjson生成的结构化NDJSON事件。用于程序化进度跟踪。--format json - (标准错误):纯文本信息级日志。用于诊断启动失败、权限问题、MCP服务器错误、插件加载问题。
<label>.log
日志行是结构化文本,每行一条:
text
INFO 2026-03-13T00:54:25 +4ms service=default directory=/private/tmp creating instanceStep 8: Monitor and Wait
步骤8:监控并等待
Poll each background task until all 3 complete.
NDJSON progress (replace with the actual path):
$NDJSONbash
undefined轮询每个后台任务,直到三个任务全部完成。
NDJSON进度(将替换为实际路径):
$NDJSONbash
undefinedCount completed steps
统计已完成步骤数
grep -c '"type":"step_finish"' "$NDJSON" 2>/dev/null || echo 0
grep -c '"type":"step_finish"' "$NDJSON" 2>/dev/null || echo 0
Check if done (last step_finish has reason "stop")
检查是否完成(最后一个step_finish的reason为"stop")
tail -1 "$NDJSON" 2>/dev/null | jq -r '.part.reason // empty'
tail -1 "$NDJSON" 2>/dev/null | jq -r '.part.reason // empty'
Check for errors
检查错误
grep '"type":"error"' "$NDJSON" 2>/dev/null
Text logs (replace `$LOGFILE` with the actual path):
```bashgrep '"type":"error"' "$NDJSON" 2>/dev/null
文本日志(将`$LOGFILE`替换为实际路径):
```bashErrors or warnings
错误或警告
grep -E "^(ERROR|WARN)" "$LOGFILE" 2>/dev/null
grep -E "^(ERROR|WARN)" "$LOGFILE" 2>/dev/null
Recent activity
最近活动
tail -5 "$LOGFILE" 2>/dev/null
undefinedtail -5 "$LOGFILE" 2>/dev/null
undefinedStep 9: Report Results
步骤9:报告结果
For each model, check whether exists.
.llmtmp/ralph-review-<label>.mdReport per model:
- Success or failure (file present or not)
- Total cost: sum of from all
costeventsstep_finish - Whether any errors occurred
Surface the bd graph validation findings from Step 2 separately at the top of the report.
针对每个模型,检查是否存在。
.llmtmp/ralph-review-<label>.md按模型报告:
- 成功或失败(文件是否存在)
- 总成本:所有事件中
step_finish的总和cost - 是否发生任何错误
在报告顶部单独列出步骤2中的bd图验证结果。
Step 10: Cleanup
步骤10:清理
The state directory is preserved for post-run inspection (per-model NDJSON event streams and text logs are useful for diagnosing review failures, calibration regressions, and cost trends).
bash
rm -f /tmp/ralph-review-prompt.txt$STATE_DIR.llmtmp/ralph_review_state/状态目录会被保留,用于运行后检查(每个模型的NDJSON事件流和文本日志可用于诊断评审失败、校准回归和成本趋势)。
bash
rm -f /tmp/ralph-review-prompt.txt$STATE_DIR.llmtmp/ralph_review_state/Expected Output Files
预期输出文件
3 files total, one per model:
.llmtmp/ralph-review-openai.md.llmtmp/ralph-review-gemini.md.llmtmp/ralph-review-claude.md
共3个文件,每个模型对应一个:
.llmtmp/ralph-review-openai.md.llmtmp/ralph-review-gemini.md.llmtmp/ralph-review-claude.md
NDJSON Log Format Reference
NDJSON日志格式参考
Each opencode process writes NDJSON to . One JSON object per line. Skip lines that fail to parse (partial writes).
$STATE_DIR/<label>.ndjsonKey event types:
- step_start - New LLM turn begins.
- text - Model emitted text:
{"type":"text","part":{"text":"..."}} - tool_use - Tool call:
{"type":"tool_use","part":{"tool":"bash","state":{"status":"completed","metadata":{"exit":0}}}} - step_finish - Turn completed: .
{"type":"step_finish","part":{"reason":"stop","cost":0,"tokens":{"total":13494}}}= done,reason: "stop"= continuing.reason: "tool-calls" - error - Session error:
{"type":"error","error":{"data":{"message":"..."}}}
Useful jq queries (replace with the actual path):
$NDJSONbash
NDJSON="<PROJECT_ROOT>/.llmtmp/ralph_review_state/openai.ndjson"每个opencode流程会将NDJSON写入。每行一个JSON对象。跳过解析失败的行(部分写入)。
$STATE_DIR/<label>.ndjson关键事件类型:
- step_start - 新的LLM轮次开始。
- text - 模型生成文本:
{"type":"text","part":{"text":"..."}} - tool_use - 工具调用:
{"type":"tool_use","part":{"tool":"bash","state":{"status":"completed","metadata":{"exit":0}}}} - step_finish - 轮次完成:。
{"type":"step_finish","part":{"reason":"stop","cost":0,"tokens":{"total":13494}}}表示完成,reason: "stop"表示继续。reason: "tool-calls" - error - 会话错误:
{"type":"error","error":{"data":{"message":"..."}}}
实用jq查询(将替换为实际路径):
$NDJSONbash
NDJSON="<PROJECT_ROOT>/.llmtmp/ralph_review_state/openai.ndjson"Is done?
是否已完成?
tail -1 "$NDJSON" | jq -r 'select(.type=="step_finish") | .part.reason'
tail -1 "$NDJSON" | jq -r 'select(.type=="step_finish") | .part.reason'
Total cost
总成本
jq -s '[.[] | select(.type=="step_finish") | .part.cost] | add' "$NDJSON"
jq -s '[.[] | select(.type=="step_finish") | .part.cost] | add' "$NDJSON"
All errors
所有错误
jq -r 'select(.type=="error") | .error.data.message' "$NDJSON"
Replace `$LOGFILE` with the text log path:
```bash
grep -E "^(ERROR|WARN)" "$LOGFILE"jq -r 'select(.type=="error") | .error.data.message' "$NDJSON"
将`$LOGFILE`替换为文本日志路径:
```bash
grep -E "^(ERROR|WARN)" "$LOGFILE"Rules
规则
- The invoking agent is ONLY a launcher. All review work happens inside the opencode processes.
- Do NOT perform any review analysis directly. The opencode processes handle reviewing.
- Do NOT ask questions during execution. This is non-interactive.
- Launch all 3 processes in parallel. Do not wait for one to finish before starting another.
- Use plain message invocation, not . The
--commandflag has a known issue with the c7 MCP server.--command - Preserve the state directory after reporting results. NDJSON and log files stay for post-run inspection.
- If a model fails, still wait for and report the others.
- The bd graph validation in Step 2 runs in the invoking agent, not in opencode. Findings are passed to opencode reviewers as context.
- 调用代理仅作为启动器。所有评审工作在opencode流程内完成。
- 不要直接执行任何评审分析。opencode流程负责评审工作。
- 执行过程中不要提问。此流程为非交互式。
- 并行启动三个流程。不要等待一个流程完成后再启动另一个。
- 使用普通消息调用,而非。
--command标志在c7 MCP服务器上存在已知问题。--command - 报告结果后保留状态目录。NDJSON和日志文件用于运行后检查。
- 若某个模型失败,仍需等待并报告其他模型的结果。
- 步骤2中的bd图验证在调用代理中运行,而非opencode。验证结果会作为上下文传递给opencode评审人员。