ralph-review-deep

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- markdownlint-disable-file MD041 -->
STARTER_CHARACTER = 🔬
<!-- markdownlint-disable-file MD041 -->
STARTER_CHARACTER = 🔬

Ralph Review (Deep)

Ralph深度评审

Validate the beads graph and launch three independent reviews of the ralph artifacts in parallel using different models via
opencode run
. Each model produces one review file evaluating goal clarity, task decomposition, sequencing, plan completeness, risk and gaps, and feasibility.
For a faster single-model review using an internal subagent, use the ralph-review skill (/ralph-review, $ralph-review) instead. This deep variant trades wall time and cost for cross-model consensus.
验证beads图,并通过
opencode run
使用不同模型并行启动三个独立的Ralph工件评审。每个模型生成一份评审文件,评估目标清晰度、任务分解、排序、计划完整性、风险与缺口以及可行性。
如需使用内部子代理进行更快的单模型评审,请改用ralph-review技能(/ralph-review, $ralph-review)。此深度变体以耗时和成本为代价换取跨模型共识。

Models

模型

LabelModel ID
openaiopenai/gpt-5.3-codex
geminigoogle/gemini-3.1-pro-preview
claudeaz-anthropic/claude-opus-4-6
标签模型ID
openaiopenai/gpt-5.3-codex
geminigoogle/gemini-3.1-pro-preview
claudeaz-anthropic/claude-opus-4-6

Procedure

流程

Step 1: Validate Artifacts Exist

步骤1:验证工件存在

bash
PROJECT_ROOT=$(git rev-parse --show-toplevel)
Verify all of these exist:
  • $PROJECT_ROOT/.llmtmp/ralph-plan.md
  • $PROJECT_ROOT/.llmtmp/ralph-prompt-insession.md
  • $PROJECT_ROOT/.llmtmp/ralph-prompt-external.md
  • $PROJECT_ROOT/.beads/
    (directory; bd-aware probe via
    bd list >/dev/null 2>&1
    )
Abort if any are missing.
bash
PROJECT_ROOT=$(git rev-parse --show-toplevel)
验证以下所有内容是否存在:
  • $PROJECT_ROOT/.llmtmp/ralph-plan.md
  • $PROJECT_ROOT/.llmtmp/ralph-prompt-insession.md
  • $PROJECT_ROOT/.llmtmp/ralph-prompt-external.md
  • $PROJECT_ROOT/.beads/
    (目录;通过
    bd list >/dev/null 2>&1
    进行bd感知探测)
若有任何一项缺失则中止。

Step 2: Validate the bd Graph

步骤2:验证bd图

Run these checks before launching opencode. Failures here are HIGH-severity findings and should abort or be flagged loudly.
The bd graph is organized under a per-run epic. The slug is derived from the current branch; the epic is found via the
ralph:<slug>
label. All scoped checks filter by
--parent ${EPIC}
.
bash
cd "$PROJECT_ROOT"

SLUG=$(git branch --show-current | tr '/' '-')
在启动opencode前运行这些检查。此处的失败属于高严重性问题,应中止流程或进行醒目标记。
bd图按每次运行的史诗(epic)进行组织。slug由当前分支派生而来;史诗通过
ralph:<slug>
标签查找。所有范围检查均通过
--parent ${EPIC}
进行过滤。
bash
cd "$PROJECT_ROOT"

SLUG=$(git branch --show-current | tr '/' '-')

1. Single open epic for this run

1. 本次运行仅存在一个开放的史诗

EPIC_COUNT=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq '[.[] | select(.status=="open")] | length') if [[ "$EPIC_COUNT" -ne 1 ]]; then echo "HIGH: expected exactly one open epic with label ralph:${SLUG}, found ${EPIC_COUNT}" fi EPIC=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq -r '[.[] | select(.status=="open")] | .[0].id // empty')
EPIC_COUNT=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq '[.[] | select(.status=="open")] | length') if [[ "$EPIC_COUNT" -ne 1 ]]; then echo "HIGH: expected exactly one open epic with label ralph:${SLUG}, found ${EPIC_COUNT}" fi EPIC=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq -r '[.[] | select(.status=="open")] | .[0].id // empty')

2. Cycles

2. 循环检查

if ! bd dep cycles >/dev/null 2>&1; then echo "HIGH: bd graph has dependency cycles" bd dep cycles fi
if ! bd dep cycles >/dev/null 2>&1; then echo "HIGH: bd graph has dependency cycles" bd dep cycles fi

3. Minimum child count under the run epic

3. 运行史诗下的最小子任务数

COUNT=$(bd list --parent "$EPIC" --json 2>/dev/null | jq 'length') if [[ "$COUNT" -eq 0 ]]; then echo "HIGH: run epic ${EPIC} has no child tasks" fi
COUNT=$(bd list --parent "$EPIC" --json 2>/dev/null | jq 'length') if [[ "$COUNT" -eq 0 ]]; then echo "HIGH: run epic ${EPIC} has no child tasks" fi

4. At least one ready task under the run epic

4. 运行史诗下至少有一个就绪任务

READY_COUNT=$(bd ready --parent "$EPIC" --json 2>/dev/null | jq 'length') if [[ "$READY_COUNT" -eq 0 && "$COUNT" -gt 0 ]]; then echo "HIGH: run epic ${EPIC} has tasks but no ready tasks (deadlock or all closed unexpectedly)" fi
READY_COUNT=$(bd ready --parent "$EPIC" --json 2>/dev/null | jq 'length') if [[ "$READY_COUNT" -eq 0 && "$COUNT" -gt 0 ]]; then echo "HIGH: run epic ${EPIC} has tasks but no ready tasks (deadlock or all closed unexpectedly)" fi

5. Isolated nodes within the run epic

5. 运行史诗内的孤立节点

if [[ "$COUNT" -gt 1 ]]; then bd list --parent "$EPIC" --json 2>/dev/null
| jq -r '.[] | select(.dependency_count == 0 and .dependent_count == 0) | "MEDIUM: isolated task " + .id + " " + .title' fi
if [[ "$COUNT" -gt 1 ]]; then bd list --parent "$EPIC" --json 2>/dev/null
| jq -r '.[] | select(.dependency_count == 0 and .dependent_count == 0) | "MEDIUM: isolated task " + .id + " " + .title' fi

6. Description coverage: every child carries the worker's instructions

6. 描述覆盖率:每个子任务都包含执行者的指令

bd list --parent "$EPIC" --json 2>/dev/null
| jq -r '.[] | select(((.description // "") | length) == 0) | "HIGH: task " + .id + " has empty description: " + .title'
bd list --parent "$EPIC" --json 2>/dev/null
| jq -r '.[] | select(((.description // "") | length) == 0) | "HIGH: task " + .id + " has empty description: " + .title'

7. Parent set equals label set (defense-in-depth orphan detection)

7. 父集合与标签集合一致(深度防御式孤立任务检测)

PARENT_IDS=$(bd list --parent "$EPIC" --json 2>/dev/null | jq -r '.[].id' | sort) LABEL_IDS=$(bd list --label "ralph:${SLUG}" --json 2>/dev/null | jq -r '.[] | select(.issue_type != "epic") | .id' | sort) if [[ "$PARENT_IDS" != "$LABEL_IDS" ]]; then echo "HIGH: parent set (--parent ${EPIC}) and label set (--label ralph:${SLUG} excluding epic) differ" echo "Parent-only (label missing): $(comm -23 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))" echo "Label-only (orphan, parent missing): $(comm -13 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))" fi

Persist these findings; the reviewers reference them in their final reports.
PARENT_IDS=$(bd list --parent "$EPIC" --json 2>/dev/null | jq -r '.[].id' | sort) LABEL_IDS=$(bd list --label "ralph:${SLUG}" --json 2>/dev/null | jq -r '.[] | select(.issue_type != "epic") | .id' | sort) if [[ "$PARENT_IDS" != "$LABEL_IDS" ]]; then echo "HIGH: parent set (--parent ${EPIC}) and label set (--label ralph:${SLUG} excluding epic) differ" echo "Parent-only (label missing): $(comm -23 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))" echo "Label-only (orphan, parent missing): $(comm -13 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))" fi

保存这些检查结果;评审人员会在最终报告中参考这些内容。

Step 3: Clean Up Previous Reviews

步骤3:清理之前的评审文件

bash
rm -f "$PROJECT_ROOT"/.llmtmp/ralph-review-openai.md \
      "$PROJECT_ROOT"/.llmtmp/ralph-review-gemini.md \
      "$PROJECT_ROOT"/.llmtmp/ralph-review-claude.md
bash
rm -f "$PROJECT_ROOT"/.llmtmp/ralph-review-openai.md \
      "$PROJECT_ROOT"/.llmtmp/ralph-review-gemini.md \
      "$PROJECT_ROOT"/.llmtmp/ralph-review-claude.md

Step 4: Read Files and Build Prompt Context

步骤4:读取文件并构建提示词上下文

Concatenate the contents of these files into a single
CONTEXT
string:
  1. .llmtmp/ralph-plan.md
  2. .llmtmp/ralph-prompt-insession.md
  3. .llmtmp/ralph-prompt-external.md
  4. The slug and epic ID resolved in Step 2 (literal label
    SLUG=...
    and
    EPIC=...
    )
  5. Output of
    bd list --parent "$EPIC" --json
  6. Output of
    bd ready --parent "$EPIC" --json
  7. Output of
    bd list --label "ralph:${SLUG}" --json
  8. The bd graph validation findings from Step 2
  9. Plan documents listed in the plan's
    plan documents:
    line. Parse the line, split on commas, trim whitespace, read each path. Wrap each as
    === plan-documents/<path> ===
    so the reviewer can distinguish them from baseline. Empty value means no plan documents to load.
  10. CLAUDE.md
    and
    README.md
    from project root (if they exist)
  11. All
    .md
    files in
    .llmdocs/
Wrap each as:
text
=== <relative-path or label> ===
<contents>
将以下文件的内容拼接成单个
CONTEXT
字符串:
  1. .llmtmp/ralph-plan.md
  2. .llmtmp/ralph-prompt-insession.md
  3. .llmtmp/ralph-prompt-external.md
  4. 步骤2中解析得到的slug和史诗ID(字面标签
    SLUG=...
    EPIC=...
  5. bd list --parent "$EPIC" --json
    的输出
  6. bd ready --parent "$EPIC" --json
    的输出
  7. bd list --label "ralph:${SLUG}" --json
    的输出
  8. 步骤2中的bd图验证结果
  9. 计划中
    plan documents:
    行列出的计划文档。解析该行,按逗号分割,去除空格,读取每个路径。将每个路径包装为
    === plan-documents/<path> ===
    ,以便评审人员区分基线内容。空值表示无需加载计划文档。
  10. 项目根目录下的
    CLAUDE.md
    README.md
    (若存在)
  11. .llmdocs/
    目录下的所有
    .md
    文件
将每个内容块包装为:
text
=== <相对路径或标签> ===
<内容>

Step 5: Construct the Review Prompt

步骤5:构建评审提示词

Build the prompt that each opencode process executes. Replace
<PROJECT_ROOT>
and
<CONTEXT>
:
text
PROMPT="You are a plan reviewer running headless in a non-interactive session. There is no user present. Do not ask questions. Do not prompt for confirmation. Do not stop to wait for input.

YOUR PRIMARY OBJECTIVE IS TO WRITE A FILE. Everything else is preparation. After completing your review, your VERY NEXT action must be a tool call that writes the output file. Do not summarize, do not reflect, do not plan — write the file immediately.

OUTPUT RULES: You are running headless. Nobody reads your text responses. ALL review content goes exclusively into the output file via tool calls. Any text you generate outside of a tool call is wasted tokens and risks hitting the output token limit before you can write the file. Keep text responses to one short sentence at most.

PROJECT_ROOT: <PROJECT_ROOT>
构建每个opencode流程执行的提示词。替换
<PROJECT_ROOT>
<CONTEXT>
text
PROMPT="You are a plan reviewer running headless in a non-interactive session. There is no user present. Do not ask questions. Do not prompt for confirmation. Do not stop to wait for input.

YOUR PRIMARY OBJECTIVE IS TO WRITE A FILE. Everything else is preparation. After completing your review, your VERY NEXT action must be a tool call that writes the output file. Do not summarize, do not reflect, do not plan — write the file immediately.

OUTPUT RULES: You are running headless. Nobody reads your text responses. ALL review content goes exclusively into the output file via tool calls. Any text you generate outside of a tool call is wasted tokens and risks hitting the output token limit before you can write the file. Keep text responses to one short sentence at most.

PROJECT_ROOT: <PROJECT_ROOT>

Context

Context

<CONTEXT>
<CONTEXT>

How Ralph Loops Work

How Ralph Loops Work

A ralph loop is an autonomous, iterative execution mechanism. Two execution modes share the same artifacts:
  • In-session (ralph-loop plugin): a Stop hook re-feeds the prompt to the same Claude session. Cheap on short runs; expensive on long runs because context grows quadratically.
  • External (scripts/ralph.sh): a bash while-loop spawns a fresh
    claude --print
    per iteration. Higher fixed overhead per iteration; stays in the cheap part of the cost curve indefinitely.
Authority sources in CONTEXT:
The CONTEXT contains two distinct authority categories. Treat them differently.
  • Plan documents (under
    === plan-documents/...
    blocks): forward-looking intent the planner consulted, e.g., a PRD, a Jira ticket, a design note. The plan must faithfully render these. Flag the plan if it omits, contradicts, or stale-references a requirement from a plan document.
  • Baseline docs (
    CLAUDE.md
    ,
    README.md
    ,
    .llmdocs/*
    ): the system's current state. The plan's job is to change this state. Do NOT flag the plan for deviating from baseline. Flag only if the plan changes baseline without including a corresponding docs-update bead (typically delegated to the docs skill).
Mechanics that apply to both modes:
  • The bd graph is the state machine. The graph is shared across worktrees of the same repo. Each ralph run is organized under a per-run epic so concurrent runs in different worktrees stay isolated by
    --parent
    filtering.
    bd ready --parent <epic> --json
    finds the next unblocked task within this run.
  • The run epic carries the label
    ralph:<slug>
    where
    <slug>
    is derived from the current branch. Every child task carries both
    --parent <epic>
    and the same
    ralph:<slug>
    label. The two signals are independent so the reviewer can detect orphans.
  • The plan's "Run Identity" section records the literal slug and epic ID. The agent reads them directly from the plan each iteration, no derivation.
  • One task per iteration. The agent reads
    bd ready --parent <epic> --json
    , takes the first result, runs the per-task workflow, then closes the task.
  • When the ready queue is empty, the agent closes the epic and emits the sentinel as the terminal action. The epic is not closed by
    bd
    automatically.
  • Resumable. A partially-completed graph resumes cleanly. Partial completion is normal; iterations are cheap.
  • Every action is a bd task, including ralph lifecycle scaffolding. The branch creation, BEGIN tag, code work, docs, and END tag are all first-class beads, all parented to the run epic.
  • Each bead carries its full per-task instructions in its description: files to touch, expected behavior, verification command, commit message. The worker reads
    bd show <id>
    and executes the description exactly.
  • Run-wide conventions (commit style, language, build command, test command) live in
    ralph-plan.md
    , not in every description.
  • Environment is a precondition. Credentials, network, toolchain, and runtime dependencies are the operator's responsibility. Do not flag missing environment setup as a gap.
Do NOT recommend: collapsing tasks into bundles, adding environment preflight checks, treating partial completion as a risk, modifying the prompt body, replacing the bd graph with a markdown checklist, or reintroducing a workflow-label taxonomy. The
ralph:<slug>
label is run-identity for partition and orphan detection, not workflow dispatch; do not flag it as a label taxonomy.
Flag as HIGH severity if: zero or more than one open epic exists for
ralph:<slug>
; any open child has an empty description; the bd graph contains cycles; the run epic has no children; the run epic has children but no ready tasks; the parent set under the epic differs from the label set for
ralph:<slug>
(orphans or planning typos); the per-task workflow does not explicitly stop after
bd close
(the agent will do multiple beads in one context, defeating the Ralph design); the per-task workflow does not close the epic before emitting the sentinel; the plan lacks a "Run Identity" section recording the literal slug and epic ID; the plan lacks an "Inputs" section with a
plan documents:
line; the plan omits or contradicts a material requirement explicit in any plan document; the plan's Approach prose declares an ordering ("X first", "after Y") that the bd dependency edges do not enforce. Flag as MEDIUM if: a description omits the verification command or commit message; an isolated task exists in a multi-task graph; the plan duplicates information that already lives in bead descriptions; a bead that touches README.md/CLAUDE.md/.llmdocs/ authors edits inline instead of running the docs skill.
A ralph loop is an autonomous, iterative execution mechanism. Two execution modes share the same artifacts:
  • In-session (ralph-loop plugin): a Stop hook re-feeds the prompt to the same Claude session. Cheap on short runs; expensive on long runs because context grows quadratically.
  • External (scripts/ralph.sh): a bash while-loop spawns a fresh
    claude --print
    per iteration. Higher fixed overhead per iteration; stays in the cheap part of the cost curve indefinitely.
Authority sources in CONTEXT:
The CONTEXT contains two distinct authority categories. Treat them differently.
  • Plan documents (under
    === plan-documents/...
    blocks): forward-looking intent the planner consulted, e.g., a PRD, a Jira ticket, a design note. The plan must faithfully render these. Flag the plan if it omits, contradicts, or stale-references a requirement from a plan document.
  • Baseline docs (
    CLAUDE.md
    ,
    README.md
    ,
    .llmdocs/*
    ): the system's current state. The plan's job is to change this state. Do NOT flag the plan for deviating from baseline. Flag only if the plan changes baseline without including a corresponding docs-update bead (typically delegated to the docs skill).
Mechanics that apply to both modes:
  • The bd graph is the state machine. The graph is shared across worktrees of the same repo. Each ralph run is organized under a per-run epic so concurrent runs in different worktrees stay isolated by
    --parent
    filtering.
    bd ready --parent <epic> --json
    finds the next unblocked task within this run.
  • The run epic carries the label
    ralph:<slug>
    where
    <slug>
    is derived from the current branch. Every child task carries both
    --parent <epic>
    and the same
    ralph:<slug>
    label. The two signals are independent so the reviewer can detect orphans.
  • The plan's "Run Identity" section records the literal slug and epic ID. The agent reads them directly from the plan each iteration, no derivation.
  • One task per iteration. The agent reads
    bd ready --parent <epic> --json
    , takes the first result, runs the per-task workflow, then closes the task.
  • When the ready queue is empty, the agent closes the epic and emits the sentinel as the terminal action. The epic is not closed by
    bd
    automatically.
  • Resumable. A partially-completed graph resumes cleanly. Partial completion is normal; iterations are cheap.
  • Every action is a bd task, including ralph lifecycle scaffolding. The branch creation, BEGIN tag, code work, docs, and END tag are all first-class beads, all parented to the run epic.
  • Each bead carries its full per-task instructions in its description: files to touch, expected behavior, verification command, commit message. The worker reads
    bd show <id>
    and executes the description exactly.
  • Run-wide conventions (commit style, language, build command, test command) live in
    ralph-plan.md
    , not in every description.
  • Environment is a precondition. Credentials, network, toolchain, and runtime dependencies are the operator's responsibility. Do not flag missing environment setup as a gap.
Do NOT recommend: collapsing tasks into bundles, adding environment preflight checks, treating partial completion as a risk, modifying the prompt body, replacing the bd graph with a markdown checklist, or reintroducing a workflow-label taxonomy. The
ralph:<slug>
label is run-identity for partition and orphan detection, not workflow dispatch; do not flag it as a label taxonomy.
Flag as HIGH severity if: zero or more than one open epic exists for
ralph:<slug>
; any open child has an empty description; the bd graph contains cycles; the run epic has no children; the run epic has children but no ready tasks; the parent set under the epic differs from the label set for
ralph:<slug>
(orphans or planning typos); the per-task workflow does not explicitly stop after
bd close
(the agent will do multiple beads in one context, defeating the Ralph design); the per-task workflow does not close the epic before emitting the sentinel; the plan lacks a "Run Identity" section recording the literal slug and epic ID; the plan lacks an "Inputs" section with a
plan documents:
line; the plan omits or contradicts a material requirement explicit in any plan document; the plan's Approach prose declares an ordering ("X first", "after Y") that the bd dependency edges do not enforce. Flag as MEDIUM if: a description omits the verification command or commit message; an isolated task exists in a multi-task graph; the plan duplicates information that already lives in bead descriptions; a bead that touches README.md/CLAUDE.md/.llmdocs/ authors edits inline instead of running the docs skill.

Your Task

Your Task

Review the Ralph artifacts:
  • ralph-plan.md (mode-neutral execution plan)
  • ralph-prompt-insession.md (slash-command wrapper for the in-session driver)
  • ralph-prompt-external.md (body for scripts/ralph.sh)
  • The bd graph (issues, dependencies, ready/closed state)
Evaluate across these areas:
Review the Ralph artifacts:
  • ralph-plan.md (mode-neutral execution plan)
  • ralph-prompt-insession.md (slash-command wrapper for the in-session driver)
  • ralph-prompt-external.md (body for scripts/ralph.sh)
  • The bd graph (issues, dependencies, ready/closed state)
Evaluate across these areas:

1. Goal Clarity

1. Goal Clarity

  • Is the objective well-defined and unambiguous?
  • Would a fresh-context iteration understand what success looks like from the plan alone?
  • Is the objective well-defined and unambiguous?
  • Would a fresh-context iteration understand what success looks like from the plan alone?

2. Task Decomposition

2. Task Decomposition

  • Are tasks the right granularity (one focused deliverable each)?
  • Is each task independently completable?
  • Are there missing tasks needed to achieve the goal?
  • Are there unnecessary or redundant tasks?
  • Do titles name deliverables rather than techniques?
  • Do descriptions carry enough detail (files, verification, commit message) for a cold-start worker?
  • Are tasks the right granularity (one focused deliverable each)?
  • Is each task independently completable?
  • Are there missing tasks needed to achieve the goal?
  • Are there unnecessary or redundant tasks?
  • Do titles name deliverables rather than techniques?
  • Do descriptions carry enough detail (files, verification, commit message) for a cold-start worker?

3. Sequencing and Dependencies

3. Sequencing and Dependencies

  • Are dependencies correct in the bd graph?
  • Would executing in
    bd ready --json
    order produce correct results?
  • Are there implicit dependencies that should be explicit edges?
  • Are there cycles or isolated nodes?
  • Are dependencies correct in the bd graph?
  • Would executing in
    bd ready --json
    order produce correct results?
  • Are there implicit dependencies that should be explicit edges?
  • Are there cycles or isolated nodes?

4. Plan Completeness

4. Plan Completeness

  • Do the instructions cover repo-specific conventions, tools, and commands?
  • Are build, test, and lint commands accurate?
  • Is the per-task workflow clear (read description, execute it, close)?
  • Does the plan have a "Run Identity" section that records the literal branch slug, the literal epic ID, and the epic label
    ralph:<slug>
    ?
  • Does the Per-Task Workflow read the epic ID from the "Run Identity" section and scope
    bd ready
    with
    --parent <epic-id>
    ?
  • Does the Per-Task Workflow close the epic via
    bd close <epic-id>
    before emitting the sentinel when the ready queue is empty?
  • Does the Per-Task Workflow explicitly say STOP after
    bd close
    so the agent does not loop within a single context?
  • Does the Git Workflow section require a
    ralph/
    feature branch and BEGIN/END tags?
  • Is the Per-Task Workflow explicit that
    bd close
    for the current task runs BEFORE the sentinel?
  • Do beads that touch
    README.md
    ,
    CLAUDE.md
    , or
    .llmdocs/
    invoke the docs skill (/docs, $docs) rather than authoring edits inline?
  • Do the instructions cover repo-specific conventions, tools, and commands?
  • Are build, test, and lint commands accurate?
  • Is the per-task workflow clear (read description, execute it, close)?
  • Does the plan have a "Run Identity" section that records the literal branch slug, the literal epic ID, and the epic label
    ralph:<slug>
    ?
  • Does the Per-Task Workflow read the epic ID from the "Run Identity" section and scope
    bd ready
    with
    --parent <epic-id>
    ?
  • Does the Per-Task Workflow close the epic via
    bd close <epic-id>
    before emitting the sentinel when the ready queue is empty?
  • Does the Per-Task Workflow explicitly say STOP after
    bd close
    so the agent does not loop within a single context?
  • Does the Git Workflow section require a
    ralph/
    feature branch and BEGIN/END tags?
  • Is the Per-Task Workflow explicit that
    bd close
    for the current task runs BEFORE the sentinel?
  • Do beads that touch
    README.md
    ,
    CLAUDE.md
    , or
    .llmdocs/
    invoke the docs skill (/docs, $docs) rather than authoring edits inline?

5. Risk and Gaps

5. Risk and Gaps

  • What could go wrong during autonomous execution?
  • Are there tasks that require human judgment or external access?
  • Could any task leave the repo in a broken state?
  • What could go wrong during autonomous execution?
  • Are there tasks that require human judgment or external access?
  • Could any task leave the repo in a broken state?

6. Feasibility

6. Feasibility

  • Can a single iteration realistically complete each task?
  • Are there tasks that exceed what a fresh context can do autonomously?
For each area, list specific findings with severity (high/medium/low) and actionable recommendations.
  • Can a single iteration realistically complete each task?
  • Are there tasks that exceed what a fresh context can do autonomously?
For each area, list specific findings with severity (high/medium/low) and actionable recommendations.

Output

Output

Your VERY NEXT action after the review must be a tool call that writes the file. No intermediate steps.
Determine your model label:
  • Claude variants: claude
  • GPT variants: openai
  • Gemini variants: gemini
Write to
<PROJECT_ROOT>/.llmtmp/ralph-review-$MODEL_LABEL.md
using this template:
```markdown
Your VERY NEXT action after the review must be a tool call that writes the file. No intermediate steps.
Determine your model label:
  • Claude variants: claude
  • GPT variants: openai
  • Gemini variants: gemini
Write to
<PROJECT_ROOT>/.llmtmp/ralph-review-$MODEL_LABEL.md
using this template:
```markdown

Ralph Loop Review

Ralph Loop Review

Model: <MODEL_LABEL>
Model: <MODEL_LABEL>

Goal Clarity

Goal Clarity

  • [medium] Short title. Explanation...
  • [medium] Short title. Explanation...

Task Decomposition

Task Decomposition

  • [high] Short title. Explanation...
  • [high] Short title. Explanation...

Sequencing and Dependencies

Sequencing and Dependencies

  • [medium] Short title. Explanation...
  • [medium] Short title. Explanation...

Plan Completeness

Plan Completeness

  • [low] Short title. Explanation...
  • [low] Short title. Explanation...

Risk and Gaps

Risk and Gaps

  • [high] Short title. Explanation...
  • [high] Short title. Explanation...

Feasibility

Feasibility

  • [medium] Short title. Explanation...
  • [medium] Short title. Explanation...

Summary

Summary

<overall assessment and top 3 recommendations> ```
Finding format: `- [severity] Title. Description.`
Every section must be present. If no findings for an area, write 'No findings.' under its heading.
Then verify: `ls -la '<PROJECT_ROOT>/.llmtmp/ralph-review-<MODEL_LABEL>.md'`
If the file does not exist, write it again. Do not exit without the file."
undefined
<overall assessment and top 3 recommendations> ```
Finding format: `- [severity] Title. Description.`
Every section must be present. If no findings for an area, write 'No findings.' under its heading.
Then verify: `ls -la '<PROJECT_ROOT>/.llmtmp/ralph-review-<MODEL_LABEL>.md'`
If the file does not exist, write it again. Do not exit without the file."
undefined

Step 6: Write Prompt to File

步骤6:将提示词写入文件

Avoid shell interpolation issues with large prompts:
bash
STATE_DIR="$PROJECT_ROOT/.llmtmp/ralph_review_state"
mkdir -p "$STATE_DIR"
OPENAI_DIR=$(mktemp -d)
GEMINI_DIR=$(mktemp -d)
CLAUDE_DIR=$(mktemp -d)
cat > /tmp/ralph-review-prompt.txt <<'PROMPT_EOF'
<the prompt from Step 5>
PROMPT_EOF
避免大型提示词导致的shell插值问题:
bash
STATE_DIR="$PROJECT_ROOT/.llmtmp/ralph_review_state"
mkdir -p "$STATE_DIR"
OPENAI_DIR=$(mktemp -d)
GEMINI_DIR=$(mktemp -d)
CLAUDE_DIR=$(mktemp -d)
cat > /tmp/ralph-review-prompt.txt <<'PROMPT_EOF'
<the prompt from Step 5>
PROMPT_EOF

Step 7: Launch 3 Separate Background Bash Tasks

步骤7:启动3个独立的后台Bash任务

Each opencode process must run as its own background Bash task. Do NOT chain them in a single shell with
&
; child processes get killed when the parent exits.
OpenAI:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m openai/gpt-5.3-codex \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<OPENAI_DIR>" \
  --title "Ralph Review - OpenAI" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/openai.ndjson" 2>"$STATE_DIR/openai.log"
Gemini:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m google/gemini-3.1-pro-preview \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<GEMINI_DIR>" \
  --title "Ralph Review - Gemini" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/gemini.ndjson" 2>"$STATE_DIR/gemini.log"
Claude:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m az-anthropic/claude-opus-4-6 \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<CLAUDE_DIR>" \
  --title "Ralph Review - Claude" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/claude.ndjson" 2>"$STATE_DIR/claude.log"
All 3 launch as separate background tasks. Do NOT use
&
in a single shell.
每个opencode流程必须作为独立的后台Bash任务运行。不要在单个shell中用
&
串联;父进程退出时子进程会被终止。
OpenAI:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m openai/gpt-5.3-codex \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<OPENAI_DIR>" \
  --title "Ralph Review - OpenAI" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/openai.ndjson" 2>"$STATE_DIR/openai.log"
Gemini:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m google/gemini-3.1-pro-preview \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<GEMINI_DIR>" \
  --title "Ralph Review - Gemini" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/gemini.ndjson" 2>"$STATE_DIR/gemini.log"
Claude:
bash
STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m az-anthropic/claude-opus-4-6 \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<CLAUDE_DIR>" \
  --title "Ralph Review - Claude" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/claude.ndjson" 2>"$STATE_DIR/claude.log"
三个流程均作为独立后台任务启动。不要在单个shell中使用
&

Output Streams

输出流

Each opencode process produces two output files:
  • <label>.ndjson
    (stdout): Structured NDJSON events from
    --format json
    . Use for programmatic progress tracking.
  • <label>.log
    (stderr): Plain-text info-level logs. Use for diagnosing startup failures, permission issues, MCP server errors, plugin loading.
Log lines are structured text, one per line:
text
INFO  2026-03-13T00:54:25 +4ms service=default directory=/private/tmp creating instance
每个opencode流程生成两个输出文件:
  • <label>.ndjson
    (标准输出):
    --format json
    生成的结构化NDJSON事件。用于程序化进度跟踪。
  • <label>.log
    (标准错误):纯文本信息级日志。用于诊断启动失败、权限问题、MCP服务器错误、插件加载问题。
日志行是结构化文本,每行一条:
text
INFO  2026-03-13T00:54:25 +4ms service=default directory=/private/tmp creating instance

Step 8: Monitor and Wait

步骤8:监控并等待

Poll each background task until all 3 complete.
NDJSON progress (replace
$NDJSON
with the actual path):
bash
undefined
轮询每个后台任务,直到三个任务全部完成。
NDJSON进度(将
$NDJSON
替换为实际路径):
bash
undefined

Count completed steps

统计已完成步骤数

grep -c '"type":"step_finish"' "$NDJSON" 2>/dev/null || echo 0
grep -c '"type":"step_finish"' "$NDJSON" 2>/dev/null || echo 0

Check if done (last step_finish has reason "stop")

检查是否完成(最后一个step_finish的reason为"stop")

tail -1 "$NDJSON" 2>/dev/null | jq -r '.part.reason // empty'
tail -1 "$NDJSON" 2>/dev/null | jq -r '.part.reason // empty'

Check for errors

检查错误

grep '"type":"error"' "$NDJSON" 2>/dev/null

Text logs (replace `$LOGFILE` with the actual path):

```bash
grep '"type":"error"' "$NDJSON" 2>/dev/null

文本日志(将`$LOGFILE`替换为实际路径):

```bash

Errors or warnings

错误或警告

grep -E "^(ERROR|WARN)" "$LOGFILE" 2>/dev/null
grep -E "^(ERROR|WARN)" "$LOGFILE" 2>/dev/null

Recent activity

最近活动

tail -5 "$LOGFILE" 2>/dev/null
undefined
tail -5 "$LOGFILE" 2>/dev/null
undefined

Step 9: Report Results

步骤9:报告结果

For each model, check whether
.llmtmp/ralph-review-<label>.md
exists.
Report per model:
  • Success or failure (file present or not)
  • Total cost: sum of
    cost
    from all
    step_finish
    events
  • Whether any errors occurred
Surface the bd graph validation findings from Step 2 separately at the top of the report.
针对每个模型,检查
.llmtmp/ralph-review-<label>.md
是否存在。
按模型报告:
  • 成功或失败(文件是否存在)
  • 总成本:所有
    step_finish
    事件中
    cost
    的总和
  • 是否发生任何错误
在报告顶部单独列出步骤2中的bd图验证结果。

Step 10: Cleanup

步骤10:清理

The state directory is preserved for post-run inspection (per-model NDJSON event streams and text logs are useful for diagnosing review failures, calibration regressions, and cost trends).
bash
rm -f /tmp/ralph-review-prompt.txt
$STATE_DIR
(
.llmtmp/ralph_review_state/
) is intentionally NOT removed. Each subsequent reviewer run overwrites the per-model files in place; if historical retention is needed, archive the directory before re-running.
状态目录会被保留,用于运行后检查(每个模型的NDJSON事件流和文本日志可用于诊断评审失败、校准回归和成本趋势)。
bash
rm -f /tmp/ralph-review-prompt.txt
$STATE_DIR
.llmtmp/ralph_review_state/
)不会被删除。后续每次评审运行会覆盖该目录下的各模型文件;若需要保留历史记录,请在重新运行前归档该目录。

Expected Output Files

预期输出文件

3 files total, one per model:
  • .llmtmp/ralph-review-openai.md
  • .llmtmp/ralph-review-gemini.md
  • .llmtmp/ralph-review-claude.md
共3个文件,每个模型对应一个:
  • .llmtmp/ralph-review-openai.md
  • .llmtmp/ralph-review-gemini.md
  • .llmtmp/ralph-review-claude.md

NDJSON Log Format Reference

NDJSON日志格式参考

Each opencode process writes NDJSON to
$STATE_DIR/<label>.ndjson
. One JSON object per line. Skip lines that fail to parse (partial writes).
Key event types:
  • step_start - New LLM turn begins.
  • text - Model emitted text:
    {"type":"text","part":{"text":"..."}}
  • tool_use - Tool call:
    {"type":"tool_use","part":{"tool":"bash","state":{"status":"completed","metadata":{"exit":0}}}}
  • step_finish - Turn completed:
    {"type":"step_finish","part":{"reason":"stop","cost":0,"tokens":{"total":13494}}}
    .
    reason: "stop"
    = done,
    reason: "tool-calls"
    = continuing.
  • error - Session error:
    {"type":"error","error":{"data":{"message":"..."}}}
Useful jq queries (replace
$NDJSON
with the actual path):
bash
NDJSON="<PROJECT_ROOT>/.llmtmp/ralph_review_state/openai.ndjson"
每个opencode流程会将NDJSON写入
$STATE_DIR/<label>.ndjson
。每行一个JSON对象。跳过解析失败的行(部分写入)。
关键事件类型:
  • step_start - 新的LLM轮次开始。
  • text - 模型生成文本:
    {"type":"text","part":{"text":"..."}}
  • tool_use - 工具调用:
    {"type":"tool_use","part":{"tool":"bash","state":{"status":"completed","metadata":{"exit":0}}}}
  • step_finish - 轮次完成:
    {"type":"step_finish","part":{"reason":"stop","cost":0,"tokens":{"total":13494}}}
    reason: "stop"
    表示完成,
    reason: "tool-calls"
    表示继续。
  • error - 会话错误:
    {"type":"error","error":{"data":{"message":"..."}}}
实用jq查询(将
$NDJSON
替换为实际路径):
bash
NDJSON="<PROJECT_ROOT>/.llmtmp/ralph_review_state/openai.ndjson"

Is done?

是否已完成?

tail -1 "$NDJSON" | jq -r 'select(.type=="step_finish") | .part.reason'
tail -1 "$NDJSON" | jq -r 'select(.type=="step_finish") | .part.reason'

Total cost

总成本

jq -s '[.[] | select(.type=="step_finish") | .part.cost] | add' "$NDJSON"
jq -s '[.[] | select(.type=="step_finish") | .part.cost] | add' "$NDJSON"

All errors

所有错误

jq -r 'select(.type=="error") | .error.data.message' "$NDJSON"

Replace `$LOGFILE` with the text log path:

```bash
grep -E "^(ERROR|WARN)" "$LOGFILE"
jq -r 'select(.type=="error") | .error.data.message' "$NDJSON"

将`$LOGFILE`替换为文本日志路径:

```bash
grep -E "^(ERROR|WARN)" "$LOGFILE"

Rules

规则

  • The invoking agent is ONLY a launcher. All review work happens inside the opencode processes.
  • Do NOT perform any review analysis directly. The opencode processes handle reviewing.
  • Do NOT ask questions during execution. This is non-interactive.
  • Launch all 3 processes in parallel. Do not wait for one to finish before starting another.
  • Use plain message invocation, not
    --command
    . The
    --command
    flag has a known issue with the c7 MCP server.
  • Preserve the state directory after reporting results. NDJSON and log files stay for post-run inspection.
  • If a model fails, still wait for and report the others.
  • The bd graph validation in Step 2 runs in the invoking agent, not in opencode. Findings are passed to opencode reviewers as context.
  • 调用代理仅作为启动器。所有评审工作在opencode流程内完成。
  • 不要直接执行任何评审分析。opencode流程负责评审工作。
  • 执行过程中不要提问。此流程为非交互式。
  • 并行启动三个流程。不要等待一个流程完成后再启动另一个。
  • 使用普通消息调用,而非
    --command
    --command
    标志在c7 MCP服务器上存在已知问题。
  • 报告结果后保留状态目录。NDJSON和日志文件用于运行后检查。
  • 若某个模型失败,仍需等待并报告其他模型的结果。
  • 步骤2中的bd图验证在调用代理中运行,而非opencode。验证结果会作为上下文传递给opencode评审人员。