massgen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MassGen

Invoke MassGen for multi-agent iteration on any task — general-purpose work, evaluation, planning, or spec writing. Multiple AI agents independently work on the problem and converge on the strongest result through MassGen's checklist-gated voting system.

调用MassGen多Agent系统来处理任意任务——通用工作、评估、规划或需求规格撰写。多个AI Agent会独立处理问题，并通过MassGen的基于检查清单的投票机制收敛出最优结果。

When to Use

适用场景

General (default) — get multi-agent results on any task:

When you want multiple AI agents to independently tackle a problem
When the task doesn't fit neatly into evaluate, plan, or spec
Writing, research, code generation, design, analysis, or any open-ended task

Evaluate — get diverse, critical feedback on existing work:

After iterating and stalling — need outside perspective
Before submitting PRs or delivering artifacts
When wanting diverse AI perspectives on implementation quality

Plan — create or refine a structured project plan:

When starting a complex feature or project that needs task decomposition
When an existing plan has gaps, is too vague, or needs restructuring
When you need a valid task DAG with verification criteria

Spec — create or refine a requirements specification:

When starting a feature that needs precise requirements before implementation
When an existing spec has ambiguities, missing edge cases, or gaps
When you need EARS-formatted requirements with acceptance criteria

通用模式（默认）——为任意任务获取多Agent结果：

当你需要多个AI Agent独立处理同一个问题时
当任务无法明确归为评估、规划或规格制定类别时
写作、研究、代码生成、设计、分析或任何开放式任务

评估模式——为现有工作获取多元、批判性的反馈：

当你在迭代过程中陷入停滞，需要外部视角时
在提交PR或交付成果之前
当你需要AI从多元视角评估实现质量时

规划模式——创建或优化结构化项目计划：

当你启动需要任务分解的复杂功能或项目时
当现有计划存在漏洞、过于模糊或需要重构时
当你需要一个带有验证标准的有效任务DAG（有向无环图）时

规格模式——创建或优化需求规格：

当你在实现前需要为功能制定精确需求时
当现有规格存在歧义、缺失边缘场景或漏洞时
当你需要带有验收标准的EARS格式需求时

Mode Selection

模式选择

Mode	Purpose	Input	Output	Default Criteria Preset
general	Any task	Task description + context	Winner's deliverables in `result.md` + workspace files	Auto-generated
evaluate	Critique existing work	Artifacts to evaluate	`critique_packet.md` (with `approach_assessment` ), `verdict.json` , `next_tasks.json` (with `fix_tasks` , `evolution_tasks` )	`"evaluation"`
plan	Create or refine a plan	Goal + constraints (+ existing plan)	`project_plan.json` (typed tasks, chunks, deps, prototypes)	`"planning"`
spec	Create or refine a spec	Problem + needs (+ existing spec)	`project_spec.json` (EARS requirements, chunks, rationale)	`"spec"`

模式	用途	输入	输出	默认标准预设
general	任意任务	任务描述 + 上下文	获胜Agent的交付成果（存储于 `result.md` ） + 工作区文件	自动生成
evaluate	评审现有工作	待评估的成果	`critique_packet.md` （包含 `approach_assessment` ）、 `verdict.json` 、 `next_tasks.json` （包含 `fix_tasks` 、 `evolution_tasks` ）	`"evaluation"`
plan	创建或优化计划	目标 + 约束（+ 现有计划）	`project_plan.json` （包含结构化任务、任务块、依赖关系、原型）	`"planning"`
spec	创建或优化规格	问题 + 需求（+ 现有规格）	`project_spec.json` （包含EARS需求、任务块、设计依据）	`"spec"`

FIRST: Confirm Config (do this before anything else)

第一步：确认配置（请在所有操作前完成）

Always ask the user which config to use. The config controls which models run and how many agents are spawned — this directly affects quality and cost. Never silently pick a config. The user must confirm the choice every time.

务必询问用户要使用哪个配置。配置会控制运行的模型以及生成的Agent数量——这直接影响结果质量和成本。切勿自行选择配置，每次都必须由用户确认选择。

Step A: Check what the user already specified

步骤A：检查用户已指定的内容

Scan the user's message for any config signal before searching:

Signal in message	What to do
Explicit file path (e.g. `--config foo.yaml` , `configs/team.yaml` )	Go to Step D — verify it exists, then confirm
Provider/model name (e.g. "use Claude", "GPT-4 agents", "Gemini")	Note the preference; use it to rank options in Step B
Named config (e.g. "the teams config", "my 3-agent setup")	Search for a match in Step B, confirm before using
"Same as last time" / "use recent"	Find the last-used config (see Step B), confirm before using
Nothing about config	Proceed to Step B

在搜索前，先扫描用户消息中的配置信号：

消息中的信号	操作
明确的文件路径（如 `--config foo.yaml` 、 `configs/team.yaml` ）	进入步骤D——验证文件是否存在，然后确认
提供商/模型名称（如 "use Claude"、"GPT-4 agents"、"Gemini"）	记录用户偏好；在步骤B中用该偏好排序选项
命名配置（如 "团队配置"、"我的3-Agent设置"）	在步骤B中搜索匹配项，使用前需确认
"和上次一样" / "使用最近的配置"	查找最近使用的配置（见步骤B），使用前需确认
未提及配置	进入步骤B

Step B: Discover available configs and models

步骤B：发现可用配置和模型

Run these checks and collect all found paths:

bash

undefined

运行以下检查并收集所有找到的路径：

bash

undefined

Standard locations

标准位置

ls .massgen/config.yaml 2>/dev/null && echo "PROJECT: .massgen/config.yaml" ls ~/.config/massgen/config.yaml 2>/dev/null && echo "GLOBAL: ~/.config/massgen/config.yaml"

Recently used (from past skill runs in this project)

最近使用的配置（来自本项目中最近的Skill运行记录）

ls -t .massgen/*/run_summary.json 2>/dev/null | head -5


If the user said "same as last time", check the most recent `run_summary.json` for
a `"config"` field — that's the last-used path.

If the user mentioned a provider or model name but you need to verify what's
available, run:

```bash
uv run massgen --list-backends

This prints all supported backends with their models, capabilities, and required API keys — useful for matching a user's stated preference to a real backend name.

If no configs are found at all, do NOT create a YAML file yourself. Instead, use the headless quickstart, which auto-detects available API keys and generates a config without requiring a browser:

bash

uv run massgen --quickstart --headless

This writes a config to

.massgen/config.yaml

and exits. If you need a specific backend, add

--quickstart-agent backend=claude,model=claude-opus-4-6

(repeat for multiple agents). Only fall back to

--web-quickstart

if the user explicitly wants the browser-based setup wizard.

ls -t .massgen/*/run_summary.json 2>/dev/null | head -5


如果用户说“和上次一样”，请检查最新的`run_summary.json`中的`"config"`字段——这就是上次使用的路径。

如果用户提到了提供商或模型名称，但你需要验证其可用性，请运行：

```bash
uv run massgen --list-backends

该命令会打印所有支持的后端及其模型、功能和所需的API密钥——有助于将用户的偏好与实际可用的后端名称匹配。

如果未找到任何配置，请勿自行创建YAML文件。请使用无界面快速启动模式，它会自动检测可用的API密钥并生成配置，无需浏览器：

bash

uv run massgen --quickstart --headless

该命令会将配置写入

.massgen/config.yaml

并退出。如果你需要特定后端，请添加

--quickstart-agent backend=claude,model=claude-opus-4-6

（多个Agent可重复添加）。仅当用户明确需要基于浏览器的设置向导时，才使用

--web-quickstart

作为备选方案。

Step C: Ask the user to confirm

步骤C：请用户确认

Use AskUserQuestion to present the options. Format the question clearly:

I found these MassGen configs:
.massgen/config.yaml
— project config
~/.config/massgen/config.yaml
— global config
Which would you like to use? You can also paste a path to a different config, say "create new" to generate one, or tell me which provider/model you prefer.

Rules for presenting options:

List every found config with its location label (project / global / path)
If the user expressed a preference (provider name, agent count), note which option best matches and say why
Always include "create new" as an option
If only one config exists, still ask — just make it easy: "I found one config at
```
.massgen/config.yaml
```
— use it, or would you prefer a different one?"

使用AskUserQuestion呈现选项，清晰格式化问题：

我找到了以下MassGen配置：
.massgen/config.yaml
— 项目配置
~/.config/massgen/config.yaml
— 全局配置
你想使用哪一个？你也可以粘贴其他配置的路径，或者说“创建新配置”来生成一个，或者告诉我你偏好的提供商/模型。

呈现选项的规则：

列出所有找到的配置及其位置标签（项目/全局/路径）
如果用户表达了偏好（提供商名称、Agent数量），请指出最匹配的选项并说明原因
始终将“创建新配置”作为选项之一
如果仅存在一个配置，仍需询问用户：“我找到了一个配置位于
```
.massgen/config.yaml
```
— 是否使用它，还是你想要其他配置？”

Step D: Resolve the user's answer

步骤D：处理用户的回答

User response	Resolution
Picks a number from the list	Use that config path
Pastes or types a file path	Verify it exists; if not, report error and stop
Describes preference (e.g. "the Claude one", "use Gemini")	Match to discovered list or run `--list-backends` to find it; confirm
Says "default" or presses enter with one option	Use the single discovered config
Says "create new" / "generate one"	Run `uv run massgen --quickstart --headless` from cwd, wait for exit
Specifies backend+model (e.g. "3 Claude agents")	Run headless quickstart with explicit `--quickstart-agent` flags

Once resolved, pass the path via

--config <path>

in the

massgen_run.sh

invocation (Step 4). If the user confirmed

.massgen/config.yaml

(the implicit default), you may omit

--config

STOP here until you have a confirmed config. Do NOT proceed to Scope or Workflow until the user has explicitly chosen a config. Do NOT write config YAML files yourself — use the headless quickstart to generate them. Do NOT search for configs in subdirectories, parent directories, or anywhere else beyond the standard locations above.

用户回复	处理方式
选择列表中的编号	使用对应的配置路径
粘贴或输入文件路径	验证文件是否存在；如果不存在，报告错误并停止操作
描述偏好（如“使用Claude”、“用Gemini”）	与已发现的列表匹配，或运行 `--list-backends` 查找；确认后使用
回复“默认”或在仅有一个选项时按回车	使用找到的唯一配置
回复“创建新配置” / “生成一个”	在当前工作目录运行 `uv run massgen --quickstart --headless` ，等待命令执行完成
指定后端+模型（如“3个Claude Agent”）	使用显式的 `--quickstart-agent` 标志运行无界面快速启动模式

处理完成后，在

massgen_run.sh

调用（步骤4）中通过

--config <path>

传递配置路径。如果用户确认使用

.massgen/config.yaml

（隐式默认配置），可以省略

--config

参数。

在获得确认的配置前，请停止所有操作。在用户明确选择配置前，请勿进入范围定义或工作流程步骤。请勿自行编写配置YAML文件——请使用无界面快速启动模式生成配置。请勿在标准位置之外的子目录、父目录或其他位置搜索配置。

Scope

范围定义

Before starting, determine what the MassGen invocation covers. Focused invocations produce far better results than unscoped "do everything" runs.

General: the task to accomplish, relevant context, quality expectations
Evaluate: which files/artifacts to evaluate, what to ignore, evaluation focus
Plan: the goal/objective, constraints, what context to include
Spec: the problem to specify, user needs, system boundaries

If the user doesn't specify scope, ask them.

在启动前，请确定MassGen调用的覆盖范围。聚焦的调用比无范围的“处理所有事情”运行效果好得多。

通用模式：要完成的任务、相关上下文、质量期望
评估模式：待评估的文件/成果、忽略项、评估重点
规划模式：目标/目的、约束条件、需包含的上下文
规格模式：待定义的问题、用户需求、系统边界

如果用户未指定范围，请询问用户。

Workflow

工作流程

Step 0: Create Working Directory

步骤0：创建工作目录

Create a timestamped subdirectory so parallel invocations don't conflict:

bash

MODE="general"  # or "evaluate", "plan", or "spec"
WORK_DIR=".massgen/$MODE/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

All artifacts (context, criteria, prompt, output, logs) go in this directory.

创建带时间戳的子目录，避免并行调用冲突：

bash

MODE="general"  # 或 "evaluate"、"plan"、"spec"
WORK_DIR=".massgen/$MODE/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

所有成果（上下文、标准、提示词、输出、日志）都存储在该目录中。

Step 1: Clarify & Write Context File

步骤1：明确并编写上下文文件

Read

references/<mode>/workflow.md

(relative to this skill) for the context file template specific to your mode.

Write

$WORK_DIR/context.md

using the template from the workflow file.

Key principle for all modes: provide factual context that orients the MassGen agents. Do NOT bias them with your opinions about quality — let them discover issues independently. That's the whole point of multi-agent evaluation.

General: describe the task, relevant context, quality expectations
Evaluate: describe what was built, scope, git info, verification evidence
Plan: describe the goal, constraints, existing context, success criteria
Spec: describe the problem, user needs, system boundaries, constraints

阅读

references/<mode>/workflow.md

（相对于本Skill）获取对应模式的上下文文件模板。

使用工作流文件中的模板编写

$WORK_DIR/context.md

。

所有模式的核心原则：提供能让MassGen Agent快速了解情况的事实性上下文。不要用你对质量的看法影响Agent——让它们独立发现问题。这正是多Agent评估的意义所在。

通用模式：描述任务、相关上下文、质量期望
评估模式：描述已完成的工作、范围、Git信息、验证证据
规划模式：描述目标、约束条件、现有上下文、成功标准
规格模式：描述问题、用户需求、系统边界、约束条件

Step 2: Generate Criteria

步骤2：生成评估标准

Each mode has a recommended criteria preset:

Mode	Preset	How to Apply
general	Auto-generated	Omit `--criteria-file` and `--criteria-preset`
evaluate	Custom or `"evaluation"`	`--criteria-file` with JSON, or `--criteria-preset evaluation`
plan	`"planning"`	`--criteria-preset planning`
spec	`"spec"`	`--criteria-preset spec`

For general mode, criteria are auto-generated from the task content — omit both flags unless you have specific quality axes to enforce.

To use custom criteria: read

references/criteria_guide.md

for the format and writing guide, then write criteria JSON to

$WORK_DIR/criteria.json

If there's a specific focus area, weight your criteria toward that focus. In Claude Code: use AskUserQuestion to ask the user for focus preference. In Codex or non-interactive: default to general coverage.

bash

cat > $WORK_DIR/criteria.json << 'EOF'
[
  {"text": "...", "category": "must"},
  {"text": "...", "category": "must"},
  {"text": "...", "category": "should"},
  {"text": "...", "category": "could"}
]
EOF

每个模式都有推荐的标准预设：

模式	预设	应用方式
general	自动生成	省略 `--criteria-file` 和 `--criteria-preset` 参数
evaluate	自定义或 `"evaluation"`	使用 `--criteria-file` 指定JSON文件，或使用 `--criteria-preset evaluation`
plan	`"planning"`	使用 `--criteria-preset planning`
spec	`"spec"`	使用 `--criteria-preset spec`

对于通用模式，标准会根据任务内容自动生成——除非你有特定的质量维度需要强制实施，否则请省略这两个参数。

使用自定义标准：阅读

references/criteria_guide.md

了解格式和编写指南，然后将标准JSON写入

$WORK_DIR/criteria.json

。

如果有特定的重点领域，请在标准中侧重该领域。在Claude Code中：使用AskUserQuestion询问用户的重点偏好。在Codex或非交互式环境中：默认使用通用覆盖范围。

bash

cat > $WORK_DIR/criteria.json << 'EOF'
[
  {"text": "...", "category": "must"},
  {"text": "...", "category": "must"},
  {"text": "...", "category": "should"},
  {"text": "...", "category": "could"}
]
EOF

Step 3: Construct the Prompt

步骤3：构建提示词

Read the prompt template from
```
references/<mode>/prompt_template.md
```
(relative to this skill)
Read the context file you wrote in Step 1
Replace
```
{{CONTEXT_FILE_CONTENT}}
```
with the context file contents
Replace
```
{{CUSTOM_FOCUS}}
```
with the focus directive (or empty string if none)
Write the final prompt to
```
$WORK_DIR/prompt.md
```

从
```
references/<mode>/prompt_template.md
```
（相对于本Skill）读取提示词模板
读取你在步骤1中编写的上下文文件
将
```
{{CONTEXT_FILE_CONTENT}}
```
替换为上下文文件的内容
将
```
{{CUSTOM_FOCUS}}
```
替换为重点指令（如果没有则为空字符串）
将最终提示词写入
```
$WORK_DIR/prompt.md
```

Step 4: Run MassGen

步骤4：运行MassGen

Use the launcher script (

scripts/massgen_run.sh

relative to this skill) to run MassGen, launch the web viewer, and wait for completion in a single atomic command. This avoids the double-backgrounding issues that cause agents to lose track of running processes.

Run in the background using your agent's native mechanism (e.g.,

run_in_background

in Claude Code):

bash

undefined

使用启动脚本（

scripts/massgen_run.sh

，相对于本Skill）运行MassGen、启动Web查看器并等待命令完成，所有操作通过一个原子命令完成。这避免了导致Agent丢失运行进程跟踪的双重后台问题。

使用Agent的原生机制在后台运行（例如，Claude Code中的

run_in_background

）：

bash

undefined

SKILL_DIR is the directory containing this SKILL.md file

SKILL_DIR是包含本SKILL.md文件的目录

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--criteria-file "$WORK_DIR/criteria.json"
--viewer


If using default criteria (no custom criteria file), omit `--criteria-file`.
For planning/spec modes, use `--criteria-preset planning` or `--criteria-preset spec` instead.
If you resolved a custom config in Step 2, add `--config <path>`.

The script handles everything atomically:
1. Launches MassGen in `--automation` mode
2. Waits for the log directory to appear (with 30s timeout)
3. Starts the web viewer at `http://localhost:8000`
4. Waits for MassGen to complete
5. Writes `$WORK_DIR/run_summary.json` with exit code, duration, log dir

**After the background task completes**, read the summary:

```bash
cat $WORK_DIR/run_summary.json

Script options:

```
--viewer
```
— launch web viewer (opens
```
http://localhost:8000
```
)
```
--viewer-port PORT
```
— use a different port
```
--config FILE
```
— custom MassGen config YAML
```
--output-file FILE
```
— override result path (default:
```
$WORK_DIR/result.md
```
)
```
--no-cwd-context
```
— disable read-only CWD access
```
--extra-args "..."
```
— pass additional massgen CLI flags

Timing: expect 15-45 minutes. Do not assume something is stuck — MassGen runs multiple agents through several rounds of iteration.

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--criteria-file "$WORK_DIR/criteria.json"
--viewer


如果使用默认标准（无自定义标准文件），请省略`--criteria-file`参数。对于规划/规格模式，请使用`--criteria-preset planning`或`--criteria-preset spec`替代。如果在步骤2中确定了自定义配置，请添加`--config <path>`参数。

该脚本会自动处理所有操作：
1. 以`--automation`模式启动MassGen
2. 等待日志目录出现（超时时间30秒）
3. 在`http://localhost:8000`启动Web查看器
4. 等待MassGen运行完成
5. 将退出代码、运行时长、日志目录写入`$WORK_DIR/run_summary.json`

**后台任务完成后**，读取摘要：

```bash
cat $WORK_DIR/run_summary.json

脚本选项：

```
--viewer
```
— 启动Web查看器（打开
```
http://localhost:8000
```
）
```
--viewer-port PORT
```
— 使用指定端口
```
--config FILE
```
— 自定义MassGen配置YAML
```
--output-file FILE
```
— 覆盖结果路径（默认：
```
$WORK_DIR/result.md
```
）
```
--no-cwd-context
```
— 禁用只读当前工作目录访问
```
--extra-args "..."
```
— 传递额外的massgen CLI标志

运行时长：预计15-45分钟。不要假设进程卡住——MassGen会让多个Agent完成多轮迭代。

Step 5: Parse the Output

步骤5：解析输出

The output depends on the mode. The winner's workspace path is shown in

$WORK_DIR/result.md

(look for "Workspace cwd" or check

status.json

in the log directory for

workspace_paths

General mode: the winner's answer is in

$WORK_DIR/result.md

. Any files the agents created are in the winner's workspace (path shown in result.md). Copy or reference the workspace files as needed.

Evaluate mode: three files —

verdict.json

next_tasks.json

critique_packet.md

. Read

verdict.json

first to determine iterate vs converged. See

references/evaluate/workflow.md

for full output structure.

Plan mode:

project_plan.json

— structured task list with chunks, dependencies, and verification. May include auxiliary files in

research/

framework/

risks/

subdirectories. See

references/plan/workflow.md

for full output structure.

Spec mode:

project_spec.json

— EARS requirements with chunks, rationale, and verification. May include auxiliary files in

research/

design/

decisions/

subdirectories. See

references/spec/workflow.md

for full output structure.

输出内容取决于所选模式。获胜Agent的工作区路径会显示在

$WORK_DIR/result.md

中（查找“Workspace cwd”或查看日志目录中的

status.json

中的

workspace_paths

）。

通用模式：获胜Agent的答案存储在

$WORK_DIR/result.md

中。Agent创建的所有文件都存储在获胜Agent的工作区（路径显示在result.md中）。根据需要复制或引用工作区文件。

评估模式：三个文件——

verdict.json

、

next_tasks.json

、

critique_packet.md

。请先读取

verdict.json

确定是需要迭代还是已收敛。查看

references/evaluate/workflow.md

了解完整的输出结构。

规划模式：

project_plan.json

— 包含任务块、依赖关系和验证标准的结构化任务列表。可能包含

research/

、

framework/

、

risks/

子目录中的辅助文件。查看

references/plan/workflow.md

了解完整的输出结构。

规格模式：

project_spec.json

— 包含任务块、设计依据和验证标准的EARS需求。可能包含

research/

、

design/

、

decisions/

子目录中的辅助文件。查看

references/spec/workflow.md

了解完整的输出结构。

Step 6: Ground in Your Native Task System

步骤6：整合到原生任务系统

This is the most critical step for evaluate, plan, and spec modes. MassGen produced a structured result — now you must internalize it by entering your native task/plan mode and enumerating every task or requirement as a tracked item. Without this, the plan is just text that fades from context as you work.

For general mode, grounding is optional — it applies when the output contains a structured task list or action items, but many general tasks produce artifacts (code, documents, designs) rather than task lists.

Why this matters: agents that skip this step tend to execute the first few tasks, then drift — forgetting verification steps, skipping later tasks, or losing track of dependencies. Grounding forces you to commit to the full scope before executing anything.

For all modes:

Enter your task planning mode (e.g., TodoWrite in Claude Code, task tracking in Codex, or whatever native tracking your environment provides)
Create one tracked task per item from the MassGen output:
- Evaluate: each task from
```
next_tasks.json
```
  becomes a tracked task, preserving
```
implementation_guidance
```
  ,
```
depends_on
```
  , and
```
verification
```
- Plan: each task from
```
project_plan.json
```
  becomes a tracked task, preserving chunk ordering, dependencies, and verification criteria
- Spec: each requirement from
```
project_spec.json
```
  becomes a tracked task (implement + verify), preserving priority, dependencies, and acceptance criteria
Preserve the dependency order — don't flatten the DAG. Tasks in chunk C01 must complete before C02 tasks begin
Include verification as explicit tasks — don't just track "implement X", also track "verify X meets [criteria]". Verification that isn't tracked doesn't happen
Mark each task's status as you work: pending → in_progress → completed

Then execute in order, updating status as you go. When you complete a task, check it off and move to the next one. This creates an execution trace that keeps you honest about what's done and what remains.

这是评估、规划和规格模式中最关键的步骤。MassGen生成了结构化结果——现在你必须将其整合到原生任务/规划模式中，并将每个任务或需求列为跟踪项。否则，计划只是一段文字，会随着工作推进逐渐被遗忘。

对于通用模式，此步骤可选——仅当输出包含结构化任务列表或行动项时需要执行，许多通用任务会生成成果（代码、文档、设计）而非任务列表。

为什么这很重要：跳过此步骤的Agent往往会执行前几个任务，然后偏离方向——忘记验证步骤、跳过后续任务，或丢失依赖关系跟踪。整合操作会迫使你在执行任何任务前承诺完成全部范围。

所有模式的操作：

进入任务规划模式（例如，Claude Code中的TodoWrite、Codex中的任务跟踪，或你的环境提供的任何原生跟踪工具）
为MassGen输出中的每个项创建一个跟踪任务：
- 评估模式：
```
next_tasks.json
```
  中的每个任务都成为一个跟踪任务，保留
```
implementation_guidance
```
  、
```
depends_on
```
  和
```
verification
```
  字段
- 规划模式：
```
project_plan.json
```
  中的每个任务都成为一个跟踪任务，保留任务块顺序、依赖关系和验证标准
- 规格模式：
```
project_spec.json
```
  中的每个需求都成为一个跟踪任务（实现+验证），保留优先级、依赖关系和验收标准
保留依赖顺序——不要扁平化DAG。任务块C01中的任务必须在C02中的任务开始前完成
将验证作为显式任务包含——不要只跟踪“实现X”，还要跟踪“验证X是否符合[标准]”。未被跟踪的验证步骤往往不会被执行
在工作时标记每个任务的状态：pending → in_progress → completed

然后按顺序执行，在推进时更新状态。完成任务后，将其标记为已完成并继续下一个任务。这会创建一个执行轨迹，让你清楚了解已完成和待完成的工作。

Step 7: Execute and Iterate

步骤7：执行并迭代

General: read

result.md

for the winning answer. Copy deliverable files from the winner's workspace if applicable.

Evaluate: read

verdict.json

— if

"iterate"

, check

approach_assessment.ceiling_status

next_tasks.json

first:

ceiling_not_reached

→ execute

fix_tasks

, then

evolution_tasks

as stretch

ceiling_approaching

→ execute

fix_tasks

, then

evolution_tasks

```
ceiling_reached
```
→ consider re-invoking plan mode with evaluation findings (see Plan-Evaluate Loop below) If
```
"converged"
```
, proceed to delivery.

Plan / Spec: store the result as a living document (see below), then execute the grounded tasks chunk by chunk. At tasks marked with

eval_checkpoint

, invoke evaluate mode to assess approach viability before continuing (see Plan-Evaluate Loop below).

通用模式：阅读

result.md

获取获胜答案。如果适用，从获胜Agent的工作区复制交付文件。

评估模式：阅读

verdict.json

——如果结果为

"iterate"

，请先查看

next_tasks.json

中的

approach_assessment.ceiling_status

：

```
ceiling_not_reached
```
→ 执行
```
fix_tasks
```
，然后将
```
evolution_tasks
```
作为扩展任务执行

ceiling_approaching

→ 执行

fix_tasks

，然后执行

evolution_tasks

```
ceiling_reached
```
→ 考虑结合评估结果重新调用规划模式（见下文的“规划-评估循环”）如果结果为
```
"converged"
```
，则可以交付成果。

规划/规格模式：将结果存储为活文档（见下文），然后按任务块执行已整合的任务。在标记为

eval_checkpoint

的任务处，调用评估模式评估方法的可行性，然后再继续推进（见下文的“规划-评估循环”）。

Living Document Protocol (Plan & Spec Modes)

活文档协议（规划和规格模式）

This is the most important section for plan/spec modes — it defines how the output is used after MassGen produces it.

这是规划/规格模式中最重要的部分——定义了MassGen生成输出后的使用方式。

Store

存储

Adopt the MassGen output into

.massgen/plans/

using the existing

PlanStorage

infrastructure:

.massgen/plans/plan_<timestamp>/
├── workspace/          # Mutable working copy
│   ├── plan.json       # (renamed from project_plan.json) or spec.json
│   └── research/       # Auxiliary files from MassGen output
├── frozen/             # Immutable snapshot (identical to workspace/ at creation)
│   ├── plan.json       # or spec.json
│   └── research/
└── plan_metadata.json  # artifact_type, status, chunk_order, context_paths

Copy

project_plan.json

→

workspace/plan.json

(or

project_spec.json

→

workspace/spec.json

). Copy any auxiliary directories. Create

frozen/

as an identical snapshot.

使用现有的

PlanStorage

基础设施将MassGen输出整合到

.massgen/plans/

中：

.massgen/plans/plan_<timestamp>/
├── workspace/          # 可编辑的工作副本
│   ├── plan.json       # （从project_plan.json重命名）或spec.json
│   └── research/       # MassGen输出中的辅助文件
├── frozen/             # 不可变的快照（与创建时的workspace/完全一致）
│   ├── plan.json       # 或spec.json
│   └── research/
└── plan_metadata.json  # artifact_type、status、chunk_order、context_paths

将

project_plan.json

复制为

workspace/plan.json

（或将

project_spec.json

复制为

workspace/spec.json

）。复制所有辅助目录。创建与workspace/完全一致的

frozen/

快照。

Read on Restart

重启时读取

FIRST ACTION in every new session: read

workspace/plan.json

(or

workspace/spec.json

). This is the source of truth for what's done and what's next.

每个新会话的第一个操作：读取

workspace/plan.json

（或

workspace/spec.json

）。这是已完成和待完成工作的唯一来源。

Update Continuously

持续更新

As tasks complete (plan) or requirements are implemented (spec), update the workspace copy. Mark status, add notes, record discoveries. The workspace copy is a living document.

随着任务完成（规划模式）或需求实现（规格模式），更新工作副本。标记状态、添加注释、记录新发现。工作副本是一个活文档。

Check Drift

检查偏离

Periodically compare

workspace/

against

frozen/

. The existing

PlanSession.compute_plan_diff()

returns a

divergence_score

(0.0 = no drift, 1.0 = complete rewrite). High drift means re-evaluate whether the plan/spec is still valid.

定期比较

workspace/

和

frozen/

。现有的

PlanSession.compute_plan_diff()

会返回

divergence_score

（0.0 = 无偏离，1.0 = 完全重写）。高偏离值意味着需要重新评估计划/规格是否仍然有效。

Refine When Stuck

陷入停滞时优化

If the plan/spec proves wrong or incomplete, re-invoke this skill with the workspace copy as "What Already Exists" to get multi-agent refinement. This creates a new plan directory with a fresh

frozen/

snapshot.

如果计划/规格被证明有误或不完整，请以工作副本作为“现有内容”重新调用本Skill，获取多Agent优化结果。这会创建一个新的计划目录，包含新的

frozen/

快照。

Don't Drift Silently

不要无声地偏离

If you deviate from the plan/spec, update the workspace copy first. An outdated plan is worse than no plan.

如果你偏离了计划/规格，请先更新工作副本。过时的计划比没有计划更糟糕。

Mode Overviews

模式概述

General

通用模式

Run any task through MassGen's multi-agent system. Agents independently produce solutions and converge through checklist-gated voting. Use this for tasks that don't fit neatly into evaluate, plan, or spec — writing, code generation, research, analysis, design, or anything where multiple perspectives improve the result.

See

references/general/workflow.md

for the context template and output handling.

通过MassGen多Agent系统运行任意任务。Agent独立生成解决方案，并通过基于检查清单的投票机制收敛结果。适用于无法明确归为评估、规划或规格制定类别的任务——写作、代码生成、研究、分析、设计，或任何多视角能提升结果质量的场景。

查看

references/general/workflow.md

获取上下文模板和输出处理指南。

Evaluate

评估模式

Critique existing work artifacts. Evaluator agents inspect your code, documents, or deliverables and produce a structured critique with machine-readable verdict, per-criterion scores, and actionable improvement tasks. The checklist-gated voting system ensures agents converge on the strongest critique.

See

references/evaluate/workflow.md

for the full context template, output structure, and examples.

评审现有工作成果。评估Agent会检查你的代码、文档或交付成果，并生成包含机器可读结论、分项标准评分和可操作改进任务的结构化评审。基于检查清单的投票机制确保Agent收敛出最有价值的评审结果。

查看

references/evaluate/workflow.md

获取完整的上下文模板、输出结构和示例。

Plan

规划模式

Create or refine a structured project plan. Planning agents decompose the goal into a task DAG with chunks, dependencies, verification criteria, and technology choices. Each round of MassGen iteration improves task quality — descriptions get more actionable, verification gets more specific, sequencing gets tighter.

See

references/plan/workflow.md

for the full context template, output format, and lifecycle.

创建或优化结构化项目计划。规划Agent会将目标分解为包含任务块、依赖关系、验证标准和技术选择的任务DAG。每一轮MassGen迭代都会提升任务质量——任务描述会更具可操作性、验证标准会更具体、任务顺序会更紧凑。

查看

references/plan/workflow.md

获取完整的上下文模板、输出格式和生命周期指南。

Spec

规格模式

Create or refine a requirements specification. Spec agents produce EARS-formatted requirements with acceptance criteria, rationale, and verification. Iteration focuses on precision — each round eliminates ambiguities, fills gaps, and strengthens edge case coverage.

See

references/spec/workflow.md

for the full context template, output format, and lifecycle.

创建或优化需求规格。规格Agent会生成带有验收标准、设计依据和验证标准的EARS格式需求。迭代的重点是精确性——每一轮迭代都会消除歧义、填补漏洞并强化边缘场景覆盖。

查看

references/spec/workflow.md

获取完整的上下文模板、输出格式和生命周期指南。

Plan-Evaluate Loop

规划-评估循环

For complex or creative projects, plan and evaluate modes work together in a feedback loop:

Plan → Execute → Evaluate → (fix OR re-plan) → Execute → Evaluate → ...

对于复杂或创意项目，规划和评估模式会形成一个反馈循环协同工作：

规划 → 执行 → 评估 → （修复 或 重新规划） → 执行 → 评估 → ...

When to Use the Loop

何时使用该循环

The task has exploratory components (visual design, creative writing, UX)
The project is complex enough that the initial plan is partly speculative
Quality expectations are high and "correct but adequate" isn't enough
Prior iterations show diminishing returns

任务包含探索性内容（视觉设计、创意写作、UX）
项目足够复杂，初始计划部分具有推测性
质量期望高，“合格但不够好”无法满足需求
前几轮迭代的收益递减

Loop Protocol

循环协议

Plan: invoke plan mode. Agents classify tasks as
```
deterministic
```
or
```
exploratory
```
and create prototypes to validate assumptions
Execute: implement the plan chunk by chunk
Evaluate: at
```
eval_checkpoint
```
tasks (or after any exploratory chunk), invoke evaluate mode
Decide: read
```
approach_assessment
```
in the evaluation output:
- ```
ceiling_not_reached
```
  → execute fix_tasks, continue
- ```
ceiling_approaching
```
  → execute fix_tasks + evolution_tasks, continue
- ```
ceiling_reached
```
  → re-invoke plan mode with evaluation discoveries
Evolve: if re-planning, pass
```
approach_assessment
```
and
```
breakthroughs
```
as context. The new plan amplifies what worked and avoids approaches that hit their ceiling
Repeat until evaluation returns "converged"

规划：调用规划模式。Agent会将任务分类为
```
deterministic
```
（确定性）或
```
exploratory
```
（探索性），并创建原型验证假设
执行：按任务块逐步实现计划
评估：在
```
eval_checkpoint
```
任务处（或任何探索性任务块完成后），调用评估模式
决策：读取评估输出中的
```
approach_assessment
```
：
- ```
ceiling_not_reached
```
  → 执行fix_tasks，继续推进
- ```
ceiling_approaching
```
  → 执行fix_tasks + evolution_tasks，继续推进
- ```
ceiling_reached
```
  → 结合评估发现重新调用规划模式
演进：如果重新规划，请将
```
approach_assessment
```
和
```
breakthroughs
```
作为上下文传递。新计划会放大有效的方法，避免使用已触及天花板的方法
重复，直到评估返回“converged”

What Makes This Different from Just Re-Running Eval

与仅重新运行评估的区别

Eval assesses whether the APPROACH has room to grow, not just whether the OUTPUT has defects
When the approach is limited, the loop goes back to PLANNING, not just more implementation
Breakthroughs discovered during execution feed FORWARD into new plans, not just into preserve lists
The plan evolves based on evidence from execution, not speculation

评估会评估方法是否有成长空间，而不仅仅是输出是否有缺陷
当方法存在局限性时，循环会回到规划阶段，而不仅仅是继续实现
执行过程中发现的突破会反馈到新计划中，而不仅仅是被保留
计划会基于执行中的证据演进，而非基于推测

Loop Termination

循环终止条件

Max 3 plan mutations per chunk — if still not converging, escalate to user
If evaluation returns "converged" with
```
ceiling_not_reached
```
, the loop is complete
If the user provides explicit direction, follow it regardless of ceiling status

每个任务块最多允许3次计划变更——如果仍未收敛，请向用户升级反馈
如果评估返回“converged”且
```
ceiling_not_reached
```
，则循环完成
如果用户提供明确指示，无论天花板状态如何，请遵循用户指示

Condensed Examples

简化示例

General: Multi-Agent Task Execution

通用模式：多Agent任务执行

bash

WORK_DIR=".massgen/general/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

cat > $WORK_DIR/context.md << 'EOF'

bash

WORK_DIR=".massgen/general/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

cat > $WORK_DIR/context.md << 'EOF'

Task

任务

Build a responsive landing page for a developer tool that converts CSV files to JSON. Single page with hero, features, and CTA sections.

为一个将CSV文件转换为JSON的开发者工具构建响应式着陆页。单页包含Hero区、功能区和CTA区。

Context

上下文

Target audience: developers and data engineers
Tech stack: HTML, CSS, vanilla JS (no frameworks)
Must work on mobile and desktop

目标受众：开发者和数据工程师
技术栈：HTML、CSS、原生JS（无框架）
必须支持移动端和桌面端

Quality Expectations

质量期望

Visually polished, not template-looking
Fast load time, no external dependencies EOF

视觉效果精致，避免模板化
加载速度快，无外部依赖 EOF

Build prompt from references/general/prompt_template.md, then run

从references/general/prompt_template.md构建提示词，然后运行

No --criteria-file — criteria auto-generated from task

无需--criteria-file — 标准会从任务中自动生成

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--viewer

undefined

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--viewer

undefined

Evaluate: Pre-PR Review

评估模式：PR前评审

bash

WORK_DIR=".massgen/evaluate/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

bash

WORK_DIR=".massgen/evaluate/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

Write context (scope: specific deliverables, no quality opinions)

编写上下文（范围：特定交付成果，无质量评价）

cat > $WORK_DIR/context.md << 'EOF'

Deliverables in Scope

范围内的交付成果

```
src/api/handler.ts
```
— API request handler
```
src/hooks/useAuth.ts
```
— authentication hook

```
src/api/handler.ts
```
— API请求处理器
```
src/hooks/useAuth.ts
```
— 认证Hook

Out of Scope

范围外内容

Test files, CI config

测试文件、CI配置

Original Task

原始任务

Add JWT authentication to the API layer

为API层添加JWT认证

What Was Done

已完成工作

Implemented JWT validation in handler and auth hook for React components.

在处理器和React组件的认证Hook中实现了JWT验证。

Verification Evidence

验证证据

pytest: 24 passed, 0 failed EOF

pytest：24个测试通过，0个失败 EOF

Write criteria (or omit --eval-criteria to use default preset)

编写标准（或省略--eval-criteria使用默认预设）

cat > $WORK_DIR/criteria.json << 'EOF' [ {"text": "Auth security: JWT validation covers expiration, signature, and audience checks.", "category": "must"}, {"text": "Error handling: invalid/expired tokens produce clear error responses.", "category": "must"}, {"text": "Code quality: clean separation between auth logic and business logic.", "category": "should"} ] EOF

cat > $WORK_DIR/criteria.json << 'EOF' [ {"text": "认证安全性：JWT验证包含过期时间、签名和受众检查。", "category": "must"}, {"text": "错误处理：无效/过期令牌会返回清晰的错误响应。", "category": "must"}, {"text": "代码质量：认证逻辑与业务逻辑清晰分离。", "category": "should"} ] EOF

Build prompt from template, then run

从模板构建提示词，然后运行

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--criteria-file "$WORK_DIR/criteria.json"
--viewer

undefined

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--criteria-file "$WORK_DIR/criteria.json"
--viewer

undefined

Plan: New Feature Planning

规划模式：新功能规划

bash

WORK_DIR=".massgen/plan/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

cat > $WORK_DIR/context.md << 'EOF'

bash

WORK_DIR=".massgen/plan/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

cat > $WORK_DIR/context.md << 'EOF'

Goal

目标

Add real-time collaboration to the document editor — multiple users editing the same document simultaneously with cursor presence.

为文档编辑器添加实时协作功能——多个用户可同时编辑同一文档，显示光标位置。

Constraints

约束条件

Must work with existing PostgreSQL database
Timeline: 2 weeks
Team: 2 engineers

必须与现有PostgreSQL数据库兼容
时间线：2周
团队：2名工程师

Existing Context

现有上下文

Express.js backend, React frontend, WebSocket already used for notifications.

Express.js后端、React前端、已使用WebSocket实现通知功能。

Success Criteria

成功标准

Two users can edit the same document with <500ms sync latency and no data loss. EOF

两名用户可同时编辑同一文档，同步延迟<500ms且无数据丢失。 EOF

Build prompt from references/plan/prompt_template.md, then run

从references/plan/prompt_template.md构建提示词，然后运行

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--criteria-preset planning
--viewer

undefined

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--criteria-preset planning
--viewer

undefined

Spec: Feature Specification

规格模式：功能规格制定

bash

WORK_DIR=".massgen/spec/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

cat > $WORK_DIR/context.md << 'EOF'

bash

WORK_DIR=".massgen/spec/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$WORK_DIR"

cat > $WORK_DIR/context.md << 'EOF'

Problem Statement

问题描述

Users cannot recover deleted items — deletion is permanent and irreversible.

用户无法恢复已删除的项目——删除操作是永久且不可逆的。

User Needs / Personas

用户需求 / 用户角色

End users: accidentally delete items, need easy recovery
Admins: need to purge items for compliance after retention period

终端用户：不小心删除项目，需要简单的恢复方式
管理员：需要在保留期后为合规目的清除项目

Constraints

约束条件

PostgreSQL database, soft-delete pattern preferred
30-day retention before permanent purge
Must not break existing API consumers EOF

PostgreSQL数据库，优先使用软删除模式
30天保留期后永久清除
不得影响现有API消费者 EOF

Build prompt from references/spec/prompt_template.md, then run

从references/spec/prompt_template.md构建提示词，然后运行

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--criteria-preset spec
--viewer

undefined

bash "$SKILL_DIR/scripts/massgen_run.sh"
--work-dir "$WORK_DIR"
--prompt-file "$WORK_DIR/prompt.md"
--criteria-preset spec
--viewer

undefined

Reference Files

参考文件

```
references/general/workflow.md
```
— general mode context template and output handling
```
references/general/prompt_template.md
```
— general prompt template with placeholders
```
references/criteria_guide.md
```
— how to write quality criteria (format, tiers, examples)
```
references/evaluate/workflow.md
```
— evaluate mode context template, output structure, examples
```
references/evaluate/prompt_template.md
```
— evaluation prompt template with placeholders
```
references/plan/workflow.md
```
— plan mode context template, output format, lifecycle
```
references/plan/prompt_template.md
```
— planning prompt template with placeholders
```
references/spec/workflow.md
```
— spec mode context template, output format, lifecycle
```
references/spec/prompt_template.md
```
— spec prompt template with placeholders

massgen/subagent_types/round_evaluator/SUBAGENT.md

— source methodology for evaluation

massgen/skills/massgen-develops-massgen/SKILL.md

— reference pattern for

--automation

```
references/general/workflow.md
```
— 通用模式上下文模板和输出处理指南
```
references/general/prompt_template.md
```
— 通用提示词模板（包含占位符）
```
references/criteria_guide.md
```
— 质量标准编写指南（格式、层级、示例）
```
references/evaluate/workflow.md
```
— 评估模式上下文模板、输出结构、示例
```
references/evaluate/prompt_template.md
```
— 评估提示词模板（包含占位符）
```
references/plan/workflow.md
```
— 规划模式上下文模板、输出格式、生命周期指南
```
references/plan/prompt_template.md
```
— 规划提示词模板（包含占位符）
```
references/spec/workflow.md
```
— 规格模式上下文模板、输出格式、生命周期指南
```
references/spec/prompt_template.md
```
— 规格提示词模板（包含占位符）

massgen/subagent_types/round_evaluator/SUBAGENT.md

— 评估方法的原始文档

massgen/skills/massgen-develops-massgen/SKILL.md

—

--automation

模式的参考示例