ablation-planner

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Ablation Planner

消融实验规划器

Systematically design ablation studies that answer the questions reviewers will ask. Codex leads the design (reviewer perspective), CC reviews feasibility and implements.

系统性地设计消融实验，以回应审稿人可能提出的问题。Codex主导设计（从审稿人视角出发），CC负责审核可行性并执行。

Context: $ARGUMENTS

上下文：$ARGUMENTS

When to Use

使用场景

Main results pass
```
/result-to-claim
```
with claim_supported = yes or partial
User explicitly requests ablation planning
```
/auto-review-loop
```
reviewer identifies missing ablations

主要结果通过
```
/result-to-claim
```
且claim_supported = yes或partial
用户明确要求进行消融实验规划
```
/auto-review-loop
```
审稿人指出缺失消融实验

Workflow

工作流程

Step 1: Prepare Context

步骤1：准备上下文

CC reads available project files to build the full picture:

Method description and components (from docs/research_contract.md or project CLAUDE.md)
Current experiment results (from EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, or W&B)
Confirmed and intended claims (from result-to-claim output or project notes)
Available compute resources (from CLAUDE.md server config, if present)

CC读取可用的项目文件以构建完整信息：

方法描述和组件（来自docs/research_contract.md或项目CLAUDE.md）
当前实验结果（来自EXPERIMENT_LOG.md、EXPERIMENT_TRACKER.md或W&B）
已确认和拟提出的结论（来自result-to-claim输出或项目笔记）
可用计算资源（若存在，来自CLAUDE.md中的服务器配置）

Step 2: Codex Designs Ablations

步骤2：Codex设计消融实验

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are a rigorous ML reviewer planning ablation studies.
    Given this method and results, design ablations that:

    1. Isolate the contribution of each novel component
    2. Answer questions reviewers will definitely ask
    3. Test sensitivity to key hyperparameters
    4. Compare against natural alternative design choices

    Method: [description from project files]
    Components: [list of removable/replaceable components]
    Current results: [key metrics from experiments]
    Claims: [what we claim and current evidence]

    For each ablation, specify:
    - name: what to change (e.g., "remove module X", "replace Y with Z")
    - what_it_tests: the specific question this answers
    - expected_if_component_matters: what we predict if the component is important
    - priority: 1 (must-run) to 5 (nice-to-have)

    Also provide:
    - coverage_assessment: what reviewer questions these ablations answer
    - unnecessary_ablations: experiments that seem useful but won't add insight
    - suggested_order: run order optimized for maximum early information
    - estimated_compute: total GPU-hours estimate

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are a rigorous ML reviewer planning ablation studies.
    Given this method and results, design ablations that:

    1. Isolate the contribution of each novel component
    2. Answer questions reviewers will definitely ask
    3. Test sensitivity to key hyperparameters
    4. Compare against natural alternative design choices

    Method: [description from project files]
    Components: [list of removable/replaceable components]
    Current results: [key metrics from experiments]
    Claims: [what we claim and current evidence]

    For each ablation, specify:
    - name: what to change (e.g., "remove module X", "replace Y with Z")
    - what_it_tests: the specific question this answers
    - expected_if_component_matters: what we predict if the component is important
    - priority: 1 (must-run) to 5 (nice-to-have)

    Also provide:
    - coverage_assessment: what reviewer questions these ablations answer
    - unnecessary_ablations: experiments that seem useful but won't add insight
    - suggested_order: run order optimized for maximum early information
    - estimated_compute: total GPU-hours estimate

Step 3: Parse Ablation Plan

步骤3：解析消融实验计划

Normalize Codex response into structured format:

markdown

undefined

将Codex的响应标准化为结构化格式：

markdown

undefined

Ablation Plan

Component Ablations (highest priority)

#	Name	What It Tests	Expected If Matters	Priority
1	remove module X	contribution of X	performance drops on metric Y	1
2	replace X with simpler Z	value of learned vs fixed	drops, especially on dataset A	2

#	Name	What It Tests	Expected If Matters	Priority
1	remove module X	contribution of X	performance drops on metric Y	1
2	replace X with simpler Z	value of learned vs fixed	drops, especially on dataset A	2

Hyperparameter Sensitivity

#	Parameter	Values to Test	What It Tests	Priority
3	lambda	[0.01, 0.1, 1.0]	sensitivity to regularization	3

#	Parameter	Values to Test	What It Tests	Priority
3	lambda	[0.01, 0.1, 1.0]	sensitivity to regularization	3

Design Choice Comparisons

#	Name	What It Tests	Priority
4	joint vs separate matching	whether joint adds value	4

#	Name	What It Tests	Priority
4	joint vs separate matching	whether joint adds value	4

Coverage Assessment

[What reviewer questions these ablations answer]

Unnecessary Ablations

[Experiments that seem useful but won't add insight — skip these]

Run Order

[Optimized for maximum early information]

Estimated Compute

[Total GPU-hours]

undefined

[Total GPU-hours]

undefined

Step 4: CC Reviews Feasibility

步骤4：CC审核可行性

Before running anything, CC checks:

Compute budget: can we afford all ablations with available GPUs?
Code changes: which ablations need code modifications vs config-only changes?
Dependencies: which ablations can run in parallel?
Cuts: if budget is tight, propose removing lower-priority ablations and ask Codex to confirm

在执行任何实验前，CC需检查：

计算预算：现有GPU资源是否支持所有消融实验？
代码变更：哪些消融实验只需修改配置，哪些需要调整代码？
依赖关系：哪些消融实验可以并行执行？
削减方案：若预算紧张，建议移除低优先级的消融实验并请Codex确认

Step 5: Implement and Run

步骤5：实现与执行

Create configs/scripts for each ablation (config-only changes first)
Smoke test each ablation before full run
Run in suggested order, using descriptive names (e.g.,
```
ablation-no-module-X
```
)
Track results in EXPERIMENT_LOG.md
After all ablations complete → update findings.md with insights

为每个消融实验创建配置/脚本（优先处理仅需修改配置的实验）
在全量运行前对每个消融实验进行冒烟测试
按照建议顺序执行，使用描述性命名（如
```
ablation-no-module-X
```
）
在EXPERIMENT_LOG.md中记录结果
所有消融实验完成后 → 在findings.md中更新实验见解

Rules

规则

Codex leads the design. CC does not pre-filter or bias the ablation list before Codex sees it. Codex thinks like a reviewer; CC thinks like an engineer.
Every ablation must have a clear
```
what_it_tests
```
and
```
expected_if_component_matters
```
. No "just try it" experiments.
Config-only ablations take priority over those needing code changes (faster, less error-prone).
If total compute exceeds budget, CC proposes cuts and asks Codex to re-prioritize — don't silently drop ablations.
Component ablations (remove/replace) take priority over hyperparameter sweeps.
Do not generate ablations for components identical to the baseline (no-op ablations).
Record all ablation results in EXPERIMENT_LOG.md, including negative results (component removal had no effect = important finding).

Codex主导设计。在Codex查看前，CC不得预先过滤或偏向消融实验列表。Codex从审稿人角度思考；CC从工程师角度思考。
每个消融实验必须明确
```
what_it_tests
```
和
```
expected_if_component_matters
```
。禁止“随便试试”的实验。
仅需修改配置的消融实验优先于需要调整代码的实验（更快、出错率更低）。
若总计算量超出预算，CC需提出削减方案并请Codex重新排序优先级 — 不得擅自删除消融实验。
组件消融实验（移除/替换）优先于超参数扫描。
不得为与基线完全相同的组件生成消融实验（无意义的消融实验）。
所有消融实验结果（包括负面结果，如移除组件后无效果 = 重要发现）均需记录在EXPERIMENT_LOG.md中。