ablation-planner

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Ablation Planner

消融实验规划器

Systematically design ablation studies that answer the questions reviewers will ask. Codex leads the design (reviewer perspective), CC reviews feasibility and implements.
系统性地设计消融实验,以回应审稿人可能提出的问题。Codex主导设计(从审稿人视角出发),CC负责审核可行性并执行。

Context: $ARGUMENTS

上下文:$ARGUMENTS

When to Use

使用场景

  • Main results pass
    /result-to-claim
    with claim_supported = yes or partial
  • User explicitly requests ablation planning
  • /auto-review-loop
    reviewer identifies missing ablations
  • 主要结果通过
    /result-to-claim
    且claim_supported = yes或partial
  • 用户明确要求进行消融实验规划
  • /auto-review-loop
    审稿人指出缺失消融实验

Workflow

工作流程

Step 1: Prepare Context

步骤1:准备上下文

CC reads available project files to build the full picture:
  • Method description and components (from docs/research_contract.md or project CLAUDE.md)
  • Current experiment results (from EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, or W&B)
  • Confirmed and intended claims (from result-to-claim output or project notes)
  • Available compute resources (from CLAUDE.md server config, if present)
CC读取可用的项目文件以构建完整信息:
  • 方法描述和组件(来自docs/research_contract.md或项目CLAUDE.md)
  • 当前实验结果(来自EXPERIMENT_LOG.md、EXPERIMENT_TRACKER.md或W&B)
  • 已确认和拟提出的结论(来自result-to-claim输出或项目笔记)
  • 可用计算资源(若存在,来自CLAUDE.md中的服务器配置)

Step 2: Codex Designs Ablations

步骤2:Codex设计消融实验

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are a rigorous ML reviewer planning ablation studies.
    Given this method and results, design ablations that:

    1. Isolate the contribution of each novel component
    2. Answer questions reviewers will definitely ask
    3. Test sensitivity to key hyperparameters
    4. Compare against natural alternative design choices

    Method: [description from project files]
    Components: [list of removable/replaceable components]
    Current results: [key metrics from experiments]
    Claims: [what we claim and current evidence]

    For each ablation, specify:
    - name: what to change (e.g., "remove module X", "replace Y with Z")
    - what_it_tests: the specific question this answers
    - expected_if_component_matters: what we predict if the component is important
    - priority: 1 (must-run) to 5 (nice-to-have)

    Also provide:
    - coverage_assessment: what reviewer questions these ablations answer
    - unnecessary_ablations: experiments that seem useful but won't add insight
    - suggested_order: run order optimized for maximum early information
    - estimated_compute: total GPU-hours estimate
mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are a rigorous ML reviewer planning ablation studies.
    Given this method and results, design ablations that:

    1. Isolate the contribution of each novel component
    2. Answer questions reviewers will definitely ask
    3. Test sensitivity to key hyperparameters
    4. Compare against natural alternative design choices

    Method: [description from project files]
    Components: [list of removable/replaceable components]
    Current results: [key metrics from experiments]
    Claims: [what we claim and current evidence]

    For each ablation, specify:
    - name: what to change (e.g., "remove module X", "replace Y with Z")
    - what_it_tests: the specific question this answers
    - expected_if_component_matters: what we predict if the component is important
    - priority: 1 (must-run) to 5 (nice-to-have)

    Also provide:
    - coverage_assessment: what reviewer questions these ablations answer
    - unnecessary_ablations: experiments that seem useful but won't add insight
    - suggested_order: run order optimized for maximum early information
    - estimated_compute: total GPU-hours estimate

Step 3: Parse Ablation Plan

步骤3:解析消融实验计划

Normalize Codex response into structured format:
markdown
undefined
将Codex的响应标准化为结构化格式:
markdown
undefined

Ablation Plan

Ablation Plan

Component Ablations (highest priority)

Component Ablations (highest priority)

#NameWhat It TestsExpected If MattersPriority
1remove module Xcontribution of Xperformance drops on metric Y1
2replace X with simpler Zvalue of learned vs fixeddrops, especially on dataset A2
#NameWhat It TestsExpected If MattersPriority
1remove module Xcontribution of Xperformance drops on metric Y1
2replace X with simpler Zvalue of learned vs fixeddrops, especially on dataset A2

Hyperparameter Sensitivity

Hyperparameter Sensitivity

#ParameterValues to TestWhat It TestsPriority
3lambda[0.01, 0.1, 1.0]sensitivity to regularization3
#ParameterValues to TestWhat It TestsPriority
3lambda[0.01, 0.1, 1.0]sensitivity to regularization3

Design Choice Comparisons

Design Choice Comparisons

#NameWhat It TestsPriority
4joint vs separate matchingwhether joint adds value4
#NameWhat It TestsPriority
4joint vs separate matchingwhether joint adds value4

Coverage Assessment

Coverage Assessment

[What reviewer questions these ablations answer]
[What reviewer questions these ablations answer]

Unnecessary Ablations

Unnecessary Ablations

[Experiments that seem useful but won't add insight — skip these]
[Experiments that seem useful but won't add insight — skip these]

Run Order

Run Order

[Optimized for maximum early information]
[Optimized for maximum early information]

Estimated Compute

Estimated Compute

[Total GPU-hours]
undefined
[Total GPU-hours]
undefined

Step 4: CC Reviews Feasibility

步骤4:CC审核可行性

Before running anything, CC checks:
  • Compute budget: can we afford all ablations with available GPUs?
  • Code changes: which ablations need code modifications vs config-only changes?
  • Dependencies: which ablations can run in parallel?
  • Cuts: if budget is tight, propose removing lower-priority ablations and ask Codex to confirm
在执行任何实验前,CC需检查:
  • 计算预算:现有GPU资源是否支持所有消融实验?
  • 代码变更:哪些消融实验只需修改配置,哪些需要调整代码?
  • 依赖关系:哪些消融实验可以并行执行?
  • 削减方案:若预算紧张,建议移除低优先级的消融实验并请Codex确认

Step 5: Implement and Run

步骤5:实现与执行

  1. Create configs/scripts for each ablation (config-only changes first)
  2. Smoke test each ablation before full run
  3. Run in suggested order, using descriptive names (e.g.,
    ablation-no-module-X
    )
  4. Track results in EXPERIMENT_LOG.md
  5. After all ablations complete → update findings.md with insights
  1. 为每个消融实验创建配置/脚本(优先处理仅需修改配置的实验)
  2. 在全量运行前对每个消融实验进行冒烟测试
  3. 按照建议顺序执行,使用描述性命名(如
    ablation-no-module-X
  4. 在EXPERIMENT_LOG.md中记录结果
  5. 所有消融实验完成后 → 在findings.md中更新实验见解

Rules

规则

  • Codex leads the design. CC does not pre-filter or bias the ablation list before Codex sees it. Codex thinks like a reviewer; CC thinks like an engineer.
  • Every ablation must have a clear
    what_it_tests
    and
    expected_if_component_matters
    . No "just try it" experiments.
  • Config-only ablations take priority over those needing code changes (faster, less error-prone).
  • If total compute exceeds budget, CC proposes cuts and asks Codex to re-prioritize — don't silently drop ablations.
  • Component ablations (remove/replace) take priority over hyperparameter sweeps.
  • Do not generate ablations for components identical to the baseline (no-op ablations).
  • Record all ablation results in EXPERIMENT_LOG.md, including negative results (component removal had no effect = important finding).
  • Codex主导设计。在Codex查看前,CC不得预先过滤或偏向消融实验列表。Codex从审稿人角度思考;CC从工程师角度思考。
  • 每个消融实验必须明确
    what_it_tests
    expected_if_component_matters
    。禁止“随便试试”的实验。
  • 仅需修改配置的消融实验优先于需要调整代码的实验(更快、出错率更低)。
  • 若总计算量超出预算,CC需提出削减方案并请Codex重新排序优先级 — 不得擅自删除消融实验。
  • 组件消融实验(移除/替换)优先于超参数扫描。
  • 不得为与基线完全相同的组件生成消融实验(无意义的消融实验)。
  • 所有消融实验结果(包括负面结果,如移除组件后无效果 = 重要发现)均需记录在EXPERIMENT_LOG.md中。