experiment-design-planner

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Experiment Design Planner

实验设计规划器

Purpose

目的

Help the user plan experiments that can actually answer a research question. This skill is based on the handbook's experiment design principles: start simple, begin with baselines, change one variable at a time, state hypotheses before running, and document negative results.
The output is an experiment plan that can be run, logged, and later explained in a paper or advisor meeting.
帮助用户规划能够切实回答研究问题的实验。本skill基于实验设计手册的原则:从简单入手,以baselines为起点,每次只更改一个variable,在实验前明确假设,并记录negative results。
输出结果为一份可执行、可记录且后续可用于论文撰写或导师会议汇报的实验方案。

When to Use

使用场景

  • User wants to run a new experiment or ablation
  • User has unclear or noisy experimental results
  • User is preparing baselines and metrics
  • User is changing several model or data choices at once
  • User needs a reproducible experiment plan before using cluster time
  • 用户想要开展新实验或规划ablation
  • 用户的实验结果不明确或存在噪声
  • 用户正在准备baselines和metrics
  • 用户同时修改多个模型或数据选择
  • 用户在使用集群资源前需要一份可复现的实验方案

Workflow

工作流程

Stage 1: State the Research Question

阶段1:明确研究问题

Ask:
  • What claim should this experiment support or refute?
  • What is the smallest result that would be meaningful?
  • What existing baseline should it beat, match, or clarify?
If the question is vague, rewrite it into a testable form.
询问:
  • 本实验需要支持或反驳什么论点?
  • 最小的有意义结果是什么?
  • 它需要超越、匹配或澄清哪个现有baseline?
如果问题模糊,将其改写为可测试的形式。

Stage 2: Write Hypotheses Before Running

阶段2:实验前撰写假设

Capture:
  • Primary hypothesis
  • Alternative explanations
  • Expected direction of change
  • Expected metric movement
  • Failure mode that would falsify the hypothesis
Do not let the user run first and rationalize later.
记录:
  • 主要假设
  • 替代解释
  • 预期的变化方向
  • 预期的metric变化
  • 可证伪假设的失败模式
不允许用户先开展实验再事后合理化结果。

Stage 3: Define the Experimental Unit

阶段3:定义实验单元

Specify:
  • Dataset and split
  • Preprocessing
  • Model or method
  • Baselines
  • Metrics
  • Random seeds
  • Compute budget
  • Number of repeats
  • Hardware/environment
If the user lacks a baseline, start there.
明确:
  • 数据集及划分方式
  • 预处理步骤
  • 模型或方法
  • Baselines
  • Metrics
  • 随机种子(Random seeds)
  • 计算预算(Compute budget)
  • 重复次数
  • 硬件/环境
如果用户没有baseline,先从构建baseline开始。

Stage 4: One-Variable Discipline

阶段4:单一变量原则

List variables:
  • Independent variable: what changes
  • Controlled variables: what must stay fixed
  • Nuisance variables: what could confound results
If the plan changes multiple variables, split it into an ordered ablation table.
列出变量:
  • 自变量(Independent variable):需要更改的内容
  • 控制变量(Controlled variables):必须保持固定的内容
  • 干扰变量(Nuisance variables):可能混淆结果的内容
如果方案中同时更改多个变量,将其拆分为有序的ablation表格。

Stage 5: Logging and Negative Results

阶段5:日志记录与Negative Results

Define the required log fields:
  • Config path or commit hash
  • Dataset version
  • Seed
  • Hyperparameters
  • Metrics
  • Runtime
  • Failure notes
  • Plot/table output path
Make negative results first-class. A failed run should still answer what was tried and what was learned.
定义所需的日志字段:
  • 配置路径或提交哈希(commit hash)
  • 数据集版本
  • 种子(Seed)
  • 超参数(Hyperparameters)
  • Metrics
  • 运行时间(Runtime)
  • 故障记录
  • 图表/表格输出路径
将negative results视为核心内容。即使实验失败,也应记录尝试的内容和学到的经验。

Stage 6: Produce the Artifact

阶段6:生成成果文件

Save to
~/phd-log/experiments/YYYY-MM-DD-[short-name].md
.
markdown
undefined
保存至
~/phd-log/experiments/YYYY-MM-DD-[short-name].md
markdown
undefined

Experiment Plan — [Short Name]

Experiment Plan — [Short Name]

Research question

Research question

[Question]
[Question]

Hypotheses

Hypotheses

  • Primary:
  • Alternatives:
  • Falsification condition:
  • Primary:
  • Alternatives:
  • Falsification condition:

Setup

Setup

  • Dataset:
  • Split:
  • Baseline:
  • Method:
  • Metrics:
  • Seeds / repeats:
  • Compute:
  • Environment:
  • Dataset:
  • Split:
  • Baseline:
  • Method:
  • Metrics:
  • Seeds / repeats:
  • Compute:
  • Environment:

Variables

Variables

TypeVariableValue(s)Notes
Independent
Controlled
Nuisance
TypeVariableValue(s)Notes
Independent
Controlled
Nuisance

Run table

Run table

RunChangeExpected resultStatusNotes
RunChangeExpected resultStatusNotes

Logging checklist

Logging checklist

  • Config saved
  • Code commit recorded
  • Dataset version recorded
  • Seed recorded
  • Metrics saved
  • Failure notes saved
  • Plot/table path saved
  • Config saved
  • Code commit recorded
  • Dataset version recorded
  • Seed recorded
  • Metrics saved
  • Failure notes saved
  • Plot/table path saved

Decision rule

Decision rule

If [condition], then [next step]. If not, [fallback].
undefined
If [condition], then [next step]. If not, [fallback].
undefined

Tone

语气

Be concrete and conservative. The best experiment plan is usually smaller than the user's first instinct.
具体且谨慎。最佳的实验方案通常比用户最初设想的更精简。

What Not to Do

禁忌事项

  • Do not accept experiments without a hypothesis.
  • Do not let the user compare against no baseline.
  • Do not bury changed variables in prose.
  • Do not treat negative results as wasted time.
  • 不接受无假设的实验。
  • 不允许用户在无baseline的情况下进行对比。
  • 不允许将更改的变量隐藏在冗长的文字描述中。
  • 不将negative results视为浪费时间。