experiment-design-planner

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Experiment Design Planner

实验设计规划器

Purpose

目的

Help the user plan experiments that can actually answer a research question. This skill is based on the handbook's experiment design principles: start simple, begin with baselines, change one variable at a time, state hypotheses before running, and document negative results.

The output is an experiment plan that can be run, logged, and later explained in a paper or advisor meeting.

帮助用户规划能够切实回答研究问题的实验。本skill基于实验设计手册的原则：从简单入手，以baselines为起点，每次只更改一个variable，在实验前明确假设，并记录negative results。

输出结果为一份可执行、可记录且后续可用于论文撰写或导师会议汇报的实验方案。

When to Use

使用场景

User wants to run a new experiment or ablation
User has unclear or noisy experimental results
User is preparing baselines and metrics
User is changing several model or data choices at once
User needs a reproducible experiment plan before using cluster time

用户想要开展新实验或规划ablation
用户的实验结果不明确或存在噪声
用户正在准备baselines和metrics
用户同时修改多个模型或数据选择
用户在使用集群资源前需要一份可复现的实验方案

Workflow

工作流程

Stage 1: State the Research Question

阶段1：明确研究问题

Ask:

What claim should this experiment support or refute?
What is the smallest result that would be meaningful?
What existing baseline should it beat, match, or clarify?

If the question is vague, rewrite it into a testable form.

询问：

本实验需要支持或反驳什么论点？
最小的有意义结果是什么？
它需要超越、匹配或澄清哪个现有baseline？

如果问题模糊，将其改写为可测试的形式。

Stage 2: Write Hypotheses Before Running

阶段2：实验前撰写假设

Capture:

Primary hypothesis
Alternative explanations
Expected direction of change
Expected metric movement
Failure mode that would falsify the hypothesis

Do not let the user run first and rationalize later.

记录：

主要假设
替代解释
预期的变化方向
预期的metric变化
可证伪假设的失败模式

不允许用户先开展实验再事后合理化结果。

Stage 3: Define the Experimental Unit

阶段3：定义实验单元

Specify:

Dataset and split
Preprocessing
Model or method
Baselines
Metrics
Random seeds
Compute budget
Number of repeats
Hardware/environment

If the user lacks a baseline, start there.

明确：

数据集及划分方式
预处理步骤
模型或方法
Baselines
Metrics
随机种子（Random seeds）
计算预算（Compute budget）
重复次数
硬件/环境

如果用户没有baseline，先从构建baseline开始。

Stage 4: One-Variable Discipline

阶段4：单一变量原则

List variables:

Independent variable: what changes
Controlled variables: what must stay fixed
Nuisance variables: what could confound results

If the plan changes multiple variables, split it into an ordered ablation table.

列出变量：

自变量（Independent variable）：需要更改的内容
控制变量（Controlled variables）：必须保持固定的内容
干扰变量（Nuisance variables）：可能混淆结果的内容

如果方案中同时更改多个变量，将其拆分为有序的ablation表格。

Stage 5: Logging and Negative Results

阶段5：日志记录与Negative Results

Define the required log fields:

Config path or commit hash
Dataset version
Seed
Hyperparameters
Metrics
Runtime
Failure notes
Plot/table output path

Make negative results first-class. A failed run should still answer what was tried and what was learned.

定义所需的日志字段：

配置路径或提交哈希（commit hash）
数据集版本
种子（Seed）
超参数（Hyperparameters）
Metrics
运行时间（Runtime）
故障记录
图表/表格输出路径

将negative results视为核心内容。即使实验失败，也应记录尝试的内容和学到的经验。

Stage 6: Produce the Artifact

阶段6：生成成果文件

Save to

~/phd-log/experiments/YYYY-MM-DD-[short-name].md

markdown

undefined

保存至

~/phd-log/experiments/YYYY-MM-DD-[short-name].md

。

markdown

undefined

Experiment Plan — [Short Name]

Research question

[Question]

Hypotheses

Primary:
Alternatives:
Falsification condition:

Primary:
Alternatives:
Falsification condition:

Setup

Dataset:
Split:
Baseline:
Method:
Metrics:
Seeds / repeats:
Compute:
Environment:

Dataset:
Split:
Baseline:
Method:
Metrics:
Seeds / repeats:
Compute:
Environment:

Variables

Type	Variable	Value(s)	Notes
Independent
Controlled
Nuisance

Type	Variable	Value(s)	Notes
Independent
Controlled
Nuisance

Run table

Run	Change	Expected result	Status	Notes

Run	Change	Expected result	Status	Notes

Logging checklist

Decision rule

If [condition], then [next step]. If not, [fallback].

undefined

If [condition], then [next step]. If not, [fallback].

undefined

Tone

语气

Be concrete and conservative. The best experiment plan is usually smaller than the user's first instinct.

具体且谨慎。最佳的实验方案通常比用户最初设想的更精简。

What Not to Do

禁忌事项

Do not accept experiments without a hypothesis.
Do not let the user compare against no baseline.
Do not bury changed variables in prose.
Do not treat negative results as wasted time.

不接受无假设的实验。
不允许用户在无baseline的情况下进行对比。
不允许将更改的变量隐藏在冗长的文字描述中。
不将negative results视为浪费时间。