experiment-design

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Experiment Design

实验设计

Design structured, progressive experiment plans for research papers.
为科研论文设计结构化、递进式的实验方案。

Input

输入

  • $0
    — Research idea, plan, or method description
  • $0
    — 研究想法、规划或方法描述

References

参考资料

  • 4-stage progressive experiment prompts:
    ~/.claude/skills/experiment-design/references/stage-prompts.md
  • 四阶段递进式实验提示词:
    ~/.claude/skills/experiment-design/references/stage-prompts.md

Scripts

脚本

Generate experiment design

生成实验设计

bash
python ~/.claude/skills/experiment-design/scripts/design_experiments.py --plan research_plan.json --output experiment_design.json
python ~/.claude/skills/experiment-design/scripts/design_experiments.py --method "contrastive learning" --task classification --format markdown
Generates baselines, ablation matrix, hyperparameter grid, metric selection. Stdlib-only.
bash
python ~/.claude/skills/experiment-design/scripts/design_experiments.py --plan research_plan.json --output experiment_design.json
python ~/.claude/skills/experiment-design/scripts/design_experiments.py --method "contrastive learning" --task classification --format markdown
生成基线模型、消融矩阵、超参数网格和评估指标选择。仅使用标准库实现。

4-Stage Progressive Framework (from AI-Scientist-v2)

四阶段递进式框架(源自AI-Scientist-v2)

Stage 1: Initial Implementation

阶段1:初始实现

  • Focus on getting a basic working implementation
  • Use a simple dataset
  • Aim for basic functional correctness
  • Completion: at least one working (non-buggy) implementation
  • 重点是搭建可运行的基础实现
  • 使用简单数据集
  • 目标是保证基础功能正确性
  • 完成标准:至少一个可正常运行(无bug)的实现

Stage 2: Baseline Tuning

阶段2:基线调优

  • Tune hyperparameters (learning rate, epochs, batch size)
  • Do NOT change model architecture
  • Test on at least TWO datasets
  • Completion: stable training curves, improvement over Stage 1
  • 调优超参数(学习率、训练轮数、批次大小)
  • 不得更改模型架构
  • 至少在两个数据集上进行测试
  • 完成标准:稳定的训练曲线,效果优于阶段1

Stage 3: Creative Research

阶段3:创新性研究

  • Explore novel improvements and insights
  • Be creative and think outside the box
  • Test on at least THREE datasets
  • Completion: demonstrated novel improvement
  • 探索新颖的改进方向和研究见解
  • 发挥创意,跳出固有思维
  • 至少在三个数据集上进行测试
  • 完成标准:展示出新颖的改进效果

Stage 4: Ablation Studies

阶段4:消融实验

  • Systematic component analysis
  • Each ablation tests a different aspect
  • Use same datasets as Stage 3
  • Completion: all planned ablations done
  • 系统性的组件分析
  • 每个消融实验针对不同的组件方面
  • 使用与阶段3相同的数据集
  • 完成标准:完成所有规划的消融实验

Output Format

输出格式

json
{
  "stages": [
    {
      "name": "initial_implementation",
      "goals": ["Basic working baseline", "Simple dataset"],
      "max_iterations": 5,
      "completion_criteria": "Working implementation with non-zero accuracy"
    }
  ],
  "baselines": ["Method A", "Method B"],
  "datasets": ["Dataset1", "Dataset2", "Dataset3"],
  "metrics": ["accuracy", "F1", "inference_time"],
  "ablation_components": ["component_A", "component_B"],
  "hyperparameter_grid": {
    "lr": [1e-4, 1e-3, 1e-2],
    "batch_size": [32, 64, 128]
  },
  "num_seeds": 3
}
json
{
  "stages": [
    {
      "name": "initial_implementation",
      "goals": ["Basic working baseline", "Simple dataset"],
      "max_iterations": 5,
      "completion_criteria": "Working implementation with non-zero accuracy"
    }
  ],
  "baselines": ["Method A", "Method B"],
  "datasets": ["Dataset1", "Dataset2", "Dataset3"],
  "metrics": ["accuracy", "F1", "inference_time"],
  "ablation_components": ["component_A", "component_B"],
  "hyperparameter_grid": {
    "lr": [1e-4, 1e-3, 1e-2],
    "batch_size": [32, 64, 128]
  },
  "num_seeds": 3
}

Rules

规则

  • Always start simple (Stage 1) before complex experiments
  • Each stage builds on the best result from the previous stage
  • Multi-seed evaluation for statistical significance
  • Document every experiment run in notes.txt
  • Generate figures for training curves and comparisons
  • 先从简单实验(阶段1)开始,再进行复杂实验
  • 每个阶段都基于上一阶段的最优结果展开
  • 采用多种子评估以保证统计显著性
  • 将每次实验运行记录在notes.txt中
  • 生成训练曲线和对比图表

Related Skills

相关技能

  • Upstream: research-planning, idea-generation
  • Downstream: experiment-code, data-analysis
  • See also: paper-assembly
  • 上游技能:研究规划想法生成
  • 下游技能:实验代码数据分析
  • 另见:论文整合