reproducibility-audit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Reproducibility Audit

可复现性审计

Purpose

目的

Help the user make a research project reproducible enough that another person, or their future self, can rerun it. This skill follows the handbook's reproducibility section: exact environments, data and code versioning, documentation, random seeds, parameter logging, and backups.
The output is an audit report plus a prioritized fix list.
帮助用户让研究项目具备足够的可复现性,确保他人或未来的自己能够重新运行该项目。本技能遵循手册中的可复现性章节要求:精准环境配置、数据与代码版本控制、文档记录、随机种子设置、参数日志记录以及备份策略。
输出内容为一份审计报告及优先级修复清单。

When to Use

使用场景

  • User is starting or cleaning a research codebase
  • User needs collaborator handoff
  • User cannot reproduce an old experiment
  • User is preparing for paper submission or public release
  • User is moving between local, server, HPC, Docker, or conda environments
  • 用户正在启动或整理研究代码库
  • 用户需要向合作者交接项目
  • 用户无法复现旧实验的结果
  • 用户正在准备论文提交或项目公开发布
  • 用户在本地、服务器、高性能计算(HPC)、Docker或conda环境之间切换

Workflow

工作流程

Stage 1: Identify the Project Context

阶段1:明确项目背景

Ask:
  • What repository or folder is being audited?
  • Is this local, remote server, HPC, or containerized?
  • Is the goal internal reproducibility, collaborator onboarding, or public release?
  • What result must be reproducible?
If the user grants filesystem access, inspect the repo structure before advising.
询问以下问题:
  • 正在审核的是哪个仓库或文件夹?
  • 项目运行在本地、远程服务器、HPC还是容器化环境中?
  • 可复现性的目标是内部使用、合作者上手还是公开发布?
  • 必须可复现的具体结果是什么?
如果用户授予文件系统访问权限,建议先检查仓库结构。

Stage 2: Audit the Three Pillars

阶段2:审核三大核心支柱

Check:
Environment management
  • Conda/uv/pip environment file exists
  • Python/CUDA/framework versions are pinned where needed
  • Setup instructions are current
  • Dockerfile or container instructions exist if portability matters
Data and code versioning
  • Git tracks source and config, not generated clutter
  • Dataset source/version/preprocessing are documented
  • Large files have an intentional storage path
  • Checkpoints and outputs are named consistently
Documentation
  • README explains install, data setup, train/eval commands
  • Commands are copy-pasteable
  • Expected outputs and runtime are stated
  • Known caveats and hardware assumptions are explicit
检查以下内容:
环境管理
  • 是否存在Conda/uv/pip环境配置文件
  • 是否在必要位置固定了Python/CUDA/框架版本
  • 安装说明是否是最新的
  • 如果需要可移植性,是否存在Dockerfile或容器化部署说明
数据与代码版本控制
  • Git是否仅跟踪源码和配置文件,而非生成的冗余文件
  • 是否记录了数据集的来源、版本及预处理流程
  • 大文件是否有指定的存储路径
  • 检查点和输出文件的命名是否保持一致
文档记录
  • README是否说明了安装、数据准备、训练/评估命令
  • 命令是否可直接复制粘贴使用
  • 是否注明了预期输出结果和运行时长
  • 是否明确说明了已知注意事项和硬件要求

Stage 3: Audit Experiment Hygiene

阶段3:审核实验规范性

Check:
  • Random seeds
  • Config files
  • Hyperparameter logging
  • Commit hash logging
  • Metric files
  • Plot/table generation scripts
  • Failure logs
  • Result-to-paper traceability
检查以下内容:
  • 随机种子设置
  • 配置文件管理
  • 超参数日志记录
  • Commit哈希值日志记录
  • 指标文件管理
  • 图表/表格生成脚本
  • 错误日志记录
  • 实验结果到论文内容的可追溯性

Stage 4: Classify Issues

阶段4:问题分类

Use severity:
  • Blocker
    : current results cannot be rerun or verified
  • High
    : collaborators will likely fail to run it
  • Medium
    : results can run but are hard to inspect or compare
  • Low
    : polish or future-proofing
Prioritize fixes that protect irreplaceable results first.
按以下严重程度分类:
  • Blocker
    : 当前结果无法重新运行或验证
  • High
    : 合作者很可能无法运行项目
  • Medium
    : 结果可运行,但难以检查或对比
  • Low
    : 优化或面向未来的改进
优先修复那些保护不可替代结果的问题。

Stage 5: Produce the Artifact

阶段5:生成审计产物

Save to
~/phd-log/reproducibility/YYYY-MM-DD-[project].md
.
markdown
undefined
保存至
~/phd-log/reproducibility/YYYY-MM-DD-[project].md
markdown
undefined

Reproducibility Audit — [Project]

Reproducibility Audit — [Project]

Goal

Goal

[Internal / collaborator / release] reproducibility for [target result]
[Internal / collaborator / release] reproducibility for [target result]

Summary

Summary

  • Overall status:
  • Biggest risk:
  • First fix:
  • Overall status:
  • Biggest risk:
  • First fix:

Environment

Environment

CheckStatusNotes
CheckStatusNotes

Data and code versioning

Data and code versioning

CheckStatusNotes
CheckStatusNotes

Documentation

Documentation

CheckStatusNotes
CheckStatusNotes

Experiment hygiene

Experiment hygiene

CheckStatusNotes
CheckStatusNotes

Issues

Issues

SeverityIssueFix
SeverityIssueFix

Prioritized action plan

Prioritized action plan

  • Blocker:
  • High:
  • Medium:
  • Low:
undefined
  • Blocker:
  • High:
  • Medium:
  • Low:
undefined

Tone

语气要求

Be practical. A reproducibility audit should leave the user with 3-5 high-impact fixes, not a guilt-inducing wall of best practices.
保持务实。可复现性审计应给用户留下3-5个高影响力的修复方案,而非堆砌大量最佳实践让用户感到愧疚。

What Not to Do

禁止事项

  • Do not demand Docker if conda or uv is enough for the collaboration context.
  • Do not recommend version pinning without explaining what needs to be pinned.
  • Do not ignore data provenance.
  • Do not treat README polish as more urgent than rerunnable experiments.
  • 如果conda或uv已满足协作需求,不要强制要求使用Docker
  • 不要在未说明需要固定哪些内容的情况下推荐版本固定
  • 不要忽略数据来源
  • 不要将README优化看得比实验可复现更重要