minimal-run-and-audit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

minimal-run-and-audit

minimal-run-and-audit

Use this as the Rigor Run skill. The installed slug remains
minimal-run-and-audit
for compatibility.
Use the shared operating principles in
../../references/agent-operating-principles.md
; this skill should make run evidence auditable without turning every command into a rigid protocol.
将此用作Rigor Run skill。为保持兼容性,安装后的标识仍为
minimal-run-and-audit
遵循
../../references/agent-operating-principles.md
中的共享操作原则;此skill应使运行证据可审计,同时无需将每个命令都变成严格的协议。

When to apply

适用场景

  • After a reproduction target and setup plan exist.
  • When the main skill needs execution evidence and normalized outputs.
  • When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate.
  • When the user already knows what command should be attempted and wants execution plus reporting only.
  • 已确定复现目标和配置计划之后。
  • 主skill需要执行证据和标准化输出时。
  • 适合进行smoke test、已记录的推理运行、已记录的评估运行或其他简短的非训练验证时。
  • 用户已明确要尝试的命令,仅需要执行和报告时。

When not to apply

不适用场景

  • During initial repo scanning.
  • When environment or assets are still undefined enough to make execution meaningless.
  • When the task is a literature lookup rather than repository execution.
  • When the user is still deciding which reproduction target should count as the main run.
  • 初始仓库扫描期间。
  • 环境或资产仍未明确,导致执行无意义时。
  • 任务为文献查找而非仓库执行时。
  • 用户仍在决定将哪个复现目标作为主运行目标时。

Clear boundaries

明确边界

  • This skill owns normalized reporting for an attempted command.
  • It may receive execution evidence from the main skill or a thin helper.
  • It does not choose the overall target on its own.
  • It does not perform broad paper analysis.
  • It does not own training startup, resume, or long-running training state.
  • It should not normalize risky code edits into acceptable practice.
  • It must not hide changes that alter evaluation, preprocessing, checkpoints, metrics, or other scientific meaning.
  • 此skill负责为尝试执行的命令生成标准化报告。
  • 它可从主skill或轻量辅助工具获取执行证据。
  • 它不会自行选择整体目标。
  • 不执行广泛的论文分析。
  • 不负责训练启动、恢复或长期训练状态管理。
  • 不应将有风险的代码编辑规范化为可接受的做法。
  • 不得隐藏会改变评估、预处理、检查点、指标或其他科学意义的变更。

Input expectations

输入要求

  • selected reproduction goal
  • runnable commands or smoke commands
  • environment and asset assumptions
  • optional patch metadata
  • 选定的复现目标
  • 可运行命令或冒烟测试命令
  • 环境和资产假设
  • 可选的补丁元数据

Output expectations

输出要求

  • execution result summary
  • standardized
    repro_outputs/
    files
  • SCIENTIFIC_CHANGELOG.md
    for changed scientific meaning and evidence status
  • COMPARABILITY_REPORT.md
    for README/paper/baseline comparability
  • clear distinction between verified, partial, and blocked states
  • PATCHES.md
    when repo files changed
  • 执行结果摘要
  • 标准化的
    repro_outputs/
    文件
  • 用于记录科学意义变更和证据状态的
    SCIENTIFIC_CHANGELOG.md
  • 用于README/论文/基线可比性的
    COMPARABILITY_REPORT.md
  • 明确区分已验证、部分完成和受阻状态
  • 仓库文件变更时生成
    PATCHES.md

Notes

说明

Use
references/reporting-policy.md
,
../../references/research-rigor-principles.md
,
scripts/run_command.py
, and
scripts/write_outputs.py
.
请使用
references/reporting-policy.md
../../references/research-rigor-principles.md
scripts/run_command.py
scripts/write_outputs.py