artifact-evaluation-prep

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Artifact Evaluation Prep

工件评估准备

Prepare a paper's code, data, checkpoints, scripts, and instructions so an external artifact reviewer can reproduce the paper-facing claims with minimal ambiguity.

Use this skill when:

a venue requires or offers artifact evaluation, reproducibility badges, or artifact appendices
the user needs reviewer-facing install, quickstart, demo, or reproduction instructions
a camera-ready or accepted paper needs an artifact package handoff
code, data, checkpoints, models, Docker images, or external services must be packaged
runtime, hardware, random seeds, expected outputs, or troubleshooting notes need to be made explicit
claims in the paper need to be mapped to runnable scripts or released artifacts

Do not use this skill as a general code-release skill. Use

release-code

for public repository hygiene, licensing, CITATION files, tags, and GitHub releases. Use this skill for reviewer-facing artifact execution and claim reproduction.

Pair this skill with:

```
camera-ready-finalizer
```
to recover accepted-paper obligations and final claim/evidence state
```
release-code
```
to prepare public repository hygiene after artifact obligations are clear
```
reproducibility-audit
```
when environment, data, or execution drift needs a broader audit
```
run-experiment
```
for generating or testing reproduction commands
```
figure-results-review
```
when artifact outputs must match paper figures or tables
```
citation-audit
```
when artifact metadata cites datasets, code, or prior artifacts
```
research-project-memory
```
when artifact status, blockers, and reviewer-facing instructions should persist

准备论文的代码、数据、检查点、脚本和说明，以便外部工件评审人员能够以最小的歧义复现论文中的声明。

在以下场景使用此技能：

学术场所要求或提供工件评估、可复现性徽章或工件附录
用户需要面向评审人员的安装、快速入门、演示或复现说明
终稿或已录用论文需要移交工件包
代码、数据、检查点、模型、Docker镜像或外部服务需要打包
运行时、硬件、随机种子、预期输出或故障排除说明需要明确化
论文中的声明需要映射到可运行脚本或已发布工件

请勿将此技能用作通用代码发布技能。如需处理公共仓库维护、许可证、CITATION文件、标签和GitHub发布，请使用

release-code

技能。此技能专用于面向评审人员的工件执行和声明复现。

可搭配以下技能使用：

```
camera-ready-finalizer
```
：梳理已录用论文的义务，确定最终声明/证据状态
```
release-code
```
：在明确工件相关义务后，准备公共仓库的维护工作
```
reproducibility-audit
```
：当环境、数据或执行出现偏差需要更全面审核时使用
```
run-experiment
```
：生成或测试复现命令
```
figure-results-review
```
：当工件输出必须与论文图表匹配时使用
```
citation-audit
```
：当工件元数据引用数据集、代码或先前工件时使用
```
research-project-memory
```
：当需要持久化工件状态、障碍和面向评审人员的说明时使用

Skill Directory Layout

技能目录结构

text

<installed-skill-dir>/
├── SKILL.md
└── references/
    ├── artifact-audit.md
    ├── memory-writeback.md
    ├── package-manifest.md
    ├── report-template.md
    └── reviewer-instructions.md

text

<installed-skill-dir>/
├── SKILL.md
└── references/
    ├── artifact-audit.md
    ├── memory-writeback.md
    ├── package-manifest.md
    ├── report-template.md
    └── reviewer-instructions.md

Progressive Loading

渐进式加载

Always read

references/artifact-audit.md

references/package-manifest.md

, and

references/reviewer-instructions.md

Read
```
references/report-template.md
```
before writing a saved artifact evaluation report.
Read
```
references/memory-writeback.md
```
when the project has
```
memory/
```
, component
```
.agent/
```
folders, or the user asks for persistent state.
If venue rules matter, verify current official artifact evaluation instructions before asserting deadlines, badge names, anonymity rules, upload fields, page limits, or required formats.

务必阅读

references/artifact-audit.md

、

references/package-manifest.md

和

references/reviewer-instructions.md

。

在撰写保存的工件评估报告前，阅读
```
references/report-template.md
```
。
当项目包含
```
memory/
```
、组件
```
.agent/
```
文件夹，或用户要求持久化状态时，阅读
```
references/memory-writeback.md
```
。
如果学术场所规则重要，在确定截止日期、徽章名称、匿名规则、上传字段、页数限制或必填格式前，需验证当前官方工件评估说明。

Core Principles

核心原则

Artifact evaluation is a reviewer workflow, not just a code dump.
The artifact must reproduce the paper's important claims at an acceptable cost, or clearly document what it cannot reproduce.
Prefer one reliable quickstart and one complete reproduction path over many fragile commands.
Every command should state expected runtime, hardware, input, output, and success criteria.
Package only redistributable data, checkpoints, and dependencies; document restricted assets precisely.
Keep anonymity, licensing, and external-service assumptions explicit.
Treat smoke tests as required. An untested instruction file is not an artifact package.

工件评估是评审人员的工作流程，而非简单的代码转储。
工件必须以可接受的成本复现论文的重要声明，或明确记录无法复现的内容。
优先提供一条可靠的快速入门路径和一条完整的复现路径，而非多条脆弱的命令。
每条命令都应说明预期运行时间、硬件、输入、输出和成功标准。
仅打包可重新分发的数据、检查点和依赖项；精确记录受限资产。
明确说明匿名性、许可证和外部服务假设。
将冒烟测试视为必需项。未经测试的说明文件不能算作工件包。

Step 1 - Recover Evaluation Context

步骤1 - 恢复评估上下文

Collect:

venue and artifact evaluation track, if known
official artifact instructions, badge criteria, anonymity policy, and upload mechanism
accepted or submitted paper, appendix, supplementary material, and checklist
code repository, commit hash, branches, and worktrees
datasets, checkpoints, pretrained models, generated outputs, and external dependencies
hardware expectations: CPU/GPU type, memory, disk, runtime, network access
paper claims, figures, tables, and experiments that the artifact should support
constraints: private data, license limits, large files, cloud dependencies, nondeterminism, or reviewer time budget

If no venue is specified, produce a venue-agnostic artifact package but mark venue-specific fields as unresolved.

收集以下信息：

学术场所和工件评估赛道（若已知）
官方工件说明、徽章标准、匿名政策和上传机制
已录用或已提交的论文、附录、补充材料和检查清单
代码仓库、commit hash、分支和工作区
数据集、检查点、预训练模型、生成的输出和外部依赖项
硬件预期：CPU/GPU类型、内存、磁盘、运行时间、网络访问
工件应支持的论文声明、图表和实验
约束条件：私有数据、许可证限制、大文件、云依赖项、非确定性或评审人员时间预算

若未指定学术场所，生成与场所无关的工件包，但标记场所特定字段为未解决。

Step 2 - Map Claims to Artifact Paths

步骤2 - 将声明映射到工件路径

For each paper-facing claim or result, record:

claim or result ID
paper location
script, notebook, config, or command that supports it
input data or checkpoint
expected output file, metric, table, or figure
approximate runtime and hardware
deterministic tolerance or expected variance
reviewer priority: quickstart, core, optional, or not reproducible in package

Do not imply full reproducibility if only a smoke test or cached output is provided.

针对每个论文声明或结果，记录：

声明或结果ID
在论文中的位置
支持该声明的脚本、笔记本、配置或命令
输入数据或检查点
预期输出文件、指标、表格或图表
大致运行时间和硬件要求
确定性容差或预期方差
评审优先级：快速入门、核心、可选或包内不可复现

若仅提供冒烟测试或缓存输出，请勿暗示完全可复现。

Step 3 - Build the Artifact Manifest

步骤3 - 构建工件清单

Read

references/package-manifest.md

Create or update a manifest that lists:

repository URL or archive path
exact commit, tag, or checksum
directory layout
environment files and Docker images
data and checkpoint locations
reproduction scripts and configs
expected generated outputs
license and citation metadata
known limitations and unsupported claims

Prefer small, stable names such as

ARTIFACT.md

REPRODUCE.md

, or

docs/artifact_evaluation.md

unless the venue requires a specific filename.

阅读

references/package-manifest.md

。

创建或更新清单，列出：

仓库URL或归档路径
精确的commit、标签或校验和
目录结构
环境文件和Docker镜像
数据和检查点位置
复现脚本和配置
预期生成的输出
许可证和引用元数据
已知限制和不支持的声明

优先使用简短、稳定的名称，如

ARTIFACT.md

、

REPRODUCE.md

或

docs/artifact_evaluation.md

，除非学术场所要求特定文件名。

Step 4 - Write Reviewer Instructions

步骤4 - 撰写评审人员说明

Read

references/reviewer-instructions.md

Provide:

setup commands
quick smoke test under a short runtime budget
core reproduction commands for main paper claims
expected outputs and how to compare them with the paper
troubleshooting for common failures
hardware, storage, network, and time requirements
contact policy or anonymous support channel if allowed
limitations and optional extended runs

Instructions should be copy-pasteable and should not require the reviewer to infer hidden paths or environment variables.

阅读

references/reviewer-instructions.md

。

提供以下内容：

安装命令
短时间预算内的快速冒烟测试
针对论文主要声明的核心复现命令
预期输出及与论文对比的方法
常见故障的排查方案
硬件、存储、网络和时间要求
允许情况下的联系政策或匿名支持渠道
限制条件和可选的扩展运行

说明应可直接复制粘贴，且无需评审人员推断隐藏路径或环境变量。

Step 5 - Smoke Test the Artifact

步骤5 - 对工件进行冒烟测试

When allowed by the user and environment, run at least:

environment creation or dependency resolution
import or CLI sanity check
quickstart command
one representative data/checkpoint load
one expected-output comparison

If commands are too expensive, record the exact reason and create a minimal substitute test.

在用户和环境允许的情况下，至少运行以下测试：

环境创建或依赖项解析
导入或CLI完整性检查
快速入门命令
一次代表性的数据/检查点加载
一次预期输出对比

若命令成本过高，记录确切原因并创建最小替代测试。

Step 6 - Handle Packaging Risks

步骤6 - 处理打包风险

Audit:

anonymization vs public release state
licenses for code, data, pretrained weights, and third-party assets
large-file strategy and checksums
private paths, credentials, API keys, and machine-specific assumptions
random seeds and nondeterminism
version pinning and dependency conflicts
reviewer time budget and failure recovery

Route public release issues to

release-code

; route environment drift to

reproducibility-audit

if available.

审核以下内容：

匿名化与公开发布状态
代码、数据、预训练权重和第三方资产的许可证
大文件策略和校验和
私有路径、凭据、API密钥和特定机器的假设
随机种子和非确定性
版本固定和依赖项冲突
评审人员时间预算和故障恢复

将公开发布问题转至

release-code

；若环境出现偏差，可转至

reproducibility-audit

（若可用）。

Step 7 - Write the Artifact Evaluation Report

步骤7 - 撰写工件评估报告

Read

references/report-template.md

If saving to a project and no path is given, use:

text

docs/submission/artifact_evaluation_prep_YYYY-MM-DD.md

The report must include:

readiness decision
blocking issues
claim-to-artifact map
package manifest summary
smoke-test status
reviewer instruction status
risks, limitations, and reviewer-facing caveats
handoff to release, camera-ready, or memory

阅读

references/report-template.md

。

若保存到项目且未指定路径，使用：

text

docs/submission/artifact_evaluation_prep_YYYY-MM-DD.md

报告必须包含：

就绪决策
阻塞问题
声明到工件的映射
工件清单摘要
冒烟测试状态
评审人员说明状态
风险、限制和面向评审人员的注意事项
移交至发布、终稿或记忆模块

Step 8 - Write Back to Project Memory

步骤8 - 写入项目记忆

Read

references/memory-writeback.md

when memory exists.

Update artifact status, reproduction commands, blockers, claim support, release actions, and final handoff notes without copying full command logs into memory.

当存在记忆模块时，阅读

references/memory-writeback.md

。

更新工件状态、复现命令、障碍、声明支持情况、发布操作和最终移交说明，无需将完整命令日志复制到记忆模块中。

Final Sanity Check

最终完整性检查

Before finalizing:

every important paper claim is either reproducible, smoke-tested, cached with explanation, or explicitly out of scope
quickstart instructions have expected outputs and runtime
hardware, data, checkpoints, licenses, and anonymity state are clear
package paths and links are stable
reviewer-facing failure modes are documented
public-release and camera-ready obligations are routed
project memory records artifact readiness and open blockers

完成前需确认：

每个重要的论文声明要么可复现、已冒烟测试、带说明缓存，要么明确标记为超出范围
快速入门说明包含预期输出和运行时间
硬件、数据、检查点、许可证和匿名状态清晰明确
包路径和链接稳定
面向评审人员的故障模式已记录
公开发布和终稿义务已妥善移交
项目记忆模块记录了工件就绪状态和未解决的障碍