artifact-evaluation-prep

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Artifact Evaluation Prep

工件评估准备

Prepare a paper's code, data, checkpoints, scripts, and instructions so an external artifact reviewer can reproduce the paper-facing claims with minimal ambiguity.
Use this skill when:
  • a venue requires or offers artifact evaluation, reproducibility badges, or artifact appendices
  • the user needs reviewer-facing install, quickstart, demo, or reproduction instructions
  • a camera-ready or accepted paper needs an artifact package handoff
  • code, data, checkpoints, models, Docker images, or external services must be packaged
  • runtime, hardware, random seeds, expected outputs, or troubleshooting notes need to be made explicit
  • claims in the paper need to be mapped to runnable scripts or released artifacts
Do not use this skill as a general code-release skill. Use
release-code
for public repository hygiene, licensing, CITATION files, tags, and GitHub releases. Use this skill for reviewer-facing artifact execution and claim reproduction.
Pair this skill with:
  • camera-ready-finalizer
    to recover accepted-paper obligations and final claim/evidence state
  • release-code
    to prepare public repository hygiene after artifact obligations are clear
  • reproducibility-audit
    when environment, data, or execution drift needs a broader audit
  • run-experiment
    for generating or testing reproduction commands
  • figure-results-review
    when artifact outputs must match paper figures or tables
  • citation-audit
    when artifact metadata cites datasets, code, or prior artifacts
  • research-project-memory
    when artifact status, blockers, and reviewer-facing instructions should persist
准备论文的代码、数据、检查点、脚本和说明,以便外部工件评审人员能够以最小的歧义复现论文中的声明。
在以下场景使用此技能:
  • 学术场所要求或提供工件评估、可复现性徽章或工件附录
  • 用户需要面向评审人员的安装、快速入门、演示或复现说明
  • 终稿或已录用论文需要移交工件包
  • 代码、数据、检查点、模型、Docker镜像或外部服务需要打包
  • 运行时、硬件、随机种子、预期输出或故障排除说明需要明确化
  • 论文中的声明需要映射到可运行脚本或已发布工件
请勿将此技能用作通用代码发布技能。如需处理公共仓库维护、许可证、CITATION文件、标签和GitHub发布,请使用
release-code
技能。此技能专用于面向评审人员的工件执行和声明复现。
可搭配以下技能使用:
  • camera-ready-finalizer
    :梳理已录用论文的义务,确定最终声明/证据状态
  • release-code
    :在明确工件相关义务后,准备公共仓库的维护工作
  • reproducibility-audit
    :当环境、数据或执行出现偏差需要更全面审核时使用
  • run-experiment
    :生成或测试复现命令
  • figure-results-review
    :当工件输出必须与论文图表匹配时使用
  • citation-audit
    :当工件元数据引用数据集、代码或先前工件时使用
  • research-project-memory
    :当需要持久化工件状态、障碍和面向评审人员的说明时使用

Skill Directory Layout

技能目录结构

text
<installed-skill-dir>/
├── SKILL.md
└── references/
    ├── artifact-audit.md
    ├── memory-writeback.md
    ├── package-manifest.md
    ├── report-template.md
    └── reviewer-instructions.md
text
<installed-skill-dir>/
├── SKILL.md
└── references/
    ├── artifact-audit.md
    ├── memory-writeback.md
    ├── package-manifest.md
    ├── report-template.md
    └── reviewer-instructions.md

Progressive Loading

渐进式加载

  • Always read
    references/artifact-audit.md
    ,
    references/package-manifest.md
    , and
    references/reviewer-instructions.md
    .
  • Read
    references/report-template.md
    before writing a saved artifact evaluation report.
  • Read
    references/memory-writeback.md
    when the project has
    memory/
    , component
    .agent/
    folders, or the user asks for persistent state.
  • If venue rules matter, verify current official artifact evaluation instructions before asserting deadlines, badge names, anonymity rules, upload fields, page limits, or required formats.
  • 务必阅读
    references/artifact-audit.md
    references/package-manifest.md
    references/reviewer-instructions.md
  • 在撰写保存的工件评估报告前,阅读
    references/report-template.md
  • 当项目包含
    memory/
    、组件
    .agent/
    文件夹,或用户要求持久化状态时,阅读
    references/memory-writeback.md
  • 如果学术场所规则重要,在确定截止日期、徽章名称、匿名规则、上传字段、页数限制或必填格式前,需验证当前官方工件评估说明。

Core Principles

核心原则

  • Artifact evaluation is a reviewer workflow, not just a code dump.
  • The artifact must reproduce the paper's important claims at an acceptable cost, or clearly document what it cannot reproduce.
  • Prefer one reliable quickstart and one complete reproduction path over many fragile commands.
  • Every command should state expected runtime, hardware, input, output, and success criteria.
  • Package only redistributable data, checkpoints, and dependencies; document restricted assets precisely.
  • Keep anonymity, licensing, and external-service assumptions explicit.
  • Treat smoke tests as required. An untested instruction file is not an artifact package.
  • 工件评估是评审人员的工作流程,而非简单的代码转储。
  • 工件必须以可接受的成本复现论文的重要声明,或明确记录无法复现的内容。
  • 优先提供一条可靠的快速入门路径和一条完整的复现路径,而非多条脆弱的命令。
  • 每条命令都应说明预期运行时间、硬件、输入、输出和成功标准。
  • 仅打包可重新分发的数据、检查点和依赖项;精确记录受限资产。
  • 明确说明匿名性、许可证和外部服务假设。
  • 将冒烟测试视为必需项。未经测试的说明文件不能算作工件包。

Step 1 - Recover Evaluation Context

步骤1 - 恢复评估上下文

Collect:
  • venue and artifact evaluation track, if known
  • official artifact instructions, badge criteria, anonymity policy, and upload mechanism
  • accepted or submitted paper, appendix, supplementary material, and checklist
  • code repository, commit hash, branches, and worktrees
  • datasets, checkpoints, pretrained models, generated outputs, and external dependencies
  • hardware expectations: CPU/GPU type, memory, disk, runtime, network access
  • paper claims, figures, tables, and experiments that the artifact should support
  • constraints: private data, license limits, large files, cloud dependencies, nondeterminism, or reviewer time budget
If no venue is specified, produce a venue-agnostic artifact package but mark venue-specific fields as unresolved.
收集以下信息:
  • 学术场所和工件评估赛道(若已知)
  • 官方工件说明、徽章标准、匿名政策和上传机制
  • 已录用或已提交的论文、附录、补充材料和检查清单
  • 代码仓库、commit hash、分支和工作区
  • 数据集、检查点、预训练模型、生成的输出和外部依赖项
  • 硬件预期:CPU/GPU类型、内存、磁盘、运行时间、网络访问
  • 工件应支持的论文声明、图表和实验
  • 约束条件:私有数据、许可证限制、大文件、云依赖项、非确定性或评审人员时间预算
若未指定学术场所,生成与场所无关的工件包,但标记场所特定字段为未解决。

Step 2 - Map Claims to Artifact Paths

步骤2 - 将声明映射到工件路径

For each paper-facing claim or result, record:
  • claim or result ID
  • paper location
  • script, notebook, config, or command that supports it
  • input data or checkpoint
  • expected output file, metric, table, or figure
  • approximate runtime and hardware
  • deterministic tolerance or expected variance
  • reviewer priority: quickstart, core, optional, or not reproducible in package
Do not imply full reproducibility if only a smoke test or cached output is provided.
针对每个论文声明或结果,记录:
  • 声明或结果ID
  • 在论文中的位置
  • 支持该声明的脚本、笔记本、配置或命令
  • 输入数据或检查点
  • 预期输出文件、指标、表格或图表
  • 大致运行时间和硬件要求
  • 确定性容差或预期方差
  • 评审优先级:快速入门、核心、可选或包内不可复现
若仅提供冒烟测试或缓存输出,请勿暗示完全可复现。

Step 3 - Build the Artifact Manifest

步骤3 - 构建工件清单

Read
references/package-manifest.md
.
Create or update a manifest that lists:
  • repository URL or archive path
  • exact commit, tag, or checksum
  • directory layout
  • environment files and Docker images
  • data and checkpoint locations
  • reproduction scripts and configs
  • expected generated outputs
  • license and citation metadata
  • known limitations and unsupported claims
Prefer small, stable names such as
ARTIFACT.md
,
REPRODUCE.md
, or
docs/artifact_evaluation.md
unless the venue requires a specific filename.
阅读
references/package-manifest.md
创建或更新清单,列出:
  • 仓库URL或归档路径
  • 精确的commit、标签或校验和
  • 目录结构
  • 环境文件和Docker镜像
  • 数据和检查点位置
  • 复现脚本和配置
  • 预期生成的输出
  • 许可证和引用元数据
  • 已知限制和不支持的声明
优先使用简短、稳定的名称,如
ARTIFACT.md
REPRODUCE.md
docs/artifact_evaluation.md
,除非学术场所要求特定文件名。

Step 4 - Write Reviewer Instructions

步骤4 - 撰写评审人员说明

Read
references/reviewer-instructions.md
.
Provide:
  • setup commands
  • quick smoke test under a short runtime budget
  • core reproduction commands for main paper claims
  • expected outputs and how to compare them with the paper
  • troubleshooting for common failures
  • hardware, storage, network, and time requirements
  • contact policy or anonymous support channel if allowed
  • limitations and optional extended runs
Instructions should be copy-pasteable and should not require the reviewer to infer hidden paths or environment variables.
阅读
references/reviewer-instructions.md
提供以下内容:
  • 安装命令
  • 短时间预算内的快速冒烟测试
  • 针对论文主要声明的核心复现命令
  • 预期输出及与论文对比的方法
  • 常见故障的排查方案
  • 硬件、存储、网络和时间要求
  • 允许情况下的联系政策或匿名支持渠道
  • 限制条件和可选的扩展运行
说明应可直接复制粘贴,且无需评审人员推断隐藏路径或环境变量。

Step 5 - Smoke Test the Artifact

步骤5 - 对工件进行冒烟测试

When allowed by the user and environment, run at least:
  • environment creation or dependency resolution
  • import or CLI sanity check
  • quickstart command
  • one representative data/checkpoint load
  • one expected-output comparison
If commands are too expensive, record the exact reason and create a minimal substitute test.
在用户和环境允许的情况下,至少运行以下测试:
  • 环境创建或依赖项解析
  • 导入或CLI完整性检查
  • 快速入门命令
  • 一次代表性的数据/检查点加载
  • 一次预期输出对比
若命令成本过高,记录确切原因并创建最小替代测试。

Step 6 - Handle Packaging Risks

步骤6 - 处理打包风险

Audit:
  • anonymization vs public release state
  • licenses for code, data, pretrained weights, and third-party assets
  • large-file strategy and checksums
  • private paths, credentials, API keys, and machine-specific assumptions
  • random seeds and nondeterminism
  • version pinning and dependency conflicts
  • reviewer time budget and failure recovery
Route public release issues to
release-code
; route environment drift to
reproducibility-audit
if available.
审核以下内容:
  • 匿名化与公开发布状态
  • 代码、数据、预训练权重和第三方资产的许可证
  • 大文件策略和校验和
  • 私有路径、凭据、API密钥和特定机器的假设
  • 随机种子和非确定性
  • 版本固定和依赖项冲突
  • 评审人员时间预算和故障恢复
将公开发布问题转至
release-code
;若环境出现偏差,可转至
reproducibility-audit
(若可用)。

Step 7 - Write the Artifact Evaluation Report

步骤7 - 撰写工件评估报告

Read
references/report-template.md
.
If saving to a project and no path is given, use:
text
docs/submission/artifact_evaluation_prep_YYYY-MM-DD.md
The report must include:
  • readiness decision
  • blocking issues
  • claim-to-artifact map
  • package manifest summary
  • smoke-test status
  • reviewer instruction status
  • risks, limitations, and reviewer-facing caveats
  • handoff to release, camera-ready, or memory
阅读
references/report-template.md
若保存到项目且未指定路径,使用:
text
docs/submission/artifact_evaluation_prep_YYYY-MM-DD.md
报告必须包含:
  • 就绪决策
  • 阻塞问题
  • 声明到工件的映射
  • 工件清单摘要
  • 冒烟测试状态
  • 评审人员说明状态
  • 风险、限制和面向评审人员的注意事项
  • 移交至发布、终稿或记忆模块

Step 8 - Write Back to Project Memory

步骤8 - 写入项目记忆

Read
references/memory-writeback.md
when memory exists.
Update artifact status, reproduction commands, blockers, claim support, release actions, and final handoff notes without copying full command logs into memory.
当存在记忆模块时,阅读
references/memory-writeback.md
更新工件状态、复现命令、障碍、声明支持情况、发布操作和最终移交说明,无需将完整命令日志复制到记忆模块中。

Final Sanity Check

最终完整性检查

Before finalizing:
  • every important paper claim is either reproducible, smoke-tested, cached with explanation, or explicitly out of scope
  • quickstart instructions have expected outputs and runtime
  • hardware, data, checkpoints, licenses, and anonymity state are clear
  • package paths and links are stable
  • reviewer-facing failure modes are documented
  • public-release and camera-ready obligations are routed
  • project memory records artifact readiness and open blockers
完成前需确认:
  • 每个重要的论文声明要么可复现、已冒烟测试、带说明缓存,要么明确标记为超出范围
  • 快速入门说明包含预期输出和运行时间
  • 硬件、数据、检查点、许可证和匿名状态清晰明确
  • 包路径和链接稳定
  • 面向评审人员的故障模式已记录
  • 公开发布和终稿义务已妥善移交
  • 项目记忆模块记录了工件就绪状态和未解决的障碍