artifact-evaluation-prep
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseArtifact Evaluation Prep
工件评估准备
Prepare a paper's code, data, checkpoints, scripts, and instructions so an external artifact reviewer can reproduce the paper-facing claims with minimal ambiguity.
Use this skill when:
- a venue requires or offers artifact evaluation, reproducibility badges, or artifact appendices
- the user needs reviewer-facing install, quickstart, demo, or reproduction instructions
- a camera-ready or accepted paper needs an artifact package handoff
- code, data, checkpoints, models, Docker images, or external services must be packaged
- runtime, hardware, random seeds, expected outputs, or troubleshooting notes need to be made explicit
- claims in the paper need to be mapped to runnable scripts or released artifacts
Do not use this skill as a general code-release skill. Use for public repository hygiene, licensing, CITATION files, tags, and GitHub releases. Use this skill for reviewer-facing artifact execution and claim reproduction.
release-codePair this skill with:
- to recover accepted-paper obligations and final claim/evidence state
camera-ready-finalizer - to prepare public repository hygiene after artifact obligations are clear
release-code - when environment, data, or execution drift needs a broader audit
reproducibility-audit - for generating or testing reproduction commands
run-experiment - when artifact outputs must match paper figures or tables
figure-results-review - when artifact metadata cites datasets, code, or prior artifacts
citation-audit - when artifact status, blockers, and reviewer-facing instructions should persist
research-project-memory
准备论文的代码、数据、检查点、脚本和说明,以便外部工件评审人员能够以最小的歧义复现论文中的声明。
在以下场景使用此技能:
- 学术场所要求或提供工件评估、可复现性徽章或工件附录
- 用户需要面向评审人员的安装、快速入门、演示或复现说明
- 终稿或已录用论文需要移交工件包
- 代码、数据、检查点、模型、Docker镜像或外部服务需要打包
- 运行时、硬件、随机种子、预期输出或故障排除说明需要明确化
- 论文中的声明需要映射到可运行脚本或已发布工件
请勿将此技能用作通用代码发布技能。如需处理公共仓库维护、许可证、CITATION文件、标签和GitHub发布,请使用技能。此技能专用于面向评审人员的工件执行和声明复现。
release-code可搭配以下技能使用:
- :梳理已录用论文的义务,确定最终声明/证据状态
camera-ready-finalizer - :在明确工件相关义务后,准备公共仓库的维护工作
release-code - :当环境、数据或执行出现偏差需要更全面审核时使用
reproducibility-audit - :生成或测试复现命令
run-experiment - :当工件输出必须与论文图表匹配时使用
figure-results-review - :当工件元数据引用数据集、代码或先前工件时使用
citation-audit - :当需要持久化工件状态、障碍和面向评审人员的说明时使用
research-project-memory
Skill Directory Layout
技能目录结构
text
<installed-skill-dir>/
├── SKILL.md
└── references/
├── artifact-audit.md
├── memory-writeback.md
├── package-manifest.md
├── report-template.md
└── reviewer-instructions.mdtext
<installed-skill-dir>/
├── SKILL.md
└── references/
├── artifact-audit.md
├── memory-writeback.md
├── package-manifest.md
├── report-template.md
└── reviewer-instructions.mdProgressive Loading
渐进式加载
- Always read ,
references/artifact-audit.md, andreferences/package-manifest.md.references/reviewer-instructions.md - Read before writing a saved artifact evaluation report.
references/report-template.md - Read when the project has
references/memory-writeback.md, componentmemory/folders, or the user asks for persistent state..agent/ - If venue rules matter, verify current official artifact evaluation instructions before asserting deadlines, badge names, anonymity rules, upload fields, page limits, or required formats.
- 务必阅读、
references/artifact-audit.md和references/package-manifest.md。references/reviewer-instructions.md - 在撰写保存的工件评估报告前,阅读。
references/report-template.md - 当项目包含、组件
memory/文件夹,或用户要求持久化状态时,阅读.agent/。references/memory-writeback.md - 如果学术场所规则重要,在确定截止日期、徽章名称、匿名规则、上传字段、页数限制或必填格式前,需验证当前官方工件评估说明。
Core Principles
核心原则
- Artifact evaluation is a reviewer workflow, not just a code dump.
- The artifact must reproduce the paper's important claims at an acceptable cost, or clearly document what it cannot reproduce.
- Prefer one reliable quickstart and one complete reproduction path over many fragile commands.
- Every command should state expected runtime, hardware, input, output, and success criteria.
- Package only redistributable data, checkpoints, and dependencies; document restricted assets precisely.
- Keep anonymity, licensing, and external-service assumptions explicit.
- Treat smoke tests as required. An untested instruction file is not an artifact package.
- 工件评估是评审人员的工作流程,而非简单的代码转储。
- 工件必须以可接受的成本复现论文的重要声明,或明确记录无法复现的内容。
- 优先提供一条可靠的快速入门路径和一条完整的复现路径,而非多条脆弱的命令。
- 每条命令都应说明预期运行时间、硬件、输入、输出和成功标准。
- 仅打包可重新分发的数据、检查点和依赖项;精确记录受限资产。
- 明确说明匿名性、许可证和外部服务假设。
- 将冒烟测试视为必需项。未经测试的说明文件不能算作工件包。
Step 1 - Recover Evaluation Context
步骤1 - 恢复评估上下文
Collect:
- venue and artifact evaluation track, if known
- official artifact instructions, badge criteria, anonymity policy, and upload mechanism
- accepted or submitted paper, appendix, supplementary material, and checklist
- code repository, commit hash, branches, and worktrees
- datasets, checkpoints, pretrained models, generated outputs, and external dependencies
- hardware expectations: CPU/GPU type, memory, disk, runtime, network access
- paper claims, figures, tables, and experiments that the artifact should support
- constraints: private data, license limits, large files, cloud dependencies, nondeterminism, or reviewer time budget
If no venue is specified, produce a venue-agnostic artifact package but mark venue-specific fields as unresolved.
收集以下信息:
- 学术场所和工件评估赛道(若已知)
- 官方工件说明、徽章标准、匿名政策和上传机制
- 已录用或已提交的论文、附录、补充材料和检查清单
- 代码仓库、commit hash、分支和工作区
- 数据集、检查点、预训练模型、生成的输出和外部依赖项
- 硬件预期:CPU/GPU类型、内存、磁盘、运行时间、网络访问
- 工件应支持的论文声明、图表和实验
- 约束条件:私有数据、许可证限制、大文件、云依赖项、非确定性或评审人员时间预算
若未指定学术场所,生成与场所无关的工件包,但标记场所特定字段为未解决。
Step 2 - Map Claims to Artifact Paths
步骤2 - 将声明映射到工件路径
For each paper-facing claim or result, record:
- claim or result ID
- paper location
- script, notebook, config, or command that supports it
- input data or checkpoint
- expected output file, metric, table, or figure
- approximate runtime and hardware
- deterministic tolerance or expected variance
- reviewer priority: quickstart, core, optional, or not reproducible in package
Do not imply full reproducibility if only a smoke test or cached output is provided.
针对每个论文声明或结果,记录:
- 声明或结果ID
- 在论文中的位置
- 支持该声明的脚本、笔记本、配置或命令
- 输入数据或检查点
- 预期输出文件、指标、表格或图表
- 大致运行时间和硬件要求
- 确定性容差或预期方差
- 评审优先级:快速入门、核心、可选或包内不可复现
若仅提供冒烟测试或缓存输出,请勿暗示完全可复现。
Step 3 - Build the Artifact Manifest
步骤3 - 构建工件清单
Read .
references/package-manifest.mdCreate or update a manifest that lists:
- repository URL or archive path
- exact commit, tag, or checksum
- directory layout
- environment files and Docker images
- data and checkpoint locations
- reproduction scripts and configs
- expected generated outputs
- license and citation metadata
- known limitations and unsupported claims
Prefer small, stable names such as , , or unless the venue requires a specific filename.
ARTIFACT.mdREPRODUCE.mddocs/artifact_evaluation.md阅读。
references/package-manifest.md创建或更新清单,列出:
- 仓库URL或归档路径
- 精确的commit、标签或校验和
- 目录结构
- 环境文件和Docker镜像
- 数据和检查点位置
- 复现脚本和配置
- 预期生成的输出
- 许可证和引用元数据
- 已知限制和不支持的声明
优先使用简短、稳定的名称,如、或,除非学术场所要求特定文件名。
ARTIFACT.mdREPRODUCE.mddocs/artifact_evaluation.mdStep 4 - Write Reviewer Instructions
步骤4 - 撰写评审人员说明
Read .
references/reviewer-instructions.mdProvide:
- setup commands
- quick smoke test under a short runtime budget
- core reproduction commands for main paper claims
- expected outputs and how to compare them with the paper
- troubleshooting for common failures
- hardware, storage, network, and time requirements
- contact policy or anonymous support channel if allowed
- limitations and optional extended runs
Instructions should be copy-pasteable and should not require the reviewer to infer hidden paths or environment variables.
阅读。
references/reviewer-instructions.md提供以下内容:
- 安装命令
- 短时间预算内的快速冒烟测试
- 针对论文主要声明的核心复现命令
- 预期输出及与论文对比的方法
- 常见故障的排查方案
- 硬件、存储、网络和时间要求
- 允许情况下的联系政策或匿名支持渠道
- 限制条件和可选的扩展运行
说明应可直接复制粘贴,且无需评审人员推断隐藏路径或环境变量。
Step 5 - Smoke Test the Artifact
步骤5 - 对工件进行冒烟测试
When allowed by the user and environment, run at least:
- environment creation or dependency resolution
- import or CLI sanity check
- quickstart command
- one representative data/checkpoint load
- one expected-output comparison
If commands are too expensive, record the exact reason and create a minimal substitute test.
在用户和环境允许的情况下,至少运行以下测试:
- 环境创建或依赖项解析
- 导入或CLI完整性检查
- 快速入门命令
- 一次代表性的数据/检查点加载
- 一次预期输出对比
若命令成本过高,记录确切原因并创建最小替代测试。
Step 6 - Handle Packaging Risks
步骤6 - 处理打包风险
Audit:
- anonymization vs public release state
- licenses for code, data, pretrained weights, and third-party assets
- large-file strategy and checksums
- private paths, credentials, API keys, and machine-specific assumptions
- random seeds and nondeterminism
- version pinning and dependency conflicts
- reviewer time budget and failure recovery
Route public release issues to ; route environment drift to if available.
release-codereproducibility-audit审核以下内容:
- 匿名化与公开发布状态
- 代码、数据、预训练权重和第三方资产的许可证
- 大文件策略和校验和
- 私有路径、凭据、API密钥和特定机器的假设
- 随机种子和非确定性
- 版本固定和依赖项冲突
- 评审人员时间预算和故障恢复
将公开发布问题转至;若环境出现偏差,可转至(若可用)。
release-codereproducibility-auditStep 7 - Write the Artifact Evaluation Report
步骤7 - 撰写工件评估报告
Read .
references/report-template.mdIf saving to a project and no path is given, use:
text
docs/submission/artifact_evaluation_prep_YYYY-MM-DD.mdThe report must include:
- readiness decision
- blocking issues
- claim-to-artifact map
- package manifest summary
- smoke-test status
- reviewer instruction status
- risks, limitations, and reviewer-facing caveats
- handoff to release, camera-ready, or memory
阅读。
references/report-template.md若保存到项目且未指定路径,使用:
text
docs/submission/artifact_evaluation_prep_YYYY-MM-DD.md报告必须包含:
- 就绪决策
- 阻塞问题
- 声明到工件的映射
- 工件清单摘要
- 冒烟测试状态
- 评审人员说明状态
- 风险、限制和面向评审人员的注意事项
- 移交至发布、终稿或记忆模块
Step 8 - Write Back to Project Memory
步骤8 - 写入项目记忆
Read when memory exists.
references/memory-writeback.mdUpdate artifact status, reproduction commands, blockers, claim support, release actions, and final handoff notes without copying full command logs into memory.
当存在记忆模块时,阅读。
references/memory-writeback.md更新工件状态、复现命令、障碍、声明支持情况、发布操作和最终移交说明,无需将完整命令日志复制到记忆模块中。
Final Sanity Check
最终完整性检查
Before finalizing:
- every important paper claim is either reproducible, smoke-tested, cached with explanation, or explicitly out of scope
- quickstart instructions have expected outputs and runtime
- hardware, data, checkpoints, licenses, and anonymity state are clear
- package paths and links are stable
- reviewer-facing failure modes are documented
- public-release and camera-ready obligations are routed
- project memory records artifact readiness and open blockers
完成前需确认:
- 每个重要的论文声明要么可复现、已冒烟测试、带说明缓存,要么明确标记为超出范围
- 快速入门说明包含预期输出和运行时间
- 硬件、数据、检查点、许可证和匿名状态清晰明确
- 包路径和链接稳定
- 面向评审人员的故障模式已记录
- 公开发布和终稿义务已妥善移交
- 项目记忆模块记录了工件就绪状态和未解决的障碍