manuscript-provenance

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Manuscript Provenance Audit

手稿来源审计（Manuscript Provenance Audit）

Pipeline position: Phase 2a (grounding audit). Runs in parallel with

manuscript-typography

. Depends on: content settled after Phase 1 fixes. Produces macro manifest consumed by

manuscript-review

Pass 13 (Cross-Element Coherence).

流程定位：阶段2a（基础审计）。与

manuscript-typography

并行运行。依赖条件：阶段1修复后确定的内容。生成供

manuscript-review

第13轮审查（跨元素一致性）使用的宏清单。

Purpose

目的

Verify that a manuscript is a faithful rendering of computational outputs. Every number, table, figure, category label, ordering, and threshold in the document must trace to a specific script, config file, or pipeline output. Manual data entry in a manuscript is a reproducibility defect.

This skill produces a provenance map — a structured report linking each manuscript artifact to its generating code — and flags every break in the chain.

Companion skill:

manuscript-review

audits the document as prose (structure, argumentation, citations). This skill audits whether the document content is computationally grounded. Run both for complete pre-publication coverage.

验证手稿是否忠实呈现计算输出结果。文档中的每个数字、表格、图表、类别标签、排序规则和阈值都必须追溯到特定脚本、配置文件或流程输出。手稿中的手动数据输入属于可重复性缺陷。

本技能会生成一份来源映射报告——一份结构化报告，将每个手稿元素与其生成代码关联起来，并标记链条中的每一处断裂。

配套技能：

manuscript-review

从文本角度审查文档（结构、论证、引用）。本技能则审查文档内容是否有计算依据。为了实现完整的预发布覆盖，需同时运行这两项技能。

Boundary Agreement with manuscript-review

与manuscript-review的边界约定

Concern	manuscript-review	This skill (manuscript-provenance)
Reproducibility	Does the paper describe enough to reproduce? (§6)	Does the code actually produce what the paper claims? (§1, §7)
Figures/Tables	Legible, accessible, well-formatted? (§12)	Generated by scripts, not manual entry? (§2, §3)
Rendered visuals	Readable at print scale? Floats near references? (§23)	Figure generation script produces correct format? (§3)
Hyperparameters	Listed in the paper with rationale? (§6)	Values trace to config files, not hardcoded? (§1, §8)
Code availability	Statement exists in the paper? (§17)	Repo URL valid, README accurate, pipeline works? (§11)
Terminology	Abbreviations consistent within document? (§14)	Terms match code identifiers? (§5)
Significant figures	Consistent precision within document? (§12)	Precision matches script output? (§2)
Figure format	Appropriate format for document quality? (§12)	Format generated by script, not manually exported? (§3)
Computational cost	Reported in the paper? (§7)	Values trace to benchmarking scripts? (§1)
Macro-prose coherence	Prose framing appropriate for injected value? (§24)	Value traced to code, macro manifest produced? (§4)
Cross-element consistency	Prose, captions, figures, tables mutually consistent? (§24)	All elements from same run/pipeline output? (§9)

Rule: This skill never judges prose quality. manuscript-review never opens the codebase. Each reads the other's report when available.

Integration point — Macro Manifest: This skill produces a macro manifest as part of the §4 audit: a structured list of every macro-injected value with:

Macro name (e.g.,
```
\bestf
```
)
Resolved value (e.g.,
```
0.847
```
)
Source (script + output file that generates it)
Location(s) in manuscript text (file, line number, surrounding sentence)
Classification (TRACED / MACRO-TRACED / CONFIG-TRACED / UNTRACED / STALE)

manuscript-review's Pass 13 (Cross-Element Coherence, §24) consumes this manifest to check whether the prose surrounding each injected value is appropriate for the actual numeric value. Provenance owns "is this value computationally grounded?" Review owns "does the text wrapping this value make sense given what the value is?"

关注点	manuscript-review	本技能（manuscript-provenance）
可重复性	论文是否提供了足够的复现信息？（第6节）	代码是否真的生成了论文声称的内容？（第1、7节）
图表/表格	是否清晰、易访问、格式规范？（第12节）	是否由脚本生成而非手动输入？（第2、3节）
渲染后的可视化内容	在打印尺寸下是否可读？是否在引用附近浮动？（第23节）	图表生成脚本是否生成正确格式？（第3节）
超参数	论文中是否列出并给出理由？（第6节）	参数值是否可追溯到配置文件而非硬编码？（第1、8节）
代码可用性	论文中是否有相关声明？（第17节）	仓库URL是否有效、README是否准确、流程是否可运行？（第11节）
术语	文档内的缩写是否一致？（第14节）	术语是否与代码标识符匹配？（第5节）
有效数字	文档内的精度是否一致？（第12节）	精度是否与脚本输出一致？（第2节）
图表格式	是否符合文档质量的合适格式？（第12节）	格式是否由脚本生成而非手动导出？（第3节）
计算成本	论文中是否有报告？（第7节）	数值是否可追溯到基准测试脚本？（第1节）
宏观文本一致性	文本框架是否适合注入的数值？（第24节）	数值是否可追溯到代码、是否生成宏清单？（第4节）
跨元素一致性	文本、标题、图表、表格之间是否相互一致？（第24节）	所有元素是否来自同一运行/流程输出？（第9节）

规则：本技能从不评判文本质量。manuscript-review从不打开代码库。当对方报告可用时，两者都会读取对方的报告。

集成点——宏清单：本技能在第4节审计中生成一份宏清单：一份结构化列表，包含每个宏注入的值，信息包括：

宏名称（如
```
\bestf
```
）
解析后的值（如
```
0.847
```
）
来源（生成它的脚本+输出文件）
在手稿文本中的位置（文件、行号、上下文句子）
分类（TRACED / MACRO-TRACED / CONFIG-TRACED / UNTRACED / STALE）

manuscript-review的第13轮审查（跨元素一致性，第24节）会使用这份清单，检查每个注入值周围的文本是否与实际数值匹配。来源审计负责“该数值是否有计算依据？”，而审查负责“给定该数值，其周围文本是否合理？”

Scope

范围

In scope:

Numbers, metrics, percentages in manuscript text
Tables (content, ordering, formatting)
Figures (generation scripts, data sources)
LaTeX macros (
```
\newcommand
```
,
```
\def
```
,
```
\pgfmathsetmacro
```
)
Terminology, mode names, mechanism labels, category names
Ordering of items in enumerations, tables, discussion
Config values (thresholds, hyperparameters, model names)
Pipeline completeness (raw data → final PDF)
Timestamp consistency (scripts vs outputs)

Out of scope:

Prose quality (→ manuscript-review)
Citation hygiene (→ manuscript-review)
Argumentation structure (→ manuscript-review)
Code quality/style (separate concern)

纳入范围：

手稿文本中的数字、指标、百分比
表格（内容、排序、格式）
图表（生成脚本、数据源）
LaTeX宏（
```
\newcommand
```
、
```
\def
```
、
```
\pgfmathsetmacro
```
）
术语、模式名称、机制标签、类别名称
枚举、表格、讨论中项目的排序
配置值（阈值、超参数、模型名称）
流程完整性（原始数据→最终PDF）
时间戳一致性（脚本vs输出）

排除范围：

文本质量（→ manuscript-review）
引用规范（→ manuscript-review）
论证结构（→ manuscript-review）
代码质量/风格（单独关注点）

Inputs

输入

This audit requires TWO artifacts:

Manuscript source — LaTeX
```
.tex
```
files (preferred), or PDF/DOCX as fallback
Codebase — the scripts, configs, and pipeline that generate manuscript content

If the user provides only one, ask for the other. LaTeX source is strongly preferred over compiled PDF — provenance auditing requires seeing the raw markup, macros, and input commands.

本次审计需要两个工件：

手稿源文件 —— 优先选择LaTeX
```
.tex
```
文件，也可使用PDF/DOCX作为备选
代码库 —— 生成手稿内容的脚本、配置和流程

如果用户仅提供其中一个，需请求提供另一个。强烈优先选择LaTeX源文件而非编译后的PDF——来源审计需要查看原始标记、宏和输入命令。

Workflow

工作流程

Phase 1 — Inventory

阶段1 —— 盘点

1a. Manuscript Artifact Extraction

Read all

.tex

files (main + included via

\input

\include

). Extract:

Inline values: bare numbers in running text (percentages, counts, metrics, p-values, confidence intervals, thresholds, sizes)
LaTeX macros: all
```
\newcommand
```
,
```
\def
```
,
```
\pgfmathsetmacro
```
, and custom command definitions that carry data values
Tables: full content of every
```
tabular
```
/
```
table
```
environment — cell values, row/column ordering, headers
Figures:
```
\includegraphics
```
paths, caption content, referenced data
Input files: any
```
\input{generated/*.tex}
```
patterns that pull from script-generated LaTeX fragments
Labels and references:
```
\label
```
/
```
\ref
```
pairs for cross-referencing
Terminology: named modes, mechanisms, strategies, categories, method names used in prose
Ordered lists: any enumerated or ranked items (methods compared, features listed, results ordered)

Build an artifact registry — a flat list of every data-carrying element in the manuscript with its location (file, line number).

1b. Codebase Mapping

Scan the project directory. Identify:

Pipeline entry points:
```
Makefile
```
,
```
snakemake
```
,
```
dvc.yaml
```
,
```
run.sh
```
,
```
main.py
```
, or equivalent orchestration
Analysis scripts: files that produce numbers, tables, figures
Config files:
```
config.toml
```
,
```
config.yaml
```
,
```
.env
```
,
```
params.yaml
```
, hyperparameter files
Output directories: where scripts write results (
```
results/
```
,
```
output/
```
,
```
figures/
```
,
```
tables/
```
,
```
generated/
```
)
Generated LaTeX fragments:
```
.tex
```
files in output directories that scripts produce for
```
\input
```
inclusion
Data files: CSVs, JSON, HDF5, pickles that intermediate results flow through

Build a source registry — a flat list of every code artifact that produces or configures manuscript content.

1a. 手稿工件提取

读取所有

.tex

文件（主文件+通过

\input

\include

引入的文件）。提取：

内联值：正文中的裸数字（百分比、计数、指标、p值、置信区间、阈值、大小）
LaTeX宏：所有承载数据值的
```
\newcommand
```
、
```
\def
```
、
```
\pgfmathsetmacro
```
和自定义命令定义
表格：每个
```
tabular
```
/
```
table
```
环境的完整内容——单元格值、行/列排序、表头
图表：
```
\includegraphics
```
路径、标题内容、引用的数据
输入文件：任何
```
\input{generated/*.tex}
```
模式，用于引入脚本生成的LaTeX片段
标签与引用：用于交叉引用的
```
\label
```
/
```
\ref
```
对
术语：正文中使用的命名模式、机制、策略、类别、方法名称
有序列表：任何枚举或排名项目（比较的方法、列出的特征、排序的结果）

构建工件注册表——一份扁平列表，包含手稿中每个承载数据的元素及其位置（文件、行号）。

1b. 代码库映射

扫描项目目录。识别：

流程入口点：
```
Makefile
```
、
```
snakemake
```
、
```
dvc.yaml
```
、
```
run.sh
```
、
```
main.py
```
或等效编排工具
分析脚本：生成数字、表格、图表的文件
配置文件：
```
config.toml
```
、
```
config.yaml
```
、
```
.env
```
、
```
params.yaml
```
、超参数文件
输出目录：脚本写入结果的位置（
```
results/
```
、
```
output/
```
、
```
figures/
```
、
```
tables/
```
、
```
generated/
```
）
生成的LaTeX片段：输出目录中由脚本生成、供
```
\input
```
引入的
```
.tex
```
文件
数据文件：中间结果流经的CSV、JSON、HDF5、pickle文件

构建源注册表——一份扁平列表，包含每个生成或配置手稿内容的代码工件。

Phase 2 — Provenance Tracing

阶段2 —— 来源追溯

For each entry in the artifact registry, attempt to establish a provenance chain: manuscript value → generated output → script → input data/config.

2a. Value Provenance

For every number in the manuscript:

Search for the value in script outputs (logs, result files, generated LaTeX)
Trace the output back to the script that produces it
Verify the script reads from data/config (not hardcoded)
Record the full chain or flag as UNTRACED

Classification:

TRACED — full chain from manuscript value to generating code
MACRO-TRACED — value defined in a LaTeX macro that is generated by a script
CONFIG-TRACED — value comes from a config file read by scripts
UNTRACED — no provenance chain found; manually entered
STALE — provenance chain exists but output is older than generating script

2b. Table Provenance

For each table:

Is the table content generated by a script (CSV → LaTeX, or direct LaTeX generation)?
Is the row/column ordering determined by code (sorted by metric, alphabetical, grouped by category) or manually arranged?
Are header labels matching code-defined names?
Are formatting choices (bold for best, significant figures) applied by code?

Classification:

GENERATED — entire table produced by script
PARTIAL — some cells generated, some manual
MANUAL — no generation script found
ORDER-MANUAL — content generated but ordering is manually set

2c. Figure Provenance

For each figure:

Does a script produce the exact file referenced by
```
\includegraphics
```
?
Does the script use a deterministic seed for reproducibility?
Is the figure output path in the script consistent with the LaTeX reference?
Are figure parameters (colors, labels, axis ranges) set in code or manually edited post-generation?

Classification:

GENERATED — script produces the exact file
POST-EDITED — script generates base figure, but manual edits detected (e.g., Illustrator metadata, different checksum than script output)
MANUAL — no generating script found
STALE — generating script modified after figure file

2d. Terminology Provenance

For each named mode, mechanism, category, or method label:

Is the term defined in code (enum, constant, config key, class name)?
Does the manuscript term match the code term exactly?
If the manuscript uses a display-friendly name, is there an explicit mapping in code or config?

Classification:

CODE-DEFINED — term matches code definition
MAPPED — explicit code→display mapping exists
UNMAPPED — term appears in manuscript but not in code
INCONSISTENT — term appears in both but differs (e.g., code says
```
greedy_search
```
, manuscript says "Greedy Search" in some places and "greedy approach" in others)

2e. Ordering Provenance

For each ordered list, ranked comparison, or sequenced enumeration:

Does code determine the ordering (sort by metric, alphabetical, enum order)?
Does the manuscript ordering match the code-determined order?
Are there items in the manuscript list not present in code output, or vice versa?

Classification:

CODE-ORDERED — ordering matches code output
MANUAL-ORDER — ordering differs from code output or no ordering logic in code
SUBSET-MISMATCH — manuscript lists different items than code produces

针对工件注册表中的每个条目，尝试建立来源链：手稿值→生成的输出→脚本→输入数据/配置。

2a. 值来源

针对手稿中的每个数字：

在脚本输出（日志、结果文件、生成的LaTeX）中搜索该值
将输出追溯到生成它的脚本
验证脚本读取自数据/配置（而非硬编码）
记录完整链条或标记为UNTRACED（无法追溯）

分类：

TRACED（已追溯）—— 从手稿值到生成代码的完整链条存在
MACRO-TRACED（宏追溯）—— 值定义在由脚本生成的LaTeX宏中
CONFIG-TRACED（配置追溯）—— 值来自脚本读取的配置文件
UNTRACED（无法追溯）—— 未找到来源链；为手动输入
STALE（过期）—— 来源链存在，但输出早于生成它的脚本

2b. 表格来源

针对每个表格：

表格内容是否由脚本生成（CSV→LaTeX，或直接生成LaTeX）？
行/列排序是由代码决定（按指标排序、字母序、按类别分组）还是手动排列？
表头标签是否与代码定义的名称匹配？
格式选择（最佳值加粗、有效数字）是否由代码应用？

分类：

GENERATED（已生成）—— 整个表格由脚本生成
PARTIAL（部分生成）—— 部分单元格为生成，部分为手动输入
MANUAL（手动）—— 未找到生成脚本
ORDER-MANUAL（手动排序）—— 内容为生成，但排序为手动设置

2c. 图表来源

针对每个图表：

是否有脚本生成
```
\includegraphics
```
引用的精确文件？
脚本是否使用确定性种子以确保可重复性？
脚本中的图表输出路径与LaTeX引用是否一致？
图表参数颜色、标签、轴范围是在代码中设置还是生成后手动编辑？

分类：

GENERATED（已生成）—— 脚本生成精确文件
POST-EDITED（后期编辑）—— 脚本生成基础图表，但检测到手动编辑（如Illustrator元数据、与脚本输出校验和不同）
MANUAL（手动）—— 未找到生成脚本
STALE（过期）—— 生成脚本在图表文件之后被修改

2d. 术语来源

针对每个命名模式、机制、类别或方法标签：

该术语是否在代码中定义（枚举、常量、配置键、类名）？
手稿中的术语是否与代码术语完全匹配？
如果手稿使用便于展示的名称，代码或配置中是否有明确映射？

分类：

CODE-DEFINED（代码定义）—— 术语与代码定义匹配
MAPPED（已映射）—— 存在明确的代码→展示名称映射
UNMAPPED（未映射）—— 术语出现在手稿中但未在代码中出现
INCONSISTENT（不一致）—— 术语在两者中都出现但存在差异（如代码为
```
greedy_search
```
，手稿中部分地方写"Greedy Search"，部分地方写"greedy approach"）

2e. 排序来源

针对每个有序列表、排名比较或序列枚举：

排序是否由代码决定（按指标排序、字母序、枚举顺序）？
手稿排序是否与代码决定的顺序匹配？
手稿列表中的项目是否与代码输出的项目存在差异？

分类：

CODE-ORDERED（代码排序）—— 排序与代码输出匹配
MANUAL-ORDER（手动排序）—— 排序与代码输出不同，或代码中无排序逻辑
SUBSET-MISMATCH（子集不匹配）—— 手稿列出的项目与代码生成的项目不同

Phase 3 — Infrastructure Audit

阶段3 —— 基础设施审计

3a. LaTeX Macro Hygiene

Every data-carrying macro should be generated by a script, not hand-typed in the preamble
Pattern to detect:
```
\newcommand{\someMetric}{42.7}
```
defined directly in
```
.tex
```
files (bad) vs
```
\input{generated/metrics.tex}
```
where that file is script output (good)
Flag macros whose values appear nowhere in script outputs
Flag macros defined in main
```
.tex
```
files that carry numeric/data values

3b. Pipeline Completeness

Does a single command reproduce all manuscript artifacts from raw data?
Is the pipeline documented (Makefile, README, CI config)?
Are intermediate steps cached or do they require full re-execution?
Are random seeds fixed for reproducibility?
Are software versions pinned (requirements.txt, environment.yml, lock files)?

3c. Config/Code Separation

Are hyperparameters, thresholds, model names in config files?
Are file paths relative (portable) or absolute (fragile)?
Are credentials, API keys, or machine-specific paths absent from committed code?
Is there a single config entry point or are settings scattered across scripts?

3d. Stale Output Detection

Compare modification timestamps: script vs its output files
Flag outputs that are older than their generating scripts (stale)
Flag outputs with no corresponding script (orphaned)
Flag scripts with no corresponding output (dead code or unrun)

3e. Version Pinning

Are dependencies locked (requirements.txt with versions, conda environment.yml, poetry.lock, package-lock.json)?
Are data versions tracked (DVC, git-lfs, data checksums)?
Is the manuscript itself versioned alongside code (same repo, tagged releases)?

3a. LaTeX宏规范

每个承载数据的宏都应由脚本生成，而非在序言中手动输入
检测模式：
```
.tex
```
文件中直接定义的
```
\newcommand{\someMetric}{42.7}
```
（不良）vs
```
\input{generated/metrics.tex}
```
（该文件为脚本输出，良好）
标记值未出现在任何脚本输出中的宏
标记在主
```
.tex
```
文件中定义的承载数值/数据的宏

3b. 流程完整性

是否可通过单个命令从原始数据复现所有手稿工件？
流程是否有文档说明（Makefile、README、CI配置）？
中间步骤是否缓存，还是需要完全重新执行？
是否固定随机种子以确保可重复性？
是否固定软件版本（requirements.txt、environment.yml、锁定文件）？

3c. 配置/代码分离

超参数、阈值、模型名称是否在配置文件中？
文件路径是相对路径（可移植）还是绝对路径（易损坏）？
提交的代码中是否包含凭据、API密钥或机器特定路径？
是否有单一配置入口点，还是设置分散在多个脚本中？

3d. 过期输出检测

比较修改时间戳：脚本与其输出文件
标记早于生成脚本的输出（过期）
标记无对应脚本的输出（孤立）
标记无对应输出的脚本（死代码或未运行）

3e. 版本固定

依赖项是否锁定（带版本的requirements.txt、conda environment.yml、poetry.lock、package-lock.json）？
数据版本是否被跟踪（DVC、git-lfs、数据校验和）？
手稿本身是否与代码一起版本化（同一仓库、标记版本）？

Phase 4 — Cross-Reference and Manifest Generation

阶段4 —— 交叉引用与清单生成

4a. Macro Manifest Generation

Produce the macro manifest — the primary handoff artifact to manuscript-review. For every data-carrying macro identified in Phase 1a and traced in Phase 2a:

text

Macro: \bestf
Value: 0.847
Source: results/metrics.json → scripts/generate_latex_macros.py → generated/metrics.tex
Locations:
  - paper.tex:142 — "achieving an F1 score of \bestf{}"
  - paper.tex:287 — "The \bestf{} result represents a substantial improvement"
  - abstract.tex:8 — "...with \bestf{} F1 score"
Classification: MACRO-TRACED

Also include every bare number (not a macro) found in Phase 1a that carries data (metrics, counts, parameters) — these are values that SHOULD be macros but aren't:

text

Bare value: 50
Location: paper.tex:198 — "convergence after 50 epochs"
Should-be-macro: YES — this is a training parameter, should trace to config
Classification: UNTRACED (no macro, no provenance)

Save the manifest as

[manuscript-name]-macro-manifest.json

alongside the provenance report. This file is consumed by manuscript-review Pass 13 (Cross-Element Coherence) to verify prose-value appropriateness.

4b. Cross-Reference with manuscript-review

If a manuscript-review report exists for this manuscript, load it and:

Map UNTRACED values to manuscript-review §6 (Methodology) and §7 (Results) findings — provenance gaps often co-occur with reproducibility concerns
Flag terminology inconsistencies as potential §14 (Abbreviations) or §15 (Notation) issues in the manuscript-review framework
Feed HIGH-priority provenance issues as §6/§7 failures
Feed macro manifest into manuscript-review §24 (Cross-Element Coherence) findings — macro values whose surrounding prose uses inappropriate qualitative language ("marginal" for 14.3%, "dramatic" for 0.3%) are §24 failures

If no manuscript-review report exists, recommend running it as a companion audit and note that the macro manifest is available for its Pass 13.

4a. 宏清单生成

生成宏清单——提交给manuscript-review的主要交接工件。针对阶段1a中识别、阶段2a中追溯的每个承载数据的宏：

text

Macro: \bestf
Value: 0.847
Source: results/metrics.json → scripts/generate_latex_macros.py → generated/metrics.tex
Locations:
  - paper.tex:142 — "achieving an F1 score of \bestf{}"
  - paper.tex:287 — "The \bestf{} result represents a substantial improvement"
  - abstract.tex:8 — "...with \bestf{} F1 score"
Classification: MACRO-TRACED

还需包含阶段1a中找到的每个承载数据的裸数字（非宏）——这些是应该设为宏但未设置的值：

text

Bare value: 50
Location: paper.tex:198 — "convergence after 50 epochs"
Should-be-macro: YES — this is a training parameter, should trace to config
Classification: UNTRACED (no macro, no provenance)

将清单保存为

[manuscript-name]-macro-manifest.json

，与来源报告放在同一目录。该文件会被manuscript-review第13轮审查（跨元素一致性）使用，以验证文本与数值的匹配性。

4b. 与manuscript-review交叉引用

如果该手稿存在manuscript-review报告，加载并：

将无法追溯的值映射到manuscript-review第6节（方法论）和第7节（结果）的发现——来源缺口通常与可重复性问题同时出现
将术语不一致标记为manuscript-review框架中第14节（缩写）或第15节（符号）的潜在问题
将高优先级来源问题作为第6/7节的失败项提交
将宏清单提交给manuscript-review第24节（跨元素一致性）的发现——周围文本使用不恰当定性语言的宏值（如14.3%用"微小"、0.3%用"显著"）属于第24节的失败项

如果不存在manuscript-review报告，建议运行该配套审计，并说明宏清单可供其第13轮审查使用。

Phase 5 — Report Generation

阶段5 —— 报告生成

Load

references/checklist.md

and

references/report-template.md

text

Read references/checklist.md
Read references/report-template.md

Generate the provenance report following the template structure:

Provenance Summary — overall score, breakdown by category
Provenance Map — each manuscript artifact linked to its source
Defect Registry — every UNTRACED, STALE, MANUAL, INCONSISTENT finding
Infrastructure Assessment — pipeline, config, versioning status
Remediation Queue — prioritized fixes
Checklist Status — full checklist with pass/fail per checkpoint

加载

references/checklist.md

和

references/report-template.md

。

text

Read references/checklist.md
Read references/report-template.md

按照模板结构生成来源报告：

来源摘要——总体得分、按类别细分
来源映射——每个手稿元素与其来源的关联
缺陷注册表——所有无法追溯、过期、手动、不一致的发现
基础设施评估——流程、配置、版本控制状态
修复队列——按优先级排序的修复项
检查清单状态——完整检查清单，包含每个检查点的通过/失败情况

Phase 6 — Output

阶段6 —— 输出

Save two files in the manuscript directory:

```
[manuscript-name]-provenance-report.md
```
— the full provenance report
```
[manuscript-name]-macro-manifest.json
```
— the structured macro manifest for consumption by manuscript-review Pass 13

The macro manifest JSON structure:

json

{
  "macros": [
    {
      "name": "\\bestf",
      "value": "0.847",
      "source_chain": "results/metrics.json → scripts/gen_macros.py → generated/metrics.tex",
      "locations": [
        {
          "file": "paper.tex",
          "line": 142,
          "context": "achieving an F1 score of \\bestf{}"
        },
        {
          "file": "paper.tex",
          "line": 287,
          "context": "The \\bestf{} result represents a substantial improvement"
        }
      ],
      "classification": "MACRO-TRACED"
    }
  ],
  "bare_numbers": [
    {
      "value": "50",
      "location": {
        "file": "paper.tex",
        "line": 198,
        "context": "convergence after 50 epochs"
      },
      "section": "methodology",
      "should_be_macro": true,
      "rationale": "Training parameter — should trace to config",
      "classification": "UNTRACED"
    }
  ]
}

Present to the user:

Provenance coverage percentage (TRACED / total artifacts)
Count of UNTRACED / STALE / MANUAL findings by severity
Count of bare numbers that should be macros
Top 5 remediation actions
Pipeline completeness verdict
Note that macro manifest is available for manuscript-review Pass 13

在手稿目录中保存两个文件：

```
[manuscript-name]-provenance-report.md
```
——完整的来源报告
```
[manuscript-name]-macro-manifest.json
```
——结构化宏清单，供manuscript-review第13轮审查使用

宏清单的JSON结构：

json

{
  "macros": [
    {
      "name": "\\bestf",
      "value": "0.847",
      "source_chain": "results/metrics.json → scripts/gen_macros.py → generated/metrics.tex",
      "locations": [
        {
          "file": "paper.tex",
          "line": 142,
          "context": "achieving an F1 score of \\bestf{}"
        },
        {
          "file": "paper.tex",
          "line": 287,
          "context": "The \\bestf{} result represents a substantial improvement"
        }
      ],
      "classification": "MACRO-TRACED"
    }
  ],
  "bare_numbers": [
    {
      "value": "50",
      "location": {
        "file": "paper.tex",
        "line": 198,
        "context": "convergence after 50 epochs"
      },
      "section": "methodology",
      "should_be_macro": true,
      "rationale": "Training parameter — should trace to config",
      "classification": "UNTRACED"
    }
  ]
}

向用户展示：

来源覆盖百分比（已追溯工件数/总工件数）
按严重程度划分的无法追溯/过期/手动发现数量
应设为宏的裸数字数量
前5项修复操作
流程完整性 verdict
说明宏清单可供manuscript-review第13轮审查使用

Severity Classification

严重程度分类

CRITICAL — Value in manuscript has no provenance chain AND is a key result (main finding, abstract metric, table headline number). This means the paper's core claims cannot be verified from code.
HIGH — Value/table/figure is untraced or stale, and appears in results or methodology sections. Reproducibility gap.
MEDIUM — Terminology mismatch, manual ordering, partial table generation, config values hardcoded in scripts. Maintenance and consistency risk.
LOW — Minor issues: display-name mapping missing but terms are close, non-critical figures without generation scripts, cosmetic post-editing of generated figures.

CRITICAL（严重）——手稿中的值无来源链且为关键结果（主要发现、摘要指标、表格标题数字）。这意味着论文的核心主张无法通过代码验证。
HIGH（高）——值/表格/图表无法追溯或过期，且出现在结果或方法论部分。存在可重复性缺口。
MEDIUM（中）——术语不匹配、手动排序、表格部分生成、配置值硬编码在脚本中。存在维护和一致性风险。
LOW（低）——次要问题：缺少显示名称映射但术语相近、非关键图表无生成脚本、生成图表的 cosmetic 后期编辑。

Core Principles

核心原则

Binary provenance. Every artifact is either traced or not. No "partially reproducible" — partial means broken.
Code is truth. When manuscript and code disagree, the manuscript is wrong until proven otherwise. Flag the disagreement; do not assume the manuscript author "meant to" override code output.
Macros over magic numbers. Every data value in LaTeX should be a macro. Every macro should be generated. No exceptions for "obvious" values.
Pipeline as proof. If
```
make
```
(or equivalent) does not produce the PDF from raw data, the manuscript is not reproducible. Partial pipelines get partial credit, not a pass.
Config is not code. Hyperparameters, thresholds, model names, file paths — all belong in config files, not scattered through script bodies.
Ordering is data. The sequence of items in a table or enumeration is an assertion. It must come from code (sort order, enum definition) not from the author's sense of what "looks right."
Timestamps matter. A figure generated last month from a script modified yesterday is suspect. Stale outputs are provenance failures.
Companion, not replacement. This audit checks computational grounding. manuscript-review checks document quality. Both are needed. Neither subsumes the other.

二元来源。每个工件要么可追溯，要么不可。不存在“部分可复现”——部分可复现即为断裂。
代码为真相。当手稿与代码不一致时，手稿错误，除非另有证明。标记不一致；不要假设手稿作者“意图”覆盖代码输出。
宏优先于魔法数字。LaTeX中的每个数据值都应是宏。每个宏都应是生成的。“明显”值也不例外。
流程为证明。如果
```
make
```
（或等效命令）无法从原始数据生成PDF，则手稿不可复现。部分流程仅得部分分数，不通过。
配置非代码。超参数、阈值、模型名称、文件路径——所有这些都应在配置文件中，而非分散在脚本主体中。
排序即数据。表格或枚举中项目的顺序是一种断言。它必须来自代码（排序顺序、枚举定义），而非作者认为“看起来正确”的主观判断。
时间戳至关重要。上个月生成的图表，其脚本在昨天被修改，应受到质疑。过期输出属于来源失败。
配套而非替代。本次审计检查计算依据。manuscript-review检查文档质量。两者都需要。彼此不可替代。

Example Invocation Patterns

示例调用模式

User says any of:

"Check provenance"
"Are my numbers from code"
"Audit my pipeline"
"Verify reproducibility"
"Check manuscript against scripts"
"Provenance audit"
"Are my tables generated"
"Do my figures come from scripts"
"/manuscript-provenance"

All trigger this skill.

用户说出以下任意指令：

"Check provenance"
"Are my numbers from code"
"Audit my pipeline"
"Verify reproducibility"
"Check manuscript against scripts"
"Provenance audit"
"Are my tables generated"
"Do my figures come from scripts"
"/manuscript-provenance"

都会触发本技能。