tech-article-reproducibility
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTech Article Reproducibility
技术文章可复现性评估
Measure the quality of a technical article from the angle of "can a reader reproduce the same thing on their machine?" This is an independent axis from prose-style evaluation (mizchi-blog-style) or logical evaluation. The premise: the most important thing about a technical article is whether a reader can reproduce it on their own machine.
从“读者能否在自己的机器上复现相同内容”的角度衡量技术文章的质量。这是与文笔风格评估(mizchi-blog-style)或逻辑评估相互独立的维度。核心前提:技术文章最重要的一点,就是读者能否在自己的机器上复现其中内容。
When to use
使用场景
- Final pre-publication check on a technical article draft
- Hands-on articles / tutorial articles
- Tool introduction articles / setup articles
- Verifying an article that claims "it worked"
When not to use:
- Conceptual explainer articles (nothing to reproduce)
- Poems / opinion pieces
- Self-contained small tidbits
- 技术文章草稿发布前的最终检查
- 实操类文章/教程类文章
- 工具介绍类文章/环境搭建类文章
- 验证声称“已成功运行”的文章
不适用场景:
- 概念解释类文章(无内容可复现)
- 诗歌/观点类文章
- 独立的小技巧分享
Reproducibility check axes (10 axes)
可复现性评估维度(10项)
Score each axis on a 0–2 scale, 20 points total → converted to a 10-point scale.
| # | Axis | 0 (NG) | 1 (partial) | 2 (OK) |
|---|---|---|---|---|
| 1 | Environment prerequisites stated | No OS / version / required tools listed | Partially listed | Everything listed (OS, lang version, CLI tools) |
| 2 | Code completeness | Fragments only, imports/setup omitted | Only the main part | Full, copy-pasteable form that runs |
| 3 | Command accuracy | Placeholders left as-is ( | Some placeholders | Runnable as-is |
| 4 | Version dependency stated | No mention | Partial | Explicit, e.g. "works on v3.x", "v2 or earlier behaves as X" |
| 5 | Full config files included | Excerpts only | Main keys only | Full minimal working config |
| 6 | Expected output shown | None | Explained in prose | Actual output / screenshot |
| 7 | Handling of errors | Not mentioned | One case touched on | Several major errors + how to handle them |
| 8 | Project prerequisites stated | Author-environment assumptions are implicit | Partially stated | Paths / repo structure / existing config all stated |
| 9 | Link health | Links broken or require auth | Some require auth | All accessible publicly |
| 10 | Author-specific knowledge stated | Helpers / dotfiles assumed implicitly | Partially stated | Fully stated or not required |
每个维度按0-2分打分,总分20分→转换为10分制。
| 序号 | 评估维度 | 0分(不合格) | 1分(部分合格) | 2分(合格) |
|---|---|---|---|---|
| 1 | 环境前提说明 | 未列出操作系统/版本/所需工具 | 仅部分列出 | 完整列出所有内容(操作系统、语言版本、CLI工具) |
| 2 | 代码完整性 | 仅提供代码片段,省略导入/初始化代码 | 仅提供核心代码 | 提供完整、可直接复制粘贴运行的代码 |
| 3 | 命令准确性 | 占位符未做说明直接保留(如 | 存在部分占位符 | 命令可直接运行 |
| 4 | 版本依赖说明 | 未提及版本依赖 | 部分提及 | 明确说明,例如“适用于v3.x版本”、“v2及更早版本表现为X” |
| 5 | 完整配置文件提供 | 仅提供配置片段 | 仅提供核心配置项 | 提供完整的最小可用配置 |
| 6 | 预期输出展示 | 未展示预期输出 | 仅用文字描述预期输出 | 提供实际输出内容/截图 |
| 7 | 错误处理说明 | 未提及错误处理 | 仅提及一种错误情况 | 覆盖多种主要错误及对应的解决方法 |
| 8 | 项目前提说明 | 隐含作者自身环境的假设 | 部分说明项目前提 | 完整说明路径/仓库结构/已有配置等信息 |
| 9 | 链接有效性 | 链接失效或需要权限访问 | 部分链接需要权限 | 所有链接均可公开访问 |
| 10 | 作者专属知识说明 | 隐含假设读者了解辅助工具/配置文件(dotfiles) | 部分说明作者专属知识 | 完整说明或无需此类知识 |
Evaluation workflow
评估流程
For evaluating technical articles, use the same subagent dispatch as empirical-prompt-tuning. The difference is that the subagent plays the role of "a first-time reader trying to reproduce the work" rather than "an executor."
- Fix the target article
- subagent dispatch (template below)
- Extract "reproduction sticking points" from the returned evaluation
- Add / fix text in the article to address those sticking points
- If needed, re-evaluate with a fresh subagent
评估技术文章时,使用与empirical-prompt-tuning相同的subagent调度方式。区别在于,subagent扮演的是**“首次阅读并尝试复现内容的读者”**,而非“执行者”角色。
- 确定目标文章
- 调度subagent(模板如下)
- 从返回的评估结果中提取“复现卡点”
- 在文章中补充/修改内容以解决这些卡点
- 如有需要,使用新的subagent重新评估
subagent dispatch template
subagent调度模板
You are a reader interested in <the article's subject area> but new to <the tech stack>.
You are going to read this article and try to reproduce the same thing in your local environment.You are a reader interested in <the article's subject area> but new to <the tech stack>.
You are going to read this article and try to reproduce the same thing in your local environment.Target article
Target article
<path to the article file>
<path to the article file>
Evaluation axes (10 reproducibility axes)
Evaluation axes (10 reproducibility axes)
Score each axis 0–2. Refer to the rubric in the skill:
/Users/mz/.claude/skills/tech-article-reproducibility/SKILL.md
tech-article-reproducibility- Environment prerequisites stated
- Code completeness
- Command accuracy
- Version dependency stated
- Full config files included
- Expected output shown
- Handling of errors
- Project prerequisites stated
- Link health (actually verify with WebFetch)
- Author-specific knowledge stated
Score each axis 0–2. Refer to the rubric in the skill:
/Users/mz/.claude/skills/tech-article-reproducibility/SKILL.md
tech-article-reproducibility- Environment prerequisites stated
- Code completeness
- Command accuracy
- Version dependency stated
- Full config files included
- Expected output shown
- Handling of errors
- Project prerequisites stated
- Link health (actually verify with WebFetch)
- Author-specific knowledge stated
Tasks
Tasks
- While reading the article, imagine "where would I get stuck if I reproduced this on my own machine?"
- Score each axis 0–2 with quoted evidence
- List the top 5 sticking points with line numbers
- While reading the article, imagine "where would I get stuck if I reproduced this on my own machine?"
- Score each axis 0–2 with quoted evidence
- List the top 5 sticking points with line numbers
Report structure
Report structure
- Reproducibility score: X/20 (breakdown table)
- Top 5 sticking points: <line number> <quote> → <why it sticks>
- Missing information: list of things that should be added to the article
- Overall verdict: what percentage chance (subjective) do you have of reproducing this after reading the article
undefined- Reproducibility score: X/20 (breakdown table)
- Top 5 sticking points: <line number> <quote> → <why it sticks>
- Missing information: list of things that should be added to the article
- Overall verdict: what percentage chance (subjective) do you have of reproducing this after reading the article
undefinedHow to read the score
分数解读
- 18-20: Publishable as a hands-on piece; almost no additional information needed
- 14-17: Some googling required, but reproducible; okay to publish
- 10-13: Information outside the article is required to reproduce; revisions recommended
- 9 or below: Hard to reproduce; rethink the article's premise or position it as something other than a hands-on piece
- 18-20分:可作为实操类文章发布;几乎无需补充额外信息
- 14-17分:需要少量搜索,但可复现;可以发布
- 10-13分:复现需要文章外的信息;建议修订
- 9分及以下:难以复现;需重新考量文章定位,或不作为实操类文章发布
Pitfalls
注意事项
- The evaluator's background knowledge is too high: if you don't explicitly tell the subagent to play a "beginner role," it will judge "enough information" from an expert's viewpoint. Emphasize "first-time reader" in the prompt
- Ignoring link health: links that are alive at publication time can break a year later. Separately check whether reproduction is possible using only live links
- Inlining all sample code: reproducibility goes up, but the article bloats. A hybrid approach that combines inline code with a link to the repository is realistic
- Reproducibility ≠ prose quality: an article can be highly reproducible yet hard to read. Combine with and similar to measure both axes
mizchi-blog-style
- 评估者背景知识过高:如果未明确要求subagent扮演“初学者”角色,它会从专家视角判断“信息是否足够”。需在提示中强调“首次阅读者”身份
- 忽略链接有效性:发布时有效的链接可能在一年后失效。需单独检查仅通过有效链接是否能完成复现
- 内嵌所有示例代码:可复现性会提升,但文章会变得冗长。结合内嵌代码与仓库链接的混合方式更为合理
- 可复现性≠文笔质量:一篇文章可能具备高可复现性但可读性差。建议结合等工具同时评估两个维度
mizchi-blog-style
Related
相关技能
- — meta-skill for subagent dispatch + iterative improvement
empirical-prompt-tuning - — evaluation on the prose-style axis (independent from this skill)
mizchi-blog-style
- —— 用于subagent调度+迭代优化的元技能
empirical-prompt-tuning - —— 文笔风格维度的评估(与本技能相互独立)
mizchi-blog-style