tech-article-reproducibility

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Tech Article Reproducibility

技术文章可复现性评估

Measure the quality of a technical article from the angle of "can a reader reproduce the same thing on their machine?" This is an independent axis from prose-style evaluation (mizchi-blog-style) or logical evaluation. The premise: the most important thing about a technical article is whether a reader can reproduce it on their own machine.
从“读者能否在自己的机器上复现相同内容”的角度衡量技术文章的质量。这是与文笔风格评估(mizchi-blog-style)或逻辑评估相互独立的维度。核心前提:技术文章最重要的一点,就是读者能否在自己的机器上复现其中内容。

When to use

使用场景

  • Final pre-publication check on a technical article draft
  • Hands-on articles / tutorial articles
  • Tool introduction articles / setup articles
  • Verifying an article that claims "it worked"
When not to use:
  • Conceptual explainer articles (nothing to reproduce)
  • Poems / opinion pieces
  • Self-contained small tidbits
  • 技术文章草稿发布前的最终检查
  • 实操类文章/教程类文章
  • 工具介绍类文章/环境搭建类文章
  • 验证声称“已成功运行”的文章
不适用场景:
  • 概念解释类文章(无内容可复现)
  • 诗歌/观点类文章
  • 独立的小技巧分享

Reproducibility check axes (10 axes)

可复现性评估维度(10项)

Score each axis on a 0–2 scale, 20 points total → converted to a 10-point scale.
#Axis0 (NG)1 (partial)2 (OK)
1Environment prerequisites statedNo OS / version / required tools listedPartially listedEverything listed (OS, lang version, CLI tools)
2Code completenessFragments only, imports/setup omittedOnly the main partFull, copy-pasteable form that runs
3Command accuracyPlaceholders left as-is (
<your-token>
etc. without explanation)
Some placeholdersRunnable as-is
4Version dependency statedNo mentionPartialExplicit, e.g. "works on v3.x", "v2 or earlier behaves as X"
5Full config files includedExcerpts onlyMain keys onlyFull minimal working config
6Expected output shownNoneExplained in proseActual output / screenshot
7Handling of errorsNot mentionedOne case touched onSeveral major errors + how to handle them
8Project prerequisites statedAuthor-environment assumptions are implicitPartially statedPaths / repo structure / existing config all stated
9Link healthLinks broken or require authSome require authAll accessible publicly
10Author-specific knowledge statedHelpers / dotfiles assumed implicitlyPartially statedFully stated or not required
每个维度按0-2分打分,总分20分→转换为10分制。
序号评估维度0分(不合格)1分(部分合格)2分(合格)
1环境前提说明未列出操作系统/版本/所需工具仅部分列出完整列出所有内容(操作系统、语言版本、CLI工具)
2代码完整性仅提供代码片段,省略导入/初始化代码仅提供核心代码提供完整、可直接复制粘贴运行的代码
3命令准确性占位符未做说明直接保留(如
<your-token>
等)
存在部分占位符命令可直接运行
4版本依赖说明未提及版本依赖部分提及明确说明,例如“适用于v3.x版本”、“v2及更早版本表现为X”
5完整配置文件提供仅提供配置片段仅提供核心配置项提供完整的最小可用配置
6预期输出展示未展示预期输出仅用文字描述预期输出提供实际输出内容/截图
7错误处理说明未提及错误处理仅提及一种错误情况覆盖多种主要错误及对应的解决方法
8项目前提说明隐含作者自身环境的假设部分说明项目前提完整说明路径/仓库结构/已有配置等信息
9链接有效性链接失效或需要权限访问部分链接需要权限所有链接均可公开访问
10作者专属知识说明隐含假设读者了解辅助工具/配置文件(dotfiles)部分说明作者专属知识完整说明或无需此类知识

Evaluation workflow

评估流程

For evaluating technical articles, use the same subagent dispatch as empirical-prompt-tuning. The difference is that the subagent plays the role of "a first-time reader trying to reproduce the work" rather than "an executor."
  1. Fix the target article
  2. subagent dispatch (template below)
  3. Extract "reproduction sticking points" from the returned evaluation
  4. Add / fix text in the article to address those sticking points
  5. If needed, re-evaluate with a fresh subagent
评估技术文章时,使用与empirical-prompt-tuning相同的subagent调度方式。区别在于,subagent扮演的是**“首次阅读并尝试复现内容的读者”**,而非“执行者”角色。
  1. 确定目标文章
  2. 调度subagent(模板如下)
  3. 从返回的评估结果中提取“复现卡点”
  4. 在文章中补充/修改内容以解决这些卡点
  5. 如有需要,使用新的subagent重新评估

subagent dispatch template

subagent调度模板

You are a reader interested in <the article's subject area> but new to <the tech stack>.
You are going to read this article and try to reproduce the same thing in your local environment.
You are a reader interested in <the article's subject area> but new to <the tech stack>.
You are going to read this article and try to reproduce the same thing in your local environment.

Target article

Target article

<path to the article file>
<path to the article file>

Evaluation axes (10 reproducibility axes)

Evaluation axes (10 reproducibility axes)

Score each axis 0–2. Refer to the rubric in the
tech-article-reproducibility
skill: /Users/mz/.claude/skills/tech-article-reproducibility/SKILL.md
  1. Environment prerequisites stated
  2. Code completeness
  3. Command accuracy
  4. Version dependency stated
  5. Full config files included
  6. Expected output shown
  7. Handling of errors
  8. Project prerequisites stated
  9. Link health (actually verify with WebFetch)
  10. Author-specific knowledge stated
Score each axis 0–2. Refer to the rubric in the
tech-article-reproducibility
skill: /Users/mz/.claude/skills/tech-article-reproducibility/SKILL.md
  1. Environment prerequisites stated
  2. Code completeness
  3. Command accuracy
  4. Version dependency stated
  5. Full config files included
  6. Expected output shown
  7. Handling of errors
  8. Project prerequisites stated
  9. Link health (actually verify with WebFetch)
  10. Author-specific knowledge stated

Tasks

Tasks

  1. While reading the article, imagine "where would I get stuck if I reproduced this on my own machine?"
  2. Score each axis 0–2 with quoted evidence
  3. List the top 5 sticking points with line numbers
  1. While reading the article, imagine "where would I get stuck if I reproduced this on my own machine?"
  2. Score each axis 0–2 with quoted evidence
  3. List the top 5 sticking points with line numbers

Report structure

Report structure

  • Reproducibility score: X/20 (breakdown table)
  • Top 5 sticking points: <line number> <quote><why it sticks>
  • Missing information: list of things that should be added to the article
  • Overall verdict: what percentage chance (subjective) do you have of reproducing this after reading the article
undefined
  • Reproducibility score: X/20 (breakdown table)
  • Top 5 sticking points: <line number> <quote><why it sticks>
  • Missing information: list of things that should be added to the article
  • Overall verdict: what percentage chance (subjective) do you have of reproducing this after reading the article
undefined

How to read the score

分数解读

  • 18-20: Publishable as a hands-on piece; almost no additional information needed
  • 14-17: Some googling required, but reproducible; okay to publish
  • 10-13: Information outside the article is required to reproduce; revisions recommended
  • 9 or below: Hard to reproduce; rethink the article's premise or position it as something other than a hands-on piece
  • 18-20分:可作为实操类文章发布;几乎无需补充额外信息
  • 14-17分:需要少量搜索,但可复现;可以发布
  • 10-13分:复现需要文章外的信息;建议修订
  • 9分及以下:难以复现;需重新考量文章定位,或不作为实操类文章发布

Pitfalls

注意事项

  • The evaluator's background knowledge is too high: if you don't explicitly tell the subagent to play a "beginner role," it will judge "enough information" from an expert's viewpoint. Emphasize "first-time reader" in the prompt
  • Ignoring link health: links that are alive at publication time can break a year later. Separately check whether reproduction is possible using only live links
  • Inlining all sample code: reproducibility goes up, but the article bloats. A hybrid approach that combines inline code with a link to the repository is realistic
  • Reproducibility ≠ prose quality: an article can be highly reproducible yet hard to read. Combine with
    mizchi-blog-style
    and similar to measure both axes
  • 评估者背景知识过高:如果未明确要求subagent扮演“初学者”角色,它会从专家视角判断“信息是否足够”。需在提示中强调“首次阅读者”身份
  • 忽略链接有效性:发布时有效的链接可能在一年后失效。需单独检查仅通过有效链接是否能完成复现
  • 内嵌所有示例代码:可复现性会提升,但文章会变得冗长。结合内嵌代码与仓库链接的混合方式更为合理
  • 可复现性≠文笔质量:一篇文章可能具备高可复现性但可读性差。建议结合
    mizchi-blog-style
    等工具同时评估两个维度

Related

相关技能

  • empirical-prompt-tuning
    — meta-skill for subagent dispatch + iterative improvement
  • mizchi-blog-style
    — evaluation on the prose-style axis (independent from this skill)
  • empirical-prompt-tuning
    —— 用于subagent调度+迭代优化的元技能
  • mizchi-blog-style
    —— 文笔风格维度的评估(与本技能相互独立)