doc-drift-detector

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Documentation Drift Detector

文档漂移检测器

The agent detects documentation drift by mapping code directories to their docs, comparing git modification histories, extracting Python function signatures via AST, validating every markdown link and anchor, and scoring freshness on a weighted 0-100 scale. All four CLI tools use the Python standard library only.

该工具通过将代码目录与对应文档关联、对比git修改历史、通过AST提取Python函数签名、验证所有Markdown链接和锚点,并采用加权0-100分制评估文档新鲜度,以此检测文档漂移。所有四款CLI工具仅依赖Python标准库。

Quick Start

快速开始

bash
undefined
bash
undefined

1. Run full drift analysis on a repository

1. 对仓库运行完整漂移分析

python scripts/drift_analyzer.py /path/to/repo
python scripts/drift_analyzer.py /path/to/repo

2. Score documentation freshness

2. 评估文档新鲜度分数

python scripts/doc_staleness_scorer.py /path/to/repo
python scripts/doc_staleness_scorer.py /path/to/repo

3. Validate API docs against Python source

3. 验证API文档与Python源码的一致性

python scripts/api_doc_validator.py /path/to/repo/src /path/to/repo/docs/api.md
python scripts/api_doc_validator.py /path/to/repo/src /path/to/repo/docs/api.md

4. Check all markdown links

4. 检查所有Markdown链接

python scripts/link_checker.py /path/to/repo
python scripts/link_checker.py /path/to/repo

JSON output for any tool

以JSON格式输出任意工具的结果

python scripts/drift_analyzer.py /path/to/repo --json
python scripts/drift_analyzer.py /path/to/repo --json

Set failure threshold for CI

为CI设置失败阈值

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

All tools support `--help` for full usage details.

---
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

所有工具均支持`--help`查看完整使用说明。

---

Core Workflows

核心工作流程

Workflow 1: Full Drift Analysis

工作流程1:完整漂移分析

Scan all documentation against code changes since each doc was last updated. This is the primary entry point for understanding the overall drift state of a repository.
bash
undefined
扫描所有文档,对比自文档上次更新以来的代码变更。这是了解仓库整体漂移状态的主要入口。
bash
undefined

Basic analysis

基础分析

python scripts/drift_analyzer.py /path/to/repo
python scripts/drift_analyzer.py /path/to/repo

Analyze with custom doc patterns

使用自定义文档模式进行分析

python scripts/drift_analyzer.py /path/to/repo --doc-patterns ".md,.rst,*.txt"
python scripts/drift_analyzer.py /path/to/repo --doc-patterns ".md,.rst,*.txt"

JSON output for tooling

输出JSON格式结果供工具调用

python scripts/drift_analyzer.py /path/to/repo --json
python scripts/drift_analyzer.py /path/to/repo --json

Only show high-severity drift

仅显示高严重性漂移

python scripts/drift_analyzer.py /path/to/repo --min-severity high
python scripts/drift_analyzer.py /path/to/repo --min-severity high

Analyze specific directory

分析指定目录

python scripts/drift_analyzer.py /path/to/repo --scope src/

**What it does:**

1. Discovers all documentation files in the repo
2. For each doc, identifies the code directories it describes (via path proximity and content references)
3. Compares the doc's last-modified date against the git history of its associated code
4. Identifies specific changes (renamed files, moved directories, changed function signatures)
5. Classifies each drift instance by category and severity
6. Generates an actionable report with specific file:line references

**Output example:**

Documentation Drift Report

Repository: /path/to/repo Scan date: 2026-03-18 Docs found: 12 Drifted: 5
HIGH SEVERITY: docs/api.md (last updated: 2026-01-15) - 23 code files changed since doc update - 4 functions renamed in src/handlers/ - 2 new modules undocumented Category: Factual + Structural Recommendation: Manual update required
MEDIUM SEVERITY: README.md (last updated: 2026-02-28) - Installation section references removed dependency - Version string outdated (says 1.8.0, current 2.0.0) Category: Factual + Temporal Recommendation: Auto-fixable (version), Manual (installation)
undefined
python scripts/drift_analyzer.py /path/to/repo --scope src/

**功能说明:**

1. 发现仓库中的所有文档文件
2. 为每个文档确定其描述的代码目录(通过路径关联性和内容引用)
3. 将文档的最后修改日期与其关联代码的git历史进行对比
4. 识别具体变更(文件重命名、目录移动、函数签名变更)
5. 按类别和严重性对每个漂移实例进行分类
6. 生成包含具体文件:行引用的可操作报告

**输出示例:**

文档漂移报告

仓库路径: /path/to/repo 扫描日期: 2026-03-18 发现文档数: 12 存在漂移的文档数: 5
高严重性: docs/api.md (最后更新: 2026-01-15) - 文档更新后有23个代码文件发生变更 - src/handlers/中有4个函数被重命名 - 2个新模块未被文档化 类别: 事实性 + 结构性 建议: 需要手动更新
中严重性: README.md (最后更新: 2026-02-28) - 安装章节引用了已移除的依赖 - 版本字符串过时(文档显示1.8.0,当前版本为2.0.0) 类别: 事实性 + 时效性 建议: 版本可自动修复,安装部分需手动更新
undefined

Workflow 2: API Documentation Validation

工作流程2:API文档验证

Check that API documentation accurately reflects the actual function signatures, class definitions, and module structure in your Python source code.
bash
undefined
检查API文档是否准确反映Python源码中的实际函数签名、类定义和模块结构。
bash
undefined

Validate API docs against source

验证API文档与源码的一致性

python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md

Scan entire docs directory

扫描整个文档目录

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive

JSON output

输出JSON格式结果

python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md --json
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md --json

Include private methods in validation

在验证中包含私有方法

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private

**What it detects:**

- Functions/classes present in code but missing from docs
- Functions/classes documented but no longer in code (removed or renamed)
- Parameter mismatches (missing params, wrong types, wrong defaults)
- Deprecated items still documented as current
- Return type mismatches
- Module-level docstring drift

**How it works:**

The tool uses Python's `ast` module to parse source files and extract function signatures, class definitions, decorators, and docstrings. It then parses the markdown documentation looking for function/class references, parameter lists, and code blocks. Mismatches are reported with exact locations in both source and documentation.
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private

**检测内容:**

- 代码中存在但文档中缺失的函数/类
- 文档中存在但代码中已移除或重命名的函数/类
- 参数不匹配(缺失参数、类型错误、默认值错误)
- 已废弃但仍被作为当前内容文档化的项
- 返回类型不匹配
- 模块级文档字符串漂移

**工作原理:**

该工具使用Python的`ast`模块解析源码文件,提取函数签名、类定义、装饰器和文档字符串。然后解析Markdown文档,查找函数/类引用、参数列表和代码块。报告中会指出源码和文档中不匹配项的精确位置。

Workflow 3: README Health Check

工作流程3:README健康检查

Validate README sections against the actual project state. This combines drift analysis, link checking, and completeness scoring into a single README-focused report.
bash
undefined
验证README章节与项目实际状态的一致性。该流程将漂移分析、链接检查和完整性评估整合为一份聚焦README的报告。
bash
undefined

Check README health

检查README健康状态

python scripts/doc_staleness_scorer.py /path/to/repo --readme-focus
python scripts/doc_staleness_scorer.py /path/to/repo --readme-focus

Check with custom sections

使用自定义章节进行检查

python scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"

**Validates:**

- Required sections are present (Installation, Usage, API Reference, Contributing, License)
- Version strings match package version (package.json, setup.py, pyproject.toml)
- File references in README actually exist
- Badge URLs are well-formed
- Code examples reference existing files/functions
- Table of contents matches actual headings
python scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"

**验证内容:**

- 是否存在必填章节(安装、使用、API参考、贡献指南、许可证)
- 版本字符串是否与包版本(package.json、setup.py、pyproject.toml)匹配
- README中引用的文件是否实际存在
- 徽章URL格式是否正确
- 代码示例是否引用了现有文件/函数
- 目录是否与实际标题匹配

Workflow 4: Link Integrity Audit

工作流程4:链接完整性审计

Check every link in every markdown file -- local file references, anchors, cross-document links, and optionally external URLs.
bash
undefined
检查每个Markdown文件中的所有链接——本地文件引用、锚点、跨文档链接,可选检查外部URL。
bash
undefined

Check all markdown links

检查所有Markdown链接

python scripts/link_checker.py /path/to/repo
python scripts/link_checker.py /path/to/repo

Include external URL checks (slower, makes HTTP requests)

包含外部URL检查(速度较慢,会发起HTTP请求)

python scripts/link_checker.py /path/to/repo --check-external
python scripts/link_checker.py /path/to/repo --check-external

Check specific file

检查指定文件

python scripts/link_checker.py /path/to/repo/README.md
python scripts/link_checker.py /path/to/repo/README.md

JSON output

输出JSON格式结果

python scripts/link_checker.py /path/to/repo --json
python scripts/link_checker.py /path/to/repo --json

Only show broken links

仅显示损坏的链接

python scripts/link_checker.py /path/to/repo --broken-only

**What it checks:**

- Local file references (`[link](path/to/file.md)`) -- does the file exist?
- Anchor references (`[link](#section-name)`) -- does the heading exist?
- Cross-document anchors (`[link](other.md#section)`) -- does the file and heading exist?
- Relative path correctness (catches `../` errors)
- Case sensitivity issues (common on Linux but silent on macOS)
- Image references -- do referenced images exist?
- Duplicate anchors that would cause ambiguous links
python scripts/link_checker.py /path/to/repo --broken-only

**检查内容:**

- 本地文件引用(`[link](path/to/file.md)`)——文件是否存在?
- 锚点引用(`[link](#section-name)`)——标题是否存在?
- 跨文档锚点(`[link](other.md#section)`)——文件和标题是否存在?
- 相对路径正确性(捕获`../`错误)
- 大小写敏感性问题(在Linux上常见,在macOS上无提示)
- 图片引用——引用的图片是否存在?
- 会导致链接歧义的重复锚点

Workflow 5: Continuous Doc Monitoring

工作流程5:持续文档监控

Integrate documentation drift detection into your CI/CD pipeline for ongoing monitoring.
GitHub Actions example:
yaml
name: Documentation Drift Check
on:
  pull_request:
    branches: [main, dev]
  push:
    branches: [main]

jobs:
  doc-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for git log analysis

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Run drift analysis
        run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json

      - name: Check staleness score
        run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50

      - name: Validate API docs
        run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md

      - name: Check links
        run: python engineering/doc-drift-detector/scripts/link_checker.py .

      - name: Upload drift report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: drift-report
          path: drift-report.json
Pre-commit hook:
bash
#!/bin/bash
将文档漂移检测集成到CI/CD流水线中进行持续监控。
GitHub Actions示例:
yaml
name: Documentation Drift Check
on:
  pull_request:
    branches: [main, dev]
  push:
    branches: [main]

jobs:
  doc-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # 用于git日志分析的完整历史

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Run drift analysis
        run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json

      - name: Check staleness score
        run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50

      - name: Validate API docs
        run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md

      - name: Check links
        run: python engineering/doc-drift-detector/scripts/link_checker.py .

      - name: Upload drift report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: drift-report
          path: drift-report.json
提交前钩子示例:
bash
#!/bin/bash

.git/hooks/pre-commit

.git/hooks/pre-commit

Fail commit if docs are severely stale

如果文档严重过时,则阻止提交

python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet if [ $? -ne 0 ]; then echo "Documentation is critically stale. Update docs before committing." exit 1 fi

---
python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet if [ $? -ne 0 ]; then echo "文档已严重过时,请在提交前更新文档。" exit 1 fi

---

Tools

工具列表

ToolPurposeLinesKey Feature
drift_analyzer.py
Full drift analysis between code and docs~550Git history comparison with code-to-doc mapping
doc_staleness_scorer.py
Score documentation freshness 0-100~450Weighted multi-dimensional scoring
api_doc_validator.py
Validate API docs against Python source~400AST-based signature extraction and comparison
link_checker.py
Audit all markdown links and anchors~400Local file, anchor, and cross-document validation
All tools:
  • Python 3.8+ standard library only
  • Support
    --json
    for machine-readable output
  • Support
    --help
    for usage details
  • Use non-zero exit codes on failure (CI/CD compatible)
  • Work on any OS (Windows, macOS, Linux)

工具用途代码行数核心特性
drift_analyzer.py
代码与文档间的完整漂移分析~550基于git历史对比的代码-文档关联映射
doc_staleness_scorer.py
0-100分制评估文档新鲜度~450加权多维评分机制
api_doc_validator.py
验证API文档与Python源码的一致性~400基于AST的签名提取与对比
link_checker.py
审计所有Markdown链接和锚点~400本地文件、锚点及跨文档验证
所有工具特性:
  • 仅支持Python 3.8+标准库
  • 支持
    --json
    输出机器可读结果
  • 支持
    --help
    查看使用说明
  • 检测到问题时返回非零退出码(兼容CI/CD)
  • 支持所有操作系统(Windows、macOS、Linux)

Staleness Scoring

陈旧度评分

Documentation freshness is scored on a 0-100 scale where 100 = perfectly current. The score is a weighted combination of five dimensions:
DimensionWeightWhat It Measures
Last Updated20%How recently the doc file was modified relative to its associated code
Code-Doc Alignment30%Whether documented items (functions, classes, files) still exist and match
Link Health15%Percentage of links that resolve correctly
Completeness20%Whether expected sections are present and non-empty
Accuracy15%Whether version strings, file paths, and other verifiable facts are correct
Score interpretation:
ScoreLabelAction
90-100ExcellentNo action needed
70-89GoodMinor updates recommended
50-69StaleUpdates needed before next release
30-49CriticalImmediate attention required
0-29AbandonedFull rewrite likely needed
Customization:
bash
undefined
文档新鲜度采用0-100分制评估,其中100=完全同步。分数由五个维度加权计算得出:
维度权重衡量内容
最后更新时间20%文档文件相对于其关联代码的修改时效性
代码-文档一致性30%文档化的项(函数、类、文件)是否仍存在且匹配
链接健康度15%可正常解析的链接占比
完整性20%是否存在预期章节且内容非空
准确性15%版本字符串、文件路径及其他可验证事实是否正确
分数解读:
分数标签操作建议
90-100优秀无需操作
70-89良好建议进行小幅更新
50-69陈旧需在下一版本发布前完成更新
30-49严重需立即处理
0-29废弃可能需要完全重写
自定义配置:
bash
undefined

Override default weights

覆盖默认权重

python scripts/doc_staleness_scorer.py /path/to/repo
--weight-updated 0.25
--weight-alignment 0.25
--weight-links 0.15
--weight-completeness 0.20
--weight-accuracy 0.15
python scripts/doc_staleness_scorer.py /path/to/repo
--weight-updated 0.25
--weight-alignment 0.25
--weight-links 0.15
--weight-completeness 0.20
--weight-accuracy 0.15

Set staleness thresholds

设置陈旧度阈值

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

---
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

---

Drift Categories

漂移类别

Every detected drift instance is classified into one or more categories:
每个检测到的漂移实例会被归类到一个或多个类别:

Structural Drift

结构性漂移

Missing or misorganized sections. A README lacks an Installation section. An API doc is missing an entire module. A CHANGELOG has no entries for the latest version.
Detection: Compare actual document headings against expected headings for that document type.
章节缺失或组织混乱。例如README缺少安装章节,API文档缺失整个模块,CHANGELOG没有最新版本的条目。
检测方式: 将文档实际标题与该类型文档的预期标题进行对比。

Factual Drift

事实性漂移

Incorrect information. A function signature in the docs has the wrong parameters. An installation command references a removed package. A configuration example uses deprecated options.
Detection: Cross-reference documented facts against code analysis (AST parsing, file existence, git tags).
信息错误。例如文档中的函数签名参数错误,安装命令引用了已移除的包,配置示例使用了已废弃的选项。
检测方式: 将文档中的事实与代码分析结果(AST解析、文件存在性、git标签)进行交叉验证。

Referential Drift

引用性漂移

Broken references. A link points to a file that was moved. An anchor references a heading that was renamed. An image path is wrong.
Detection: Link checker validates every reference against the filesystem and document structure.
引用损坏。例如链接指向已移动的文件,锚点引用了已重命名的标题,图片路径错误。
检测方式: 链接检查器验证每个引用与文件系统和文档结构的一致性。

Temporal Drift

时效性漂移

Outdated time-sensitive content. Version strings are old. "Last updated" dates are stale. "Coming soon" items that shipped months ago. Roadmap items past their target date.
Detection: Extract version strings and dates, compare against git tags, package manifests, and current date.
时间敏感内容过时。例如版本字符串陈旧,“最后更新”日期过时,“即将推出”的功能已发布数月,路线图项已过截止日期。
检测方式: 提取版本字符串和日期,与git标签、包清单及当前日期进行对比。

Semantic Drift

语义性漂移

Technically accurate but misleading. A description says "simple REST API" when the project now has GraphQL, gRPC, and WebSocket endpoints. The architecture overview omits a major new subsystem.
Detection: Compare document topic coverage against code directory structure and file counts. Flag when code complexity has grown significantly but documentation scope has not.

技术上准确但具有误导性。例如文档描述为“简单REST API”,但项目现在已包含GraphQL、gRPC和WebSocket端点;架构概述遗漏了重要的新子系统。
检测方式: 将文档主题覆盖范围与代码目录结构和文件数量进行对比。当代码复杂度显著增长但文档范围未扩大时标记为漂移。

Auto-Fix vs Manual-Fix Classification

自动修复与手动修复分类

Not all drift can be fixed programmatically. The tools classify each issue:
并非所有漂移都能通过程序自动修复。工具会将每个问题分类:

Auto-Fixable (safe to automate)

可自动修复(安全自动化)

  • Version string updates -- replace old version with current from package manifest
  • Date updates -- update "last modified" timestamps
  • Broken local links -- suggest correct path when file was moved (git log tracks renames)
  • Missing table of contents entries -- generate from actual headings
  • Removed file references -- flag for deletion or suggest replacement
  • 版本字符串更新——用包清单中的当前版本替换旧版本
  • 日期更新——更新“最后修改”时间戳
  • 损坏的本地链接——当文件被移动时建议正确路径(git日志跟踪重命名)
  • 缺失的目录条目——根据实际标题生成目录
  • 已移除文件的引用——标记为待删除或建议替换

Manual-Fix Required (needs human judgment)

需要手动修复(需人工判断)

  • Architectural description changes -- requires understanding intent
  • API usage examples -- new examples need domain context
  • Migration guides -- require understanding of breaking changes
  • Getting started rewrites -- narrative flow needs human touch
  • Security documentation updates -- compliance implications require review
  • 架构描述变更——需要理解设计意图
  • API使用示例——新示例需要领域上下文
  • 迁移指南——需要理解破坏性变更
  • 入门指南重写——叙述流程需要人工调整
  • 安全文档更新——合规影响需要审核

Semi-Automated (template + human review)

半自动化(模板+人工审核)

  • New function documentation -- generate skeleton from AST, human fills description
  • Changelog entries -- generate from git commits, human edits for clarity
  • README section additions -- provide template, human adds content
The drift report marks each issue with
[AUTO]
,
[MANUAL]
, or
[SEMI]
tags.

  • 新函数文档——从AST生成框架,人工补充描述
  • 变更日志条目——从git提交生成内容,人工编辑以确保清晰
  • README章节添加——提供模板,人工填充内容
漂移报告会为每个问题标记
[AUTO]
[MANUAL]
[SEMI]
标签。

Integration Points

集成点

With CI/CD Pipelines

与CI/CD流水线集成

All tools return non-zero exit codes when issues are found:
  • Exit 0: No issues (or all within threshold)
  • Exit 1: Issues found exceeding threshold
  • Exit 2: Tool error (invalid arguments, missing files)
所有工具在检测到问题时返回非零退出码:
  • 退出码0:无问题(或所有问题均在阈值内)
  • 退出码1:检测到超出阈值的问题
  • 退出码2:工具错误(参数无效、文件缺失)

With Code Review

与代码评审集成

Add drift analysis to PR checks. When a PR modifies code in
src/
, automatically check whether docs in
docs/
need updates. The drift analyzer can scope its analysis to only changed directories.
将漂移分析添加到PR检查中。当PR修改
src/
中的代码时,自动检查
docs/
中的文档是否需要更新。漂移分析器可将分析范围限定为已变更的目录。

With Documentation Generators

与文档生成器集成

Pair with tools like Sphinx, MkDocs, or mdBook. Run API validation after doc generation to ensure the generated docs match source. Run link checker on the built output.
与Sphinx、MkDocs或mdBook等工具配合使用。在文档生成后运行API验证,确保生成的文档与源码一致。对构建输出运行链接检查器。

With Release Processes

与发布流程集成

Add staleness scoring to release checklists. Block releases if documentation score falls below threshold. Generate drift reports as release artifacts.
将陈旧度评分添加到发布检查清单中。如果文档分数低于阈值,则阻止发布。将漂移报告作为发布工件生成。

With Other Skills

与其他技能集成

  • code-reviewer -- include doc drift in PR review reports
  • senior-devops -- integrate into deployment pipelines
  • senior-qa -- documentation quality as part of QA checklist

  • code-reviewer——在PR评审报告中包含文档漂移信息
  • senior-devops——集成到部署流水线中
  • senior-qa——将文档质量作为QA检查清单的一部分

Reference Guides

参考指南

GuideDescription
Documentation StandardsREADME structure, API docs, changelogs, ADRs, docs-as-code
Drift Prevention GuideCoupling strategies, CI gates, review checklists, prevention patterns

指南描述
Documentation StandardsREADME结构、API文档、变更日志、ADR、文档即代码规范
Drift Prevention Guide耦合策略、CI门控、评审检查清单、预防模式

Assets

资源

AssetDescription
Drift Report TemplateTemplate for drift analysis reports
Sample Drift DataSample JSON for testing and demonstration

资源描述
Drift Report Template漂移分析报告模板
Sample Drift Data用于测试和演示的示例JSON数据

Anti-Patterns

反模式

  • Ignoring drift until release -- run drift analysis in CI on every PR, not as a release-day scramble
  • Treating all drift as equal -- factual drift (wrong function signatures) is critical; temporal drift (stale dates) is cosmetic; prioritize by category
  • Manual-only doc updates -- use
    [AUTO]
    fixes for version strings and broken links; reserve human effort for semantic and architectural drift
  • Shallow clone in CI --
    fetch-depth: 1
    breaks git history comparison; always use
    fetch-depth: 0
    for drift analysis
  • Skipping link checks on internal docs -- cross-document anchor references break silently on refactors; run
    link_checker.py
    on every markdown change

  • 直到发布才处理漂移——在CI中对每个PR运行漂移分析,不要等到发布日才仓促处理
  • 将所有漂移视为同等严重——事实性漂移(错误的函数签名)是关键问题;时效性漂移(陈旧日期)是 cosmetic问题;按类别区分优先级
  • 仅手动更新文档——使用
    [AUTO]
    修复版本字符串和损坏的链接;将人力投入到语义和架构漂移上
  • CI中使用浅克隆——
    fetch-depth: 1
    会破坏git历史对比;漂移分析始终使用
    fetch-depth: 0
  • 跳过内部文档的链接检查——重构时跨文档锚点引用会无声损坏;每次Markdown变更都运行
    link_checker.py

Troubleshooting

故障排除

ProblemCauseSolution
drift_analyzer.py
reports zero docs found
Repository has non-standard doc extensions or docs are in ignored directories (e.g.,
node_modules
,
dist
)
Use
--doc-patterns "*.md,*.rst,*.txt"
to explicitly specify extensions
Staleness scores are unexpectedly lowDocs reference files that were reorganized or moved to new directoriesRun
link_checker.py
first to identify broken references, fix them, then re-score
API validator finds no source signaturesSource path points to a non-Python directory or all functions are
_
-prefixed private
Verify
source_path
contains
.py
files; add
--include-private
if the API surface uses private names
Link checker flags valid anchors as brokenHeading text contains special characters, inline code, or emoji that alter the slugCompare the expected slug (lowercase, special chars stripped, spaces to hyphens) against the actual heading text
Git history comparison shows no changesShallow clone lacks full commit history (common in CI)Clone with
fetch-depth: 0
or pass
--scope
to narrow the analysis window
External URL checks hang or time outTarget servers are slow or block automated HEAD requestsOmit
--check-external
for local-only validation, or run external checks in a separate non-blocking job
Drift report marks everything as
[MANUAL]
Most detected drift is semantic or architectural, not auto-fixableThis is expected for large refactors; focus on
[AUTO]
and
[SEMI]
items first, then triage
[MANUAL]
items by severity

问题原因解决方案
drift_analyzer.py
报告未发现任何文档
仓库使用非标准文档扩展名,或文档位于被忽略的目录中(如
node_modules
dist
使用
--doc-patterns "*.md,*.rst,*.txt"
显式指定扩展名
陈旧度分数意外偏低文档引用了已重组或移动到新目录的文件先运行
link_checker.py
识别损坏的引用,修复后重新评分
API验证器未找到源码签名源码路径指向非Python目录,或所有函数均为
_
前缀的私有函数
验证
source_path
包含
.py
文件;如果API表面使用私有名称,添加
--include-private
链接检查器将有效锚点标记为损坏标题文本包含特殊字符、行内代码或表情符号,导致slug变更将预期slug(小写、移除特殊字符、空格替换为连字符)与实际标题文本进行对比
Git历史对比未显示任何变更浅克隆缺少完整提交历史(CI中常见)使用
fetch-depth: 0
克隆,或通过
--scope
缩小分析范围
外部URL检查挂起或超时目标服务器缓慢或阻止自动化HEAD请求省略
--check-external
仅进行本地验证,或在单独的非阻塞任务中运行外部检查
漂移报告将所有问题标记为
[MANUAL]
检测到的大多数漂移是语义或架构性的,无法自动修复这在大型重构中是正常的;优先处理
[AUTO]
[SEMI]
项,然后按严重性分类处理
[MANUAL]

Success Criteria

成功标准

  • Zero stale docs older than 90 days -- every documentation file has been updated within the last 90 days relative to its associated code changes
  • Aggregate staleness score above 80/100 -- the repository-wide freshness score stays in the "Good" or "Excellent" range
  • Link integrity above 99% -- fewer than 1% of internal links (file references, anchors, cross-document links) are broken
  • API doc coverage above 95% -- at least 95% of public functions and classes have corresponding entries in API documentation
  • Zero high-severity drift issues in CI -- pull requests with high or critical drift are blocked before merge
  • Version string accuracy at 100% -- every version reference in documentation matches the current release tag or package manifest
  • Drift report turnaround under 60 seconds -- full drift analysis completes in under one minute for repositories with up to 500 documentation files

  • 无超过90天未更新的陈旧文档——每个文档文件均在其关联代码变更后的90天内更新过
  • 整体陈旧度分数高于80/100——仓库级新鲜度分数保持在“良好”或“优秀”区间
  • 链接完整性高于99%——少于1%的内部链接(文件引用、锚点、跨文档链接)损坏
  • API文档覆盖率高于95%——至少95%的公共函数和类在API文档中有对应条目
  • CI中无高严重性漂移问题——存在高或严重漂移的PR在合并前被阻止
  • 版本字符串准确率100%——文档中的每个版本引用均与当前发布标签或包清单匹配
  • 漂移报告生成时间少于60秒——对于文档文件不超过500个的仓库,完整漂移分析可在1分钟内完成

Scope & Limitations

范围与限制

Covers:
  • Detection of documentation drift against git history for any git repository
  • AST-based validation of Python API documentation (function signatures, class definitions, parameters, return types)
  • Internal link validation including local files, markdown anchors, cross-document anchors, images, and case-sensitivity checks
  • Multi-dimensional staleness scoring with configurable weights and CI/CD threshold enforcement
Does NOT cover:
  • Non-Python source code API validation -- the AST-based validator only parses Python; for TypeScript, Go, Rust, or Java APIs, use language-specific doc generators and pair with the link checker
  • External URL uptime monitoring --
    --check-external
    performs one-shot HEAD requests but does not provide continuous monitoring; use the senior-devops skill for uptime dashboards
  • Automatic documentation rewriting -- tools classify issues as
    [AUTO]
    ,
    [SEMI]
    , or
    [MANUAL]
    but do not generate replacement text; use the code-reviewer skill for AI-assisted doc suggestions
  • Content quality or readability assessment -- staleness scoring measures freshness and structural completeness, not prose quality; see the standards/communication library for writing guidelines

涵盖内容:
  • 检测任意git仓库中文档与git历史之间的漂移
  • 基于AST的Python API文档验证(函数签名、类定义、参数、返回类型)
  • 内部链接验证,包括本地文件、Markdown锚点、跨文档锚点、图片及大小写敏感性检查
  • 多维陈旧度评分,支持可配置权重和CI/CD阈值强制执行
不涵盖内容:
  • 非Python源码的API验证——基于AST的验证器仅解析Python;对于TypeScript、Go、Rust或Java API,使用语言特定的文档生成器并配合链接检查器
  • 外部URL可用性监控——
    --check-external
    执行一次性HEAD请求,但不提供持续监控;使用senior-devops技能获取可用性仪表板
  • 自动文档重写——工具将问题分类为
    [AUTO]
    [SEMI]
    [MANUAL]
    ,但不生成替换文本;使用code-reviewer技能获取AI辅助的文档建议
  • 内容质量或可读性评估——陈旧度评分衡量新鲜度和结构完整性,不评估 prose质量;查看standards/communication库获取写作指南

Integration Points

集成点

SkillIntegrationData Flow
code-reviewerInclude drift report in PR review comments
drift_analyzer.py --json
output feeds into review checklists as a documentation health section
senior-devopsAdd staleness gate to CI/CD pipelines
doc_staleness_scorer.py --threshold 50
returns exit code 1 on failure, blocking deploys
senior-qaDocumentation quality as part of QA acceptance
link_checker.py --json
output merges into QA dashboards alongside test coverage metrics
senior-fullstackValidate generated project docs post-scaffoldRun
api_doc_validator.py
against scaffolded
docs/
directory to confirm generated API docs match source
senior-secopsAudit security documentation currency
drift_analyzer.py --scope security/
detects when security docs fall behind policy changes
senior-architectArchitecture decision record (ADR) freshness
doc_staleness_scorer.py --required-sections "Status,Context,Decision,Consequences"
validates ADR completeness

技能集成方式数据流
code-reviewer在PR评审评论中包含漂移报告
drift_analyzer.py --json
输出作为文档健康部分纳入评审检查清单
senior-devops在CI/CD流水线中添加陈旧度门控
doc_staleness_scorer.py --threshold 50
失败时返回退出码1,阻止部署
senior-qa将文档质量作为QA验收的一部分
link_checker.py --json
输出与测试覆盖率指标合并到QA仪表板
senior-fullstack脚手架生成后验证项目文档对脚手架生成的
docs/
目录运行
api_doc_validator.py
,确认生成的API文档与源码匹配
senior-secops审计安全文档的时效性
drift_analyzer.py --scope security/
检测安全文档是否落后于策略变更
senior-architect架构决策记录(ADR)新鲜度验证
doc_staleness_scorer.py --required-sections "Status,Context,Decision,Consequences"
验证ADR完整性

Tool Reference

工具参考

drift_analyzer.py

drift_analyzer.py

Purpose: Scan a git repository for documentation that has fallen out of sync with code. Maps documentation files to their associated code directories, compares git modification dates, detects renamed files, version string drift, broken references, and structural gaps. Classifies every issue by category, severity, and fix type.
Usage:
bash
python scripts/drift_analyzer.py <repo_path> [options]
Parameters:
FlagTypeDefaultDescription
repo_path
positional(required)Path to the git repository to analyze
--json
flagoffOutput the full drift report as JSON
--min-severity
choice
low
Minimum severity to include in report. Choices:
critical
,
high
,
medium
,
low
,
info
--scope
string
""
(all)
Limit code analysis to a subdirectory (e.g.,
src/
)
--doc-patterns
string
*.md,*.rst,*.txt,*.adoc
Comma-separated file patterns for documentation discovery
Example:
bash
python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --json
Output Formats:
  • Human-readable (default): Grouped by severity with
    [AUTO]
    /
    [SEMI]
    /
    [MANUAL]
    fix-type tags, category labels, and a fix-type summary
  • JSON (
    --json
    ): Structured object with
    repository
    ,
    scan_date
    ,
    summary
    (counts by severity, category, fix type), and
    issues
    array
Exit Codes: 0 = no high/critical issues, 1 = high or critical issues found, 2 = tool error (invalid path, not a git repo)

用途: 扫描git仓库,查找与代码不同步的文档。将文档文件关联到对应的代码目录,对比git修改日期,检测文件重命名、版本字符串漂移、损坏的引用和结构缺口。按类别、严重性和修复类型对每个问题进行分类。
用法:
bash
python scripts/drift_analyzer.py <repo_path> [options]
参数:
标志类型默认值描述
repo_path
位置参数(必填)待分析的git仓库路径
--json
标志关闭以JSON格式输出完整漂移报告
--min-severity
选项
low
报告中包含的最低严重性。可选值:
critical
high
medium
low
info
--scope
字符串
""
(全部)
将代码分析限制到子目录(如
src/
--doc-patterns
字符串
*.md,*.rst,*.txt,*.adoc
用于发现文档的逗号分隔文件模式
示例:
bash
python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --json
输出格式:
  • 人类可读格式(默认):按严重性分组,包含
    [AUTO]
    /
    [SEMI]
    /
    [MANUAL]
    修复类型标签、类别标签和修复类型摘要
  • JSON格式
    --json
    ):结构化对象,包含
    repository
    scan_date
    summary
    (按严重性、类别、修复类型统计的数量)和
    issues
    数组
退出码: 0 = 无高/严重问题,1 = 检测到高或严重问题,2 = 工具错误(路径无效、非git仓库)

doc_staleness_scorer.py

doc_staleness_scorer.py

Purpose: Score documentation freshness on a weighted 0-100 scale across five dimensions: last updated, code-doc alignment, link health, completeness, and accuracy. Supports CI/CD threshold gates and README-focused analysis.
Usage:
bash
python scripts/doc_staleness_scorer.py <repo_path> [options]
Parameters:
FlagTypeDefaultDescription
repo_path
positional(required)Path to the git repository to score
--json
flagoffOutput the full scoring report as JSON
--threshold
float(none)Fail with exit code 1 if aggregate score falls below this value
--readme-focus
flagoffOnly score README files (filenames starting with
readme
)
--required-sections
string
Installation,Usage,API,Contributing,License
Comma-separated section names for completeness scoring
--quiet
flagoffOnly print the aggregate score number (no report)
--weight-updated
float
0.20
Weight for the "last updated" dimension
--weight-alignment
float
0.30
Weight for the "code-doc alignment" dimension
--weight-links
float
0.15
Weight for the "link health" dimension
--weight-completeness
float
0.20
Weight for the "completeness" dimension
--weight-accuracy
float
0.15
Weight for the "accuracy" dimension
Example:
bash
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quiet
Output Formats:
  • Human-readable (default): Aggregate score with label, per-file score table sorted worst-first, and dimension breakdown with ASCII bars for the bottom 5 files
  • JSON (
    --json
    ): Structured object with
    aggregate_score
    ,
    aggregate_label
    ,
    total_documents
    , and
    documents
    array (each with
    total_score
    ,
    label
    , and per-dimension scores/details)
  • Quiet (
    --quiet
    ): Single line with the aggregate score (e.g.,
    72.3
    )
Exit Codes: 0 = score above threshold (or no threshold set), 1 = score below threshold, 2 = tool error

用途: 基于五个维度(最后更新时间、代码-文档一致性、链接健康度、完整性、准确性)的加权评分,以0-100分制评估文档新鲜度。支持CI/CD阈值门控和聚焦README的分析。
用法:
bash
python scripts/doc_staleness_scorer.py <repo_path> [options]
参数:
标志类型默认值描述
repo_path
位置参数(必填)待评分的git仓库路径
--json
标志关闭以JSON格式输出完整评分报告
--threshold
浮点数(无)如果整体分数低于该值,返回退出码1
--readme-focus
标志关闭仅对README文件(以
readme
开头的文件名)评分
--required-sections
字符串
Installation,Usage,API,Contributing,License
用于完整性评分的逗号分隔章节名称
--quiet
标志关闭仅打印整体分数(无报告)
--weight-updated
浮点数
0.20
“最后更新时间”维度的权重
--weight-alignment
浮点数
0.30
“代码-文档一致性”维度的权重
--weight-links
浮点数
0.15
“链接健康度”维度的权重
--weight-completeness
浮点数
0.20
“完整性”维度的权重
--weight-accuracy
浮点数
0.15
“准确性”维度的权重
示例:
bash
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quiet
输出格式:
  • 人类可读格式(默认):整体分数及标签、按最差到最优排序的单文件分数表,以及底部5个文件的维度细分(带ASCII进度条)
  • JSON格式
    --json
    ):结构化对象,包含
    aggregate_score
    aggregate_label
    total_documents
    documents
    数组(每个元素包含
    total_score
    label
    和各维度的分数/详情)
  • 静默格式
    --quiet
    ):单行显示整体分数(如
    72.3
退出码: 0 = 分数高于阈值(或未设置阈值),1 = 分数低于阈值,2 = 工具错误

api_doc_validator.py

api_doc_validator.py

Purpose: Extract function and class signatures from Python source files using the
ast
module and compare them against API documentation in markdown files. Detects undocumented items, phantom documentation for removed code, parameter mismatches, and deprecated items.
Usage:
bash
python scripts/api_doc_validator.py <source_path> <doc_path> [options]
Parameters:
FlagTypeDefaultDescription
source_path
positional(required)Path to a Python source file or directory
doc_path
positional(required)Path to API documentation file (
.md
) or directory
--json
flagoffOutput the validation report as JSON
--recursive
flagoffRecursively scan the doc directory for markdown files
--include-private
flagoffInclude
_
-prefixed private functions and classes in validation
Example:
bash
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --json
Output Formats:
  • Human-readable (default): Summary counts (source signatures, documented items, issues), then issues grouped by severity with type tags, source/doc file locations, and a summary-by-type table
  • JSON (
    --json
    ): Structured object with
    summary
    (counts by type and severity) and
    issues
    array (each with
    type
    ,
    severity
    ,
    name
    , file/line references, and
    description
    )
Exit Codes: 0 = no high-severity issues, 1 = high-severity issues found (e.g., documented items missing from source), 2 = tool error

用途: 使用
ast
模块从Python源码文件中提取函数和类签名,并与Markdown格式的API文档进行对比。检测未文档化的项、已移除代码的幽灵文档、参数不匹配和已废弃项。
用法:
bash
python scripts/api_doc_validator.py <source_path> <doc_path> [options]
参数:
标志类型默认值描述
source_path
位置参数(必填)Python源码文件或目录路径
doc_path
位置参数(必填)API文档文件(
.md
)或目录路径
--json
标志关闭以JSON格式输出验证报告
--recursive
标志关闭递归扫描文档目录中的Markdown文件
--include-private
标志关闭在验证中包含
_
前缀的私有函数和类
示例:
bash
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --json
输出格式:
  • 人类可读格式(默认):摘要统计(源码签名数、文档化项数、问题数),然后按严重性分组的问题(带类型标签、源码/文档文件位置)和按类型统计的摘要表
  • JSON格式
    --json
    ):结构化对象,包含
    summary
    (按类型和严重性统计的数量)和
    issues
    数组(每个元素包含
    type
    severity
    name
    、文件/行引用和
    description
退出码: 0 = 无高严重性问题,1 = 检测到高严重性问题(如文档中存在但源码中缺失的项),2 = 工具错误

link_checker.py

link_checker.py

Purpose: Scan markdown files for every link type (local files, anchors, cross-document anchors, images, HTML links, reference-style links) and validate them against the filesystem and document headings. Optionally validates external URLs via HTTP HEAD requests. Also detects duplicate heading anchors.
Usage:
bash
python scripts/link_checker.py <path> [options]
Parameters:
FlagTypeDefaultDescription
path
positional(required)File or directory to check (single
.md
file or directory for recursive scan)
--json
flagoffOutput the link check report as JSON
--broken-only
flagoffOnly show broken links in the report (omit valid links from output)
--check-external
flagoffAlso validate external URLs via HTTP HEAD requests (slower, makes network requests)
Example:
bash
python scripts/link_checker.py /path/to/repo --broken-only --json
Output Formats:
  • Human-readable (default): Summary counts (total, valid, broken, skipped, duplicate anchors), broken links grouped by source file with line numbers and error messages, duplicate anchor list, and link-type breakdown table
  • JSON (
    --json
    ): Structured object with
    summary
    (counts),
    broken_links
    array (each with source file, line, text, target, type, error),
    duplicate_anchors
    map, and optionally
    all_links
    (when
    --broken-only
    is not set)
Exit Codes: 0 = no broken links and no duplicate anchors, 1 = broken links or duplicate anchors found, 2 = tool error

Last Updated: 2026-03-18 Version: 2.0.0 Tools: 4 Python CLI tools, 0 external dependencies Compatibility: Python 3.8+, any OS, any git repository
用途: 扫描Markdown文件中的所有链接类型(本地文件、锚点、跨文档锚点、图片、HTML链接、引用式链接),并根据文件系统和文档标题进行验证。可选通过HTTP HEAD请求验证外部URL。还会检测重复的标题锚点。
用法:
bash
python scripts/link_checker.py <path> [options]
参数:
标志类型默认值描述
path
位置参数(必填)待检查的文件或目录(单个
.md
文件或递归扫描的目录)
--json
标志关闭以JSON格式输出链接检查报告
--broken-only
标志关闭仅在报告中显示损坏的链接(输出中省略有效链接)
--check-external
标志关闭同时通过HTTP HEAD请求验证外部URL(速度较慢,会发起网络请求)
示例:
bash
python scripts/link_checker.py /path/to/repo --broken-only --json
输出格式:
  • 人类可读格式(默认):摘要统计(总数、有效数、损坏数、跳过数、重复锚点数),按源文件分组的损坏链接(带行号和错误信息),重复锚点列表,以及链接类型细分表
  • JSON格式
    --json
    ):结构化对象,包含
    summary
    (统计数)、
    broken_links
    数组(每个元素包含源文件、行号、文本、目标、类型、错误)、
    duplicate_anchors
    映射,可选包含
    all_links
    (未设置
    --broken-only
    时)
退出码: 0 = 无损坏链接和重复锚点,1 = 检测到损坏链接或重复锚点,2 = 工具错误

最后更新: 2026-03-18 版本: 2.0.0 工具: 4款Python CLI工具,0外部依赖 兼容性: Python 3.8+,任意操作系统,任意git仓库