doc-drift-detector

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Documentation Drift Detector

文档漂移检测器

The agent detects documentation drift by mapping code directories to their docs, comparing git modification histories, extracting Python function signatures via AST, validating every markdown link and anchor, and scoring freshness on a weighted 0-100 scale. All four CLI tools use the Python standard library only.

该工具通过将代码目录与对应文档关联、对比git修改历史、通过AST提取Python函数签名、验证所有Markdown链接和锚点，并采用加权0-100分制评估文档新鲜度，以此检测文档漂移。所有四款CLI工具仅依赖Python标准库。

Quick Start

快速开始

bash

undefined

bash

undefined

1. Run full drift analysis on a repository

1. 对仓库运行完整漂移分析

python scripts/drift_analyzer.py /path/to/repo

2. Score documentation freshness

2. 评估文档新鲜度分数

python scripts/doc_staleness_scorer.py /path/to/repo

3. Validate API docs against Python source

3. 验证API文档与Python源码的一致性

python scripts/api_doc_validator.py /path/to/repo/src /path/to/repo/docs/api.md

4. Check all markdown links

4. 检查所有Markdown链接

python scripts/link_checker.py /path/to/repo

JSON output for any tool

以JSON格式输出任意工具的结果

python scripts/drift_analyzer.py /path/to/repo --json

Set failure threshold for CI

为CI设置失败阈值

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60


All tools support `--help` for full usage details.

---

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60


所有工具均支持`--help`查看完整使用说明。

---

Core Workflows

核心工作流程

Workflow 1: Full Drift Analysis

工作流程1：完整漂移分析

Scan all documentation against code changes since each doc was last updated. This is the primary entry point for understanding the overall drift state of a repository.

bash

undefined

扫描所有文档，对比自文档上次更新以来的代码变更。这是了解仓库整体漂移状态的主要入口。

bash

undefined

Basic analysis

基础分析

python scripts/drift_analyzer.py /path/to/repo

Analyze with custom doc patterns

使用自定义文档模式进行分析

python scripts/drift_analyzer.py /path/to/repo --doc-patterns ".md,.rst,*.txt"

JSON output for tooling

输出JSON格式结果供工具调用

python scripts/drift_analyzer.py /path/to/repo --json

Only show high-severity drift

仅显示高严重性漂移

python scripts/drift_analyzer.py /path/to/repo --min-severity high

Analyze specific directory

分析指定目录

python scripts/drift_analyzer.py /path/to/repo --scope src/


**What it does:**

1. Discovers all documentation files in the repo
2. For each doc, identifies the code directories it describes (via path proximity and content references)
3. Compares the doc's last-modified date against the git history of its associated code
4. Identifies specific changes (renamed files, moved directories, changed function signatures)
5. Classifies each drift instance by category and severity
6. Generates an actionable report with specific file:line references

**Output example:**

Documentation Drift Report

Repository: /path/to/repo Scan date: 2026-03-18 Docs found: 12 Drifted: 5

HIGH SEVERITY: docs/api.md (last updated: 2026-01-15) - 23 code files changed since doc update - 4 functions renamed in src/handlers/ - 2 new modules undocumented Category: Factual + Structural Recommendation: Manual update required

MEDIUM SEVERITY: README.md (last updated: 2026-02-28) - Installation section references removed dependency - Version string outdated (says 1.8.0, current 2.0.0) Category: Factual + Temporal Recommendation: Auto-fixable (version), Manual (installation)

undefined

python scripts/drift_analyzer.py /path/to/repo --scope src/


**功能说明：**

1. 发现仓库中的所有文档文件
2. 为每个文档确定其描述的代码目录（通过路径关联性和内容引用）
3. 将文档的最后修改日期与其关联代码的git历史进行对比
4. 识别具体变更（文件重命名、目录移动、函数签名变更）
5. 按类别和严重性对每个漂移实例进行分类
6. 生成包含具体文件:行引用的可操作报告

**输出示例：**

文档漂移报告

仓库路径: /path/to/repo 扫描日期: 2026-03-18 发现文档数: 12 存在漂移的文档数: 5

高严重性: docs/api.md (最后更新: 2026-01-15) - 文档更新后有23个代码文件发生变更 - src/handlers/中有4个函数被重命名 - 2个新模块未被文档化类别: 事实性 + 结构性建议: 需要手动更新

中严重性: README.md (最后更新: 2026-02-28) - 安装章节引用了已移除的依赖 - 版本字符串过时（文档显示1.8.0，当前版本为2.0.0）类别: 事实性 + 时效性建议: 版本可自动修复，安装部分需手动更新

undefined

Workflow 2: API Documentation Validation

工作流程2：API文档验证

Check that API documentation accurately reflects the actual function signatures, class definitions, and module structure in your Python source code.

bash

undefined

检查API文档是否准确反映Python源码中的实际函数签名、类定义和模块结构。

bash

undefined

Validate API docs against source

验证API文档与源码的一致性

python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md

Scan entire docs directory

扫描整个文档目录

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive

JSON output

输出JSON格式结果

python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md --json

Include private methods in validation

在验证中包含私有方法

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private


**What it detects:**

- Functions/classes present in code but missing from docs
- Functions/classes documented but no longer in code (removed or renamed)
- Parameter mismatches (missing params, wrong types, wrong defaults)
- Deprecated items still documented as current
- Return type mismatches
- Module-level docstring drift

**How it works:**

The tool uses Python's `ast` module to parse source files and extract function signatures, class definitions, decorators, and docstrings. It then parses the markdown documentation looking for function/class references, parameter lists, and code blocks. Mismatches are reported with exact locations in both source and documentation.

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private


**检测内容：**

- 代码中存在但文档中缺失的函数/类
- 文档中存在但代码中已移除或重命名的函数/类
- 参数不匹配（缺失参数、类型错误、默认值错误）
- 已废弃但仍被作为当前内容文档化的项
- 返回类型不匹配
- 模块级文档字符串漂移

**工作原理：**

该工具使用Python的`ast`模块解析源码文件，提取函数签名、类定义、装饰器和文档字符串。然后解析Markdown文档，查找函数/类引用、参数列表和代码块。报告中会指出源码和文档中不匹配项的精确位置。

Workflow 3: README Health Check

工作流程3：README健康检查

Validate README sections against the actual project state. This combines drift analysis, link checking, and completeness scoring into a single README-focused report.

bash

undefined

验证README章节与项目实际状态的一致性。该流程将漂移分析、链接检查和完整性评估整合为一份聚焦README的报告。

bash

undefined

Check README health

检查README健康状态

python scripts/doc_staleness_scorer.py /path/to/repo --readme-focus

Check with custom sections

使用自定义章节进行检查

python scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"


**Validates:**

- Required sections are present (Installation, Usage, API Reference, Contributing, License)
- Version strings match package version (package.json, setup.py, pyproject.toml)
- File references in README actually exist
- Badge URLs are well-formed
- Code examples reference existing files/functions
- Table of contents matches actual headings

python scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"


**验证内容：**

- 是否存在必填章节（安装、使用、API参考、贡献指南、许可证）
- 版本字符串是否与包版本（package.json、setup.py、pyproject.toml）匹配
- README中引用的文件是否实际存在
- 徽章URL格式是否正确
- 代码示例是否引用了现有文件/函数
- 目录是否与实际标题匹配

Workflow 4: Link Integrity Audit

工作流程4：链接完整性审计

Check every link in every markdown file -- local file references, anchors, cross-document links, and optionally external URLs.

bash

undefined

检查每个Markdown文件中的所有链接——本地文件引用、锚点、跨文档链接，可选检查外部URL。

bash

undefined

Check all markdown links

检查所有Markdown链接

python scripts/link_checker.py /path/to/repo

Include external URL checks (slower, makes HTTP requests)

包含外部URL检查（速度较慢，会发起HTTP请求）

python scripts/link_checker.py /path/to/repo --check-external

Check specific file

检查指定文件

python scripts/link_checker.py /path/to/repo/README.md

JSON output

输出JSON格式结果

python scripts/link_checker.py /path/to/repo --json

Only show broken links

仅显示损坏的链接

python scripts/link_checker.py /path/to/repo --broken-only


**What it checks:**

- Local file references (`[link](path/to/file.md)`) -- does the file exist?
- Anchor references (`[link](#section-name)`) -- does the heading exist?
- Cross-document anchors (`[link](other.md#section)`) -- does the file and heading exist?
- Relative path correctness (catches `../` errors)
- Case sensitivity issues (common on Linux but silent on macOS)
- Image references -- do referenced images exist?
- Duplicate anchors that would cause ambiguous links

python scripts/link_checker.py /path/to/repo --broken-only


**检查内容：**

- 本地文件引用（`[link](path/to/file.md)`）——文件是否存在？
- 锚点引用（`[link](#section-name)`）——标题是否存在？
- 跨文档锚点（`[link](other.md#section)`）——文件和标题是否存在？
- 相对路径正确性（捕获`../`错误）
- 大小写敏感性问题（在Linux上常见，在macOS上无提示）
- 图片引用——引用的图片是否存在？
- 会导致链接歧义的重复锚点

Workflow 5: Continuous Doc Monitoring

工作流程5：持续文档监控

Integrate documentation drift detection into your CI/CD pipeline for ongoing monitoring.

GitHub Actions example:

yaml

name: Documentation Drift Check
on:
  pull_request:
    branches: [main, dev]
  push:
    branches: [main]

jobs:
  doc-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for git log analysis

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Run drift analysis
        run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json

      - name: Check staleness score
        run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50

      - name: Validate API docs
        run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md

      - name: Check links
        run: python engineering/doc-drift-detector/scripts/link_checker.py .

      - name: Upload drift report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: drift-report
          path: drift-report.json

Pre-commit hook:

bash

#!/bin/bash

将文档漂移检测集成到CI/CD流水线中进行持续监控。

GitHub Actions示例：

yaml

name: Documentation Drift Check
on:
  pull_request:
    branches: [main, dev]
  push:
    branches: [main]

jobs:
  doc-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # 用于git日志分析的完整历史

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Run drift analysis
        run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json

      - name: Check staleness score
        run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50

      - name: Validate API docs
        run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md

      - name: Check links
        run: python engineering/doc-drift-detector/scripts/link_checker.py .

      - name: Upload drift report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: drift-report
          path: drift-report.json

提交前钩子示例：

bash

#!/bin/bash

.git/hooks/pre-commit

Fail commit if docs are severely stale

如果文档严重过时，则阻止提交

python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet if [ $? -ne 0 ]; then echo "Documentation is critically stale. Update docs before committing." exit 1 fi

---

python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet if [ $? -ne 0 ]; then echo "文档已严重过时，请在提交前更新文档。" exit 1 fi

---

Tools

工具列表

Tool	Purpose	Lines	Key Feature
`drift_analyzer.py`	Full drift analysis between code and docs	~550	Git history comparison with code-to-doc mapping
`doc_staleness_scorer.py`	Score documentation freshness 0-100	~450	Weighted multi-dimensional scoring
`api_doc_validator.py`	Validate API docs against Python source	~400	AST-based signature extraction and comparison
`link_checker.py`	Audit all markdown links and anchors	~400	Local file, anchor, and cross-document validation

All tools:

Python 3.8+ standard library only
Support
```
--json
```
for machine-readable output
Support
```
--help
```
for usage details
Use non-zero exit codes on failure (CI/CD compatible)
Work on any OS (Windows, macOS, Linux)

工具	用途	代码行数	核心特性
`drift_analyzer.py`	代码与文档间的完整漂移分析	~550	基于git历史对比的代码-文档关联映射
`doc_staleness_scorer.py`	0-100分制评估文档新鲜度	~450	加权多维评分机制
`api_doc_validator.py`	验证API文档与Python源码的一致性	~400	基于AST的签名提取与对比
`link_checker.py`	审计所有Markdown链接和锚点	~400	本地文件、锚点及跨文档验证

所有工具特性：

仅支持Python 3.8+标准库
支持
```
--json
```
输出机器可读结果
支持
```
--help
```
查看使用说明
检测到问题时返回非零退出码（兼容CI/CD）
支持所有操作系统（Windows、macOS、Linux）

Staleness Scoring

陈旧度评分

Documentation freshness is scored on a 0-100 scale where 100 = perfectly current. The score is a weighted combination of five dimensions:

Dimension	Weight	What It Measures
Last Updated	20%	How recently the doc file was modified relative to its associated code
Code-Doc Alignment	30%	Whether documented items (functions, classes, files) still exist and match
Link Health	15%	Percentage of links that resolve correctly
Completeness	20%	Whether expected sections are present and non-empty
Accuracy	15%	Whether version strings, file paths, and other verifiable facts are correct

Score interpretation:

Score	Label	Action
90-100	Excellent	No action needed
70-89	Good	Minor updates recommended
50-69	Stale	Updates needed before next release
30-49	Critical	Immediate attention required
0-29	Abandoned	Full rewrite likely needed

Customization:

bash

undefined

文档新鲜度采用0-100分制评估，其中100=完全同步。分数由五个维度加权计算得出：

维度	权重	衡量内容
最后更新时间	20%	文档文件相对于其关联代码的修改时效性
代码-文档一致性	30%	文档化的项（函数、类、文件）是否仍存在且匹配
链接健康度	15%	可正常解析的链接占比
完整性	20%	是否存在预期章节且内容非空
准确性	15%	版本字符串、文件路径及其他可验证事实是否正确

分数解读：

分数	标签	操作建议
90-100	优秀	无需操作
70-89	良好	建议进行小幅更新
50-69	陈旧	需在下一版本发布前完成更新
30-49	严重	需立即处理
0-29	废弃	可能需要完全重写

自定义配置：

bash

undefined

Override default weights

覆盖默认权重

python scripts/doc_staleness_scorer.py /path/to/repo
--weight-updated 0.25
--weight-alignment 0.25
--weight-links 0.15
--weight-completeness 0.20
--weight-accuracy 0.15

Set staleness thresholds

设置陈旧度阈值

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

---

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

---

Drift Categories

漂移类别

Every detected drift instance is classified into one or more categories:

每个检测到的漂移实例会被归类到一个或多个类别：

Structural Drift

结构性漂移

Missing or misorganized sections. A README lacks an Installation section. An API doc is missing an entire module. A CHANGELOG has no entries for the latest version.

Detection: Compare actual document headings against expected headings for that document type.

章节缺失或组织混乱。例如README缺少安装章节，API文档缺失整个模块，CHANGELOG没有最新版本的条目。

检测方式： 将文档实际标题与该类型文档的预期标题进行对比。

Factual Drift

事实性漂移

Incorrect information. A function signature in the docs has the wrong parameters. An installation command references a removed package. A configuration example uses deprecated options.

Detection: Cross-reference documented facts against code analysis (AST parsing, file existence, git tags).

信息错误。例如文档中的函数签名参数错误，安装命令引用了已移除的包，配置示例使用了已废弃的选项。

检测方式： 将文档中的事实与代码分析结果（AST解析、文件存在性、git标签）进行交叉验证。

Referential Drift

引用性漂移

Broken references. A link points to a file that was moved. An anchor references a heading that was renamed. An image path is wrong.

Detection: Link checker validates every reference against the filesystem and document structure.

引用损坏。例如链接指向已移动的文件，锚点引用了已重命名的标题，图片路径错误。

检测方式： 链接检查器验证每个引用与文件系统和文档结构的一致性。

Temporal Drift

时效性漂移

Outdated time-sensitive content. Version strings are old. "Last updated" dates are stale. "Coming soon" items that shipped months ago. Roadmap items past their target date.

Detection: Extract version strings and dates, compare against git tags, package manifests, and current date.

时间敏感内容过时。例如版本字符串陈旧，“最后更新”日期过时，“即将推出”的功能已发布数月，路线图项已过截止日期。

检测方式： 提取版本字符串和日期，与git标签、包清单及当前日期进行对比。

Semantic Drift

语义性漂移

Technically accurate but misleading. A description says "simple REST API" when the project now has GraphQL, gRPC, and WebSocket endpoints. The architecture overview omits a major new subsystem.

Detection: Compare document topic coverage against code directory structure and file counts. Flag when code complexity has grown significantly but documentation scope has not.

技术上准确但具有误导性。例如文档描述为“简单REST API”，但项目现在已包含GraphQL、gRPC和WebSocket端点；架构概述遗漏了重要的新子系统。

检测方式： 将文档主题覆盖范围与代码目录结构和文件数量进行对比。当代码复杂度显著增长但文档范围未扩大时标记为漂移。

Auto-Fix vs Manual-Fix Classification

自动修复与手动修复分类

Not all drift can be fixed programmatically. The tools classify each issue:

并非所有漂移都能通过程序自动修复。工具会将每个问题分类：

Auto-Fixable (safe to automate)

可自动修复（安全自动化）

Version string updates -- replace old version with current from package manifest
Date updates -- update "last modified" timestamps
Broken local links -- suggest correct path when file was moved (git log tracks renames)
Missing table of contents entries -- generate from actual headings
Removed file references -- flag for deletion or suggest replacement

版本字符串更新——用包清单中的当前版本替换旧版本
日期更新——更新“最后修改”时间戳
损坏的本地链接——当文件被移动时建议正确路径（git日志跟踪重命名）
缺失的目录条目——根据实际标题生成目录
已移除文件的引用——标记为待删除或建议替换

Manual-Fix Required (needs human judgment)

需要手动修复（需人工判断）

Architectural description changes -- requires understanding intent
API usage examples -- new examples need domain context
Migration guides -- require understanding of breaking changes
Getting started rewrites -- narrative flow needs human touch
Security documentation updates -- compliance implications require review

架构描述变更——需要理解设计意图
API使用示例——新示例需要领域上下文
迁移指南——需要理解破坏性变更
入门指南重写——叙述流程需要人工调整
安全文档更新——合规影响需要审核

Semi-Automated (template + human review)

半自动化（模板+人工审核）

New function documentation -- generate skeleton from AST, human fills description
Changelog entries -- generate from git commits, human edits for clarity
README section additions -- provide template, human adds content

The drift report marks each issue with

[AUTO]

[MANUAL]

, or

[SEMI]

tags.

新函数文档——从AST生成框架，人工补充描述
变更日志条目——从git提交生成内容，人工编辑以确保清晰
README章节添加——提供模板，人工填充内容

漂移报告会为每个问题标记

[AUTO]

、

[MANUAL]

或

[SEMI]

标签。

Integration Points

集成点

With CI/CD Pipelines

与CI/CD流水线集成

All tools return non-zero exit codes when issues are found:

Exit 0: No issues (or all within threshold)
Exit 1: Issues found exceeding threshold
Exit 2: Tool error (invalid arguments, missing files)

所有工具在检测到问题时返回非零退出码：

退出码0：无问题（或所有问题均在阈值内）
退出码1：检测到超出阈值的问题
退出码2：工具错误（参数无效、文件缺失）

With Code Review

与代码评审集成

Add drift analysis to PR checks. When a PR modifies code in

src/

, automatically check whether docs in

docs/

need updates. The drift analyzer can scope its analysis to only changed directories.

将漂移分析添加到PR检查中。当PR修改

src/

中的代码时，自动检查

docs/

中的文档是否需要更新。漂移分析器可将分析范围限定为已变更的目录。

With Documentation Generators

与文档生成器集成

Pair with tools like Sphinx, MkDocs, or mdBook. Run API validation after doc generation to ensure the generated docs match source. Run link checker on the built output.

与Sphinx、MkDocs或mdBook等工具配合使用。在文档生成后运行API验证，确保生成的文档与源码一致。对构建输出运行链接检查器。

With Release Processes

与发布流程集成

Add staleness scoring to release checklists. Block releases if documentation score falls below threshold. Generate drift reports as release artifacts.

将陈旧度评分添加到发布检查清单中。如果文档分数低于阈值，则阻止发布。将漂移报告作为发布工件生成。

With Other Skills

与其他技能集成

code-reviewer -- include doc drift in PR review reports
senior-devops -- integrate into deployment pipelines
senior-qa -- documentation quality as part of QA checklist

code-reviewer——在PR评审报告中包含文档漂移信息
senior-devops——集成到部署流水线中
senior-qa——将文档质量作为QA检查清单的一部分

Reference Guides

参考指南

Guide	Description
Documentation Standards	README structure, API docs, changelogs, ADRs, docs-as-code
Drift Prevention Guide	Coupling strategies, CI gates, review checklists, prevention patterns

指南	描述
Documentation Standards	README结构、API文档、变更日志、ADR、文档即代码规范
Drift Prevention Guide	耦合策略、CI门控、评审检查清单、预防模式

Assets

资源

Asset	Description
Drift Report Template	Template for drift analysis reports
Sample Drift Data	Sample JSON for testing and demonstration

资源	描述
Drift Report Template	漂移分析报告模板
Sample Drift Data	用于测试和演示的示例JSON数据

Anti-Patterns

反模式

Ignoring drift until release -- run drift analysis in CI on every PR, not as a release-day scramble
Treating all drift as equal -- factual drift (wrong function signatures) is critical; temporal drift (stale dates) is cosmetic; prioritize by category
Manual-only doc updates -- use
```
[AUTO]
```
fixes for version strings and broken links; reserve human effort for semantic and architectural drift
Shallow clone in CI --
```
fetch-depth: 1
```
breaks git history comparison; always use
```
fetch-depth: 0
```
for drift analysis
Skipping link checks on internal docs -- cross-document anchor references break silently on refactors; run
```
link_checker.py
```
on every markdown change

直到发布才处理漂移——在CI中对每个PR运行漂移分析，不要等到发布日才仓促处理
将所有漂移视为同等严重——事实性漂移（错误的函数签名）是关键问题；时效性漂移（陈旧日期）是 cosmetic问题；按类别区分优先级
仅手动更新文档——使用
```
[AUTO]
```
修复版本字符串和损坏的链接；将人力投入到语义和架构漂移上
CI中使用浅克隆——
```
fetch-depth: 1
```
会破坏git历史对比；漂移分析始终使用
```
fetch-depth: 0
```
跳过内部文档的链接检查——重构时跨文档锚点引用会无声损坏；每次Markdown变更都运行
```
link_checker.py
```

Troubleshooting

故障排除

Problem	Cause	Solution
`drift_analyzer.py` reports zero docs found	Repository has non-standard doc extensions or docs are in ignored directories (e.g., `node_modules` , `dist` )	Use `--doc-patterns ".md,.rst,*.txt"` to explicitly specify extensions
Staleness scores are unexpectedly low	Docs reference files that were reorganized or moved to new directories	Run `link_checker.py` first to identify broken references, fix them, then re-score
API validator finds no source signatures	Source path points to a non-Python directory or all functions are `_` -prefixed private	Verify `source_path` contains `.py` files; add `--include-private` if the API surface uses private names
Link checker flags valid anchors as broken	Heading text contains special characters, inline code, or emoji that alter the slug	Compare the expected slug (lowercase, special chars stripped, spaces to hyphens) against the actual heading text
Git history comparison shows no changes	Shallow clone lacks full commit history (common in CI)	Clone with `fetch-depth: 0` or pass `--scope` to narrow the analysis window
External URL checks hang or time out	Target servers are slow or block automated HEAD requests	Omit `--check-external` for local-only validation, or run external checks in a separate non-blocking job
Drift report marks everything as `[MANUAL]`	Most detected drift is semantic or architectural, not auto-fixable	This is expected for large refactors; focus on `[AUTO]` and `[SEMI]` items first, then triage `[MANUAL]` items by severity

问题	原因	解决方案
`drift_analyzer.py` 报告未发现任何文档	仓库使用非标准文档扩展名，或文档位于被忽略的目录中（如 `node_modules` 、 `dist` ）	使用 `--doc-patterns ".md,.rst,*.txt"` 显式指定扩展名
陈旧度分数意外偏低	文档引用了已重组或移动到新目录的文件	先运行 `link_checker.py` 识别损坏的引用，修复后重新评分
API验证器未找到源码签名	源码路径指向非Python目录，或所有函数均为 `_` 前缀的私有函数	验证 `source_path` 包含 `.py` 文件；如果API表面使用私有名称，添加 `--include-private`
链接检查器将有效锚点标记为损坏	标题文本包含特殊字符、行内代码或表情符号，导致slug变更	将预期slug（小写、移除特殊字符、空格替换为连字符）与实际标题文本进行对比
Git历史对比未显示任何变更	浅克隆缺少完整提交历史（CI中常见）	使用 `fetch-depth: 0` 克隆，或通过 `--scope` 缩小分析范围
外部URL检查挂起或超时	目标服务器缓慢或阻止自动化HEAD请求	省略 `--check-external` 仅进行本地验证，或在单独的非阻塞任务中运行外部检查
漂移报告将所有问题标记为 `[MANUAL]`	检测到的大多数漂移是语义或架构性的，无法自动修复	这在大型重构中是正常的；优先处理 `[AUTO]` 和 `[SEMI]` 项，然后按严重性分类处理 `[MANUAL]` 项

Success Criteria

成功标准

Zero stale docs older than 90 days -- every documentation file has been updated within the last 90 days relative to its associated code changes
Aggregate staleness score above 80/100 -- the repository-wide freshness score stays in the "Good" or "Excellent" range
Link integrity above 99% -- fewer than 1% of internal links (file references, anchors, cross-document links) are broken
API doc coverage above 95% -- at least 95% of public functions and classes have corresponding entries in API documentation
Zero high-severity drift issues in CI -- pull requests with high or critical drift are blocked before merge
Version string accuracy at 100% -- every version reference in documentation matches the current release tag or package manifest
Drift report turnaround under 60 seconds -- full drift analysis completes in under one minute for repositories with up to 500 documentation files

无超过90天未更新的陈旧文档——每个文档文件均在其关联代码变更后的90天内更新过
整体陈旧度分数高于80/100——仓库级新鲜度分数保持在“良好”或“优秀”区间
链接完整性高于99%——少于1%的内部链接（文件引用、锚点、跨文档链接）损坏
API文档覆盖率高于95%——至少95%的公共函数和类在API文档中有对应条目
CI中无高严重性漂移问题——存在高或严重漂移的PR在合并前被阻止
版本字符串准确率100%——文档中的每个版本引用均与当前发布标签或包清单匹配
漂移报告生成时间少于60秒——对于文档文件不超过500个的仓库，完整漂移分析可在1分钟内完成

Scope & Limitations

范围与限制

Covers:

Detection of documentation drift against git history for any git repository
AST-based validation of Python API documentation (function signatures, class definitions, parameters, return types)
Internal link validation including local files, markdown anchors, cross-document anchors, images, and case-sensitivity checks
Multi-dimensional staleness scoring with configurable weights and CI/CD threshold enforcement

Does NOT cover:

Non-Python source code API validation -- the AST-based validator only parses Python; for TypeScript, Go, Rust, or Java APIs, use language-specific doc generators and pair with the link checker
External URL uptime monitoring --
```
--check-external
```
performs one-shot HEAD requests but does not provide continuous monitoring; use the senior-devops skill for uptime dashboards
Automatic documentation rewriting -- tools classify issues as
```
[AUTO]
```
,
```
[SEMI]
```
, or
```
[MANUAL]
```
but do not generate replacement text; use the code-reviewer skill for AI-assisted doc suggestions
Content quality or readability assessment -- staleness scoring measures freshness and structural completeness, not prose quality; see the standards/communication library for writing guidelines

涵盖内容：

检测任意git仓库中文档与git历史之间的漂移
基于AST的Python API文档验证（函数签名、类定义、参数、返回类型）
内部链接验证，包括本地文件、Markdown锚点、跨文档锚点、图片及大小写敏感性检查
多维陈旧度评分，支持可配置权重和CI/CD阈值强制执行

不涵盖内容：

非Python源码的API验证——基于AST的验证器仅解析Python；对于TypeScript、Go、Rust或Java API，使用语言特定的文档生成器并配合链接检查器
外部URL可用性监控——
```
--check-external
```
执行一次性HEAD请求，但不提供持续监控；使用senior-devops技能获取可用性仪表板
自动文档重写——工具将问题分类为
```
[AUTO]
```
、
```
[SEMI]
```
或
```
[MANUAL]
```
，但不生成替换文本；使用code-reviewer技能获取AI辅助的文档建议
内容质量或可读性评估——陈旧度评分衡量新鲜度和结构完整性，不评估 prose质量；查看standards/communication库获取写作指南

Integration Points

集成点

Skill	Integration	Data Flow
code-reviewer	Include drift report in PR review comments	`drift_analyzer.py --json` output feeds into review checklists as a documentation health section
senior-devops	Add staleness gate to CI/CD pipelines	`doc_staleness_scorer.py --threshold 50` returns exit code 1 on failure, blocking deploys
senior-qa	Documentation quality as part of QA acceptance	`link_checker.py --json` output merges into QA dashboards alongside test coverage metrics
senior-fullstack	Validate generated project docs post-scaffold	Run `api_doc_validator.py` against scaffolded `docs/` directory to confirm generated API docs match source
senior-secops	Audit security documentation currency	`drift_analyzer.py --scope security/` detects when security docs fall behind policy changes
senior-architect	Architecture decision record (ADR) freshness	`doc_staleness_scorer.py --required-sections "Status,Context,Decision,Consequences"` validates ADR completeness

技能	集成方式	数据流
code-reviewer	在PR评审评论中包含漂移报告	`drift_analyzer.py --json` 输出作为文档健康部分纳入评审检查清单
senior-devops	在CI/CD流水线中添加陈旧度门控	`doc_staleness_scorer.py --threshold 50` 失败时返回退出码1，阻止部署
senior-qa	将文档质量作为QA验收的一部分	`link_checker.py --json` 输出与测试覆盖率指标合并到QA仪表板
senior-fullstack	脚手架生成后验证项目文档	对脚手架生成的 `docs/` 目录运行 `api_doc_validator.py` ，确认生成的API文档与源码匹配
senior-secops	审计安全文档的时效性	`drift_analyzer.py --scope security/` 检测安全文档是否落后于策略变更
senior-architect	架构决策记录（ADR）新鲜度验证	`doc_staleness_scorer.py --required-sections "Status,Context,Decision,Consequences"` 验证ADR完整性

Tool Reference

工具参考

drift_analyzer.py

Purpose: Scan a git repository for documentation that has fallen out of sync with code. Maps documentation files to their associated code directories, compares git modification dates, detects renamed files, version string drift, broken references, and structural gaps. Classifies every issue by category, severity, and fix type.

Usage:

bash

python scripts/drift_analyzer.py <repo_path> [options]

Parameters:

Flag	Type	Default	Description
`repo_path`	positional	(required)	Path to the git repository to analyze
`--json`	flag	off	Output the full drift report as JSON
`--min-severity`	choice	`low`	Minimum severity to include in report. Choices: `critical` , `high` , `medium` , `low` , `info`
`--scope`	string	`""` (all)	Limit code analysis to a subdirectory (e.g., `src/` )
`--doc-patterns`	string	`.md,.rst,.txt,.adoc`	Comma-separated file patterns for documentation discovery

Example:

bash

python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --json

Output Formats:

Human-readable (default): Grouped by severity with
```
[AUTO]
```
/
```
[SEMI]
```
/
```
[MANUAL]
```
fix-type tags, category labels, and a fix-type summary
JSON (
```
--json
```
): Structured object with
```
repository
```
,
```
scan_date
```
,
```
summary
```
(counts by severity, category, fix type), and
```
issues
```
array

Exit Codes: 0 = no high/critical issues, 1 = high or critical issues found, 2 = tool error (invalid path, not a git repo)

用途： 扫描git仓库，查找与代码不同步的文档。将文档文件关联到对应的代码目录，对比git修改日期，检测文件重命名、版本字符串漂移、损坏的引用和结构缺口。按类别、严重性和修复类型对每个问题进行分类。

用法：

bash

python scripts/drift_analyzer.py <repo_path> [options]

参数：

标志	类型	默认值	描述
`repo_path`	位置参数	(必填)	待分析的git仓库路径
`--json`	标志	关闭	以JSON格式输出完整漂移报告
`--min-severity`	选项	`low`	报告中包含的最低严重性。可选值： `critical` 、 `high` 、 `medium` 、 `low` 、 `info`
`--scope`	字符串	`""` （全部）	将代码分析限制到子目录（如 `src/` ）
`--doc-patterns`	字符串	`.md,.rst,.txt,.adoc`	用于发现文档的逗号分隔文件模式

示例：

bash

python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --json

输出格式：

人类可读格式（默认）：按严重性分组，包含
```
[AUTO]
```
/
```
[SEMI]
```
/
```
[MANUAL]
```
修复类型标签、类别标签和修复类型摘要
JSON格式（
```
--json
```
）：结构化对象，包含
```
repository
```
、
```
scan_date
```
、
```
summary
```
（按严重性、类别、修复类型统计的数量）和
```
issues
```
数组

退出码： 0 = 无高/严重问题，1 = 检测到高或严重问题，2 = 工具错误（路径无效、非git仓库）

doc_staleness_scorer.py

Purpose: Score documentation freshness on a weighted 0-100 scale across five dimensions: last updated, code-doc alignment, link health, completeness, and accuracy. Supports CI/CD threshold gates and README-focused analysis.

Usage:

bash

python scripts/doc_staleness_scorer.py <repo_path> [options]

Parameters:

Flag	Type	Default	Description
`repo_path`	positional	(required)	Path to the git repository to score
`--json`	flag	off	Output the full scoring report as JSON
`--threshold`	float	(none)	Fail with exit code 1 if aggregate score falls below this value
`--readme-focus`	flag	off	Only score README files (filenames starting with `readme` )
`--required-sections`	string	`Installation,Usage,API,Contributing,License`	Comma-separated section names for completeness scoring
`--quiet`	flag	off	Only print the aggregate score number (no report)
`--weight-updated`	float	`0.20`	Weight for the "last updated" dimension
`--weight-alignment`	float	`0.30`	Weight for the "code-doc alignment" dimension
`--weight-links`	float	`0.15`	Weight for the "link health" dimension
`--weight-completeness`	float	`0.20`	Weight for the "completeness" dimension
`--weight-accuracy`	float	`0.15`	Weight for the "accuracy" dimension

Example:

bash

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quiet

Output Formats:

Human-readable (default): Aggregate score with label, per-file score table sorted worst-first, and dimension breakdown with ASCII bars for the bottom 5 files
JSON (
```
--json
```
): Structured object with
```
aggregate_score
```
,
```
aggregate_label
```
,
```
total_documents
```
, and
```
documents
```
array (each with
```
total_score
```
,
```
label
```
, and per-dimension scores/details)
Quiet (
```
--quiet
```
): Single line with the aggregate score (e.g.,
```
72.3
```
)

Exit Codes: 0 = score above threshold (or no threshold set), 1 = score below threshold, 2 = tool error

用途： 基于五个维度（最后更新时间、代码-文档一致性、链接健康度、完整性、准确性）的加权评分，以0-100分制评估文档新鲜度。支持CI/CD阈值门控和聚焦README的分析。

用法：

bash

python scripts/doc_staleness_scorer.py <repo_path> [options]

参数：

标志	类型	默认值	描述
`repo_path`	位置参数	(必填)	待评分的git仓库路径
`--json`	标志	关闭	以JSON格式输出完整评分报告
`--threshold`	浮点数	(无)	如果整体分数低于该值，返回退出码1
`--readme-focus`	标志	关闭	仅对README文件（以 `readme` 开头的文件名）评分
`--required-sections`	字符串	`Installation,Usage,API,Contributing,License`	用于完整性评分的逗号分隔章节名称
`--quiet`	标志	关闭	仅打印整体分数（无报告）
`--weight-updated`	浮点数	`0.20`	“最后更新时间”维度的权重
`--weight-alignment`	浮点数	`0.30`	“代码-文档一致性”维度的权重
`--weight-links`	浮点数	`0.15`	“链接健康度”维度的权重
`--weight-completeness`	浮点数	`0.20`	“完整性”维度的权重
`--weight-accuracy`	浮点数	`0.15`	“准确性”维度的权重

示例：

bash

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quiet

输出格式：

人类可读格式（默认）：整体分数及标签、按最差到最优排序的单文件分数表，以及底部5个文件的维度细分（带ASCII进度条）
JSON格式（
```
--json
```
）：结构化对象，包含
```
aggregate_score
```
、
```
aggregate_label
```
、
```
total_documents
```
和
```
documents
```
数组（每个元素包含
```
total_score
```
、
```
label
```
和各维度的分数/详情）
静默格式（
```
--quiet
```
）：单行显示整体分数（如
```
72.3
```
）

退出码： 0 = 分数高于阈值（或未设置阈值），1 = 分数低于阈值，2 = 工具错误

api_doc_validator.py

Purpose: Extract function and class signatures from Python source files using the

ast

module and compare them against API documentation in markdown files. Detects undocumented items, phantom documentation for removed code, parameter mismatches, and deprecated items.

Usage:

bash

python scripts/api_doc_validator.py <source_path> <doc_path> [options]

Parameters:

Flag	Type	Default	Description
`source_path`	positional	(required)	Path to a Python source file or directory
`doc_path`	positional	(required)	Path to API documentation file ( `.md` ) or directory
`--json`	flag	off	Output the validation report as JSON
`--recursive`	flag	off	Recursively scan the doc directory for markdown files
`--include-private`	flag	off	Include `_` -prefixed private functions and classes in validation

Example:

bash

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --json

Output Formats:

Human-readable (default): Summary counts (source signatures, documented items, issues), then issues grouped by severity with type tags, source/doc file locations, and a summary-by-type table
JSON (
```
--json
```
): Structured object with
```
summary
```
(counts by type and severity) and
```
issues
```
array (each with
```
type
```
,
```
severity
```
,
```
name
```
, file/line references, and
```
description
```
)

Exit Codes: 0 = no high-severity issues, 1 = high-severity issues found (e.g., documented items missing from source), 2 = tool error

用途： 使用

ast

模块从Python源码文件中提取函数和类签名，并与Markdown格式的API文档进行对比。检测未文档化的项、已移除代码的幽灵文档、参数不匹配和已废弃项。

用法：

bash

python scripts/api_doc_validator.py <source_path> <doc_path> [options]

参数：

标志	类型	默认值	描述
`source_path`	位置参数	(必填)	Python源码文件或目录路径
`doc_path`	位置参数	(必填)	API文档文件（ `.md` ）或目录路径
`--json`	标志	关闭	以JSON格式输出验证报告
`--recursive`	标志	关闭	递归扫描文档目录中的Markdown文件
`--include-private`	标志	关闭	在验证中包含 `_` 前缀的私有函数和类

示例：

bash

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --json

输出格式：

人类可读格式（默认）：摘要统计（源码签名数、文档化项数、问题数），然后按严重性分组的问题（带类型标签、源码/文档文件位置）和按类型统计的摘要表
JSON格式（
```
--json
```
）：结构化对象，包含
```
summary
```
（按类型和严重性统计的数量）和
```
issues
```
数组（每个元素包含
```
type
```
、
```
severity
```
、
```
name
```
、文件/行引用和
```
description
```
）

退出码： 0 = 无高严重性问题，1 = 检测到高严重性问题（如文档中存在但源码中缺失的项），2 = 工具错误

link_checker.py

Purpose: Scan markdown files for every link type (local files, anchors, cross-document anchors, images, HTML links, reference-style links) and validate them against the filesystem and document headings. Optionally validates external URLs via HTTP HEAD requests. Also detects duplicate heading anchors.

Usage:

bash

python scripts/link_checker.py <path> [options]

Parameters:

Flag	Type	Default	Description
`path`	positional	(required)	File or directory to check (single `.md` file or directory for recursive scan)
`--json`	flag	off	Output the link check report as JSON
`--broken-only`	flag	off	Only show broken links in the report (omit valid links from output)
`--check-external`	flag	off	Also validate external URLs via HTTP HEAD requests (slower, makes network requests)

Example:

bash

python scripts/link_checker.py /path/to/repo --broken-only --json

Output Formats:

Human-readable (default): Summary counts (total, valid, broken, skipped, duplicate anchors), broken links grouped by source file with line numbers and error messages, duplicate anchor list, and link-type breakdown table
JSON (
```
--json
```
): Structured object with
```
summary
```
(counts),
```
broken_links
```
array (each with source file, line, text, target, type, error),
```
duplicate_anchors
```
map, and optionally
```
all_links
```
(when
```
--broken-only
```
is not set)

Exit Codes: 0 = no broken links and no duplicate anchors, 1 = broken links or duplicate anchors found, 2 = tool error

Last Updated: 2026-03-18 Version: 2.0.0 Tools: 4 Python CLI tools, 0 external dependencies Compatibility: Python 3.8+, any OS, any git repository

用途： 扫描Markdown文件中的所有链接类型（本地文件、锚点、跨文档锚点、图片、HTML链接、引用式链接），并根据文件系统和文档标题进行验证。可选通过HTTP HEAD请求验证外部URL。还会检测重复的标题锚点。

用法：

bash

python scripts/link_checker.py <path> [options]

参数：

标志	类型	默认值	描述
`path`	位置参数	(必填)	待检查的文件或目录（单个 `.md` 文件或递归扫描的目录）
`--json`	标志	关闭	以JSON格式输出链接检查报告
`--broken-only`	标志	关闭	仅在报告中显示损坏的链接（输出中省略有效链接）
`--check-external`	标志	关闭	同时通过HTTP HEAD请求验证外部URL（速度较慢，会发起网络请求）

示例：

bash

python scripts/link_checker.py /path/to/repo --broken-only --json

输出格式：

人类可读格式（默认）：摘要统计（总数、有效数、损坏数、跳过数、重复锚点数），按源文件分组的损坏链接（带行号和错误信息），重复锚点列表，以及链接类型细分表
JSON格式（
```
--json
```
）：结构化对象，包含
```
summary
```
（统计数）、
```
broken_links
```
数组（每个元素包含源文件、行号、文本、目标、类型、错误）、
```
duplicate_anchors
```
映射，可选包含
```
all_links
```
（未设置
```
--broken-only
```
时）

退出码： 0 = 无损坏链接和重复锚点，1 = 检测到损坏链接或重复锚点，2 = 工具错误

最后更新： 2026-03-18 版本： 2.0.0 工具： 4款Python CLI工具，0外部依赖 兼容性： Python 3.8+，任意操作系统，任意git仓库