doc-drift-detector
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDocumentation Drift Detector
文档漂移检测器
The agent detects documentation drift by mapping code directories to their docs, comparing git modification histories, extracting Python function signatures via AST, validating every markdown link and anchor, and scoring freshness on a weighted 0-100 scale. All four CLI tools use the Python standard library only.
该工具通过将代码目录与对应文档关联、对比git修改历史、通过AST提取Python函数签名、验证所有Markdown链接和锚点,并采用加权0-100分制评估文档新鲜度,以此检测文档漂移。所有四款CLI工具仅依赖Python标准库。
Quick Start
快速开始
bash
undefinedbash
undefined1. Run full drift analysis on a repository
1. 对仓库运行完整漂移分析
python scripts/drift_analyzer.py /path/to/repo
python scripts/drift_analyzer.py /path/to/repo
2. Score documentation freshness
2. 评估文档新鲜度分数
python scripts/doc_staleness_scorer.py /path/to/repo
python scripts/doc_staleness_scorer.py /path/to/repo
3. Validate API docs against Python source
3. 验证API文档与Python源码的一致性
python scripts/api_doc_validator.py /path/to/repo/src /path/to/repo/docs/api.md
python scripts/api_doc_validator.py /path/to/repo/src /path/to/repo/docs/api.md
4. Check all markdown links
4. 检查所有Markdown链接
python scripts/link_checker.py /path/to/repo
python scripts/link_checker.py /path/to/repo
JSON output for any tool
以JSON格式输出任意工具的结果
python scripts/drift_analyzer.py /path/to/repo --json
python scripts/drift_analyzer.py /path/to/repo --json
Set failure threshold for CI
为CI设置失败阈值
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60
All tools support `--help` for full usage details.
---python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60
所有工具均支持`--help`查看完整使用说明。
---Core Workflows
核心工作流程
Workflow 1: Full Drift Analysis
工作流程1:完整漂移分析
Scan all documentation against code changes since each doc was last updated. This is the primary entry point for understanding the overall drift state of a repository.
bash
undefined扫描所有文档,对比自文档上次更新以来的代码变更。这是了解仓库整体漂移状态的主要入口。
bash
undefinedBasic analysis
基础分析
python scripts/drift_analyzer.py /path/to/repo
python scripts/drift_analyzer.py /path/to/repo
Analyze with custom doc patterns
使用自定义文档模式进行分析
python scripts/drift_analyzer.py /path/to/repo --doc-patterns ".md,.rst,*.txt"
python scripts/drift_analyzer.py /path/to/repo --doc-patterns ".md,.rst,*.txt"
JSON output for tooling
输出JSON格式结果供工具调用
python scripts/drift_analyzer.py /path/to/repo --json
python scripts/drift_analyzer.py /path/to/repo --json
Only show high-severity drift
仅显示高严重性漂移
python scripts/drift_analyzer.py /path/to/repo --min-severity high
python scripts/drift_analyzer.py /path/to/repo --min-severity high
Analyze specific directory
分析指定目录
python scripts/drift_analyzer.py /path/to/repo --scope src/
**What it does:**
1. Discovers all documentation files in the repo
2. For each doc, identifies the code directories it describes (via path proximity and content references)
3. Compares the doc's last-modified date against the git history of its associated code
4. Identifies specific changes (renamed files, moved directories, changed function signatures)
5. Classifies each drift instance by category and severity
6. Generates an actionable report with specific file:line references
**Output example:**
Documentation Drift Report
Repository: /path/to/repo
Scan date: 2026-03-18
Docs found: 12
Drifted: 5
HIGH SEVERITY:
docs/api.md (last updated: 2026-01-15)
- 23 code files changed since doc update
- 4 functions renamed in src/handlers/
- 2 new modules undocumented
Category: Factual + Structural
Recommendation: Manual update required
MEDIUM SEVERITY:
README.md (last updated: 2026-02-28)
- Installation section references removed dependency
- Version string outdated (says 1.8.0, current 2.0.0)
Category: Factual + Temporal
Recommendation: Auto-fixable (version), Manual (installation)
undefinedpython scripts/drift_analyzer.py /path/to/repo --scope src/
**功能说明:**
1. 发现仓库中的所有文档文件
2. 为每个文档确定其描述的代码目录(通过路径关联性和内容引用)
3. 将文档的最后修改日期与其关联代码的git历史进行对比
4. 识别具体变更(文件重命名、目录移动、函数签名变更)
5. 按类别和严重性对每个漂移实例进行分类
6. 生成包含具体文件:行引用的可操作报告
**输出示例:**
文档漂移报告
仓库路径: /path/to/repo
扫描日期: 2026-03-18
发现文档数: 12
存在漂移的文档数: 5
高严重性:
docs/api.md (最后更新: 2026-01-15)
- 文档更新后有23个代码文件发生变更
- src/handlers/中有4个函数被重命名
- 2个新模块未被文档化
类别: 事实性 + 结构性
建议: 需要手动更新
中严重性:
README.md (最后更新: 2026-02-28)
- 安装章节引用了已移除的依赖
- 版本字符串过时(文档显示1.8.0,当前版本为2.0.0)
类别: 事实性 + 时效性
建议: 版本可自动修复,安装部分需手动更新
undefinedWorkflow 2: API Documentation Validation
工作流程2:API文档验证
Check that API documentation accurately reflects the actual function signatures, class definitions, and module structure in your Python source code.
bash
undefined检查API文档是否准确反映Python源码中的实际函数签名、类定义和模块结构。
bash
undefinedValidate API docs against source
验证API文档与源码的一致性
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md
Scan entire docs directory
扫描整个文档目录
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive
JSON output
输出JSON格式结果
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md --json
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md --json
Include private methods in validation
在验证中包含私有方法
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private
**What it detects:**
- Functions/classes present in code but missing from docs
- Functions/classes documented but no longer in code (removed or renamed)
- Parameter mismatches (missing params, wrong types, wrong defaults)
- Deprecated items still documented as current
- Return type mismatches
- Module-level docstring drift
**How it works:**
The tool uses Python's `ast` module to parse source files and extract function signatures, class definitions, decorators, and docstrings. It then parses the markdown documentation looking for function/class references, parameter lists, and code blocks. Mismatches are reported with exact locations in both source and documentation.python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private
**检测内容:**
- 代码中存在但文档中缺失的函数/类
- 文档中存在但代码中已移除或重命名的函数/类
- 参数不匹配(缺失参数、类型错误、默认值错误)
- 已废弃但仍被作为当前内容文档化的项
- 返回类型不匹配
- 模块级文档字符串漂移
**工作原理:**
该工具使用Python的`ast`模块解析源码文件,提取函数签名、类定义、装饰器和文档字符串。然后解析Markdown文档,查找函数/类引用、参数列表和代码块。报告中会指出源码和文档中不匹配项的精确位置。Workflow 3: README Health Check
工作流程3:README健康检查
Validate README sections against the actual project state. This combines drift analysis, link checking, and completeness scoring into a single README-focused report.
bash
undefined验证README章节与项目实际状态的一致性。该流程将漂移分析、链接检查和完整性评估整合为一份聚焦README的报告。
bash
undefinedCheck README health
检查README健康状态
python scripts/doc_staleness_scorer.py /path/to/repo --readme-focus
python scripts/doc_staleness_scorer.py /path/to/repo --readme-focus
Check with custom sections
使用自定义章节进行检查
python scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"
**Validates:**
- Required sections are present (Installation, Usage, API Reference, Contributing, License)
- Version strings match package version (package.json, setup.py, pyproject.toml)
- File references in README actually exist
- Badge URLs are well-formed
- Code examples reference existing files/functions
- Table of contents matches actual headingspython scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"
**验证内容:**
- 是否存在必填章节(安装、使用、API参考、贡献指南、许可证)
- 版本字符串是否与包版本(package.json、setup.py、pyproject.toml)匹配
- README中引用的文件是否实际存在
- 徽章URL格式是否正确
- 代码示例是否引用了现有文件/函数
- 目录是否与实际标题匹配Workflow 4: Link Integrity Audit
工作流程4:链接完整性审计
Check every link in every markdown file -- local file references, anchors, cross-document links, and optionally external URLs.
bash
undefined检查每个Markdown文件中的所有链接——本地文件引用、锚点、跨文档链接,可选检查外部URL。
bash
undefinedCheck all markdown links
检查所有Markdown链接
python scripts/link_checker.py /path/to/repo
python scripts/link_checker.py /path/to/repo
Include external URL checks (slower, makes HTTP requests)
包含外部URL检查(速度较慢,会发起HTTP请求)
python scripts/link_checker.py /path/to/repo --check-external
python scripts/link_checker.py /path/to/repo --check-external
Check specific file
检查指定文件
python scripts/link_checker.py /path/to/repo/README.md
python scripts/link_checker.py /path/to/repo/README.md
JSON output
输出JSON格式结果
python scripts/link_checker.py /path/to/repo --json
python scripts/link_checker.py /path/to/repo --json
Only show broken links
仅显示损坏的链接
python scripts/link_checker.py /path/to/repo --broken-only
**What it checks:**
- Local file references (`[link](path/to/file.md)`) -- does the file exist?
- Anchor references (`[link](#section-name)`) -- does the heading exist?
- Cross-document anchors (`[link](other.md#section)`) -- does the file and heading exist?
- Relative path correctness (catches `../` errors)
- Case sensitivity issues (common on Linux but silent on macOS)
- Image references -- do referenced images exist?
- Duplicate anchors that would cause ambiguous linkspython scripts/link_checker.py /path/to/repo --broken-only
**检查内容:**
- 本地文件引用(`[link](path/to/file.md)`)——文件是否存在?
- 锚点引用(`[link](#section-name)`)——标题是否存在?
- 跨文档锚点(`[link](other.md#section)`)——文件和标题是否存在?
- 相对路径正确性(捕获`../`错误)
- 大小写敏感性问题(在Linux上常见,在macOS上无提示)
- 图片引用——引用的图片是否存在?
- 会导致链接歧义的重复锚点Workflow 5: Continuous Doc Monitoring
工作流程5:持续文档监控
Integrate documentation drift detection into your CI/CD pipeline for ongoing monitoring.
GitHub Actions example:
yaml
name: Documentation Drift Check
on:
pull_request:
branches: [main, dev]
push:
branches: [main]
jobs:
doc-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for git log analysis
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Run drift analysis
run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json
- name: Check staleness score
run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50
- name: Validate API docs
run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md
- name: Check links
run: python engineering/doc-drift-detector/scripts/link_checker.py .
- name: Upload drift report
if: always()
uses: actions/upload-artifact@v4
with:
name: drift-report
path: drift-report.jsonPre-commit hook:
bash
#!/bin/bash将文档漂移检测集成到CI/CD流水线中进行持续监控。
GitHub Actions示例:
yaml
name: Documentation Drift Check
on:
pull_request:
branches: [main, dev]
push:
branches: [main]
jobs:
doc-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # 用于git日志分析的完整历史
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Run drift analysis
run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json
- name: Check staleness score
run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50
- name: Validate API docs
run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md
- name: Check links
run: python engineering/doc-drift-detector/scripts/link_checker.py .
- name: Upload drift report
if: always()
uses: actions/upload-artifact@v4
with:
name: drift-report
path: drift-report.json提交前钩子示例:
bash
#!/bin/bash.git/hooks/pre-commit
.git/hooks/pre-commit
Fail commit if docs are severely stale
如果文档严重过时,则阻止提交
python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet
if [ $? -ne 0 ]; then
echo "Documentation is critically stale. Update docs before committing."
exit 1
fi
---python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet
if [ $? -ne 0 ]; then
echo "文档已严重过时,请在提交前更新文档。"
exit 1
fi
---Tools
工具列表
| Tool | Purpose | Lines | Key Feature |
|---|---|---|---|
| Full drift analysis between code and docs | ~550 | Git history comparison with code-to-doc mapping |
| Score documentation freshness 0-100 | ~450 | Weighted multi-dimensional scoring |
| Validate API docs against Python source | ~400 | AST-based signature extraction and comparison |
| Audit all markdown links and anchors | ~400 | Local file, anchor, and cross-document validation |
All tools:
- Python 3.8+ standard library only
- Support for machine-readable output
--json - Support for usage details
--help - Use non-zero exit codes on failure (CI/CD compatible)
- Work on any OS (Windows, macOS, Linux)
| 工具 | 用途 | 代码行数 | 核心特性 |
|---|---|---|---|
| 代码与文档间的完整漂移分析 | ~550 | 基于git历史对比的代码-文档关联映射 |
| 0-100分制评估文档新鲜度 | ~450 | 加权多维评分机制 |
| 验证API文档与Python源码的一致性 | ~400 | 基于AST的签名提取与对比 |
| 审计所有Markdown链接和锚点 | ~400 | 本地文件、锚点及跨文档验证 |
所有工具特性:
- 仅支持Python 3.8+标准库
- 支持输出机器可读结果
--json - 支持查看使用说明
--help - 检测到问题时返回非零退出码(兼容CI/CD)
- 支持所有操作系统(Windows、macOS、Linux)
Staleness Scoring
陈旧度评分
Documentation freshness is scored on a 0-100 scale where 100 = perfectly current. The score is a weighted combination of five dimensions:
| Dimension | Weight | What It Measures |
|---|---|---|
| Last Updated | 20% | How recently the doc file was modified relative to its associated code |
| Code-Doc Alignment | 30% | Whether documented items (functions, classes, files) still exist and match |
| Link Health | 15% | Percentage of links that resolve correctly |
| Completeness | 20% | Whether expected sections are present and non-empty |
| Accuracy | 15% | Whether version strings, file paths, and other verifiable facts are correct |
Score interpretation:
| Score | Label | Action |
|---|---|---|
| 90-100 | Excellent | No action needed |
| 70-89 | Good | Minor updates recommended |
| 50-69 | Stale | Updates needed before next release |
| 30-49 | Critical | Immediate attention required |
| 0-29 | Abandoned | Full rewrite likely needed |
Customization:
bash
undefined文档新鲜度采用0-100分制评估,其中100=完全同步。分数由五个维度加权计算得出:
| 维度 | 权重 | 衡量内容 |
|---|---|---|
| 最后更新时间 | 20% | 文档文件相对于其关联代码的修改时效性 |
| 代码-文档一致性 | 30% | 文档化的项(函数、类、文件)是否仍存在且匹配 |
| 链接健康度 | 15% | 可正常解析的链接占比 |
| 完整性 | 20% | 是否存在预期章节且内容非空 |
| 准确性 | 15% | 版本字符串、文件路径及其他可验证事实是否正确 |
分数解读:
| 分数 | 标签 | 操作建议 |
|---|---|---|
| 90-100 | 优秀 | 无需操作 |
| 70-89 | 良好 | 建议进行小幅更新 |
| 50-69 | 陈旧 | 需在下一版本发布前完成更新 |
| 30-49 | 严重 | 需立即处理 |
| 0-29 | 废弃 | 可能需要完全重写 |
自定义配置:
bash
undefinedOverride default weights
覆盖默认权重
python scripts/doc_staleness_scorer.py /path/to/repo
--weight-updated 0.25
--weight-alignment 0.25
--weight-links 0.15
--weight-completeness 0.20
--weight-accuracy 0.15
--weight-updated 0.25
--weight-alignment 0.25
--weight-links 0.15
--weight-completeness 0.20
--weight-accuracy 0.15
python scripts/doc_staleness_scorer.py /path/to/repo
--weight-updated 0.25
--weight-alignment 0.25
--weight-links 0.15
--weight-completeness 0.20
--weight-accuracy 0.15
--weight-updated 0.25
--weight-alignment 0.25
--weight-links 0.15
--weight-completeness 0.20
--weight-accuracy 0.15
Set staleness thresholds
设置陈旧度阈值
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60
---python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60
---Drift Categories
漂移类别
Every detected drift instance is classified into one or more categories:
每个检测到的漂移实例会被归类到一个或多个类别:
Structural Drift
结构性漂移
Missing or misorganized sections. A README lacks an Installation section. An API doc is missing an entire module. A CHANGELOG has no entries for the latest version.
Detection: Compare actual document headings against expected headings for that document type.
章节缺失或组织混乱。例如README缺少安装章节,API文档缺失整个模块,CHANGELOG没有最新版本的条目。
检测方式: 将文档实际标题与该类型文档的预期标题进行对比。
Factual Drift
事实性漂移
Incorrect information. A function signature in the docs has the wrong parameters. An installation command references a removed package. A configuration example uses deprecated options.
Detection: Cross-reference documented facts against code analysis (AST parsing, file existence, git tags).
信息错误。例如文档中的函数签名参数错误,安装命令引用了已移除的包,配置示例使用了已废弃的选项。
检测方式: 将文档中的事实与代码分析结果(AST解析、文件存在性、git标签)进行交叉验证。
Referential Drift
引用性漂移
Broken references. A link points to a file that was moved. An anchor references a heading that was renamed. An image path is wrong.
Detection: Link checker validates every reference against the filesystem and document structure.
引用损坏。例如链接指向已移动的文件,锚点引用了已重命名的标题,图片路径错误。
检测方式: 链接检查器验证每个引用与文件系统和文档结构的一致性。
Temporal Drift
时效性漂移
Outdated time-sensitive content. Version strings are old. "Last updated" dates are stale. "Coming soon" items that shipped months ago. Roadmap items past their target date.
Detection: Extract version strings and dates, compare against git tags, package manifests, and current date.
时间敏感内容过时。例如版本字符串陈旧,“最后更新”日期过时,“即将推出”的功能已发布数月,路线图项已过截止日期。
检测方式: 提取版本字符串和日期,与git标签、包清单及当前日期进行对比。
Semantic Drift
语义性漂移
Technically accurate but misleading. A description says "simple REST API" when the project now has GraphQL, gRPC, and WebSocket endpoints. The architecture overview omits a major new subsystem.
Detection: Compare document topic coverage against code directory structure and file counts. Flag when code complexity has grown significantly but documentation scope has not.
技术上准确但具有误导性。例如文档描述为“简单REST API”,但项目现在已包含GraphQL、gRPC和WebSocket端点;架构概述遗漏了重要的新子系统。
检测方式: 将文档主题覆盖范围与代码目录结构和文件数量进行对比。当代码复杂度显著增长但文档范围未扩大时标记为漂移。
Auto-Fix vs Manual-Fix Classification
自动修复与手动修复分类
Not all drift can be fixed programmatically. The tools classify each issue:
并非所有漂移都能通过程序自动修复。工具会将每个问题分类:
Auto-Fixable (safe to automate)
可自动修复(安全自动化)
- Version string updates -- replace old version with current from package manifest
- Date updates -- update "last modified" timestamps
- Broken local links -- suggest correct path when file was moved (git log tracks renames)
- Missing table of contents entries -- generate from actual headings
- Removed file references -- flag for deletion or suggest replacement
- 版本字符串更新——用包清单中的当前版本替换旧版本
- 日期更新——更新“最后修改”时间戳
- 损坏的本地链接——当文件被移动时建议正确路径(git日志跟踪重命名)
- 缺失的目录条目——根据实际标题生成目录
- 已移除文件的引用——标记为待删除或建议替换
Manual-Fix Required (needs human judgment)
需要手动修复(需人工判断)
- Architectural description changes -- requires understanding intent
- API usage examples -- new examples need domain context
- Migration guides -- require understanding of breaking changes
- Getting started rewrites -- narrative flow needs human touch
- Security documentation updates -- compliance implications require review
- 架构描述变更——需要理解设计意图
- API使用示例——新示例需要领域上下文
- 迁移指南——需要理解破坏性变更
- 入门指南重写——叙述流程需要人工调整
- 安全文档更新——合规影响需要审核
Semi-Automated (template + human review)
半自动化(模板+人工审核)
- New function documentation -- generate skeleton from AST, human fills description
- Changelog entries -- generate from git commits, human edits for clarity
- README section additions -- provide template, human adds content
The drift report marks each issue with , , or tags.
[AUTO][MANUAL][SEMI]- 新函数文档——从AST生成框架,人工补充描述
- 变更日志条目——从git提交生成内容,人工编辑以确保清晰
- README章节添加——提供模板,人工填充内容
漂移报告会为每个问题标记、或标签。
[AUTO][MANUAL][SEMI]Integration Points
集成点
With CI/CD Pipelines
与CI/CD流水线集成
All tools return non-zero exit codes when issues are found:
- Exit 0: No issues (or all within threshold)
- Exit 1: Issues found exceeding threshold
- Exit 2: Tool error (invalid arguments, missing files)
所有工具在检测到问题时返回非零退出码:
- 退出码0:无问题(或所有问题均在阈值内)
- 退出码1:检测到超出阈值的问题
- 退出码2:工具错误(参数无效、文件缺失)
With Code Review
与代码评审集成
Add drift analysis to PR checks. When a PR modifies code in , automatically check whether docs in need updates. The drift analyzer can scope its analysis to only changed directories.
src/docs/将漂移分析添加到PR检查中。当PR修改中的代码时,自动检查中的文档是否需要更新。漂移分析器可将分析范围限定为已变更的目录。
src/docs/With Documentation Generators
与文档生成器集成
Pair with tools like Sphinx, MkDocs, or mdBook. Run API validation after doc generation to ensure the generated docs match source. Run link checker on the built output.
与Sphinx、MkDocs或mdBook等工具配合使用。在文档生成后运行API验证,确保生成的文档与源码一致。对构建输出运行链接检查器。
With Release Processes
与发布流程集成
Add staleness scoring to release checklists. Block releases if documentation score falls below threshold. Generate drift reports as release artifacts.
将陈旧度评分添加到发布检查清单中。如果文档分数低于阈值,则阻止发布。将漂移报告作为发布工件生成。
With Other Skills
与其他技能集成
- code-reviewer -- include doc drift in PR review reports
- senior-devops -- integrate into deployment pipelines
- senior-qa -- documentation quality as part of QA checklist
- code-reviewer——在PR评审报告中包含文档漂移信息
- senior-devops——集成到部署流水线中
- senior-qa——将文档质量作为QA检查清单的一部分
Reference Guides
参考指南
| Guide | Description |
|---|---|
| Documentation Standards | README structure, API docs, changelogs, ADRs, docs-as-code |
| Drift Prevention Guide | Coupling strategies, CI gates, review checklists, prevention patterns |
| 指南 | 描述 |
|---|---|
| Documentation Standards | README结构、API文档、变更日志、ADR、文档即代码规范 |
| Drift Prevention Guide | 耦合策略、CI门控、评审检查清单、预防模式 |
Assets
资源
| Asset | Description |
|---|---|
| Drift Report Template | Template for drift analysis reports |
| Sample Drift Data | Sample JSON for testing and demonstration |
| 资源 | 描述 |
|---|---|
| Drift Report Template | 漂移分析报告模板 |
| Sample Drift Data | 用于测试和演示的示例JSON数据 |
Anti-Patterns
反模式
- Ignoring drift until release -- run drift analysis in CI on every PR, not as a release-day scramble
- Treating all drift as equal -- factual drift (wrong function signatures) is critical; temporal drift (stale dates) is cosmetic; prioritize by category
- Manual-only doc updates -- use fixes for version strings and broken links; reserve human effort for semantic and architectural drift
[AUTO] - Shallow clone in CI -- breaks git history comparison; always use
fetch-depth: 1for drift analysisfetch-depth: 0 - Skipping link checks on internal docs -- cross-document anchor references break silently on refactors; run on every markdown change
link_checker.py
- 直到发布才处理漂移——在CI中对每个PR运行漂移分析,不要等到发布日才仓促处理
- 将所有漂移视为同等严重——事实性漂移(错误的函数签名)是关键问题;时效性漂移(陈旧日期)是 cosmetic问题;按类别区分优先级
- 仅手动更新文档——使用修复版本字符串和损坏的链接;将人力投入到语义和架构漂移上
[AUTO] - CI中使用浅克隆——会破坏git历史对比;漂移分析始终使用
fetch-depth: 1fetch-depth: 0 - 跳过内部文档的链接检查——重构时跨文档锚点引用会无声损坏;每次Markdown变更都运行
link_checker.py
Troubleshooting
故障排除
| Problem | Cause | Solution |
|---|---|---|
| Repository has non-standard doc extensions or docs are in ignored directories (e.g., | Use |
| Staleness scores are unexpectedly low | Docs reference files that were reorganized or moved to new directories | Run |
| API validator finds no source signatures | Source path points to a non-Python directory or all functions are | Verify |
| Link checker flags valid anchors as broken | Heading text contains special characters, inline code, or emoji that alter the slug | Compare the expected slug (lowercase, special chars stripped, spaces to hyphens) against the actual heading text |
| Git history comparison shows no changes | Shallow clone lacks full commit history (common in CI) | Clone with |
| External URL checks hang or time out | Target servers are slow or block automated HEAD requests | Omit |
Drift report marks everything as | Most detected drift is semantic or architectural, not auto-fixable | This is expected for large refactors; focus on |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 仓库使用非标准文档扩展名,或文档位于被忽略的目录中(如 | 使用 |
| 陈旧度分数意外偏低 | 文档引用了已重组或移动到新目录的文件 | 先运行 |
| API验证器未找到源码签名 | 源码路径指向非Python目录,或所有函数均为 | 验证 |
| 链接检查器将有效锚点标记为损坏 | 标题文本包含特殊字符、行内代码或表情符号,导致slug变更 | 将预期slug(小写、移除特殊字符、空格替换为连字符)与实际标题文本进行对比 |
| Git历史对比未显示任何变更 | 浅克隆缺少完整提交历史(CI中常见) | 使用 |
| 外部URL检查挂起或超时 | 目标服务器缓慢或阻止自动化HEAD请求 | 省略 |
漂移报告将所有问题标记为 | 检测到的大多数漂移是语义或架构性的,无法自动修复 | 这在大型重构中是正常的;优先处理 |
Success Criteria
成功标准
- Zero stale docs older than 90 days -- every documentation file has been updated within the last 90 days relative to its associated code changes
- Aggregate staleness score above 80/100 -- the repository-wide freshness score stays in the "Good" or "Excellent" range
- Link integrity above 99% -- fewer than 1% of internal links (file references, anchors, cross-document links) are broken
- API doc coverage above 95% -- at least 95% of public functions and classes have corresponding entries in API documentation
- Zero high-severity drift issues in CI -- pull requests with high or critical drift are blocked before merge
- Version string accuracy at 100% -- every version reference in documentation matches the current release tag or package manifest
- Drift report turnaround under 60 seconds -- full drift analysis completes in under one minute for repositories with up to 500 documentation files
- 无超过90天未更新的陈旧文档——每个文档文件均在其关联代码变更后的90天内更新过
- 整体陈旧度分数高于80/100——仓库级新鲜度分数保持在“良好”或“优秀”区间
- 链接完整性高于99%——少于1%的内部链接(文件引用、锚点、跨文档链接)损坏
- API文档覆盖率高于95%——至少95%的公共函数和类在API文档中有对应条目
- CI中无高严重性漂移问题——存在高或严重漂移的PR在合并前被阻止
- 版本字符串准确率100%——文档中的每个版本引用均与当前发布标签或包清单匹配
- 漂移报告生成时间少于60秒——对于文档文件不超过500个的仓库,完整漂移分析可在1分钟内完成
Scope & Limitations
范围与限制
Covers:
- Detection of documentation drift against git history for any git repository
- AST-based validation of Python API documentation (function signatures, class definitions, parameters, return types)
- Internal link validation including local files, markdown anchors, cross-document anchors, images, and case-sensitivity checks
- Multi-dimensional staleness scoring with configurable weights and CI/CD threshold enforcement
Does NOT cover:
- Non-Python source code API validation -- the AST-based validator only parses Python; for TypeScript, Go, Rust, or Java APIs, use language-specific doc generators and pair with the link checker
- External URL uptime monitoring -- performs one-shot HEAD requests but does not provide continuous monitoring; use the senior-devops skill for uptime dashboards
--check-external - Automatic documentation rewriting -- tools classify issues as ,
[AUTO], or[SEMI]but do not generate replacement text; use the code-reviewer skill for AI-assisted doc suggestions[MANUAL] - Content quality or readability assessment -- staleness scoring measures freshness and structural completeness, not prose quality; see the standards/communication library for writing guidelines
涵盖内容:
- 检测任意git仓库中文档与git历史之间的漂移
- 基于AST的Python API文档验证(函数签名、类定义、参数、返回类型)
- 内部链接验证,包括本地文件、Markdown锚点、跨文档锚点、图片及大小写敏感性检查
- 多维陈旧度评分,支持可配置权重和CI/CD阈值强制执行
不涵盖内容:
- 非Python源码的API验证——基于AST的验证器仅解析Python;对于TypeScript、Go、Rust或Java API,使用语言特定的文档生成器并配合链接检查器
- 外部URL可用性监控——执行一次性HEAD请求,但不提供持续监控;使用senior-devops技能获取可用性仪表板
--check-external - 自动文档重写——工具将问题分类为、
[AUTO]或[SEMI],但不生成替换文本;使用code-reviewer技能获取AI辅助的文档建议[MANUAL] - 内容质量或可读性评估——陈旧度评分衡量新鲜度和结构完整性,不评估 prose质量;查看standards/communication库获取写作指南
Integration Points
集成点
| Skill | Integration | Data Flow |
|---|---|---|
| code-reviewer | Include drift report in PR review comments | |
| senior-devops | Add staleness gate to CI/CD pipelines | |
| senior-qa | Documentation quality as part of QA acceptance | |
| senior-fullstack | Validate generated project docs post-scaffold | Run |
| senior-secops | Audit security documentation currency | |
| senior-architect | Architecture decision record (ADR) freshness | |
| 技能 | 集成方式 | 数据流 |
|---|---|---|
| code-reviewer | 在PR评审评论中包含漂移报告 | |
| senior-devops | 在CI/CD流水线中添加陈旧度门控 | |
| senior-qa | 将文档质量作为QA验收的一部分 | |
| senior-fullstack | 脚手架生成后验证项目文档 | 对脚手架生成的 |
| senior-secops | 审计安全文档的时效性 | |
| senior-architect | 架构决策记录(ADR)新鲜度验证 | |
Tool Reference
工具参考
drift_analyzer.py
drift_analyzer.py
Purpose: Scan a git repository for documentation that has fallen out of sync with code. Maps documentation files to their associated code directories, compares git modification dates, detects renamed files, version string drift, broken references, and structural gaps. Classifies every issue by category, severity, and fix type.
Usage:
bash
python scripts/drift_analyzer.py <repo_path> [options]Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional | (required) | Path to the git repository to analyze |
| flag | off | Output the full drift report as JSON |
| choice | | Minimum severity to include in report. Choices: |
| string | | Limit code analysis to a subdirectory (e.g., |
| string | | Comma-separated file patterns for documentation discovery |
Example:
bash
python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --jsonOutput Formats:
- Human-readable (default): Grouped by severity with /
[AUTO]/[SEMI]fix-type tags, category labels, and a fix-type summary[MANUAL] - JSON (): Structured object with
--json,repository,scan_date(counts by severity, category, fix type), andsummaryarrayissues
Exit Codes: 0 = no high/critical issues, 1 = high or critical issues found, 2 = tool error (invalid path, not a git repo)
用途: 扫描git仓库,查找与代码不同步的文档。将文档文件关联到对应的代码目录,对比git修改日期,检测文件重命名、版本字符串漂移、损坏的引用和结构缺口。按类别、严重性和修复类型对每个问题进行分类。
用法:
bash
python scripts/drift_analyzer.py <repo_path> [options]参数:
| 标志 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 位置参数 | (必填) | 待分析的git仓库路径 |
| 标志 | 关闭 | 以JSON格式输出完整漂移报告 |
| 选项 | | 报告中包含的最低严重性。可选值: |
| 字符串 | | 将代码分析限制到子目录(如 |
| 字符串 | | 用于发现文档的逗号分隔文件模式 |
示例:
bash
python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --json输出格式:
- 人类可读格式(默认):按严重性分组,包含/
[AUTO]/[SEMI]修复类型标签、类别标签和修复类型摘要[MANUAL] - JSON格式():结构化对象,包含
--json、repository、scan_date(按严重性、类别、修复类型统计的数量)和summary数组issues
退出码: 0 = 无高/严重问题,1 = 检测到高或严重问题,2 = 工具错误(路径无效、非git仓库)
doc_staleness_scorer.py
doc_staleness_scorer.py
Purpose: Score documentation freshness on a weighted 0-100 scale across five dimensions: last updated, code-doc alignment, link health, completeness, and accuracy. Supports CI/CD threshold gates and README-focused analysis.
Usage:
bash
python scripts/doc_staleness_scorer.py <repo_path> [options]Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional | (required) | Path to the git repository to score |
| flag | off | Output the full scoring report as JSON |
| float | (none) | Fail with exit code 1 if aggregate score falls below this value |
| flag | off | Only score README files (filenames starting with |
| string | | Comma-separated section names for completeness scoring |
| flag | off | Only print the aggregate score number (no report) |
| float | | Weight for the "last updated" dimension |
| float | | Weight for the "code-doc alignment" dimension |
| float | | Weight for the "link health" dimension |
| float | | Weight for the "completeness" dimension |
| float | | Weight for the "accuracy" dimension |
Example:
bash
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quietOutput Formats:
- Human-readable (default): Aggregate score with label, per-file score table sorted worst-first, and dimension breakdown with ASCII bars for the bottom 5 files
- JSON (): Structured object with
--json,aggregate_score,aggregate_label, andtotal_documentsarray (each withdocuments,total_score, and per-dimension scores/details)label - Quiet (): Single line with the aggregate score (e.g.,
--quiet)72.3
Exit Codes: 0 = score above threshold (or no threshold set), 1 = score below threshold, 2 = tool error
用途: 基于五个维度(最后更新时间、代码-文档一致性、链接健康度、完整性、准确性)的加权评分,以0-100分制评估文档新鲜度。支持CI/CD阈值门控和聚焦README的分析。
用法:
bash
python scripts/doc_staleness_scorer.py <repo_path> [options]参数:
| 标志 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 位置参数 | (必填) | 待评分的git仓库路径 |
| 标志 | 关闭 | 以JSON格式输出完整评分报告 |
| 浮点数 | (无) | 如果整体分数低于该值,返回退出码1 |
| 标志 | 关闭 | 仅对README文件(以 |
| 字符串 | | 用于完整性评分的逗号分隔章节名称 |
| 标志 | 关闭 | 仅打印整体分数(无报告) |
| 浮点数 | | “最后更新时间”维度的权重 |
| 浮点数 | | “代码-文档一致性”维度的权重 |
| 浮点数 | | “链接健康度”维度的权重 |
| 浮点数 | | “完整性”维度的权重 |
| 浮点数 | | “准确性”维度的权重 |
示例:
bash
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quiet输出格式:
- 人类可读格式(默认):整体分数及标签、按最差到最优排序的单文件分数表,以及底部5个文件的维度细分(带ASCII进度条)
- JSON格式():结构化对象,包含
--json、aggregate_score、aggregate_label和total_documents数组(每个元素包含documents、total_score和各维度的分数/详情)label - 静默格式():单行显示整体分数(如
--quiet)72.3
退出码: 0 = 分数高于阈值(或未设置阈值),1 = 分数低于阈值,2 = 工具错误
api_doc_validator.py
api_doc_validator.py
Purpose: Extract function and class signatures from Python source files using the module and compare them against API documentation in markdown files. Detects undocumented items, phantom documentation for removed code, parameter mismatches, and deprecated items.
astUsage:
bash
python scripts/api_doc_validator.py <source_path> <doc_path> [options]Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional | (required) | Path to a Python source file or directory |
| positional | (required) | Path to API documentation file ( |
| flag | off | Output the validation report as JSON |
| flag | off | Recursively scan the doc directory for markdown files |
| flag | off | Include |
Example:
bash
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --jsonOutput Formats:
- Human-readable (default): Summary counts (source signatures, documented items, issues), then issues grouped by severity with type tags, source/doc file locations, and a summary-by-type table
- JSON (): Structured object with
--json(counts by type and severity) andsummaryarray (each withissues,type,severity, file/line references, andname)description
Exit Codes: 0 = no high-severity issues, 1 = high-severity issues found (e.g., documented items missing from source), 2 = tool error
用途: 使用模块从Python源码文件中提取函数和类签名,并与Markdown格式的API文档进行对比。检测未文档化的项、已移除代码的幽灵文档、参数不匹配和已废弃项。
ast用法:
bash
python scripts/api_doc_validator.py <source_path> <doc_path> [options]参数:
| 标志 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 位置参数 | (必填) | Python源码文件或目录路径 |
| 位置参数 | (必填) | API文档文件( |
| 标志 | 关闭 | 以JSON格式输出验证报告 |
| 标志 | 关闭 | 递归扫描文档目录中的Markdown文件 |
| 标志 | 关闭 | 在验证中包含 |
示例:
bash
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --json输出格式:
- 人类可读格式(默认):摘要统计(源码签名数、文档化项数、问题数),然后按严重性分组的问题(带类型标签、源码/文档文件位置)和按类型统计的摘要表
- JSON格式():结构化对象,包含
--json(按类型和严重性统计的数量)和summary数组(每个元素包含issues、type、severity、文件/行引用和name)description
退出码: 0 = 无高严重性问题,1 = 检测到高严重性问题(如文档中存在但源码中缺失的项),2 = 工具错误
link_checker.py
link_checker.py
Purpose: Scan markdown files for every link type (local files, anchors, cross-document anchors, images, HTML links, reference-style links) and validate them against the filesystem and document headings. Optionally validates external URLs via HTTP HEAD requests. Also detects duplicate heading anchors.
Usage:
bash
python scripts/link_checker.py <path> [options]Parameters:
| Flag | Type | Default | Description |
|---|---|---|---|
| positional | (required) | File or directory to check (single |
| flag | off | Output the link check report as JSON |
| flag | off | Only show broken links in the report (omit valid links from output) |
| flag | off | Also validate external URLs via HTTP HEAD requests (slower, makes network requests) |
Example:
bash
python scripts/link_checker.py /path/to/repo --broken-only --jsonOutput Formats:
- Human-readable (default): Summary counts (total, valid, broken, skipped, duplicate anchors), broken links grouped by source file with line numbers and error messages, duplicate anchor list, and link-type breakdown table
- JSON (): Structured object with
--json(counts),summaryarray (each with source file, line, text, target, type, error),broken_linksmap, and optionallyduplicate_anchors(whenall_linksis not set)--broken-only
Exit Codes: 0 = no broken links and no duplicate anchors, 1 = broken links or duplicate anchors found, 2 = tool error
Last Updated: 2026-03-18
Version: 2.0.0
Tools: 4 Python CLI tools, 0 external dependencies
Compatibility: Python 3.8+, any OS, any git repository
用途: 扫描Markdown文件中的所有链接类型(本地文件、锚点、跨文档锚点、图片、HTML链接、引用式链接),并根据文件系统和文档标题进行验证。可选通过HTTP HEAD请求验证外部URL。还会检测重复的标题锚点。
用法:
bash
python scripts/link_checker.py <path> [options]参数:
| 标志 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 位置参数 | (必填) | 待检查的文件或目录(单个 |
| 标志 | 关闭 | 以JSON格式输出链接检查报告 |
| 标志 | 关闭 | 仅在报告中显示损坏的链接(输出中省略有效链接) |
| 标志 | 关闭 | 同时通过HTTP HEAD请求验证外部URL(速度较慢,会发起网络请求) |
示例:
bash
python scripts/link_checker.py /path/to/repo --broken-only --json输出格式:
- 人类可读格式(默认):摘要统计(总数、有效数、损坏数、跳过数、重复锚点数),按源文件分组的损坏链接(带行号和错误信息),重复锚点列表,以及链接类型细分表
- JSON格式():结构化对象,包含
--json(统计数)、summary数组(每个元素包含源文件、行号、文本、目标、类型、错误)、broken_links映射,可选包含duplicate_anchors(未设置all_links时)--broken-only
退出码: 0 = 无损坏链接和重复锚点,1 = 检测到损坏链接或重复锚点,2 = 工具错误
最后更新: 2026-03-18
版本: 2.0.0
工具: 4款Python CLI工具,0外部依赖
兼容性: Python 3.8+,任意操作系统,任意git仓库