Loading...
Loading...
Comprehensive documentation quality system combining automated validation with ToolUniverse-specific auditing. Detects outdated commands, circular navigation, inconsistent terminology, auto-generated file conflicts, broken links, and structural problems. Use when reviewing documentation, before releases, after refactoring, or when user asks to audit, optimize, or improve documentation quality.
npx skill4agent add mims-harvard/tooluniverse devtu-docs-qualityscripts/validate_documentation.py#!/usr/bin/env python3
"""Documentation validator for ToolUniverse"""
import re
import glob
from pathlib import Path
DOCS_ROOT = Path("docs")
# ToolUniverse-specific patterns
DEPRECATED_PATTERNS = [
(r"python -m tooluniverse\.server", "tooluniverse-server"),
(r"600\+?\s+tools", "1000+ tools"),
(r"750\+?\s+tools", "1000+ tools"),
]
def is_false_positive(match, content):
"""Smart context checking to avoid false positives"""
start = max(0, match.start() - 100)
end = min(len(content), match.end() + 100)
context = content[start:end].lower()
# Skip if discussing deprecation itself
if any(kw in context for kw in ['deprecated', 'old version', 'migration']):
return True
# Skip technical values (ports, dimensions, etc.)
if any(kw in context for kw in ['width', 'height', 'port', '":"']):
return True
return False
def validate_file(filepath):
"""Check one file for issues"""
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
issues = []
# Check deprecated patterns
for old_pattern, new_text in DEPRECATED_PATTERNS:
matches = re.finditer(old_pattern, content)
for match in matches:
if is_false_positive(match, content):
continue
line_num = content[:match.start()].count('\n') + 1
issues.append({
'file': filepath,
'line': line_num,
'severity': 'HIGH',
'found': match.group(),
'suggestion': new_text
})
return issues
# Scan all docs
all_issues = []
for doc_file in glob.glob(str(DOCS_ROOT / "**/*.md"), recursive=True):
all_issues.extend(validate_file(doc_file))
for doc_file in glob.glob(str(DOCS_ROOT / "**/*.rst"), recursive=True):
all_issues.extend(validate_file(doc_file))
# Report
if all_issues:
print(f"❌ Found {len(all_issues)} issues\n")
for issue in all_issues:
print(f"{issue['file']}:{issue['line']} [{issue['severity']}]")
print(f" Found: {issue['found']}")
print(f" Should be: {issue['suggestion']}\n")
exit(1)
else:
print("✅ Documentation validation passed")
exit(0)# Extract and test commands
grep -r "^\s*\$\s*" docs/ | while read line; do
cmd=$(echo "$line" | sed 's/.*\$ //' | cut -d' ' -f1)
if ! command -v "$cmd" &> /dev/null; then
echo "❌ Command not found: $cmd in $line"
fi
donedef check_rst_links(docs_root):
"""Validate :doc: references"""
pattern = r':doc:`([^`]+)`'
for rst_file in glob.glob(f"{docs_root}/**/*.rst", recursive=True):
with open(rst_file) as f:
content = f.read()
matches = re.finditer(pattern, content)
for match in matches:
ref = match.group(1)
# Check if target exists
possible = [f"{ref}.rst", f"{ref}.md", f"{ref}/index.rst"]
if not any(Path(docs_root, p).exists() for p in possible):
print(f"❌ Broken link in {rst_file}: {ref}")# Define standard terms
TERMINOLOGY = {
'api_endpoint': ['endpoint', 'url', 'route', 'path'],
'tool_count': ['tools', 'resources', 'integrations'],
}
def check_terminology(content):
"""Find inconsistent terminology"""
for standard, variations in TERMINOLOGY.items():
counts = {v: content.lower().count(v) for v in variations}
if len([c for c in counts.values() if c > 0]) > 2:
return f"Inconsistent terminology: {counts}"
return None# Find cross-references
grep -r ":doc:\`" docs/*.rst | grep -E "(quickstart|getting_started|installation)"docs/index.rstquickstart.rstgetting_started.rstdocs/faq.rstdocs/help/faq.rstdocs/installation.rstdocs/quickstart.rstdocs/getting_started.rstdocs/guide/building_ai_scientists/# Find MCP config duplication
rg "MCP.*configuration" docs/ -l | wc -l
rg "pip install tooluniverse" docs/ -l | wc -l# Find all tool count mentions
rg "[0-9]+\+?\s+(tools|resources|integrations)" docs/ --no-filename | sort -udocs/tools/*_tools.rstgenerate_config_index.pydocs/api/*.rstsphinx-apidoc.. AUTO-GENERATED - DO NOT EDIT MANUALLY
.. Generated by: docs/generate_config_index.py
.. Last updated: 2024-02-05
..
.. To modify, edit source files and regenerate.head -5 docs/tools/*_tools.rst | grep "AUTO-GENERATED"grep -A 20 "\[project.scripts\]" pyproject.tomltooluniverse-expert-feedbacktooluniverse-expert-feedback-webgenerate-mcp-toolsdocs/reference/cli_tools.rst# Find all env vars in code
rg "os\.getenv|os\.environ" src/tooluniverse/ -o | sort -u
rg "TOOLUNIVERSE_[A-Z_]+" src/tooluniverse/ -o | sort -uTOOLUNIVERSE_CACHE_*TOOLUNIVERSE_LOG_*TOOLUNIVERSE_LLM_**_API_KEYdocs/reference/environment_variables.rst.env.templatedocs/glossary.rst:term:.github/workflows/deploy-docs.yml- name: Regenerate tool documentation
run: |
cd docs
python generate_config_index.py
python generate_remote_tools_docs.py
python generate_tool_reference.pydocs/api/| Severity | Definition | Examples | Timeline |
|---|---|---|---|
| CRITICAL | Blocks release | Broken builds, dangerous instructions | Immediate |
| HIGH | Blocks users | Wrong commands, broken setup | Same day |
| MEDIUM | Causes confusion | Inconsistent terminology, unclear examples | Same week |
| LOW | Reduces quality | Long files, minor formatting | Future task |
# Documentation Quality Report
**Date**: [date]
**Scope**: Automated validation + ToolUniverse audit
## Executive Summary
- Files scanned: X
- Issues found: Y (Critical: A, High: B, Medium: C, Low: D)
## Critical Issues
1. **[Issue]** - Location: file:line
- Problem: [description]
- Fix: [action]
- Effort: [time]
## Automated Validation Results
- Deprecated commands: X instances
- Inconsistent counts: Y instances
- Broken links: Z instances
## ToolUniverse-Specific Findings
- Circular navigation: [yes/no]
- Tool count variations: [list]
- Missing CLI docs: [list]
- Auto-generated headers: X missing
## Recommendations
1. Immediate (today): [list]
2. This week: [list]
3. Next sprint: [list]
## Validation Command
Run `python scripts/validate_documentation.py` to verify fixes.github/workflows/validate-docs.ymlname: Validate Documentation
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run validation
run: python scripts/validate_documentation.py
- name: Check auto-generated headers
run: |
for f in docs/tools/*_tools.rst; do
if ! head -1 "$f" | grep -q "AUTO-GENERATED"; then
echo "Missing header: $f"
exit 1
fi
done| Issue | Detection | Fix |
|---|---|---|
| Deprecated command | | Replace with |
| Wrong tool count | | Change to "1000+ tools" |
| Circular nav | Manual trace | Remove back-references |
| Missing header | | Add AUTO-GENERATED header |
| Undocumented CLI | Check pyproject.toml | Add to cli_tools.rst |
| Missing env var | | Add to env vars reference |