Documentation Quality Assurance

Systematic documentation quality system combining automated validation scripts with ToolUniverse-specific structural audits.

When to Use

Pre-release documentation review
After major refactoring (commands, APIs, tool counts changed)
User reports confusing or outdated documentation
Circular navigation or structural problems suspected
Want to establish automated validation pipeline

Approach: Two-Phase Strategy

Phase A: Automated Validation (15-20 min)

Create validation scripts for systematic detection
Test commands, links, terminology consistency
Priority-based fixes (blockers → polish)

Phase B: ToolUniverse-Specific Audit (20-25 min)

Circular navigation checks
MCP configuration duplication
Tool count consistency
Auto-generated file conflicts

Phase A: Automated Validation

A1. Build Validation Script

Create

scripts/validate_documentation.py

python

#!/usr/bin/env python3
"""Documentation validator for ToolUniverse"""

import re
import glob
from pathlib import Path

DOCS_ROOT = Path("docs")

# ToolUniverse-specific patterns
DEPRECATED_PATTERNS = [
    (r"python -m tooluniverse\.server", "tooluniverse-server"),
    (r"600\+?\s+tools", "1000+ tools"),
    (r"750\+?\s+tools", "1000+ tools"),
]

def is_false_positive(match, content):
    """Smart context checking to avoid false positives"""
    start = max(0, match.start() - 100)
    end = min(len(content), match.end() + 100)
    context = content[start:end].lower()
    
    # Skip if discussing deprecation itself
    if any(kw in context for kw in ['deprecated', 'old version', 'migration']):
        return True
    
    # Skip technical values (ports, dimensions, etc.)
    if any(kw in context for kw in ['width', 'height', 'port', '":"']):
        return True
    
    return False

def validate_file(filepath):
    """Check one file for issues"""
    with open(filepath, 'r', encoding='utf-8') as f:
        content = f.read()
    
    issues = []
    
    # Check deprecated patterns
    for old_pattern, new_text in DEPRECATED_PATTERNS:
        matches = re.finditer(old_pattern, content)
        for match in matches:
            if is_false_positive(match, content):
                continue
            
            line_num = content[:match.start()].count('\n') + 1
            issues.append({
                'file': filepath,
                'line': line_num,
                'severity': 'HIGH',
                'found': match.group(),
                'suggestion': new_text
            })
    
    return issues

# Scan all docs
all_issues = []
for doc_file in glob.glob(str(DOCS_ROOT / "**/*.md"), recursive=True):
    all_issues.extend(validate_file(doc_file))

for doc_file in glob.glob(str(DOCS_ROOT / "**/*.rst"), recursive=True):
    all_issues.extend(validate_file(doc_file))

# Report
if all_issues:
    print(f"❌ Found {len(all_issues)} issues\n")
    for issue in all_issues:
        print(f"{issue['file']}:{issue['line']} [{issue['severity']}]")
        print(f"  Found: {issue['found']}")
        print(f"  Should be: {issue['suggestion']}\n")
    exit(1)
else:
    print("✅ Documentation validation passed")
    exit(0)

A2. Command Accuracy Check

Test that commands in docs actually work:

bash

# Extract and test commands
grep -r "^\s*\$\s*" docs/ | while read line; do
    cmd=$(echo "$line" | sed 's/.*\$ //' | cut -d' ' -f1)
    if ! command -v "$cmd" &> /dev/null; then
        echo "❌ Command not found: $cmd in $line"
    fi
done

A3. Link Integrity Check

For RST docs:

python

def check_rst_links(docs_root):
    """Validate :doc: references"""
    pattern = r':doc:`([^`]+)`'
    
    for rst_file in glob.glob(f"{docs_root}/**/*.rst", recursive=True):
        with open(rst_file) as f:
            content = f.read()
        
        matches = re.finditer(pattern, content)
        for match in matches:
            ref = match.group(1)
            
            # Check if target exists
            possible = [f"{ref}.rst", f"{ref}.md", f"{ref}/index.rst"]
            if not any(Path(docs_root, p).exists() for p in possible):
                print(f"❌ Broken link in {rst_file}: {ref}")

A4. Terminology Consistency

Track variations and standardize:

python

# Define standard terms
TERMINOLOGY = {
    'api_endpoint': ['endpoint', 'url', 'route', 'path'],
    'tool_count': ['tools', 'resources', 'integrations'],
}

def check_terminology(content):
    """Find inconsistent terminology"""
    for standard, variations in TERMINOLOGY.items():
        counts = {v: content.lower().count(v) for v in variations}
        if len([c for c in counts.values() if c > 0]) > 2:
            return f"Inconsistent terminology: {counts}"
    return None

Phase B: ToolUniverse-Specific Audit

B1. Circular Navigation Check

Issue: Documentation pages that reference each other in loops.

Check manually:

bash

# Find cross-references
grep -r ":doc:\`" docs/*.rst | grep -E "(quickstart|getting_started|installation)"

Checklist:

Is there a clear "Start Here" on
```
docs/index.rst
```
?
Does navigation follow linear path: index → quickstart → getting_started → guides?
No "you should have completed X first" statements that create dependency loops?

Common patterns to fix:

```
quickstart.rst
```
→ "See getting_started"
```
getting_started.rst
```
→ "Complete quickstart first"

B2. Duplicate Content Check

Common duplicates in ToolUniverse:

Multiple FAQs:
```
docs/faq.rst
```
and
```
docs/help/faq.rst
```

Getting started:

docs/installation.rst

docs/quickstart.rst

docs/getting_started.rst

MCP configuration: All files in
```
docs/guide/building_ai_scientists/
```

Detection:

bash

# Find MCP config duplication
rg "MCP.*configuration" docs/ -l | wc -l
rg "pip install tooluniverse" docs/ -l | wc -l

Action: Consolidate or clearly differentiate

B3. Tool Count Consistency

Standard: Use "1000+ tools" consistently.

Detection:

bash

# Find all tool count mentions
rg "[0-9]+\+?\s+(tools|resources|integrations)" docs/ --no-filename | sort -u

Check:

Are different numbers used (600, 750, 1195)?
Is "1000+ tools" used consistently?
Exact counts avoided in favor of "1000+"?

B4. Auto-Generated File Headers

Auto-generated directories:

docs/tools/*_tools.rst

(from

generate_config_index.py

)

```
docs/api/*.rst
```
(from
```
sphinx-apidoc
```
)

Required header:

rst

.. AUTO-GENERATED - DO NOT EDIT MANUALLY
.. Generated by: docs/generate_config_index.py
.. Last updated: 2024-02-05
.. 
.. To modify, edit source files and regenerate.

Check:

bash

head -5 docs/tools/*_tools.rst | grep "AUTO-GENERATED"

B5. CLI Tools Documentation

Check pyproject.toml for all CLIs:

bash

grep -A 20 "\[project.scripts\]" pyproject.toml

Common undocumented:

```
tooluniverse-expert-feedback
```
```
tooluniverse-expert-feedback-web
```
```
generate-mcp-tools
```

Action: Ensure all in

docs/reference/cli_tools.rst

B6. Environment Variables

Discovery:

bash

# Find all env vars in code
rg "os\.getenv|os\.environ" src/tooluniverse/ -o | sort -u
rg "TOOLUNIVERSE_[A-Z_]+" src/tooluniverse/ -o | sort -u

Categories to document:

Cache:
```
TOOLUNIVERSE_CACHE_*
```
Logging:
```
TOOLUNIVERSE_LOG_*
```
LLM:
```
TOOLUNIVERSE_LLM_*
```
API keys:
```
*_API_KEY
```

Check:

Does

docs/reference/environment_variables.rst

exist?

Are variables categorized?
Each has: default, description, example?
Is there
```
.env.template
```
at project root?

B7. ToolUniverse-Specific Jargon

Terms to define on first use:

Tool Specification
EFO ID
MCP, SMCP
Compact Mode
Tool Finder
AI Scientist

Check:

Is there
```
docs/glossary.rst
```
?
Terms defined inline with
```
:term:
```
references?
Glossary linked from main index?

B8. CI/CD Documentation Regeneration

Required in
.github/workflows/deploy-docs.yml
:

yaml

- name: Regenerate tool documentation
  run: |
    cd docs
    python generate_config_index.py
    python generate_remote_tools_docs.py
    python generate_tool_reference.py

Check:

CI/CD regenerates docs before build?
Regeneration happens BEFORE Sphinx build?
```
docs/api/
```
excluded from cache?

Priority Framework

Issue Severity

Severity	Definition	Examples	Timeline
CRITICAL	Blocks release	Broken builds, dangerous instructions	Immediate
HIGH	Blocks users	Wrong commands, broken setup	Same day
MEDIUM	Causes confusion	Inconsistent terminology, unclear examples	Same week
LOW	Reduces quality	Long files, minor formatting	Future task

Fix Order

Run automated validation → Fix HIGH issues
Check circular navigation → Fix CRITICAL loops
Verify tool counts → Standardize to "1000+"
Check auto-generated headers → Add missing
Validate CLI docs → Document all from pyproject.toml
Check env vars → Create reference page
Review jargon → Create/update glossary
Verify CI/CD → Add regeneration steps

Validation Checklist

Before considering docs "done":

Accuracy

Automated validation passes
All commands tested
Version numbers current
Counts match reality

Structure (ToolUniverse-specific)

No circular navigation
Clear "Start Here" entry point
Linear learning path
Max 2-3 level hierarchy

Consistency

"1000+ tools" everywhere
Same terminology throughout
Auto-generated files have headers
All CLIs documented

Completeness

All features documented
All CLIs in pyproject.toml covered
All env vars documented
Glossary includes all jargon

Output: Audit Report

markdown

# Documentation Quality Report

**Date**: [date]
**Scope**: Automated validation + ToolUniverse audit

## Executive Summary
- Files scanned: X
- Issues found: Y (Critical: A, High: B, Medium: C, Low: D)

## Critical Issues
1. **[Issue]** - Location: file:line
   - Problem: [description]
   - Fix: [action]
   - Effort: [time]

## Automated Validation Results
- Deprecated commands: X instances
- Inconsistent counts: Y instances
- Broken links: Z instances

## ToolUniverse-Specific Findings
- Circular navigation: [yes/no]
- Tool count variations: [list]
- Missing CLI docs: [list]
- Auto-generated headers: X missing

## Recommendations
1. Immediate (today): [list]
2. This week: [list]
3. Next sprint: [list]

## Validation Command
Run `python scripts/validate_documentation.py` to verify fixes

CI/CD Integration

Add to

.github/workflows/validate-docs.yml

yaml

name: Validate Documentation
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run validation
        run: python scripts/validate_documentation.py
      - name: Check auto-generated headers
        run: |
          for f in docs/tools/*_tools.rst; do
            if ! head -1 "$f" | grep -q "AUTO-GENERATED"; then
              echo "Missing header: $f"
              exit 1
            fi
          done

Common Issues Quick Reference

Issue	Detection	Fix
Deprecated command	`rg "old-cmd" docs/`	Replace with `new-cmd`
Wrong tool count	`rg "[0-9]+ tools" docs/`	Change to "1000+ tools"
Circular nav	Manual trace	Remove back-references
Missing header	`head -1 file.rst`	Add AUTO-GENERATED header
Undocumented CLI	Check pyproject.toml	Add to cli_tools.rst
Missing env var	`rg "os.getenv" src/`	Add to env vars reference

Best Practices

Automate first - Build validation before manual audit
Context matters - Smart pattern matching avoids false positives
Fix systematically - Batch similar issues together
Validate continuously - Add to CI/CD pipeline
ToolUniverse-specific last - Automated checks catch most issues

Success Criteria

Documentation quality achieved when:

✅ Automated validation reports 0 HIGH issues
✅ No circular navigation
✅ "1000+ tools" used consistently
✅ All auto-generated files have headers
✅ All CLIs from pyproject.toml documented
✅ All env vars have reference page
✅ Glossary covers all technical terms
✅ CI/CD validates on every PR

devtu-docs-quality

NPX Install

Tags

SKILL.md Content

Documentation Quality Assurance

When to Use

Approach: Two-Phase Strategy

Phase A: Automated Validation

A1. Build Validation Script

A2. Command Accuracy Check

A3. Link Integrity Check

A4. Terminology Consistency

Phase B: ToolUniverse-Specific Audit

B1. Circular Navigation Check

B2. Duplicate Content Check

B3. Tool Count Consistency

B4. Auto-Generated File Headers

B5. CLI Tools Documentation

B6. Environment Variables

B7. ToolUniverse-Specific Jargon

B8. CI/CD Documentation Regeneration

Priority Framework

Issue Severity

Fix Order

Validation Checklist

Accuracy

Structure (ToolUniverse-specific)

Consistency

Completeness

Output: Audit Report

CI/CD Integration

Common Issues Quick Reference

Best Practices

Success Criteria