Loading...
Loading...
Use when generating PDFs from markdown with Pandoc - covers differences from Python-Markdown, blank line rules, fix scripts for labels/anchors/metadata, and visual testing workflow
npx skill4agent add securityronin/ronin-marketplace pandoc-pdf-generation| Feature | Python-Markdown (MkDocs) | Pandoc (PDF) |
|---|---|---|
Roman numerals ( | ❌ Not supported | ✅ Supported |
| Grid tables | ⚠️ Needs extension | ✅ Native support |
LaTeX commands ( | ❌ Renders as text | ✅ Native support |
| Nested list indent | 4 spaces (strict) | More flexible |
| Footnotes continuation | 4-space indent required | More flexible |
# Generate both outputs
mkdocs build --clean
./scripts/generate-pdf.sh
# Check MkDocs HTML rendering
grep -A 5 "For complete details, see:" site/soc2-type1/index.html
# Should show: <ul><li>...</li></ul>
# Check Pandoc PDF rendering
pdftotext output/Documentation.pdf - | grep -A 5 "For complete details, see:"
# Should show: • Bullet point\pagebreakdocs/pdf-source/./scripts/generate-pdf.shopen output/Documentation.pdf##**Access Removal:**
- Item one
- Item twogit add output/Documentation.pdf
git commit -m "docs: regenerate PDF with [specific improvements]"# ✅ CORRECT - Header
## User Identification and Authentication
# ❌ WRONG - Plain text
User Identification and Authentication### ✅ CORRECT
**Access Removal:**
- Termination: Immediate revocation
- Role change: Adjusted within 5 days
# ❌ WRONG - No blank line
**Access Removal:**
- Termination: Immediate revocation-*[WARNING] Missing character: There is no ├ (U+251C) in font [lmmono10-regular]# In pandoc command
--pdf-engine=xelatex
--variable mainfont="DejaVu Sans"# Replace tree diagrams with ASCII
sed -i '' 's/├/+/g' file.md
sed -i '' 's/─/-/g' file.md\begin{landscape}
| Col 1 | Col 2 | Col 3 |
|-------|-------|-------|
| Data | Data | Data |
\end{landscape}\small
| Col 1 | Col 2 | Col 3 |
|-------|-------|-------|
| Data | Data | Data |
\normalsize\pagebreak
## Next Section--variable pagestyle=headings
--variable geometry:margin=1in\widowpenalty=10000
\clubpenalty=10000Technology Changes: - New system implementations - Software upgrades - Infrastructure modifications**Label:**# ❌ WRONG - No blank line
**Technology Changes:**
- New system implementations
- Software upgrades
# ✅ CORRECT - Blank line after label
**Technology Changes:**
- New system implementations
- Software upgrades# Find all bold labels immediately followed by lists
grep -n '^\*\*[^*]*:\*\*$' file.md | while read line; do
num=$(echo $line | cut -d: -f1)
next=$((num + 1))
nextline=$(sed -n "${next}p" file.md)
if [[ $nextline =~ ^[-*] ]]; then
echo "Line $num: Missing blank line after bold label"
fi
donefix_pandoc_lists.py###### Fraud Risk Assessment# ❌ WRONG - No blank line after anchor
<a name="fraud-risk"></a>
## Fraud Risk Assessment
# ✅ CORRECT - Blank line after anchor
<a name="fraud-risk"></a>
## Fraud Risk Assessment### Find anchors immediately followed by headers
grep -n '^<a name=' file.md | while read line; do
num=$(echo $line | cut -d: -f1)
next=$((num + 1))
nextline=$(sed -n "${next}p" file.md)
if [[ $nextline =~ ^## ]]; then
echo "Line $num: Missing blank line after anchor"
fi
donefix_pandoc_anchors.pyTitle: Report Name Author: Your Name Date: January 2025# ❌ WRONG - No blank lines between
**Organization:** Example Corp
**Audit Type:** SOC 2 Type 1
**Scope:** Security (CC1-CC9)
# ✅ CORRECT - Blank lines between each
**Organization:** Example Corp
**Audit Type:** SOC 2 Type 1
**Scope:** Security (CC1-CC9)# Find consecutive bold label lines
grep -n '^\*\*[^*]*:\*\* ' file.md | \
awk 'NR > 1 && $1 == prev+1 {print "Lines " prev "-" $1 ": Consecutive bold labels"} {prev=$1}'fix_pandoc_metadata.pyThe security program aligns with: - SOC 2 - ISO 27001 - NIST Framework# ❌ WRONG - No blank line
The security program aligns with:
- SOC 2 Trust Services Criteria
- ISO 27001 control framework
# ✅ CORRECT - Blank line after plain text label
The security program aligns with:
- SOC 2 Trust Services Criteria
- ISO 27001 control framework:fix_pandoc_lists.pypython3 fix_pandoc_lists.py**Label:**Text:Processing 03-risk-assessment.md...
Line 186: Added blank line after '**Technology Changes:**'
Line 265: Added blank line after 'The security program aligns with:'
✅ Fixed 03-risk-assessment.mdpython3 fix_pandoc_anchors.py<a name="..."></a>## HeaderProcessing 03-risk-assessment.md...
Line 141: Added blank line after '<a name="fraud-risk"></a>'
✅ Fixed 03-risk-assessment.mdpython3 fix_pandoc_metadata.py**Label:** valueProcessing index.md...
Line 3: Added blank line after '**Organization:** Example Corp'
Line 4: Added blank line after '**Audit Type:** SOC 2 Type 1'
✅ Fixed index.md# Fix all Pandoc formatting issues
python3 fix_pandoc_lists.py # Lists after labels
python3 fix_pandoc_anchors.py # Anchors before headers
python3 fix_pandoc_metadata.py # Consecutive metadata
# Regenerate PDF
./scripts/generate-pdf.sh
# Visual verification
open output/Documentation.pdfpandoc file.md -o output.pdf \
--from markdown \
--to pdf \
--pdf-engine=xelatexpandoc file.md -o output.pdf \
--from markdown \
--to pdf \
--pdf-engine=xelatex \
--toc \
--toc-depth=3 \
--number-sectionspandoc file.md -o output.pdf \
--from markdown \
--to pdf \
--pdf-engine=xelatex \
--metadata title="Document Title" \
--metadata author="Author Name" \
--metadata date="$(date +%Y-%m-%d)"pandoc file.md -o output.pdf \
--from markdown \
--to pdf \
--pdf-engine=xelatex \
--template=custom-template.tex## PDF Generation Test - [DATE]
### Generation Phase
- [ ] Script runs without errors
- [ ] PDF file created
- [ ] File size reasonable (< 10MB for typical docs)
### Visual Inspection Phase
- [ ] Opened PDF and scrolled through ALL pages
- [ ] Cover page correct
- [ ] TOC complete and accurate
- [ ] All headers styled correctly (no literal `##`)
- [ ] All bullets formatted as lists (not inline)
- [ ] All numbered lists formatted correctly (not inline)
- [ ] Bold/plain labels before lists properly spaced
- [ ] Metadata fields on separate lines (not run together)
- [ ] All tables fit on pages
- [ ] No obviously bad page breaks
- [ ] No missing content
- [ ] Font rendering acceptable
### Specific Checks (from user feedback)
- [ ] [Specific section] renders correctly
- [ ] [Specific formatting] matches intent
- [ ] [Specific issue] is fixed
### Final Validation
- [ ] PDF matches markdown source intent
- [ ] All user-reported issues addressed
- [ ] Ready for commit
**Issues Found:** [List any issues]
**Next Steps:** [What needs fixing]scripts/test-pdf.sh#!/bin/bash
# Test PDF generation and basic quality checks
set -e
# Generate PDF
./scripts/generate-pdf.sh
PDF="output/Documentation.pdf"
# Check file exists
if [ ! -f "$PDF" ]; then
echo "❌ PDF not generated"
exit 1
fi
# Check file size (should be between 100KB and 10MB)
SIZE=$(stat -f%z "$PDF" 2>/dev/null || stat -c%s "$PDF")
if [ $SIZE -lt 100000 ]; then
echo "⚠️ WARNING: PDF seems too small ($SIZE bytes)"
elif [ $SIZE -gt 10000000 ]; then
echo "⚠️ WARNING: PDF seems too large ($SIZE bytes)"
else
echo "✅ PDF size OK: $(numfmt --to=iec-i --suffix=B $SIZE)"
fi
# Check page count (using pdfinfo if available)
if command -v pdfinfo &> /dev/null; then
PAGES=$(pdfinfo "$PDF" | grep "Pages:" | awk '{print $2}')
echo "📄 Pages: $PAGES"
if [ $PAGES -lt 50 ]; then
echo "⚠️ WARNING: Expected ~89 pages, got $PAGES"
fi
fi
echo ""
echo "✅ Basic checks passed!"
echo "📋 Next: Open PDF and visually inspect"
echo " open $PDF"# Check for labels before lists (no blank line)
grep -B1 '^[-*] ' file.md | grep ':$' | grep -v '^--$'
# Check for anchors before headers (no blank line)
grep -A1 '^<a name=' file.md | grep '^##'
# Check for consecutive bold labels
grep '^\*\*[^*]*:\*\* ' file.md | uniq -c | grep -v '^ *1 '