pdf-toc-bookmarks

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PDF TOC to Bookmarks

PDF 目录转书签

Extract table of contents from PDF pages and create clickable bookmarks using visual analysis and PyMuPDF.
通过视觉分析和PyMuPDF从PDF页面提取目录并创建可点击书签。

When to Use This Skill

适用场景

Use this skill when:
  • User wants to add bookmarks to a PDF that lacks them
  • PDF has printed TOC pages but no clickable navigation
  • User mentions "bookmarks," "navigation," "TOC," or "table of contents" for PDFs
在以下场景中使用本工具:
  • 希望为无书签的PDF添加书签
  • PDF有已打印的目录页但无可点击导航功能
  • 用户提及PDF的“书签”“导航”“目录(TOC)”相关需求

Workflow

操作流程

1. Extract TOC Pages as Images

1. 提取目录页为图片

Extract TOC pages at high resolution for visual analysis:
python
from scripts.extract_toc_images import extract_toc_images

extract_toc_images(
    pdf_path="/path/to/file.pdf",
    toc_start_page=10,  # First TOC page (1-based)
    toc_end_page=16,    # Last TOC page (1-based)
    output_dir="/mnt/user-data/outputs/toc_images"
)
以高分辨率提取目录页用于视觉分析:
python
from scripts.extract_toc_images import extract_toc_images

extract_toc_images(
    pdf_path="/path/to/file.pdf",
    toc_start_page=10,  # First TOC page (1-based)
    toc_end_page=16,    # Last TOC page (1-based)
    output_dir="/mnt/user-data/outputs/toc_images"
)

2. Analyze Images Visually

2. 视觉分析图片

Use Claude's vision capabilities to read TOC structure from images:
python
from view import view
利用Claude的视觉能力从图片中识别目录结构:
python
from view import view

View each TOC image

View each TOC image

view("/mnt/user-data/outputs/toc_images/page_010.png") view("/mnt/user-data/outputs/toc_images/page_011.png")
view("/mnt/user-data/outputs/toc_images/page_010.png") view("/mnt/user-data/outputs/toc_images/page_011.png")

... view all TOC pages

... view all TOC pages


Manually transcribe TOC structure into Python list format:
```python
toc = [
    [1, "Chapter 1: Introduction", 10],
    [2, "1.1 Overview", 11],
    [3, "1.1.1 Background", 12],
    [2, "1.2 Methods", 15],
    [1, "Chapter 2: Results", 20],
]
Format:
[level, title, printed_page_number]
  • level
    : Hierarchy depth (1=chapter, 2=section, 3=subsection)
  • title
    : Exact text from TOC
  • printed_page_number
    : Page number as printed in book (not PDF page index)

手动将目录结构转录为Python列表格式:
```python
toc = [
    [1, "Chapter 1: Introduction", 10],
    [2, "1.1 Overview", 11],
    [3, "1.1.1 Background", 12],
    [2, "1.2 Methods", 15],
    [1, "Chapter 2: Results", 20],
]
格式说明
[level, title, printed_page_number]
  • level
    :层级深度(1=章节,2=小节,3=子小节)
  • title
    :目录中的精确文本
  • printed_page_number
    :书籍上打印的页码(非PDF页面索引)

3. Determine Page Offset

3. 计算页码偏移量

Calculate offset between printed page numbers and PDF page indices:
python
import fitz

doc = fitz.open("/path/to/file.pdf")
计算打印页码与PDF页面索引之间的偏移量:
python
import fitz

doc = fitz.open("/path/to/file.pdf")

Find a known page (e.g., where Chapter 1 starts)

Find a known page (e.g., where Chapter 1 starts)

If book says "page 13" but it's PDF page 17, offset = 4

If book says "page 13" but it's PDF page 17, offset = 4

offset = actual_pdf_page - printed_page_number

Common offsets: 0-10 pages (cover, copyright, preface typically not numbered)
offset = actual_pdf_page - printed_page_number

常见偏移量范围:0-10页(封面、版权页、前言通常不编号)

4. Add Bookmarks to PDF

4. 为PDF添加书签

python
from scripts.add_bookmarks import add_bookmarks

add_bookmarks(
    pdf_path="/path/to/input.pdf",
    toc_list=toc,
    page_offset=4,  # Offset calculated above
    output_path="/mnt/user-data/outputs/file_with_bookmarks.pdf"
)
python
from scripts.add_bookmarks import add_bookmarks

add_bookmarks(
    pdf_path="/path/to/input.pdf",
    toc_list=toc,
    page_offset=4,  # Offset calculated above
    output_path="/mnt/user-data/outputs/file_with_bookmarks.pdf"
)

5. Verify and Deliver

5. 验证并交付

Open output PDF in viewer to confirm bookmarks navigate correctly. Provide download link to user.
在PDF阅读器中打开输出文件,确认书签可正确导航。为用户提供下载链接。

Critical Insights

关键要点

Vision > Text Parsing: OCR/regex for TOC extraction is unreliable due to formatting variations. Visual analysis by Claude is faster and more accurate.
Page Offset: Always verify offset. Test by clicking a bookmark and checking if it lands on correct page.
Hierarchy Levels: Maintain consistent level numbering (1=chapter, 2=section, 3=subsection) for proper nesting in PDF viewer.
视觉分析优于文本解析:由于格式差异,使用OCR/正则表达式提取目录不可靠。Claude的视觉分析更快更准确。
页码偏移量:务必验证偏移量。可通过点击书签检查是否跳转到正确页面进行测试。
层级编号:保持一致的层级编号(1=章节,2=小节,3=子小节),以确保在PDF阅读器中正确嵌套显示。

PyMuPDF Reference

PyMuPDF 参考代码

python
import fitz
python
import fitz

Open PDF

Open PDF

doc = fitz.open("file.pdf")
doc = fitz.open("file.pdf")

Extract page as image

Extract page as image

page = doc[page_index] # 0-based mat = fitz.Matrix(2.0, 2.0) # 2x zoom pix = page.get_pixmap(matrix=mat) pix.save("output.png")
page = doc[page_index] # 0-based mat = fitz.Matrix(2.0, 2.0) # 2x zoom pix = page.get_pixmap(matrix=mat) pix.save("output.png")

Set TOC (bookmarks)

Set TOC (bookmarks)

toc = [ [1, "Chapter 1", 10], # level, title, page (1-based) [2, "Section 1.1", 11], ] doc.set_toc(toc)
toc = [ [1, "Chapter 1", 10], # level, title, page (1-based) [2, "Section 1.1", 11], ] doc.set_toc(toc)

Save PDF

Save PDF

doc.save("output.pdf") doc.close()
undefined
doc.save("output.pdf") doc.close()
undefined

Example Session

示例会话

User: "Add bookmarks to this PDF based on the table of contents"

1. Extract TOC pages (10-16) as images
2. View images and transcribe TOC structure
3. Calculate page offset (e.g., 4 pages)
4. Create toc list with 182 entries
5. Add bookmarks using add_bookmarks script
6. Provide download link

Result: PDF with hierarchical bookmarks matching printed TOC
User: "Add bookmarks to this PDF based on the table of contents"

1. Extract TOC pages (10-16) as images
2. View images and transcribe TOC structure
3. Calculate page offset (e.g., 4 pages)
4. Create toc list with 182 entries
5. Add bookmarks using add_bookmarks script
6. Provide download link

Result: PDF with hierarchical bookmarks matching printed TOC

Time Savings

时间对比

  • Traditional (regex/OCR): 30-60 min with errors
  • Vision-based (this skill): 10-15 min with high accuracy
  • 传统方法(正则/OCR):30-60分钟,且存在错误
  • 基于视觉的方法(本工具):10-15分钟,准确率高