ebook-extractor

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Ebook Text Extractor

电子书文本提取工具

Overview

概述

Extract plain text from EPUB, MOBI, and PDF files using Python scripts. No LLM calls - pure text extraction.
使用Python脚本从EPUB、MOBI和PDF文件中提取纯文本。无需调用大语言模型(LLM)——仅进行纯文本提取。

Supported Formats

支持的格式

FormatTool UsedNotes
EPUB
ebooklib
+
BeautifulSoup
Direct parsing, preserves structure
MOBICalibre
ebook-convert
Converts to EPUB first, then extracts
PDF
PyMuPDF
(fitz)
Fast, handles most PDFs well
格式使用工具说明
EPUB
ebooklib
+
BeautifulSoup
直接解析,保留结构
MOBICalibre
ebook-convert
先转换为EPUB,再提取文本
PDF
PyMuPDF
(fitz)
速度快,能良好处理大多数PDF

Usage

使用方法

Unified extractor (auto-detects format):
bash
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.epub
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.mobi
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.pdf
Output options:
bash
undefined
统一提取工具(自动检测格式):
bash
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.epub
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.mobi
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.pdf
输出选项:
bash
undefined

To stdout (default)

输出到标准输出(默认)

python3 scripts/extract.py book.epub
python3 scripts/extract.py book.epub

To file

输出到文件

python3 scripts/extract.py book.epub -o output.txt python3 scripts/extract.py book.epub > output.txt

**Format-specific scripts:**
```bash
python3 scripts/extract_epub.py book.epub
python3 scripts/extract_mobi.py book.mobi
python3 scripts/extract_pdf.py book.pdf
python3 scripts/extract.py book.epub -o output.txt python3 scripts/extract.py book.epub > output.txt

**格式专用脚本:**
```bash
python3 scripts/extract_epub.py book.epub
python3 scripts/extract_mobi.py book.mobi
python3 scripts/extract_pdf.py book.pdf

Setup

安装配置

bash
undefined
bash
undefined

One-command setup (installs all dependencies)

一键配置(安装所有依赖)

~/.claude/skills/ebook-extractor/setup.sh
~/.claude/skills/ebook-extractor/setup.sh

Or manually:

或手动安装:

pip install -r ~/.claude/skills/ebook-extractor/requirements.txt brew install calibre # macOS, for MOBI support
undefined
pip install -r ~/.claude/skills/ebook-extractor/requirements.txt brew install calibre # macOS系统,用于支持MOBI格式
undefined

Script Location

脚本位置

~/.claude/skills/ebook-extractor/scripts/
~/.claude/skills/ebook-extractor/scripts/

Common Issues

常见问题

ProblemSolution
Missing packageRun
setup.sh
or
pip install -r requirements.txt
MOBI failsEnsure Calibre is installed:
brew install calibre
PDF garbledSome PDFs are image-based; OCR needed (not supported)
问题解决方案
缺少依赖包运行
setup.sh
pip install -r requirements.txt
MOBI格式提取失败确保已安装Calibre:
brew install calibre
PDF文本乱码部分PDF是图片格式;需要OCR识别(本工具不支持)