ebook-extractor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEbook Text Extractor
电子书文本提取工具
Overview
概述
Extract plain text from EPUB, MOBI, and PDF files using Python scripts. No LLM calls - pure text extraction.
使用Python脚本从EPUB、MOBI和PDF文件中提取纯文本。无需调用大语言模型(LLM)——仅进行纯文本提取。
Supported Formats
支持的格式
| Format | Tool Used | Notes |
|---|---|---|
| EPUB | | Direct parsing, preserves structure |
| MOBI | Calibre | Converts to EPUB first, then extracts |
| Fast, handles most PDFs well |
| 格式 | 使用工具 | 说明 |
|---|---|---|
| EPUB | | 直接解析,保留结构 |
| MOBI | Calibre | 先转换为EPUB,再提取文本 |
| 速度快,能良好处理大多数PDF |
Usage
使用方法
Unified extractor (auto-detects format):
bash
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.epub
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.mobi
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.pdfOutput options:
bash
undefined统一提取工具(自动检测格式):
bash
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.epub
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.mobi
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.pdf输出选项:
bash
undefinedTo stdout (default)
输出到标准输出(默认)
python3 scripts/extract.py book.epub
python3 scripts/extract.py book.epub
To file
输出到文件
python3 scripts/extract.py book.epub -o output.txt
python3 scripts/extract.py book.epub > output.txt
**Format-specific scripts:**
```bash
python3 scripts/extract_epub.py book.epub
python3 scripts/extract_mobi.py book.mobi
python3 scripts/extract_pdf.py book.pdfpython3 scripts/extract.py book.epub -o output.txt
python3 scripts/extract.py book.epub > output.txt
**格式专用脚本:**
```bash
python3 scripts/extract_epub.py book.epub
python3 scripts/extract_mobi.py book.mobi
python3 scripts/extract_pdf.py book.pdfSetup
安装配置
bash
undefinedbash
undefinedOne-command setup (installs all dependencies)
一键配置(安装所有依赖)
~/.claude/skills/ebook-extractor/setup.sh
~/.claude/skills/ebook-extractor/setup.sh
Or manually:
或手动安装:
pip install -r ~/.claude/skills/ebook-extractor/requirements.txt
brew install calibre # macOS, for MOBI support
undefinedpip install -r ~/.claude/skills/ebook-extractor/requirements.txt
brew install calibre # macOS系统,用于支持MOBI格式
undefinedScript Location
脚本位置
~/.claude/skills/ebook-extractor/scripts/~/.claude/skills/ebook-extractor/scripts/Common Issues
常见问题
| Problem | Solution |
|---|---|
| Missing package | Run |
| MOBI fails | Ensure Calibre is installed: |
| PDF garbled | Some PDFs are image-based; OCR needed (not supported) |
| 问题 | 解决方案 |
|---|---|
| 缺少依赖包 | 运行 |
| MOBI格式提取失败 | 确保已安装Calibre: |
| PDF文本乱码 | 部分PDF是图片格式;需要OCR识别(本工具不支持) |