mineru
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMinerU PDF Parser
MinerU PDF解析器
Parse PDF documents into structured Markdown using the MinerU API.
使用MinerU API将PDF文档解析为结构化Markdown格式。
Quick Start
快速开始
bash
undefinedbash
undefinedSet API token
设置API令牌
export MINERU_TOKEN="your-jwt-token"
export MINERU_TOKEN="your-jwt-token"
Parse single PDF
解析单个PDF文件
python mineru_api.py --file ./document.pdf --output ./output/
undefinedpython mineru_api.py --file ./document.pdf --output ./output/
undefinedFeatures
功能特性
- Multi-format Output: Markdown, JSON, DOCX
- Formula Recognition: LaTeX formula extraction
- Table Extraction: Structured table parsing
- OCR Support: Scanned PDF processing
- Batch Processing: Parallel processing with async
- 多格式输出:Markdown、JSON、DOCX
- 公式识别:提取LaTeX格式公式
- 表格提取:结构化表格解析
- OCR支持:处理扫描版PDF
- 批量处理:通过异步实现并行处理
Authentication
身份验证
Get your free token at: https://open.walab.ai/
bash
export MINERU_TOKEN="your-token-here"在以下地址获取免费令牌:https://open.walab.ai/
bash
export MINERU_TOKEN="your-token-here"