mineru

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MinerU PDF Parser

MinerU PDF解析器

Parse PDF documents into structured Markdown using the MinerU API.
使用MinerU API将PDF文档解析为结构化Markdown格式。

Quick Start

快速开始

bash
undefined
bash
undefined

Set API token

设置API令牌

export MINERU_TOKEN="your-jwt-token"
export MINERU_TOKEN="your-jwt-token"

Parse single PDF

解析单个PDF文件

python mineru_api.py --file ./document.pdf --output ./output/
undefined
python mineru_api.py --file ./document.pdf --output ./output/
undefined

Features

功能特性

  • Multi-format Output: Markdown, JSON, DOCX
  • Formula Recognition: LaTeX formula extraction
  • Table Extraction: Structured table parsing
  • OCR Support: Scanned PDF processing
  • Batch Processing: Parallel processing with async
  • 多格式输出:Markdown、JSON、DOCX
  • 公式识别:提取LaTeX格式公式
  • 表格提取:结构化表格解析
  • OCR支持:处理扫描版PDF
  • 批量处理:通过异步实现并行处理

Authentication

身份验证

Get your free token at: https://open.walab.ai/
bash
export MINERU_TOKEN="your-token-here"
在以下地址获取免费令牌:https://open.walab.ai/
bash
export MINERU_TOKEN="your-token-here"