mineru-pdf

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MinerU PDF

MinerU PDF

Overview

概述

Parse a PDF locally with MinerU (CPU). Default output is Markdown + JSON. Use tables/images only when requested.
使用MinerU在本地(CPU环境)解析PDF文件。默认输出格式为Markdown + JSON。仅在指定需求时才提取表格和图片。

Quick start (single PDF)

快速开始(单PDF文件)

bash
undefined
bash
undefined

Run from the skill directory

从技能目录下运行

./scripts/mineru_parse.sh /path/to/file.pdf

Optional examples:
```bash
./scripts/mineru_parse.sh /path/to/file.pdf --format json
./scripts/mineru_parse.sh /path/to/file.pdf --tables --images
./scripts/mineru_parse.sh /path/to/file.pdf

可选示例:
```bash
./scripts/mineru_parse.sh /path/to/file.pdf --format json
./scripts/mineru_parse.sh /path/to/file.pdf --tables --images

When to read references

何时查阅参考文档

If flags differ from your wrapper or you need advanced defaults (backend/method/device/threads/format mapping), read:
  • references/mineru-cli.md
如果你的包装器使用的参数与本文不同,或者你需要配置高级默认项(后端/方法/设备/线程/格式映射),请查阅:
  • references/mineru-cli.md

Output conventions

输出约定

  • Output root defaults to
    ./mineru-output/
    .
  • MinerU creates the per-document subfolder under the output root (e.g.,
    ./mineru-output/<basename>/...
    ).
  • 输出根目录默认为
    ./mineru-output/
  • MinerU会在输出根目录下为每个文档创建单独的子文件夹(例如:
    ./mineru-output/<basename>/...
    )。

Batching

批量处理

Default is single-PDF parsing. Only implement batch folder parsing if explicitly requested.
默认仅支持单PDF解析。仅当明确要求时才实现文件夹批量解析功能。