ocr-super-surya

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OCR Super Surya

OCR超级工具Surya

GPU-optimized OCR using Surya.

基于Surya的GPU优化OCR工具。

When to Use

适用场景

OCR, extract text from image, text recognition, 画像から文字
Extracting text from screenshots, photos, or scanned images
Processing PDFs with embedded images
Multi-language document OCR (90+ languages including Japanese)

OCR、图片文本提取、文字识别、从图像提取文字
从截图、照片或扫描件中提取文本
处理包含嵌入图片的PDF
多语言文档OCR（支持日语在内的90余种语言）

Features

功能特性

Feature	Description
Accuracy	2x better than Tesseract (0.97 vs 0.88)
GPU	PyTorch-based, CUDA optimized
Languages	90+ including CJK
Layout	Document layout, table recognition

功能	描述
准确率	比Tesseract高2倍（0.97 vs 0.88）
GPU支持	基于PyTorch，CUDA优化
支持语言	90余种，包括中日韩（CJK）语言
版面分析	文档版面、表格识别

Quick Start

快速开始

Installation

安装步骤

bash

undefined

bash

undefined

1. Check GPU

1. 检查GPU

python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

2. Install (with CUDA if GPU available)

2. 安装（若有GPU则启用CUDA）

pip install surya-ocr

If CUDA=False but you have GPU, reinstall PyTorch:

若CUDA=False但实际有GPU，重新安装PyTorch:

pip uninstall torch torchvision torchaudio -y pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

undefined

pip uninstall torch torchvision torchaudio -y pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

undefined

Usage

使用方法

bash

undefined

bash

undefined

CLI

命令行界面（CLI）

python scripts/ocr_helper.py image.png python scripts/ocr_helper.py document.pdf -l ja en -o result.txt

Or use surya directly

或直接使用surya命令

surya_ocr image.png --output_dir ./results

undefined

surya_ocr image.png --output_dir ./results

undefined

Python API

python

from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

image = Image.open("document.png")
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()

predictions = recognition_predictor([image], det_predictor=detection_predictor)
for page in predictions:
    for line in page.text_lines:
        print(line.text)

python

from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

image = Image.open("document.png")
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()

predictions = recognition_predictor([image], det_predictor=detection_predictor)
for page in predictions:
    for line in page.text_lines:
        print(line.text)

GPU Configuration

GPU配置

Variable	Default	Description
`RECOGNITION_BATCH_SIZE`	512	Reduce for lower VRAM
`DETECTOR_BATCH_SIZE`	36	Reduce if OOM

bash

export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png

环境变量	默认值	描述
`RECOGNITION_BATCH_SIZE`	512	显存不足时可减小该值
`DETECTOR_BATCH_SIZE`	36	出现OOM时减小该值

bash

export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png

Scripts

脚本说明

Script	Description
`scripts/ocr_helper.py`	Helper with OOM auto-retry, batch support

脚本	描述
`scripts/ocr_helper.py`	具备OOM自动重试、批量处理功能的辅助脚本

Done Criteria

完成标准

CUDA available (if GPU present)
Text extracted from target image
Output saved to specified file

若有GPU则CUDA可用
从目标图片中提取出文本
输出保存至指定文件

License

许可证

This skill: CC BY-NC 4.0
Surya: GPL-3.0 (code), commercial license for >$2M revenue

本技能：CC BY-NC 4.0
Surya：GPL-3.0（代码），年收入超过200万美元需使用商业许可证