ocr-super-surya

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OCR Super Surya

OCR超级工具Surya

GPU-optimized OCR using Surya.
基于Surya的GPU优化OCR工具。

When to Use

适用场景

  • OCR, extract text from image, text recognition, 画像から文字
  • Extracting text from screenshots, photos, or scanned images
  • Processing PDFs with embedded images
  • Multi-language document OCR (90+ languages including Japanese)
  • OCR图片文本提取文字识别从图像提取文字
  • 从截图、照片或扫描件中提取文本
  • 处理包含嵌入图片的PDF
  • 多语言文档OCR(支持日语在内的90余种语言)

Features

功能特性

FeatureDescription
Accuracy2x better than Tesseract (0.97 vs 0.88)
GPUPyTorch-based, CUDA optimized
Languages90+ including CJK
LayoutDocument layout, table recognition
功能描述
准确率比Tesseract高2倍(0.97 vs 0.88)
GPU支持基于PyTorch,CUDA优化
支持语言90余种,包括中日韩(CJK)语言
版面分析文档版面、表格识别

Quick Start

快速开始

Installation

安装步骤

bash
undefined
bash
undefined

1. Check GPU

1. 检查GPU

python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

2. Install (with CUDA if GPU available)

2. 安装(若有GPU则启用CUDA)

pip install surya-ocr
pip install surya-ocr

If CUDA=False but you have GPU, reinstall PyTorch:

若CUDA=False但实际有GPU,重新安装PyTorch:

pip uninstall torch torchvision torchaudio -y pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
undefined
pip uninstall torch torchvision torchaudio -y pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
undefined

Usage

使用方法

bash
undefined
bash
undefined

CLI

命令行界面(CLI)

python scripts/ocr_helper.py image.png python scripts/ocr_helper.py document.pdf -l ja en -o result.txt
python scripts/ocr_helper.py image.png python scripts/ocr_helper.py document.pdf -l ja en -o result.txt

Or use surya directly

或直接使用surya命令

surya_ocr image.png --output_dir ./results
undefined
surya_ocr image.png --output_dir ./results
undefined

Python API

Python API

python
from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

image = Image.open("document.png")
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()

predictions = recognition_predictor([image], det_predictor=detection_predictor)
for page in predictions:
    for line in page.text_lines:
        print(line.text)
python
from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

image = Image.open("document.png")
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()

predictions = recognition_predictor([image], det_predictor=detection_predictor)
for page in predictions:
    for line in page.text_lines:
        print(line.text)

GPU Configuration

GPU配置

VariableDefaultDescription
RECOGNITION_BATCH_SIZE
512Reduce for lower VRAM
DETECTOR_BATCH_SIZE
36Reduce if OOM
bash
export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png
环境变量默认值描述
RECOGNITION_BATCH_SIZE
512显存不足时可减小该值
DETECTOR_BATCH_SIZE
36出现OOM时减小该值
bash
export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png

Scripts

脚本说明

ScriptDescription
scripts/ocr_helper.py
Helper with OOM auto-retry, batch support
脚本描述
scripts/ocr_helper.py
具备OOM自动重试、批量处理功能的辅助脚本

Done Criteria

完成标准

  • CUDA available (if GPU present)
  • Text extracted from target image
  • Output saved to specified file
  • 若有GPU则CUDA可用
  • 从目标图片中提取出文本
  • 输出保存至指定文件

License

许可证

  • This skill: CC BY-NC 4.0
  • Surya: GPL-3.0 (code), commercial license for >$2M revenue
  • 本技能:CC BY-NC 4.0
  • Surya:GPL-3.0(代码),年收入超过200万美元需使用商业许可证