paddleocr-text-recognition

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PaddleOCR Text Recognition Skill

PaddleOCR文本识别技能

When to Use This Skill

适用场景

Invoke this skill in the following situations:
  • Extract text from images (screenshots, photos, scans, charts)
  • Read text from PDF or document images
  • Perform OCR on any visual content containing text
  • Parse structured documents (invoices, receipts, forms, tables)
  • Recognize text in photos taken by mobile phones
  • Extract text from URLs pointing to images or PDFs
Do not use this skill in the following situations:
  • Plain text files that can be read directly with the Read tool
  • Code files or markdown documents
  • Tasks that do not involve image-to-text conversion
在以下场景中调用此技能:
  • 从图片(截图、照片、扫描件、图表)中提取文本
  • 读取PDF或文档图片中的文本
  • 对包含文本的任何视觉内容执行OCR识别
  • 解析结构化文档(发票、收据、表单、表格)
  • 识别手机拍摄照片中的文本
  • 从指向图片或PDF的URL中提取文本
请勿在以下场景中使用此技能:
  • 可直接使用Read工具读取的纯文本文件
  • 代码文件或Markdown文档
  • 不涉及图片转文本转换的任务

How to Use This Skill

使用方法

MANDATORY RESTRICTIONS - DO NOT VIOLATE
  1. ONLY use PaddleOCR Text Recognition API - Execute the script
    python scripts/ocr_caller.py
  2. NEVER use Claude's built-in vision - Do NOT read images yourself
  3. NEVER offer alternatives - Do NOT suggest "I can try to read it" or similar
  4. IF API fails - Display the error message and STOP immediately
  5. NO fallback methods - Do NOT attempt OCR any other way
If the script execution fails (API not configured, network error, etc.):
  • Show the error message to the user
  • Do NOT offer to help using your vision capabilities
  • Do NOT ask "Would you like me to try reading it?"
  • Simply stop and wait for user to fix the configuration
强制限制 - 请勿违反
  1. 仅使用PaddleOCR Text Recognition API - 执行脚本
    python scripts/ocr_caller.py
  2. 绝不能使用Claude内置的视觉功能 - 请勿自行读取图片
  3. 绝不能提供替代方案 - 请勿提出“我可以尝试读取它”或类似表述
  4. 若API调用失败 - 显示错误信息并立即停止
  5. 无 fallback 方案 - 请勿尝试任何其他OCR方式
如果脚本执行失败(API未配置、网络错误等):
  • 向用户显示错误信息
  • 请勿提出使用自身视觉功能提供帮助
  • 请勿询问“你想让我尝试读取它吗?”
  • 只需停止操作,等待用户修复配置

Basic Workflow

基本工作流程

  1. Identify the input source:
    • User provides URL: Use the
      --file-url
      parameter
    • User provides local file path: Use the
      --file-path
      parameter
    • User uploads image: Save it first, then use
      --file-path
  2. Execute OCR:
    bash
    python scripts/ocr_caller.py --file-url "URL provided by user" --pretty
    Or for local files:
    bash
    python scripts/ocr_caller.py --file-path "file path" --pretty
    Save result to file (recommended):
    bash
    python scripts/ocr_caller.py --file-url "URL" --output result.json --pretty
  3. Parse JSON response:
    • Check the
      ok
      field:
      true
      means success,
      false
      means error
    • Extract text:
      text
      field contains all recognized text
    • Handle errors: If
      ok
      is false, display
      error.message
  4. Present results to user:
    • Display extracted text in a readable format
    • If the text is empty, the image may contain no text
  1. 确定输入源
    • 用户提供URL:使用
      --file-url
      参数
    • 用户提供本地文件路径:使用
      --file-path
      参数
    • 用户上传图片:先保存图片,再使用
      --file-path
      参数
  2. 执行OCR识别
    bash
    python scripts/ocr_caller.py --file-url "用户提供的URL" --pretty
    针对本地文件:
    bash
    python scripts/ocr_caller.py --file-path "文件路径" --pretty
    推荐:将结果保存到文件
    bash
    python scripts/ocr_caller.py --file-url "URL" --output result.json --pretty
  3. 解析JSON响应
    • 检查
      ok
      字段:
      true
      表示成功,
      false
      表示错误
    • 提取文本:
      text
      字段包含所有识别出的文本
    • 处理错误:如果
      ok
      为false,显示
      error.message
      内容
  4. 向用户展示结果
    • 以易读格式显示提取的文本
    • 如果文本为空,说明图片可能不包含任何文本

IMPORTANT: Complete Output Display

重要提示:完整输出展示

CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.
  • The script returns the full JSON with complete text content in
    text
    field
  • You MUST display the entire
    text
    content to the user
    , no matter how long it is
  • Do NOT use phrases like "Here's a summary" or "The text begins with..."
  • Do NOT truncate with "..." unless the text truly exceeds reasonable display limits
  • The user expects to see ALL the recognized text, not a preview or excerpt
Correct approach:
I've extracted the text from the image. Here's the complete content:

[Display the entire text here]
Incorrect approach:
I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)
关键要求:始终向用户显示完整的识别文本。请勿截断或总结OCR结果。
  • 脚本返回的完整JSON中,
    text
    字段包含完整文本内容
  • 无论文本多长,你都必须向用户显示整个
    text
    内容
  • 请勿使用“以下是摘要”或“文本开头为...”等表述
  • 除非文本确实超出合理显示限制,否则请勿用“...”截断
  • 用户期望看到所有识别出的文本,而非预览或节选
正确示例
我已提取图片中的文本,以下是完整内容:

[在此显示全部文本]
错误示例
我在图片中发现一些文本,以下是预览:
"敏捷的棕色狐狸..."(已截断)

Usage Examples

使用示例

URL OCR:
bash
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
Local File OCR:
bash
python scripts/ocr_caller.py --file-path "./document.pdf" --pretty
URL识别
bash
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
本地文件识别
bash
python scripts/ocr_caller.py --file-path "./document.pdf" --pretty

Understanding the Output

输出结果说明

The script outputs JSON structure as follows:
json
{
  "ok": true,
  "text": "All recognized text here...",
  "result": { ... },
  "error": null
}
Key fields:
  • ok
    :
    true
    for success,
    false
    for error
  • text
    : Complete recognized text
  • result
    : Raw API response (for debugging)
  • error
    : Error details if
    ok
    is false
脚本输出的JSON结构如下:
json
{
  "ok": true,
  "text": "所有识别出的文本内容...",
  "result": { ... },
  "error": null
}
关键字段说明
  • ok
    true
    表示成功,
    false
    表示错误
  • text
    :完整的识别文本
  • result
    :API原始响应(用于调试)
  • error
    :若
    ok
    为false,此处显示错误详情

First-Time Configuration

首次配置

When API is not configured:
The error will show:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com
Configuration workflow:
  1. Show the exact error message to user (including the URL)
  2. Tell user to provide credentials:
    Please visit the URL above to get your API_URL and TOKEN.
    Once you have them, send them to me and I'll configure it automatically.
  3. When user provides credentials (accept any format):
    • API_URL=https://xxx.paddleocr.com/ocr, TOKEN=abc123...
    • Here's my API: https://xxx and token: abc123
    • Copy-pasted code format
    • Any other reasonable format
  4. Parse credentials from user's message:
    • Extract API_URL value (look for URLs with paddleocr.com or similar)
    • Extract TOKEN value (long alphanumeric string, usually 40+ chars)
  5. Configure automatically:
    bash
    python scripts/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN"
  6. If configuration succeeds:
    • Inform user: "Configuration complete! Running OCR now..."
    • Retry the original OCR task
  7. If configuration fails:
    • Show the error
    • Ask user to verify the credentials
当API未配置时
会显示以下错误:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com
配置流程
  1. 向用户显示完整错误信息(包含上述URL)
  2. 告知用户提供凭证
    请访问上述URL获取你的API_URL和TOKEN。
    获取后将其发送给我,我会自动完成配置。
  3. 当用户提供凭证时(支持任何格式):
    • API_URL=https://xxx.paddleocr.com/ocr, TOKEN=abc123...
    • 这是我的API:https://xxx 和 token:abc123
    • 复制粘贴的代码格式
    • 任何其他合理格式
  4. 从用户消息中解析凭证
    • 提取API_URL值(查找包含paddleocr.com或类似域名的URL)
    • 提取TOKEN值(长字符串,通常为40个字符以上的字母数字组合)
  5. 自动配置
    bash
    python scripts/configure.py --api-url "解析出的URL" --token "解析出的TOKEN"
  6. 若配置成功
    • 告知用户:"配置完成!正在运行OCR识别..."
    • 重试原始OCR任务
  7. 若配置失败
    • 显示错误信息
    • 请用户验证凭证是否正确

Error Handling

错误处理

Authentication failed:
API_ERROR: Authentication failed (403). Check your token.
  • Token is invalid, reconfigure with correct credentials
Quota exceeded:
API_ERROR: API rate limit exceeded (429)
  • Daily API quota exhausted, inform user to wait or upgrade
No text detected:
  • text
    field is empty
  • Image may be blank, corrupted, or contain no text
认证失败
API_ERROR: Authentication failed (403). Check your token.
  • Token无效,请使用正确的凭证重新配置
配额超限
API_ERROR: API rate limit exceeded (429)
  • 每日API配额已用尽,请告知用户等待或升级服务
未检测到文本
  • text
    字段为空
  • 图片可能为空白、损坏或不包含任何文本

Tips for Better Results

提升识别效果的技巧

If recognition quality is poor, suggest:
  • Check if the image is clear and contains text
  • Provide a higher resolution image if possible
如果识别质量不佳,建议:
  • 检查图片是否清晰且包含文本
  • 尽可能提供更高分辨率的图片

Reference Documentation

参考文档

For in-depth understanding of the OCR system, refer to:
  • references/output_schema.md
    - Output format specification
  • references/provider_api.md
    - Provider API contract
Note: Model version and capabilities are determined by your API endpoint (PADDLEOCR_OCR_API_URL).
如需深入了解OCR系统,请参考:
  • references/output_schema.md
    - 输出格式规范
  • references/provider_api.md
    - 服务商API协议
注意:模型版本和功能由你的API端点(PADDLEOCR_OCR_API_URL)决定。

Testing the Skill

技能测试

To verify the skill is working properly:
bash
python scripts/smoke_test.py
This tests configuration and API connectivity.
如需验证技能是否正常工作:
bash
python scripts/smoke_test.py
此测试会检查配置和API连接性。