paddleocr-text-recognition

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PaddleOCR Text Recognition Skill

PaddleOCR文本识别技能

When to Use This Skill

适用场景

Invoke this skill in the following situations:

Extract text from images (screenshots, photos, scans, charts)
Read text from PDF or document images
Perform OCR on any visual content containing text
Parse structured documents (invoices, receipts, forms, tables)
Recognize text in photos taken by mobile phones
Extract text from URLs pointing to images or PDFs

Do not use this skill in the following situations:

Plain text files that can be read directly with the Read tool
Code files or markdown documents
Tasks that do not involve image-to-text conversion

在以下场景中调用此技能：

从图片（截图、照片、扫描件、图表）中提取文本
读取PDF或文档图片中的文本
对包含文本的任何视觉内容执行OCR识别
解析结构化文档（发票、收据、表单、表格）
识别手机拍摄照片中的文本
从指向图片或PDF的URL中提取文本

请勿在以下场景中使用此技能：

可直接使用Read工具读取的纯文本文件
代码文件或Markdown文档
不涉及图片转文本转换的任务

How to Use This Skill

使用方法

MANDATORY RESTRICTIONS - DO NOT VIOLATE

ONLY use PaddleOCR Text Recognition API - Execute the script
```
python scripts/ocr_caller.py
```
NEVER use Claude's built-in vision - Do NOT read images yourself
NEVER offer alternatives - Do NOT suggest "I can try to read it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt OCR any other way

If the script execution fails (API not configured, network error, etc.):

Show the error message to the user
Do NOT offer to help using your vision capabilities
Do NOT ask "Would you like me to try reading it?"
Simply stop and wait for user to fix the configuration

强制限制 - 请勿违反

仅使用PaddleOCR Text Recognition API - 执行脚本
```
python scripts/ocr_caller.py
```
绝不能使用Claude内置的视觉功能 - 请勿自行读取图片
绝不能提供替代方案 - 请勿提出“我可以尝试读取它”或类似表述
若API调用失败 - 显示错误信息并立即停止
无 fallback 方案 - 请勿尝试任何其他OCR方式

如果脚本执行失败（API未配置、网络错误等）：

向用户显示错误信息
请勿提出使用自身视觉功能提供帮助
请勿询问“你想让我尝试读取它吗？”
只需停止操作，等待用户修复配置

Basic Workflow

基本工作流程

Identify the input source:
- User provides URL: Use the
```
--file-url
```
  parameter
- User provides local file path: Use the
```
--file-path
```
  parameter
- User uploads image: Save it first, then use
```
--file-path
```

Execute OCR:

bash

python scripts/ocr_caller.py --file-url "URL provided by user" --pretty

Or for local files:

bash

python scripts/ocr_caller.py --file-path "file path" --pretty

Save result to file (recommended):

bash

python scripts/ocr_caller.py --file-url "URL" --output result.json --pretty

Parse JSON response:
- Check the
```
ok
```
  field:
```
true
```
  means success,
```
false
```
  means error
- Extract text:
```
text
```
  field contains all recognized text
- Handle errors: If
```
ok
```
  is false, display
```
error.message
```
Present results to user:
- Display extracted text in a readable format
- If the text is empty, the image may contain no text

确定输入源：
- 用户提供URL：使用
```
--file-url
```
  参数
- 用户提供本地文件路径：使用
```
--file-path
```
  参数
- 用户上传图片：先保存图片，再使用
```
--file-path
```
  参数

执行OCR识别：

bash

python scripts/ocr_caller.py --file-url "用户提供的URL" --pretty

针对本地文件：

bash

python scripts/ocr_caller.py --file-path "文件路径" --pretty

推荐：将结果保存到文件：

bash

python scripts/ocr_caller.py --file-url "URL" --output result.json --pretty

解析JSON响应：
- 检查
```
ok
```
  字段：
```
true
```
  表示成功，
```
false
```
  表示错误
- 提取文本：
```
text
```
  字段包含所有识别出的文本
- 处理错误：如果
```
ok
```
  为false，显示
```
error.message
```
  内容
向用户展示结果：
- 以易读格式显示提取的文本
- 如果文本为空，说明图片可能不包含任何文本

IMPORTANT: Complete Output Display

重要提示：完整输出展示

CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.

The script returns the full JSON with complete text content in
```
text
```
field
You MUST display the entire
text
content to the user, no matter how long it is
Do NOT use phrases like "Here's a summary" or "The text begins with..."
Do NOT truncate with "..." unless the text truly exceeds reasonable display limits
The user expects to see ALL the recognized text, not a preview or excerpt

Correct approach:

I've extracted the text from the image. Here's the complete content:

[Display the entire text here]

Incorrect approach:

I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)

关键要求：始终向用户显示完整的识别文本。请勿截断或总结OCR结果。

脚本返回的完整JSON中，
```
text
```
字段包含完整文本内容
无论文本多长，你都必须向用户显示整个
text
内容
请勿使用“以下是摘要”或“文本开头为...”等表述
除非文本确实超出合理显示限制，否则请勿用“...”截断
用户期望看到所有识别出的文本，而非预览或节选

正确示例：

我已提取图片中的文本，以下是完整内容：

[在此显示全部文本]

错误示例：

我在图片中发现一些文本，以下是预览：
"敏捷的棕色狐狸..."（已截断）

Usage Examples

使用示例

URL OCR:

bash

python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty

Local File OCR:

bash

python scripts/ocr_caller.py --file-path "./document.pdf" --pretty

URL识别：

bash

python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty

本地文件识别：

bash

python scripts/ocr_caller.py --file-path "./document.pdf" --pretty

Understanding the Output

输出结果说明

The script outputs JSON structure as follows:

json

{
  "ok": true,
  "text": "All recognized text here...",
  "result": { ... },
  "error": null
}

Key fields:

```
ok
```
:
```
true
```
for success,
```
false
```
for error
```
text
```
: Complete recognized text
```
result
```
: Raw API response (for debugging)
```
error
```
: Error details if
```
ok
```
is false

脚本输出的JSON结构如下：

json

{
  "ok": true,
  "text": "所有识别出的文本内容...",
  "result": { ... },
  "error": null
}

关键字段说明：

```
ok
```
：
```
true
```
表示成功，
```
false
```
表示错误
```
text
```
：完整的识别文本
```
result
```
：API原始响应（用于调试）
```
error
```
：若
```
ok
```
为false，此处显示错误详情

First-Time Configuration

首次配置

When API is not configured:

The error will show:

CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com

Configuration workflow:

Show the exact error message to user (including the URL)

Tell user to provide credentials:

Please visit the URL above to get your API_URL and TOKEN.
Once you have them, send them to me and I'll configure it automatically.

When user provides credentials (accept any format):
- ```
API_URL=https://xxx.paddleocr.com/ocr, TOKEN=abc123...
```
- ```
Here's my API: https://xxx and token: abc123
```
- Copy-pasted code format
- Any other reasonable format
Parse credentials from user's message:
- Extract API_URL value (look for URLs with paddleocr.com or similar)
- Extract TOKEN value (long alphanumeric string, usually 40+ chars)

Configure automatically:

bash

python scripts/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN"

If configuration succeeds:
- Inform user: "Configuration complete! Running OCR now..."
- Retry the original OCR task
If configuration fails:
- Show the error
- Ask user to verify the credentials

当API未配置时：

会显示以下错误：

CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com

配置流程：

向用户显示完整错误信息（包含上述URL）

告知用户提供凭证：

请访问上述URL获取你的API_URL和TOKEN。
获取后将其发送给我，我会自动完成配置。

当用户提供凭证时（支持任何格式）：

API_URL=https://xxx.paddleocr.com/ocr, TOKEN=abc123...

这是我的API：https://xxx 和 token：abc123

复制粘贴的代码格式
任何其他合理格式

从用户消息中解析凭证：
- 提取API_URL值（查找包含paddleocr.com或类似域名的URL）
- 提取TOKEN值（长字符串，通常为40个字符以上的字母数字组合）

自动配置：

bash

python scripts/configure.py --api-url "解析出的URL" --token "解析出的TOKEN"

若配置成功：
- 告知用户："配置完成！正在运行OCR识别..."
- 重试原始OCR任务
若配置失败：
- 显示错误信息
- 请用户验证凭证是否正确

Error Handling

错误处理

Authentication failed:

API_ERROR: Authentication failed (403). Check your token.

Token is invalid, reconfigure with correct credentials

Quota exceeded:

API_ERROR: API rate limit exceeded (429)

Daily API quota exhausted, inform user to wait or upgrade

No text detected:

```
text
```
field is empty
Image may be blank, corrupted, or contain no text

认证失败：

API_ERROR: Authentication failed (403). Check your token.

Token无效，请使用正确的凭证重新配置

配额超限：

API_ERROR: API rate limit exceeded (429)

每日API配额已用尽，请告知用户等待或升级服务

未检测到文本：

```
text
```
字段为空
图片可能为空白、损坏或不包含任何文本

Tips for Better Results

提升识别效果的技巧

If recognition quality is poor, suggest:

Check if the image is clear and contains text
Provide a higher resolution image if possible

如果识别质量不佳，建议：

检查图片是否清晰且包含文本
尽可能提供更高分辨率的图片

Reference Documentation

参考文档

For in-depth understanding of the OCR system, refer to:

```
references/output_schema.md
```
- Output format specification
```
references/provider_api.md
```
- Provider API contract

Note: Model version and capabilities are determined by your API endpoint (PADDLEOCR_OCR_API_URL).

如需深入了解OCR系统，请参考：

```
references/output_schema.md
```
- 输出格式规范
```
references/provider_api.md
```
- 服务商API协议

注意：模型版本和功能由你的API端点（PADDLEOCR_OCR_API_URL）决定。

Testing the Skill

技能测试

To verify the skill is working properly:

bash

python scripts/smoke_test.py

This tests configuration and API connectivity.

如需验证技能是否正常工作：

bash

python scripts/smoke_test.py

此测试会检查配置和API连接性。