paddleocr-text-recognition

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PaddleOCR Text Recognition Skill

PaddleOCR文本识别技能

When to Use This Skill

何时使用本技能

Invoke this skill in the following situations:

Extract text from images (screenshots, photos, scans)
Extract text from PDFs or document images
Extract text and positions from structured documents (invoices, receipts, forms, tables)
Extract text from URLs or local files that point to images/PDFs

Do not use this skill in the following situations:

Plain text files that can be read directly with the Read tool
Code files or markdown documents
Tasks that do not involve image-to-text conversion

在以下场景中调用本技能：

从图片（截图、照片、扫描件）中提取文本
从PDF或文档图片中提取文本
从结构化文档（发票、收据、表单、表格）中提取文本及位置信息
从指向图片/PDF的URL或本地文件中提取文本

请勿在以下场景中使用本技能：

可直接使用Read工具读取的纯文本文件
代码文件或Markdown文档
不涉及图片转文本转换的任务

How to Use This Skill

如何使用本技能

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use PaddleOCR Text Recognition API - Execute the script
```
python scripts/ocr_caller.py
```
NEVER read images directly - Do NOT read images yourself
NEVER offer alternatives - Do NOT suggest "I can try to read it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt OCR any other way

If the script execution fails (API not configured, network error, etc.):

Show the error message to the user
Do NOT offer to help using your vision capabilities
Do NOT ask "Would you like me to try reading it?"
Simply stop and wait for user to fix the configuration

⚠️ 强制限制 - 不得违反 ⚠️

仅使用PaddleOCR文本识别API - 执行脚本
```
python scripts/ocr_caller.py
```
切勿直接读取图片 - 不要自行读取图片
切勿提供替代方案 - 不要提出“我可以尝试读取它”或类似表述
若API调用失败 - 显示错误信息并立即停止操作
无 fallback 方法 - 不要尝试以其他方式进行OCR

如果脚本执行失败（API未配置、网络错误等）：

向用户显示错误信息
不要提出使用自身视觉能力提供帮助
不要询问“你想让我尝试读取它吗？”
只需停止操作，等待用户修复配置

Basic Workflow

基本工作流程

Identify the input source:
- User provides URL: Use the
```
--file-url
```
  parameter
- User provides local file path: Use the
```
--file-path
```
  parameter
- User uploads image: Save it first, then use
```
--file-path
```
Input type note:
- Supported file types depend on the model and endpoint configuration.
- Follow the official endpoint/API documentation for the exact supported formats.
Execute OCR:
bash
```
python scripts/ocr_caller.py --file-url "URL provided by user" --pretty
```
Or for local files:
bash
```
python scripts/ocr_caller.py --file-path "file path" --pretty
```
Default behavior: save raw JSON to a temp file:
- If
```
--output
```
  is omitted, the script saves automatically under the system temp directory
- Default path pattern:
```
<system-temp>/paddleocr/text-recognition/results/result_<timestamp>_<id>.json
```
- If
```
--output
```
  is provided, it overrides the default temp-file destination
- If
```
--stdout
```
  is provided, JSON is printed to stdout and no file is saved
- In save mode, the script prints the absolute saved path on stderr:
```
Result saved to: /absolute/path/...
```
- In default/custom save mode, read and parse the saved JSON file before responding
- Use
```
--stdout
```
  only when you explicitly want to skip file persistence
Parse JSON response:
- In default/custom save mode, load JSON from the saved file path shown by the script
- Check the
```
ok
```
  field:
```
true
```
  means success,
```
false
```
  means error
- Extract text:
```
text
```
  field contains all recognized text
- If
```
--stdout
```
  is used, parse the stdout JSON directly
- Handle errors: If
```
ok
```
  is false, display
```
error.message
```
Present results to user:
- Display extracted text in a readable format
- If the text is empty, the image may contain no text
- In save mode, always tell the user the saved file path and that full raw JSON is available there

识别输入源：
- 用户提供URL：使用
```
--file-url
```
  参数
- 用户提供本地文件路径：使用
```
--file-path
```
  参数
- 用户上传图片：先保存图片，再使用
```
--file-path
```
  参数
输入类型说明：
- 支持的文件类型取决于模型和端点配置。
- 请遵循官方端点/API文档查看确切支持的格式。
执行OCR：
bash
```
python scripts/ocr_caller.py --file-url "用户提供的URL" --pretty
```
对于本地文件：
bash
```
python scripts/ocr_caller.py --file-path "文件路径" --pretty
```
默认行为：将原始JSON保存到临时文件：
- 如果省略
```
--output
```
  参数，脚本会自动保存到系统临时目录下
- 默认路径格式：
```
<系统临时目录>/paddleocr/text-recognition/results/result_<时间戳>_<id>.json
```
- 如果提供
```
--output
```
  参数，会覆盖默认的临时文件路径
- 如果提供
```
--stdout
```
  参数，JSON会打印到标准输出，不会保存文件
- 在保存模式下，脚本会在标准错误输出中打印绝对保存路径：
```
Result saved to: /absolute/path/...
```
- 在默认/自定义保存模式下，在回复前需读取并解析保存的JSON文件
- 仅当明确想要跳过文件持久化时才使用
```
--stdout
```
解析JSON响应：
- 在默认/自定义保存模式下，从脚本打印的保存文件路径加载JSON
- 检查
```
ok
```
  字段：
```
true
```
  表示成功，
```
false
```
  表示错误
- 提取文本：
```
text
```
  字段包含所有识别出的文本
- 如果使用
```
--stdout
```
  ，直接解析标准输出中的JSON
- 处理错误：如果
```
ok
```
  为
```
false
```
  ，显示
```
error.message
```
  内容
向用户展示结果：
- 以易读格式显示提取的文本
- 如果文本为空，说明图片可能不包含任何文本
- 在保存模式下，务必告知用户保存的文件路径，并说明完整的原始JSON可在该路径获取

IMPORTANT: Complete Output Display

重要：完整输出展示

CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.

The output JSON contains complete output, including full text in
```
text
```
field
You MUST display the entire
text
content to the user, no matter how long it is
Do NOT use phrases like "Here's a summary" or "The text begins with..."
Do NOT truncate with "..." unless the text truly exceeds reasonable display limits
The user expects to see ALL the recognized text, not a preview or excerpt

Correct approach:

I've extracted the text from the image. Here's the complete content:

[Display the entire text here]

Incorrect approach:

I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)

关键要求：始终向用户展示完整的识别文本。不得截断或总结OCR结果。

输出JSON包含完整输出，
```
text
```
字段中是完整文本
必须向用户展示整个
text
内容，无论长度如何
不得使用“以下是摘要”或“文本开头为...”之类的表述
除非文本确实超出合理显示限制，否则不得用“...”截断
用户期望看到所有识别出的文本，而非预览或摘录

正确做法：

我已从图片中提取出文本。以下是完整内容：

[在此处展示全部文本]

错误做法：

我在图片中发现了一些文本。以下是预览：
"敏捷的棕色狐狸..."（已截断）

Usage Examples

使用示例

Example 1: URL OCR:

bash

python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty

Example 2: Local File OCR:

bash

python scripts/ocr_caller.py --file-path "./document.pdf" --pretty

Example 3: OCR With Explicit File Type:

bash

python scripts/ocr_caller.py --file-url "https://example.com/input" --file-type 1 --pretty

Example 4: Print JSON Without Saving:

bash

python scripts/ocr_caller.py --file-url "https://example.com/input" --stdout --pretty

示例1：URL OCR：

bash

python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty

示例2：本地文件OCR：

bash

python scripts/ocr_caller.py --file-path "./document.pdf" --pretty

示例3：指定文件类型的OCR：

bash

python scripts/ocr_caller.py --file-url "https://example.com/input" --file-type 1 --pretty

示例4：打印JSON而不保存：

bash

python scripts/ocr_caller.py --file-url "https://example.com/input" --stdout --pretty

Understanding the Output

理解输出结果

The output JSON structure is as follows:

json

{
  "ok": true,
  "text": "All recognized text here...",
  "result": { ... },
  "error": null
}

Key fields:

```
ok
```
:
```
true
```
for success,
```
false
```
for error
```
text
```
: Complete recognized text
```
result
```
: Raw API response (for debugging)
```
error
```
: Error details if
```
ok
```
is false

Raw result location (default): the temp-file path printed by the script on stderr

输出JSON结构如下：

json

{
  "ok": true,
  "text": "所有识别出的文本内容...",
  "result": { ... },
  "error": null
}

关键字段：

```
ok
```
：
```
true
```
表示成功，
```
false
```
表示错误
```
text
```
：完整的识别文本
```
result
```
：原始API响应（用于调试）
```
error
```
：如果
```
ok
```
为
```
false
```
，则包含错误详情

原始结果位置（默认）：脚本在标准错误输出中打印的临时文件路径

First-Time Configuration

首次配置

You can generally assume that the required environment variables have already been configured. Only when an OCR task fails should you analyze the error message to determine whether it is caused by a configuration issue. If it is indeed a configuration problem, you should notify the user to fix it.

When API is not configured:

The error will show:

CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com

Configuration workflow:

Show the exact error message to the user (including the URL).
Guide the user to configure securely:
- Recommend configuring through the host application's standard method (e.g., settings file, environment variable UI) rather than pasting credentials in chat.
- List the required environment variables:
```
- PADDLEOCR_OCR_API_URL
- PADDLEOCR_ACCESS_TOKEN
- Optional: PADDLEOCR_OCR_TIMEOUT
```
If the user provides credentials in chat anyway (accept any reasonable format), for example:
- ```
PADDLEOCR_OCR_API_URL=https://xxx.paddleocr.com/ocr, PADDLEOCR_ACCESS_TOKEN=abc123...
```
- ```
Here's my API: https://xxx and token: abc123
```
- Copy-pasted code format
- Any other reasonable format
- Security note: Warn the user that credentials shared in chat may be stored in conversation history. Recommend setting them through the host application's configuration instead when possible.
Then parse and validate the values:
- Extract
```
PADDLEOCR_OCR_API_URL
```
  (look for URLs with
```
paddleocr.com
```
  or similar)
- Confirm
```
PADDLEOCR_OCR_API_URL
```
  is a full endpoint ending with
```
/ocr
```
- Extract
```
PADDLEOCR_ACCESS_TOKEN
```
  (long alphanumeric string, usually 40+ chars)
Ask the user to confirm the environment is configured.
Retry only after confirmation:
- Once the user confirms the environment variables are available, retry the original OCR task

通常可以假设所需的环境变量已配置完成。只有当OCR任务失败时，才需要分析错误消息以确定是否由配置问题导致。如果确实是配置问题，应通知用户进行修复。

当API未配置时：

错误信息将显示：

CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com

配置流程：

向用户显示确切的错误消息（包含URL）。
指导用户安全配置：
- 建议通过宿主应用的标准方式（如设置文件、环境变量UI）进行配置，而非在聊天中粘贴凭据。
- 列出所需的环境变量：
```
- PADDLEOCR_OCR_API_URL
- PADDLEOCR_ACCESS_TOKEN
- 可选：PADDLEOCR_OCR_TIMEOUT
```
若用户仍在聊天中提供凭据（接受任何合理格式），例如：
- ```
PADDLEOCR_OCR_API_URL=https://xxx.paddleocr.com/ocr, PADDLEOCR_ACCESS_TOKEN=abc123...
```
- ```
这是我的API：https://xxx 和 token：abc123
```
- 复制粘贴的代码格式
- 任何其他合理格式
- 安全提示：警告用户在聊天中共享的凭据可能会存储在对话历史中。建议尽可能通过宿主应用的配置进行设置。
然后解析并验证值：
- 提取
```
PADDLEOCR_OCR_API_URL
```
  （查找包含
```
paddleocr.com
```
  或类似域名的URL）
- 确认
```
PADDLEOCR_OCR_API_URL
```
  是完整的、以
```
/ocr
```
  结尾的端点
- 提取
```
PADDLEOCR_ACCESS_TOKEN
```
  （长字母数字字符串，通常40个字符以上）
请用户确认环境已配置完成。
仅在确认后重试：
- 一旦用户确认环境变量已配置好，重试原始的OCR任务

Error Handling

错误处理

Authentication failed:

API_ERROR: Authentication failed (403). Check your token.

Token is invalid, reconfigure with correct credentials

Quota exceeded:

API_ERROR: API rate limit exceeded (429)

Daily API quota exhausted, inform user to wait or upgrade

No text detected:

```
text
```
field is empty
Image may be blank, corrupted, or contain no text

认证失败：

API_ERROR: Authentication failed (403). Check your token.

Token无效，请使用正确的凭据重新配置

配额超出：

API_ERROR: API rate limit exceeded (429)

每日API配额已用尽，告知用户等待或升级服务

未检测到文本：

```
text
```
字段为空
图片可能是空白、损坏或不包含任何文本

Tips for Better Results

提升识别效果的小贴士

If recognition quality is poor, suggest:

Check if the image is clear and contains text
Provide a higher resolution image if possible

如果识别质量不佳，建议：

检查图片是否清晰且包含文本
尽可能提供更高分辨率的图片

Reference Documentation

参考文档

For in-depth understanding of the OCR system, refer to:

```
references/output_schema.md
```
- Output format specification

Note: Model version, capabilities, and supported file formats are determined by your API endpoint (
PADDLEOCR_OCR_API_URL
) and its official API documentation.

如需深入了解OCR系统，请参考：

```
references/output_schema.md
```
- 输出格式规范

注意：模型版本、功能和支持的文件格式由你的API端点（
PADDLEOCR_OCR_API_URL
）及其官方API文档决定。

Testing the Skill

测试技能

To verify the skill is working properly:

bash

python scripts/smoke_test.py

This tests configuration and API connectivity.

要验证技能是否正常工作：

bash

python scripts/smoke_test.py

此脚本会测试配置和API连通性。