paddleocr-text-recognition
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePaddleOCR Text Recognition Skill
PaddleOCR文本识别技能
When to Use This Skill
何时使用本技能
Invoke this skill in the following situations:
- Extract text from images (screenshots, photos, scans)
- Extract text from PDFs or document images
- Extract text and positions from structured documents (invoices, receipts, forms, tables)
- Extract text from URLs or local files that point to images/PDFs
Do not use this skill in the following situations:
- Plain text files that can be read directly with the Read tool
- Code files or markdown documents
- Tasks that do not involve image-to-text conversion
在以下场景中调用本技能:
- 从图片(截图、照片、扫描件)中提取文本
- 从PDF或文档图片中提取文本
- 从结构化文档(发票、收据、表单、表格)中提取文本及位置信息
- 从指向图片/PDF的URL或本地文件中提取文本
请勿在以下场景中使用本技能:
- 可直接使用Read工具读取的纯文本文件
- 代码文件或Markdown文档
- 不涉及图片转文本转换的任务
How to Use This Skill
如何使用本技能
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
- ONLY use PaddleOCR Text Recognition API - Execute the script
python scripts/ocr_caller.py - NEVER read images directly - Do NOT read images yourself
- NEVER offer alternatives - Do NOT suggest "I can try to read it" or similar
- IF API fails - Display the error message and STOP immediately
- NO fallback methods - Do NOT attempt OCR any other way
If the script execution fails (API not configured, network error, etc.):
- Show the error message to the user
- Do NOT offer to help using your vision capabilities
- Do NOT ask "Would you like me to try reading it?"
- Simply stop and wait for user to fix the configuration
⚠️ 强制限制 - 不得违反 ⚠️
- 仅使用PaddleOCR文本识别API - 执行脚本
python scripts/ocr_caller.py - 切勿直接读取图片 - 不要自行读取图片
- 切勿提供替代方案 - 不要提出“我可以尝试读取它”或类似表述
- 若API调用失败 - 显示错误信息并立即停止操作
- 无 fallback 方法 - 不要尝试以其他方式进行OCR
如果脚本执行失败(API未配置、网络错误等):
- 向用户显示错误信息
- 不要提出使用自身视觉能力提供帮助
- 不要询问“你想让我尝试读取它吗?”
- 只需停止操作,等待用户修复配置
Basic Workflow
基本工作流程
-
Identify the input source:
- User provides URL: Use the parameter
--file-url - User provides local file path: Use the parameter
--file-path - User uploads image: Save it first, then use
--file-path
Input type note:- Supported file types depend on the model and endpoint configuration.
- Follow the official endpoint/API documentation for the exact supported formats.
- User provides URL: Use the
-
Execute OCR:bash
python scripts/ocr_caller.py --file-url "URL provided by user" --prettyOr for local files:bashpython scripts/ocr_caller.py --file-path "file path" --prettyDefault behavior: save raw JSON to a temp file:- If is omitted, the script saves automatically under the system temp directory
--output - Default path pattern:
<system-temp>/paddleocr/text-recognition/results/result_<timestamp>_<id>.json - If is provided, it overrides the default temp-file destination
--output - If is provided, JSON is printed to stdout and no file is saved
--stdout - In save mode, the script prints the absolute saved path on stderr:
Result saved to: /absolute/path/... - In default/custom save mode, read and parse the saved JSON file before responding
- Use only when you explicitly want to skip file persistence
--stdout
- If
-
Parse JSON response:
- In default/custom save mode, load JSON from the saved file path shown by the script
- Check the field:
okmeans success,truemeans errorfalse - Extract text: field contains all recognized text
text - If is used, parse the stdout JSON directly
--stdout - Handle errors: If is false, display
okerror.message
-
Present results to user:
- Display extracted text in a readable format
- If the text is empty, the image may contain no text
- In save mode, always tell the user the saved file path and that full raw JSON is available there
-
识别输入源:
- 用户提供URL:使用参数
--file-url - 用户提供本地文件路径:使用参数
--file-path - 用户上传图片:先保存图片,再使用参数
--file-path
输入类型说明:- 支持的文件类型取决于模型和端点配置。
- 请遵循官方端点/API文档查看确切支持的格式。
- 用户提供URL:使用
-
执行OCR:bash
python scripts/ocr_caller.py --file-url "用户提供的URL" --pretty对于本地文件:bashpython scripts/ocr_caller.py --file-path "文件路径" --pretty默认行为:将原始JSON保存到临时文件:- 如果省略参数,脚本会自动保存到系统临时目录下
--output - 默认路径格式:
<系统临时目录>/paddleocr/text-recognition/results/result_<时间戳>_<id>.json - 如果提供参数,会覆盖默认的临时文件路径
--output - 如果提供参数,JSON会打印到标准输出,不会保存文件
--stdout - 在保存模式下,脚本会在标准错误输出中打印绝对保存路径:
Result saved to: /absolute/path/... - 在默认/自定义保存模式下,在回复前需读取并解析保存的JSON文件
- 仅当明确想要跳过文件持久化时才使用
--stdout
- 如果省略
-
解析JSON响应:
- 在默认/自定义保存模式下,从脚本打印的保存文件路径加载JSON
- 检查字段:
ok表示成功,true表示错误false - 提取文本:字段包含所有识别出的文本
text - 如果使用,直接解析标准输出中的JSON
--stdout - 处理错误:如果为
ok,显示false内容error.message
-
向用户展示结果:
- 以易读格式显示提取的文本
- 如果文本为空,说明图片可能不包含任何文本
- 在保存模式下,务必告知用户保存的文件路径,并说明完整的原始JSON可在该路径获取
IMPORTANT: Complete Output Display
重要:完整输出展示
CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.
- The output JSON contains complete output, including full text in field
text - You MUST display the entire content to the user, no matter how long it is
text - Do NOT use phrases like "Here's a summary" or "The text begins with..."
- Do NOT truncate with "..." unless the text truly exceeds reasonable display limits
- The user expects to see ALL the recognized text, not a preview or excerpt
Correct approach:
I've extracted the text from the image. Here's the complete content:
[Display the entire text here]Incorrect approach:
I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)关键要求:始终向用户展示完整的识别文本。不得截断或总结OCR结果。
- 输出JSON包含完整输出,字段中是完整文本
text - 必须向用户展示整个内容,无论长度如何
text - 不得使用“以下是摘要”或“文本开头为...”之类的表述
- 除非文本确实超出合理显示限制,否则不得用“...”截断
- 用户期望看到所有识别出的文本,而非预览或摘录
正确做法:
我已从图片中提取出文本。以下是完整内容:
[在此处展示全部文本]错误做法:
我在图片中发现了一些文本。以下是预览:
"敏捷的棕色狐狸..."(已截断)Usage Examples
使用示例
Example 1: URL OCR:
bash
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --prettyExample 2: Local File OCR:
bash
python scripts/ocr_caller.py --file-path "./document.pdf" --prettyExample 3: OCR With Explicit File Type:
bash
python scripts/ocr_caller.py --file-url "https://example.com/input" --file-type 1 --prettyExample 4: Print JSON Without Saving:
bash
python scripts/ocr_caller.py --file-url "https://example.com/input" --stdout --pretty示例1:URL OCR:
bash
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty示例2:本地文件OCR:
bash
python scripts/ocr_caller.py --file-path "./document.pdf" --pretty示例3:指定文件类型的OCR:
bash
python scripts/ocr_caller.py --file-url "https://example.com/input" --file-type 1 --pretty示例4:打印JSON而不保存:
bash
python scripts/ocr_caller.py --file-url "https://example.com/input" --stdout --prettyUnderstanding the Output
理解输出结果
The output JSON structure is as follows:
json
{
"ok": true,
"text": "All recognized text here...",
"result": { ... },
"error": null
}Key fields:
- :
okfor success,truefor errorfalse - : Complete recognized text
text - : Raw API response (for debugging)
result - : Error details if
erroris falseok
Raw result location (default): the temp-file path printed by the script on stderr
输出JSON结构如下:
json
{
"ok": true,
"text": "所有识别出的文本内容...",
"result": { ... },
"error": null
}关键字段:
- :
ok表示成功,true表示错误false - :完整的识别文本
text - :原始API响应(用于调试)
result - :如果
error为ok,则包含错误详情false
原始结果位置(默认):脚本在标准错误输出中打印的临时文件路径
First-Time Configuration
首次配置
You can generally assume that the required environment variables have already been configured. Only when an OCR task fails should you analyze the error message to determine whether it is caused by a configuration issue. If it is indeed a configuration problem, you should notify the user to fix it.
When API is not configured:
The error will show:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.comConfiguration workflow:
-
Show the exact error message to the user (including the URL).
-
Guide the user to configure securely:
- Recommend configuring through the host application's standard method (e.g., settings file, environment variable UI) rather than pasting credentials in chat.
- List the required environment variables:
- PADDLEOCR_OCR_API_URL - PADDLEOCR_ACCESS_TOKEN - Optional: PADDLEOCR_OCR_TIMEOUT
-
If the user provides credentials in chat anyway (accept any reasonable format), for example:
PADDLEOCR_OCR_API_URL=https://xxx.paddleocr.com/ocr, PADDLEOCR_ACCESS_TOKEN=abc123...Here's my API: https://xxx and token: abc123- Copy-pasted code format
- Any other reasonable format
- Security note: Warn the user that credentials shared in chat may be stored in conversation history. Recommend setting them through the host application's configuration instead when possible.
Then parse and validate the values:- Extract (look for URLs with
PADDLEOCR_OCR_API_URLor similar)paddleocr.com - Confirm is a full endpoint ending with
PADDLEOCR_OCR_API_URL/ocr - Extract (long alphanumeric string, usually 40+ chars)
PADDLEOCR_ACCESS_TOKEN
-
Ask the user to confirm the environment is configured.
-
Retry only after confirmation:
- Once the user confirms the environment variables are available, retry the original OCR task
通常可以假设所需的环境变量已配置完成。只有当OCR任务失败时,才需要分析错误消息以确定是否由配置问题导致。如果确实是配置问题,应通知用户进行修复。
当API未配置时:
错误信息将显示:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com配置流程:
-
向用户显示确切的错误消息(包含URL)。
-
指导用户安全配置:
- 建议通过宿主应用的标准方式(如设置文件、环境变量UI)进行配置,而非在聊天中粘贴凭据。
- 列出所需的环境变量:
- PADDLEOCR_OCR_API_URL - PADDLEOCR_ACCESS_TOKEN - 可选:PADDLEOCR_OCR_TIMEOUT
-
若用户仍在聊天中提供凭据(接受任何合理格式),例如:
PADDLEOCR_OCR_API_URL=https://xxx.paddleocr.com/ocr, PADDLEOCR_ACCESS_TOKEN=abc123...这是我的API:https://xxx 和 token:abc123- 复制粘贴的代码格式
- 任何其他合理格式
- 安全提示:警告用户在聊天中共享的凭据可能会存储在对话历史中。建议尽可能通过宿主应用的配置进行设置。
然后解析并验证值:- 提取(查找包含
PADDLEOCR_OCR_API_URL或类似域名的URL)paddleocr.com - 确认是完整的、以
PADDLEOCR_OCR_API_URL结尾的端点/ocr - 提取(长字母数字字符串,通常40个字符以上)
PADDLEOCR_ACCESS_TOKEN
-
请用户确认环境已配置完成。
-
仅在确认后重试:
- 一旦用户确认环境变量已配置好,重试原始的OCR任务
Error Handling
错误处理
Authentication failed:
API_ERROR: Authentication failed (403). Check your token.- Token is invalid, reconfigure with correct credentials
Quota exceeded:
API_ERROR: API rate limit exceeded (429)- Daily API quota exhausted, inform user to wait or upgrade
No text detected:
- field is empty
text - Image may be blank, corrupted, or contain no text
认证失败:
API_ERROR: Authentication failed (403). Check your token.- Token无效,请使用正确的凭据重新配置
配额超出:
API_ERROR: API rate limit exceeded (429)- 每日API配额已用尽,告知用户等待或升级服务
未检测到文本:
- 字段为空
text - 图片可能是空白、损坏或不包含任何文本
Tips for Better Results
提升识别效果的小贴士
If recognition quality is poor, suggest:
- Check if the image is clear and contains text
- Provide a higher resolution image if possible
如果识别质量不佳,建议:
- 检查图片是否清晰且包含文本
- 尽可能提供更高分辨率的图片
Reference Documentation
参考文档
For in-depth understanding of the OCR system, refer to:
- - Output format specification
references/output_schema.md
Note: Model version, capabilities, and supported file formats are determined by your API endpoint () and its official API documentation.PADDLEOCR_OCR_API_URL
如需深入了解OCR系统,请参考:
- - 输出格式规范
references/output_schema.md
注意:模型版本、功能和支持的文件格式由你的API端点()及其官方API文档决定。PADDLEOCR_OCR_API_URL
Testing the Skill
测试技能
To verify the skill is working properly:
bash
python scripts/smoke_test.pyThis tests configuration and API connectivity.
要验证技能是否正常工作:
bash
python scripts/smoke_test.py此脚本会测试配置和API连通性。