paddleocr-text-recognition
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePaddleOCR Text Recognition Skill
PaddleOCR文本识别技能
When to Use This Skill
适用场景
Invoke this skill in the following situations:
- Extract text from images (screenshots, photos, scans, charts)
- Read text from PDF or document images
- Perform OCR on any visual content containing text
- Parse structured documents (invoices, receipts, forms, tables)
- Recognize text in photos taken by mobile phones
- Extract text from URLs pointing to images or PDFs
Do not use this skill in the following situations:
- Plain text files that can be read directly with the Read tool
- Code files or markdown documents
- Tasks that do not involve image-to-text conversion
在以下场景中调用此技能:
- 从图片(截图、照片、扫描件、图表)中提取文本
- 读取PDF或文档图片中的文本
- 对包含文本的任何视觉内容执行OCR识别
- 解析结构化文档(发票、收据、表单、表格)
- 识别手机拍摄照片中的文本
- 从指向图片或PDF的URL中提取文本
请勿在以下场景中使用此技能:
- 可直接使用Read工具读取的纯文本文件
- 代码文件或Markdown文档
- 不涉及图片转文本转换的任务
How to Use This Skill
使用方法
MANDATORY RESTRICTIONS - DO NOT VIOLATE
- ONLY use PaddleOCR Text Recognition API - Execute the script
python scripts/ocr_caller.py - NEVER use Claude's built-in vision - Do NOT read images yourself
- NEVER offer alternatives - Do NOT suggest "I can try to read it" or similar
- IF API fails - Display the error message and STOP immediately
- NO fallback methods - Do NOT attempt OCR any other way
If the script execution fails (API not configured, network error, etc.):
- Show the error message to the user
- Do NOT offer to help using your vision capabilities
- Do NOT ask "Would you like me to try reading it?"
- Simply stop and wait for user to fix the configuration
强制限制 - 请勿违反
- 仅使用PaddleOCR Text Recognition API - 执行脚本
python scripts/ocr_caller.py - 绝不能使用Claude内置的视觉功能 - 请勿自行读取图片
- 绝不能提供替代方案 - 请勿提出“我可以尝试读取它”或类似表述
- 若API调用失败 - 显示错误信息并立即停止
- 无 fallback 方案 - 请勿尝试任何其他OCR方式
如果脚本执行失败(API未配置、网络错误等):
- 向用户显示错误信息
- 请勿提出使用自身视觉功能提供帮助
- 请勿询问“你想让我尝试读取它吗?”
- 只需停止操作,等待用户修复配置
Basic Workflow
基本工作流程
-
Identify the input source:
- User provides URL: Use the parameter
--file-url - User provides local file path: Use the parameter
--file-path - User uploads image: Save it first, then use
--file-path
- User provides URL: Use the
-
Execute OCR:bash
python scripts/ocr_caller.py --file-url "URL provided by user" --prettyOr for local files:bashpython scripts/ocr_caller.py --file-path "file path" --prettySave result to file (recommended):bashpython scripts/ocr_caller.py --file-url "URL" --output result.json --pretty -
Parse JSON response:
- Check the field:
okmeans success,truemeans errorfalse - Extract text: field contains all recognized text
text - Handle errors: If is false, display
okerror.message
- Check the
-
Present results to user:
- Display extracted text in a readable format
- If the text is empty, the image may contain no text
-
确定输入源:
- 用户提供URL:使用参数
--file-url - 用户提供本地文件路径:使用参数
--file-path - 用户上传图片:先保存图片,再使用参数
--file-path
- 用户提供URL:使用
-
执行OCR识别:bash
python scripts/ocr_caller.py --file-url "用户提供的URL" --pretty针对本地文件:bashpython scripts/ocr_caller.py --file-path "文件路径" --pretty推荐:将结果保存到文件:bashpython scripts/ocr_caller.py --file-url "URL" --output result.json --pretty -
解析JSON响应:
- 检查字段:
ok表示成功,true表示错误false - 提取文本:字段包含所有识别出的文本
text - 处理错误:如果为false,显示
ok内容error.message
- 检查
-
向用户展示结果:
- 以易读格式显示提取的文本
- 如果文本为空,说明图片可能不包含任何文本
IMPORTANT: Complete Output Display
重要提示:完整输出展示
CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.
- The script returns the full JSON with complete text content in field
text - You MUST display the entire content to the user, no matter how long it is
text - Do NOT use phrases like "Here's a summary" or "The text begins with..."
- Do NOT truncate with "..." unless the text truly exceeds reasonable display limits
- The user expects to see ALL the recognized text, not a preview or excerpt
Correct approach:
I've extracted the text from the image. Here's the complete content:
[Display the entire text here]Incorrect approach:
I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)关键要求:始终向用户显示完整的识别文本。请勿截断或总结OCR结果。
- 脚本返回的完整JSON中,字段包含完整文本内容
text - 无论文本多长,你都必须向用户显示整个内容
text - 请勿使用“以下是摘要”或“文本开头为...”等表述
- 除非文本确实超出合理显示限制,否则请勿用“...”截断
- 用户期望看到所有识别出的文本,而非预览或节选
正确示例:
我已提取图片中的文本,以下是完整内容:
[在此显示全部文本]错误示例:
我在图片中发现一些文本,以下是预览:
"敏捷的棕色狐狸..."(已截断)Usage Examples
使用示例
URL OCR:
bash
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --prettyLocal File OCR:
bash
python scripts/ocr_caller.py --file-path "./document.pdf" --prettyURL识别:
bash
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty本地文件识别:
bash
python scripts/ocr_caller.py --file-path "./document.pdf" --prettyUnderstanding the Output
输出结果说明
The script outputs JSON structure as follows:
json
{
"ok": true,
"text": "All recognized text here...",
"result": { ... },
"error": null
}Key fields:
- :
okfor success,truefor errorfalse - : Complete recognized text
text - : Raw API response (for debugging)
result - : Error details if
erroris falseok
脚本输出的JSON结构如下:
json
{
"ok": true,
"text": "所有识别出的文本内容...",
"result": { ... },
"error": null
}关键字段说明:
- :
ok表示成功,true表示错误false - :完整的识别文本
text - :API原始响应(用于调试)
result - :若
error为false,此处显示错误详情ok
First-Time Configuration
首次配置
When API is not configured:
The error will show:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.comConfiguration workflow:
-
Show the exact error message to user (including the URL)
-
Tell user to provide credentials:
Please visit the URL above to get your API_URL and TOKEN. Once you have them, send them to me and I'll configure it automatically. -
When user provides credentials (accept any format):
API_URL=https://xxx.paddleocr.com/ocr, TOKEN=abc123...Here's my API: https://xxx and token: abc123- Copy-pasted code format
- Any other reasonable format
-
Parse credentials from user's message:
- Extract API_URL value (look for URLs with paddleocr.com or similar)
- Extract TOKEN value (long alphanumeric string, usually 40+ chars)
-
Configure automatically:bash
python scripts/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN" -
If configuration succeeds:
- Inform user: "Configuration complete! Running OCR now..."
- Retry the original OCR task
-
If configuration fails:
- Show the error
- Ask user to verify the credentials
当API未配置时:
会显示以下错误:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com配置流程:
-
向用户显示完整错误信息(包含上述URL)
-
告知用户提供凭证:
请访问上述URL获取你的API_URL和TOKEN。 获取后将其发送给我,我会自动完成配置。 -
当用户提供凭证时(支持任何格式):
API_URL=https://xxx.paddleocr.com/ocr, TOKEN=abc123...这是我的API:https://xxx 和 token:abc123- 复制粘贴的代码格式
- 任何其他合理格式
-
从用户消息中解析凭证:
- 提取API_URL值(查找包含paddleocr.com或类似域名的URL)
- 提取TOKEN值(长字符串,通常为40个字符以上的字母数字组合)
-
自动配置:bash
python scripts/configure.py --api-url "解析出的URL" --token "解析出的TOKEN" -
若配置成功:
- 告知用户:"配置完成!正在运行OCR识别..."
- 重试原始OCR任务
-
若配置失败:
- 显示错误信息
- 请用户验证凭证是否正确
Error Handling
错误处理
Authentication failed:
API_ERROR: Authentication failed (403). Check your token.- Token is invalid, reconfigure with correct credentials
Quota exceeded:
API_ERROR: API rate limit exceeded (429)- Daily API quota exhausted, inform user to wait or upgrade
No text detected:
- field is empty
text - Image may be blank, corrupted, or contain no text
认证失败:
API_ERROR: Authentication failed (403). Check your token.- Token无效,请使用正确的凭证重新配置
配额超限:
API_ERROR: API rate limit exceeded (429)- 每日API配额已用尽,请告知用户等待或升级服务
未检测到文本:
- 字段为空
text - 图片可能为空白、损坏或不包含任何文本
Tips for Better Results
提升识别效果的技巧
If recognition quality is poor, suggest:
- Check if the image is clear and contains text
- Provide a higher resolution image if possible
如果识别质量不佳,建议:
- 检查图片是否清晰且包含文本
- 尽可能提供更高分辨率的图片
Reference Documentation
参考文档
For in-depth understanding of the OCR system, refer to:
- - Output format specification
references/output_schema.md - - Provider API contract
references/provider_api.md
Note: Model version and capabilities are determined by your API endpoint (PADDLEOCR_OCR_API_URL).
如需深入了解OCR系统,请参考:
- - 输出格式规范
references/output_schema.md - - 服务商API协议
references/provider_api.md
注意:模型版本和功能由你的API端点(PADDLEOCR_OCR_API_URL)决定。
Testing the Skill
技能测试
To verify the skill is working properly:
bash
python scripts/smoke_test.pyThis tests configuration and API connectivity.
如需验证技能是否正常工作:
bash
python scripts/smoke_test.py此测试会检查配置和API连接性。