Loading...
Loading...
Found 83 Skills
Extract vendor, date, items, amounts, and total from receipt images using OCR and pattern matching with structured JSON output.
Parse PDF into Markdown/JSON/DOCX using MinerU API. Extract text, tables, formulas with OCR support. Use when converting PDF documents, extracting content from scanned papers, or batch processing PDF files.
Implement computer vision features including text recognition (OCR), face detection, barcode scanning, image segmentation, object tracking, and document scanning in iOS apps. Covers both the modern Swift-native Vision API (iOS 16+) and legacy VNRequest patterns, VisionKit DataScannerViewController for live camera scanning, and VNCoreMLRequest for custom model inference. Use when adding OCR, barcode scanning, face detection, or custom Core ML model inference with Vision.
Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design mockup.
Extract contact information from business card images using OCR - name, company, email, phone, address.
Convert PDFs to Markdown using Mistral OCR API with image extraction. Use when you need to extract structured text and images from PDFs, especially for scanned documents or documents with complex formatting. Outputs Markdown with embedded images.
Analyzes and processes images using Claude's vision capabilities. Supports OCR, image classification, diagram comparison, chart analysis, visual Q&A, and more. Use when users need to understand, extract, or analyze visual content.
Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.
图片分析与识别,可分析本地图片、网络图片、视频、文件。适用于 OCR、物体识别、场景理解等。当用户发送图片或要求分析图片时必须使用此技能。
[QwenCloud] Understand images and videos with Qwen vision models. TRIGGER when: user wants to analyze, describe, or extract information from images or videos, OCR text extraction, chart/table reading, visual reasoning, multi-image comparison, screenshot understanding, video comprehension, or explicitly invokes this skill by name (e.g. use qwencloud-vision). DO NOT TRIGGER when: user wants to generate/create images (use qwencloud-image-generation), generate videos (use qwencloud-video-generation), text-only tasks without visual input, or non-Qwen vision tasks.
OCR Web Service integration. Manage Documents. Use when the user wants to interact with OCR Web Service data.
Guide for extracting code or pseudocode from images using OCR and implementing it correctly. This skill should be used when tasks involve reading code, pseudocode, or algorithms from images (PNG, JPG, screenshots) and executing or implementing the extracted logic.