Search Results: ocr

Found 169 Skills

AI & Machine Learningmugnimaestra/video-frames...

video-frames

Extract frames from video files using ffmpeg for AI/LLM analysis. Use when (1) the user asks to analyze, describe, or summarize a video file, (2) the user wants to extract frames or screenshots from a video, (3) the user provides a video file (.mp4, .mov, .avi, .mkv, .webm, etc.) and asks questions about its visual content, (4) the user wants to identify scenes, objects, or events in a video, (5) the user wants timestamps overlaid on extracted frames for temporal reference. Converts video into JPEG frames that can be attached to LLM prompts as images. Requires ffmpeg on PATH. Supports scene-change detection, model-aware optimization (Claude/OpenAI/Gemini), quality presets (efficient/balanced/detailed/ocr), grayscale and high-contrast OCR mode, and automatic FPS calculation via --max-frames.

🇺🇸|EnglishTranslated

1 scripts/Attention

Automationmbruhler/claude-orchestra...

orchestration:creating-workflows

Use when user says "create workflow", "create a workflow", "design workflow", "orchestrate", "automate multiple steps", "coordinate agents", "multi-agent workflow". Creates orchestration workflows from natural language using Socratic questioning to plan multi-agent workflows with visualization.

🇺🇸|EnglishTranslated

AI & Machine Learningmicrosoft/github-copilot-...

azure-ai

Use for Azure AI: Search, Speech, OpenAI, Document Intelligence. Helps with search, vector/hybrid search, speech-to-text, text-to-speech, transcription, OCR. USE FOR: AI Search, query search, vector search, hybrid search, semantic search, speech-to-text, text-to-speech, transcribe, OCR, convert text to speech. DO NOT USE FOR: Function apps/Functions (use azure-functions), databases (azure-postgres/azure-kusto), general Azure resources.

🇺🇸|EnglishTranslated

154.8k

Testing & QAsunfmin/autocraft

journey-loop

Orchestrates a continuous journey-builder → refine → restart loop. Runs journey-builder and refine-journey sequentially, improving the skill each iteration. Loops until all spec requirements are covered by journeys and the score reaches 95%.

🇺🇸|EnglishTranslated

Testing & QAsunfmin/autocraft

journey-builder

Build and test the longest uncovered user journey from spec.md. Reads the product spec, checks existing journeys, picks the longest untested path, writes a UI test with screenshots at every step, then runs 3 polish rounds (testability → refactor UI test → UI review) until everything is clean. Use when the user says "next journey", "add journey", "test the next flow", "journey builder", or "cover more user paths".

🇺🇸|EnglishTranslated

Testing & QAsunfmin/autocraft

refine-journey

Evaluate the output of a journey-builder run, identify instruction gaps, and edit the project root AGENTS.md (or add pitfalls to the gist) to fix those gaps. Does NOT modify the journey-builder skill itself.

🇺🇸|EnglishTranslated

AI & Machine Learningrenocrypt/latex-arxiv-ski...

collaborating-with-claude

Use the Claude Code CLI to consult Claude and delegate coding tasks for prototyping, debugging, and code review. Supports multi-turn sessions via SESSION_ID. Optimized for low-token, file/line-based handoff.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningmicrosoft/agent-skills

azure-ai-document-intelligence-dotnet

Azure AI Document Intelligence SDK for .NET. Extract text, tables, and structured data from documents using prebuilt and custom models. Use for invoice processing, receipt extraction, ID document analysis, and custom document models. Triggers: "Document Intelligence", "DocumentIntelligenceClient", "form recognizer", "invoice extraction", "receipt OCR", "document analysis .NET".

🇺🇸|EnglishTranslated

Document Processingdavila7/claude-code-templ...

markitdown

Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learningzrt-ai-lab/opencode-skill...

image-service

Multimodal image processing skill, supporting text-to-image, image-to-image, image-to-text, long image stitching, marketing material packs, product design images, element disassembly diagrams, and social media image sets. Triggered when the user mentions keywords such as "draw", "generate image", "draw XX", "image processing", "image-to-image", "OCR", "image recognition", "stitch long image", "infographic", "illustration", "product image", "material pack", "marketing material", "detail page", "e-commerce image", "design drawing", "exploded view", "disassembly", "image set", "nine-grid", etc. Note: If the user requests a video (including illustrations + voiceover), use the video-creator skill instead.

🇨🇳|ChineseTranslated

5 scripts/Checked

Document Processingintellectronica/agent-ski...

markdown-converter

Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.

🇺🇸|EnglishTranslated

AI & Machine Learningwyattowalsh/agents

host-panel

Host simulated panel discussions and debates among AI-simulated domain experts. Supports roundtable, Oxford-style, and Socratic formats with heterogeneous expert personas, anti-groupthink mechanisms, and structured synthesis. Use when exploring complex topics from multiple expert perspectives, testing argument strength, academic brainstorming, or understanding trade-offs in decisions. NOT for one-on-one conversations, simple Q&A, or real-time debates.

🇺🇸|EnglishTranslated