Search Results: ocr

Found 169 Skills

Backend Developmentberget1411/enable-banking...

enable-banking-sweden

Implement and troubleshoot Sweden-specific Enable Banking behavior for Swedish ASPSPs, BankID/Mobile BankID SCA, personnummer/Swedish SSN handling, redirect and decoupled authentication, Swedish domestic SEK payments, SEPA EUR payments, Bankgirot/OCR/remittance rules, Swedish business account authorization, sandbox availability, and ASPSP-specific quirks for Swedbank, SEB, Handelsbanken, Nordea, Länsförsäkringar Bank, Danske Bank, and American Express. Use when Codex needs country-specific Open Banking guidance for Sweden.

🇺🇸|EnglishTranslated

AI & Machine Learningbinhmuc/autobot-review

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.

🇺🇸|EnglishTranslated

7 scripts/Attention

AI & Machine Learninged3dai/ed3d-plugins

brainstorming

Use when creating or developing anything, before writing code or implementation plans - refines rough ideas into fully-formed designs through structured Socratic questioning, alternative exploration, and incremental validation

🇺🇸|EnglishTranslated

Code Qualityposit-dev/skills

critical-code-reviewer

Conduct rigorous, adversarial code reviews with zero tolerance for mediocrity. Use when users ask to "critically review" my code or a PR, "critique my code", "find issues in my code", or "what's wrong with this code". Identifies security holes, lazy patterns, edge case failures, and bad practices across Python, R, JavaScript/TypeScript, SQL, and front-end code. Scrutinizes error handling, type safety, performance, accessibility, and code quality. Provides structured feedback with severity tiers (Blocking, Required, Suggestions) and specific, actionable recommendations.

🇺🇸|EnglishTranslated

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-multimodal-qwen-vl

Understand images with Alibaba Cloud Model Studio Qwen VL models (qwen3-vl-plus/qwen3-vl-flash and latest aliases). Use when building image Q&A, visual analysis, OCR-like extraction, chart/table reading, or screenshot understanding workflows.

🇺🇸|EnglishTranslated

1 scripts/Checked

Project Managementjoelhooks/joelclaw

adr-skill

Create and maintain Architecture Decision Records (ADRs) optimized for agentic coding workflows. Use when you need to propose, write, update, accept/reject, deprecate, or supersede an ADR; bootstrap an adr folder and index; consult existing ADRs before implementing changes; or enforce ADR conventions. This skill uses Socratic questioning to capture intent before drafting, and validates output against an agent-readiness checklist.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learningsamhvw8/dot-claude

ai-multimodal

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

🇺🇸|EnglishTranslated

6 scripts/Attention

Platform Servicesvm0-ai/vm0-skills

kommo

Kommo (formerly amoCRM) API. Use when user mentions "Kommo", "amoCRM", "CRM", or sales pipeline management.

🇺🇸|EnglishTranslated

AI & Machine Learningtensorlakeai/tensorlake-s...

tensorlake

TensorLake SDK for building agentic workflows, sandboxed code execution, and document parsing/extraction. Use when the user mentions tensorlake, or asks about TensorLake APIs/docs/capabilities. Also use when the user is building AI agents or agentic applications that need serverless workflow orchestration (parallel map/reduce DAGs), sandboxed execution of LLM-generated code, or document parsing, structured extraction, and OCR from PDFs/images. Works with any LLM provider (OpenAI, Anthropic), agent framework (LangChain, CrewAI, LlamaIndex), database, or API as the infrastructure layer.

🇺🇸|EnglishTranslated

AI & Machine Learningglebis/claude-skills

vision-bench

Score and compare images using vision LLMs as judges. YAML-defined criteria presets for 11 use cases (text-to-image, photorealism, document OCR, charts, UI, portrait, product, scientific, invoice, alt-text, artistic style). Supports OpenAI, Anthropic, Gemini, Mistral, and OpenRouter as judge providers. Keys auto-decrypted via SOPS + age.

🇺🇸|EnglishTranslated

4 scripts/Checked

Tools & Utilitiesmvanhorn/printing-press-l...

pp-agent-capture

macOS screen capture, window recording, GIF conversion, and agent evidence bundles from the terminal. Built on ScreenCaptureKit for window-level targeting ffmpeg cannot do. Use when the user wants a screenshot of a specific window or app, a screen recording, a GIF conversion, a before/after diff, an evidence bundle for a PR, OCR text from a window, a terminal VHS recording, a Remotion render, or wants to watch a UI for changes. Requires macOS Screen Recording permission on first run.

🇺🇸|EnglishTranslated

Document Processingypares/agent-skills

read-bin-docs

Straightforward text extraction from document files (text-based PDF only for now, no OCR or docx). Use when you just need to read/extract text from binary documents.

🇺🇸|EnglishTranslated

1 scripts/Checked