Loading...
Loading...
Found 118 Skills
Process media files (video, audio, images, documents) using Transloadit. Use when asked to encode video to HLS/MP4, generate thumbnails, resize or watermark images, extract audio, concatenate clips, add subtitles, OCR documents, or run any media processing pipeline. Covers 86+ processing robots for file transformation at scale.
PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music from text descriptions, create sound effects, or perform melody-conditioned music generation.
dontbesilent Execution Diagnosis. Diagnose the real reason behind your 'know what to do but fail to act' using the Adlerian psychology framework. Triggers: /dbs-unblock, /self-check, 'I know what to do but can't do it', 'why do I always procrastinate' Execution block diagnosis using Adlerian psychology framework. Trigger: /dbs-unblock, "I know what to do but can't do it", "why do I procrastinate"
Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".
Socratic questioning protocol + user communication. MANDATORY for complex requests, new features, or unclear requirements. Includes progress reporting and error handling.
MCP server with 39 tools for Word, Excel, PowerPoint, PDF, OCR operations
Guides structured ideation through Socratic questioning to explore problems, opportunities, and solutions. Use when brainstorming features, exploring use cases, or thinking through new ideas.
Extract tables from PDFs and images to CSV or Excel. Support for scanned documents with OCR, multi-page PDFs, and complex table structures.
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.
Implement computer vision features including text recognition (OCR), face detection, barcode scanning, image segmentation, object tracking, and document scanning in iOS apps. Covers both the modern Swift-native Vision API (iOS 16+) and legacy VNRequest patterns, VisionKit DataScannerViewController for live camera scanning, and VNCoreMLRequest for custom model inference. Use when adding OCR, barcode scanning, face detection, or custom Core ML model inference with Vision.
Extract text from PDFs for LLM consumption. Use when processing PDFs for RAG, document analysis, or text extraction. Supports API services (Mistral OCR) and local tools (PyMuPDF, pdfplumber). Handles text-based PDFs, tables, and scanned documents with OCR.
Guide learning and deep understanding through proven methodologies (Socratic, Feynman, Problem-Based). Use when user says "help me understand", "teach me", "explain this", "learn about", "socratic", "feynman", "problem-based", "I don't understand", "confused about", "why does", or wants to truly grasp a concept.