Loading...
Loading...
Found 102 Skills
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.
Comprehensive epigenomics and gene regulation analysis integrating ENCODE functional genomics data, JASPAR transcription factor binding motifs, SCREEN cis-regulatory elements, ReMap TF binding sites, RegulomeDB variant regulatory scoring, 4D Nucleome chromatin conformation, and Ensembl regulatory features. Performs regulatory element cataloging, transcription factor analysis, variant regulatory impact scoring, chromatin conformation mapping, and gene-centric regulatory landscape profiling. Use when asked about gene regulation, enhancers, promoters, transcription factor binding, epigenetic modifications, chromatin structure, regulatory variants, or non-coding genome function.
Audio forensics and voice recovery guidelines for CSI-level audio analysis. This skill should be used when recovering voice from low-quality or low-volume audio, enhancing degraded recordings, performing forensic audio analysis, or transcribing difficult audio. Triggers on tasks involving audio enhancement, noise reduction, voice isolation, forensic authentication, or audio transcription.
Vox single-entry voice orchestration skill. Used to complete environment guarding, CLI installation, on-demand model download, ASR transcription, voice cloning, pipeline execution and task troubleshooting through natural language. It is used when users only describe the target without providing specific commands.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
Subtitle generation and burning. Transcription → dictionary correction → review → burning. Trigger words: add subtitles, generate subtitles, subtitles
Transcribe speech to text using the Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing pre-recorded audio files, configuring on-device vs server-based recognition, handling authorization flows, or adopting the new SpeechAnalyzer API (iOS 26+) for modern async/await speech-to-text.
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
ML-powered Karaoke app in Rust using Bevy, WhisperX, and Demucs for stem separation, lyrics transcription, and pitch scoring.
Connect to PAXS AI platform to create meetings, upload recordings, and generate transcriptions and meeting notes. Use this skill when a user wants to transcribe audio, create meeting notes, or interact with the PAXS platform.
Download videos from social media URLs (X/Twitter, YouTube, Instagram, TikTok, etc.) using yt-dlp. Use when saving a video locally, extracting content for transcription, or archiving video references.
Execute TwinMind primary workflow: Meeting transcription and summary generation. Use when implementing meeting capture, building transcription features, or automating meeting documentation. Trigger with phrases like "twinmind transcription workflow", "meeting transcription", "capture meeting with twinmind".