Search Results: audio-transcription

Found 42 Skills

Tools & Utilities0xdarkmatter/claude-mods

markitdown

Convert local documents to Markdown using Microsoft's markitdown CLI. Best for: PDF, Word, Excel, PowerPoint, images (OCR), audio. Can fetch URLs but Jina is faster for web. Triggers on: convert to markdown, read PDF, parse document, extract text from, docx, xlsx, pptx, OCR image, local file.

🇺🇸|EnglishTranslated

AI & Machine Learningsamhvw8/dot-claude

ai-multimodal

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

🇺🇸|EnglishTranslated

6 scripts/Attention

AI & Machine Learningassemblyai/assemblyai-ski...

assemblyai

Use when implementing speech-to-text, audio transcription, real-time streaming STT, audio intelligence features, or voice AI using AssemblyAI APIs or SDKs. Use when user mentions AssemblyAI, voice agents, transcription, speaker diarization, PII redaction of audio, LLM Gateway for audio understanding, or applying LLMs to transcripts. Also use when building voice agents with LiveKit or Pipecat that need speech-to-text, or when the user is working with any audio/video processing pipeline that could benefit from transcription, even if they don't mention AssemblyAI by name.

🇺🇸|EnglishTranslated

AI & Machine Learningsteipete/clawdis

openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningagntswrm/agent-media

audio-transcribe

Transcribes audio to text with timestamps and optional speaker identification. Use when you need to convert speech to text, create subtitles, transcribe meetings, or process voice recordings.

🇺🇸|EnglishTranslated

Tools & Utilitiessickn33/antigravity-aweso...

audio-transcriber

Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration

🇺🇸|EnglishTranslated

3 scripts/Attention

AI & Machine Learningkouko/monkey-knowledge-yo...

mk-youtube-audio-transcribe

Transcribe audio to text using local whisper.cpp. Use when user wants to convert audio/video to text, get transcription, or speech-to-text.

🇺🇸|EnglishTranslated

13 scripts/Attention

AI & Machine Learningthinkfleetai/thinkfleet-e...

local-whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

🇺🇸|EnglishTranslated

AI & Machine Learningsarvamai/skills

speech-to-text

Transcribe audio to text using Sarvam AI's Saaras model. Handles speech recognition, transcription, and voice interfaces for 23 Indian languages. Supports 5 output modes, auto language detection, WebSocket streaming, and batch diarization. Use when converting speech to text or building voice-enabled apps.

🇺🇸|EnglishTranslated

AI & Machine Learningcinience/alicloud-skills

aliyun-qwen-asr

Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingcoroboros/agent-skills

markitdown

Convert any document to Markdown with Microsoft's `markitdown` CLI — PDF, Word, Excel, PowerPoint, HTML, CSV, JSON, XML, ZIP, EPub, images (OCR/EXIF), audio (transcription), and YouTube URLs. Use whenever the user wants to extract text from a binary document, transcribe audio, OCR an image, scrape a YouTube transcript, or pre-process a file for an LLM context window — even when they just say "convert this pdf", "what's in this docx", "transcribe this mp3", or "get the text out of this".

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-audio-asr

Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

🇺🇸|EnglishTranslated

1 scripts/Checked