Search Results: speech-to-text

Found 59 Skills

AI & Machine Learningdaymade/claude-code-skill...

transcript-fixer

Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.

🇺🇸|EnglishTranslated

51 scripts/Checked

AI & Machine Learningsteipete/clawdis

openai-whisper

Local speech-to-text with the Whisper CLI (no API key).

🇺🇸|EnglishTranslated

AI & Machine Learningcat-xierluo/legal-skills

funasr-transcribe

Use local FunASR service to transcribe audio or video files into timestamped Markdown files, supporting common formats such as mp4, mov, mp3, wav, m4a, etc. This skill should be used when users need speech-to-text conversion, meeting minutes, video subtitles, or podcast transcription.

🇨🇳|ChineseTranslated

4 scripts/Attention

Document Processingrookie-ricardo/erduo-skil...

transcript-polisher

Refine speech transcription texts (interviews, speeches, podcasts, meetings) into more readable article paragraphs. Trigger this skill when users mention terms like "subtitle refinement", "transcript polish", "subtitle polishing", "organize video subtitles into articles", "interview text organization", processing interview records, transcription text optimization, speech-to-text organization, or when they need to organize long dialogue/speech texts into readable articles. It is suitable for organizing transcription texts of solo speeches or multi-person conversations, requiring the retention of original sentences and words, and rejecting high-level generalization. This skill should also be triggered even if users only say "help me organize this text" and attach obviously colloquial text.

🇨🇳|ChineseTranslated

AI & Machine Learningtool-belt/skills

elevenlabs-stt

ElevenLabs speech-to-text with Scribe models and forced alignment via inference.sh CLI. Models: Scribe v1/v2 (98%+ accuracy, 90+ languages). Capabilities: transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, subtitle generation. Use for: meeting transcription, subtitles, podcast transcripts, lip-sync timing, karaoke. Triggers: elevenlabs stt, elevenlabs transcription, scribe, elevenlabs speech to text, forced alignment, word alignment, subtitle timing, diarization, speaker identification, audio event detection, eleven labs transcribe

🇺🇸|EnglishTranslated

AI & Machine Learningcharleswiltgen/axiom

axiom-ios-ml

Use when deploying ANY machine learning model on-device, converting models to CoreML, compressing models, or implementing speech-to-text. Covers CoreML conversion, MLTensor, model compression (quantization/palettization/pruning), stateful models, KV-cache, multi-function models, async prediction, SpeechAnalyzer, SpeechTranscriber.

🇺🇸|EnglishTranslated

AI & Machine Learningmaxgent-ai/maxgent-plugin

audio-transcribe

Speech-to-text transcription using Whisper with word-level timestamps. Use when users ask to transcribe audio or video to text, generate subtitles, or recognize speech.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningpostplusai/postplus-skill...

audio-transcription

Transcribe local or remote audio into durable text and timestamp artifacts using hosted Whisper models. Use this when the job is speech-to-text from audio files and you need request/response persistence, optional timestamps, and subtitle-ready outputs.

🇺🇸|EnglishTranslated

11 scripts/Attention

AI & Machine Learningthinkfleetai/thinkfleet-e...

local-whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

🇺🇸|EnglishTranslated

AI & Machine Learningmodelslab/skills

modelslab-audio-generation

Generate speech, music, and sound effects using ModelsLab's v7 Voice API. Supports text-to-speech, speech-to-text, speech-to-speech, music generation, sound effects, dubbing, song extension, and song inpainting via ElevenLabs and Inworld models.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

asr-transcribe-to-text

Transcribe audio and video files to text using a remote ASR service (Qwen3-ASR or OpenAI-compatible endpoint). Extracts audio from video, sends to configurable ASR endpoint, outputs clean text. Use when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字, or has a meeting recording, lecture, interview, or screen recording to transcribe.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningbadlogic/pi-skills

transcribe

Speech-to-text transcription using Groq Whisper API. Supports m4a, mp3, wav, ogg, flac, webm.

🇺🇸|EnglishTranslated

1 scripts/Checked