Search Results: speech-recognition

Found 14 Skills

Mobile Developmentdpearson2699/swift-ios-sk...

speech-recognition

Transcribe speech to text using the Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing pre-recorded audio files, configuring on-device vs server-based recognition, handling authorization flows, or adopting the new SpeechAnalyzer API (iOS 26+) for modern async/await speech-to-text.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

🇺🇸|EnglishTranslated

AI & Machine Learningqodex-ai/ai-agent-skills

voice-ai-integration

Build voice-enabled AI applications with speech recognition, text-to-speech, and voice-based interactions. Supports multiple voice providers and real-time processing. Use when creating voice assistants, voice-controlled applications, audio interfaces, or hands-free AI systems.

🇺🇸|EnglishTranslated

4 scripts/Checked

AI & Machine Learningbytedance/agentkit-sample...

byted-voice-to-text

Automatic Speech Recognition (ASR). Uses Volcano Engine BigModel ASR for speech recognition, with two available modes: Express Edition (≤2h/100MB, synchronous fast response) and Standard Edition (≤5h, asynchronous recognition). It supports Feishu voice messages, local audio files and audio URLs. Use this skill when you receive voice messages or audio attachments (.ogg/.mp3/.wav).

🇨🇳|ChineseTranslated

5 scripts/Attention

Frontend Developmentsyncfusion/angular-ui-com...

syncfusion-angular-speech-to-text

Implement the Syncfusion Angular SpeechToText component. Use this skill for real-time speech-to-text conversion with text transcripts, custom button appearance and tooltips, recognition event handling, multiple language support with localization and RTL, error handling, and security best practices for microphone access and data transmission.

🇺🇸|EnglishTranslated

Frontend Developmentdaffy0208/ai-dev-standard...

voice-interface-builder

Expert in building voice interfaces, speech recognition, and text-to-speech systems

🇺🇸|EnglishTranslated

Tools & Utilitiessugarforever/01coder-agen...

subtitle-correction

Correct subtitle files (.srt) generated from speech recognition. Use when the user uploads subtitle files and asks to correct, fix, or proofread subtitles, especially for technical content like programming tutorials, AI/ML courses, or any content with domain-specific terminology. Supports Chinese and English subtitles with intelligent error detection and correction while preserving exact timeline information.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningbytedance/agentkit-sample...

byted-las-asr-pro

ASR (Automatic Speech Recognition) — enhanced speech-to-text built on Doubao large model, with audio preprocessing, denoising, and extended analysis capabilities. Async API. Choose this skill when: - Input is a video file (mp4/mov/mkv) — auto-extracts audio track - Audio needs denoising before recognition - File exceeds 512MB or 5 hours (no size limit) - Audio source is a TOS internal path (tos://bucket/key) - Need structured JSON output with timestamped utterances and metadata - Need speaker diarization, emotion/gender detection, speech rate, or sensitive word filtering Supports 99 languages, multiple formats (wav/mp3/m4a/aac/flac/ogg/mp4/mov/mkv), and auto language detection.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningtavus-engineering/tavus-s...

tavus-cvi-persona

Configure Tavus CVI personas with custom LLMs, TTS engines, perception, and turn-taking. Use when customizing AI behavior, bringing your own LLM, configuring voice/TTS, enabling vision with Raven, or tuning conversation flow with Sparrow.

🇺🇸|EnglishTranslated

AI & Machine Learningmarswaveai/skills

asr

Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".

🇺🇸|EnglishTranslated

Frontend Developmentsyncfusion/blazor-ui-comp...

syncfusion-blazor-speech-to-text

Implement speech-to-text voice input in Blazor applications using Syncfusion SpeechToText component. ALWAYS use this when users need voice input, speech recognition, audio transcription, or implementing the SpeechToText component in Blazor. Trigger for Syncfusion.Blazor.Inputs, microphone input, voice-to-text conversion, language support, transcript binding, listening states, error handling, browser speech API, or any speech recognition requirements.

🇺🇸|EnglishTranslated

AI & Machine Learningzainhas/togetherai-skills

together-audio

Text-to-speech (TTS) and speech-to-text (STT) via Together AI. TTS models include Orpheus, Kokoro, Cartesia Sonic, Rime, MiniMax with REST, streaming, and WebSocket support. STT models include Whisper and Voxtral. Use when users need voice synthesis, audio generation, speech recognition, transcription, TTS, STT, or real-time voice applications.

🇺🇸|EnglishTranslated

2 scripts/Checked