Search Results: speech-recognition

Found 13 Skills

digital-health-clinical-asr-setup

Stage 1 of Clinical ASR Flywheel. Use when bootstrapping a cycle: NVCF+MW disclosure, NVIDIA_API_KEY check, deps install, TTS+ASR smoke test.

🇺🇸|EnglishTranslated

Frontend Developmentsyncfusion/angular-ui-com...

syncfusion-angular-speech-to-text

Implement the Syncfusion Angular SpeechToText component. Use this skill for real-time speech-to-text conversion with text transcripts, custom button appearance and tooltips, recognition event handling, multiple language support with localization and RTL, error handling, and security best practices for microphone access and data transmission.

🇺🇸|EnglishTranslated

AI & Machine Learningbytedance/agentkit-sample...

byted-las-asr-pro

ASR (Automatic Speech Recognition) — enhanced speech-to-text built on Doubao large model, with audio preprocessing, denoising, and extended analysis capabilities. Async API. Choose this skill when: - Input is a video file (mp4/mov/mkv) — auto-extracts audio track - Audio needs denoising before recognition - File exceeds 512MB or 5 hours (no size limit) - Audio source is a TOS internal path (tos://bucket/key) - Need structured JSON output with timestamped utterances and metadata - Need speaker diarization, emotion/gender detection, speech rate, or sensitive word filtering Supports 99 languages, multiple formats (wav/mp3/m4a/aac/flac/ogg/mp4/mov/mkv), and auto language detection.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningtavus-engineering/tavus-s...

tavus-cvi-persona

Configure Tavus CVI personas with custom LLMs, TTS engines, perception, and turn-taking. Use when customizing AI behavior, bringing your own LLM, configuring voice/TTS, enabling vision with Raven, or tuning conversation flow with Sparrow.

🇺🇸|EnglishTranslated

AI & Machine Learningzainhas/togetherai-skills

together-audio

Text-to-speech (TTS) and speech-to-text (STT) via Together AI. TTS models include Orpheus, Kokoro, Cartesia Sonic, Rime, MiniMax with REST, streaming, and WebSocket support. STT models include Whisper and Voxtral. Use when users need voice synthesis, audio generation, speech recognition, transcription, TTS, STT, or real-time voice applications.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningqodex-ai/ai-agent-skills

voice-ai-integration

Build voice-enabled AI applications with speech recognition, text-to-speech, and voice-based interactions. Supports multiple voice providers and real-time processing. Use when creating voice assistants, voice-controlled applications, audio interfaces, or hands-free AI systems.

🇺🇸|EnglishTranslated

4 scripts/Checked

Mobile Developmentdpearson2699/swift-ios-sk...

speech-recognition

Transcribe speech to text using the Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing pre-recorded audio files, configuring on-device vs server-based recognition, handling authorization flows, or adopting the new SpeechAnalyzer API (iOS 26+) for modern async/await speech-to-text.

🇺🇸|EnglishTranslated

Frontend Developmentdaffy0208/ai-dev-standard...

voice-interface-builder

Expert in building voice interfaces, speech recognition, and text-to-speech systems

🇺🇸|EnglishTranslated

AI & Machine Learningjrusso1020/video-understa...

video-understand

Video understanding and transcription with intelligent multi-provider fallback. Use when: (1) Transcribing video or audio content, (2) Understanding video content including visual elements and scenes, (3) Analyzing YouTube videos by URL, (4) Extracting information from local video files, (5) Getting timestamps, summaries, or answering questions about video content. Automatically selects the best available provider based on configured API keys - prefers full video understanding (Gemini/OpenRouter) over ASR-only providers. Supports model selection per provider.

🇺🇸|EnglishTranslated

3 scripts/Attention

Tools & Utilitiessugarforever/01coder-agen...

subtitle-correction

Correct subtitle files (.srt) generated from speech recognition. Use when the user uploads subtitle files and asks to correct, fix, or proofread subtitles, especially for technical content like programming tutorials, AI/ML courses, or any content with domain-specific terminology. Supports Chinese and English subtitles with intelligent error detection and correction while preserving exact timeline information.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-audio-asr-realtime

Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.

🇺🇸|EnglishTranslated

1 scripts/Checked

Frontend Developmentsyncfusion/blazor-ui-comp...

syncfusion-blazor-speech-to-text

Implement speech-to-text voice input in Blazor applications using Syncfusion SpeechToText component. ALWAYS use this when users need voice input, speech recognition, audio transcription, or implementing the SpeechToText component in Blazor. Trigger for Syncfusion.Blazor.Inputs, microphone input, voice-to-text conversion, language support, transcript binding, listening states, error handling, browser speech API, or any speech recognition requirements.

🇺🇸|EnglishTranslated