Search Results: asr

Found 58 Skills

alicloud-ai-audio-asr-test

Minimal non-realtime ASR smoke test for Model Studio Qwen ASR.

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-audio-asr-realtime

Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningdaymade/claude-code-skill...

stepfun-asr

Transcribe audio with StepFun's stepaudio-2.5-asr — an SSE endpoint (NOT /v1/audio/transcriptions) with 32K context, ~85-101x RTF on long audio, and a single-call ceiling around 30 minutes (no client-side chunking). Use when transcribing Chinese / English audio with StepFun, when long-form recordings (5-30 min) need to land in one request, when migrating from step-asr / step-asr-1.1, or when hitting the misleading `model stepaudio-2.5-asr not supported` error (which actually means wrong endpoint). Triggers on 阶跃 ASR, StepFun ASR, stepaudio-2.5-asr, 转录, 语音识别, 长音频转写, 语音转文字. For TTS with the sibling stepaudio-2.5-tts model, use the stepfun-tts skill instead.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningdavila7/claude-code-templ...

whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

🇺🇸|EnglishTranslated

AI & Machine Learningcatoncat/vox-cli

vox

Vox single-entry voice orchestration skill. Used to complete environment guarding, CLI installation, on-demand model download, ASR transcription, voice cloning, pipeline execution and task troubleshooting through natural language. It is used when users only describe the target without providing specific commands.

🇨🇳|ChineseTranslated

5 scripts/Attention

Tools & Utilitiesyoutube-transcript-dev/yo...

youtube-transcript-api

Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch processing (up to 100 videos), translation to 100+ languages, and multiple output formats. Use when working with YouTube videos, subtitles, captions, or video-to-text conversion.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

type4me-macos-voice-input

MacOS voice input tool with local/cloud ASR engines, LLM text optimization, and fully local storage built in Swift

🇺🇸|EnglishTranslated

AI & Machine Learningjrusso1020/video-understa...

video-understand

Video understanding and transcription with intelligent multi-provider fallback. Use when: (1) Transcribing video or audio content, (2) Understanding video content including visual elements and scenes, (3) Analyzing YouTube videos by URL, (4) Extracting information from local video files, (5) Getting timestamps, summaries, or answering questions about video content. Automatically selects the best available provider based on configured API keys - prefers full video understanding (Gemini/OpenRouter) over ASR-only providers. Supports model selection per provider.

🇺🇸|EnglishTranslated

3 scripts/Attention

Document Processingwdkns/wdkns-skills

subtitle-refine

Professional-level refinement and verification for Chinese SRT subtitles for launch. Used to clean ASR-based raw subtitles into a publishable version, only performing subtitle-level cleaning and correction without formal rewriting, summarization, or expansion; meanwhile, strictly maintaining synchronization with the original audio, splitting entries only within the original subtitle time range when necessary, outputting a complete clean SRT, and then using the accompanying verification script for final rule checks and timeline review. Suitable for tasks such as documentaries, interviews, oral broadcasts, screen recordings that require correcting recognition errors, deleting meaningless filler words, adding pause spaces, limiting single-entry word count, and avoiding accidental deletion of meaningful subtitles.

🇨🇳|ChineseTranslated

1 scripts/Attention

AI & Machine Learningnvidia/skills

nemotron-speech

Routes NVIDIA Nemotron Speech (Riva) NIM tasks — deploys, runs, and tests ASR, TTS, and NMT NIMs on build.nvidia.com or self-hosted.

🇺🇸|EnglishTranslated

1 scripts/Checked

Tools & Utilitieshyperpuncher/dotagents

chough

Fast ASR CLI tool for transcribing audio/video files. Use when user wants to transcribe audio/video, generate subtitles (VTT), convert speech to text with timestamps (JSON), or optimize transcription for low memory.

🇺🇸|EnglishTranslated

AI & Machine Learningbytedance/agentkit-sample...

byted-voice-to-text

Automatic Speech Recognition (ASR). Uses Volcano Engine BigModel ASR for speech recognition, with two available modes: Express Edition (≤2h/100MB, synchronous fast response) and Standard Edition (≤5h, asynchronous recognition). It supports Feishu voice messages, local audio files and audio URLs. Use this skill when you receive voice messages or audio attachments (.ogg/.mp3/.wav).

🇨🇳|ChineseTranslated

5 scripts/Attention