Search Results: asr

Found 34 Skills

vox

Vox single-entry voice orchestration skill. Used to complete environment guarding, CLI installation, on-demand model download, ASR transcription, voice cloning, pipeline execution and task troubleshooting through natural language. It is used when users only describe the target without providing specific commands.

🇨🇳|ChineseTranslated

5 scripts/Attention

Tools & Utilitiesyoutube-transcript-dev/yo...

youtube-transcript-api

Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch processing (up to 100 videos), translation to 100+ languages, and multiple output formats. Use when working with YouTube videos, subtitles, captions, or video-to-text conversion.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

digital-health-clinical-asr-eval

Stage 3 of Clinical ASR Flywheel. Score a NeMo manifest, produce the five-section KER leaderboard (by-ipa_source diagnostic). Not for ASR auth (/riva-asr).

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

type4me-macos-voice-input

MacOS voice input tool with local/cloud ASR engines, LLM text optimization, and fully local storage built in Swift

🇺🇸|EnglishTranslated

AI & Machine Learningaahl/skills

qwen-asr

Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningjrusso1020/video-understa...

video-understand

Video understanding and transcription with intelligent multi-provider fallback. Use when: (1) Transcribing video or audio content, (2) Understanding video content including visual elements and scenes, (3) Analyzing YouTube videos by URL, (4) Extracting information from local video files, (5) Getting timestamps, summaries, or answering questions about video content. Automatically selects the best available provider based on configured API keys - prefers full video understanding (Gemini/OpenRouter) over ASR-only providers. Supports model selection per provider.

🇺🇸|EnglishTranslated

3 scripts/Attention

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-audio-asr

Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingwdkns/wdkns-skills

subtitle-refine

Professional-level refinement and verification for Chinese SRT subtitles for launch. Used to clean ASR-based raw subtitles into a publishable version, only performing subtitle-level cleaning and correction without formal rewriting, summarization, or expansion; meanwhile, strictly maintaining synchronization with the original audio, splitting entries only within the original subtitle time range when necessary, outputting a complete clean SRT, and then using the accompanying verification script for final rule checks and timeline review. Suitable for tasks such as documentaries, interviews, oral broadcasts, screen recordings that require correcting recognition errors, deleting meaningless filler words, adding pause spaces, limiting single-entry word count, and avoiding accidental deletion of meaningful subtitles.

🇨🇳|ChineseTranslated

1 scripts/Attention

AI & Machine Learningnvidia/skills

nemotron-speech

Routes NVIDIA Nemotron Speech (Riva) NIM tasks — deploys, runs, and tests ASR, TTS, and NMT NIMs on build.nvidia.com or self-hosted.

🇺🇸|EnglishTranslated

1 scripts/Checked

Tools & Utilitieshyperpuncher/dotagents

chough

Fast ASR CLI tool for transcribing audio/video files. Use when user wants to transcribe audio/video, generate subtitles (VTT), convert speech to text with timestamps (JSON), or optimize transcription for low memory.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

asr-transcribe-to-text

Transcribe audio and video files to text using a remote ASR service (Qwen3-ASR or OpenAI-compatible endpoint). Extracts audio from video, sends to configurable ASR endpoint, outputs clean text. Use when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字, or has a meeting recording, lecture, interview, or screen recording to transcribe.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningbytedance/agentkit-sample...

byted-voice-to-text

Automatic Speech Recognition (ASR). Uses Volcano Engine BigModel ASR for speech recognition, with two available modes: Express Edition (≤2h/100MB, synchronous fast response) and Standard Edition (≤5h, asynchronous recognition). It supports Feishu voice messages, local audio files and audio URLs. Use this skill when you receive voice messages or audio attachments (.ogg/.mp3/.wav).

🇨🇳|ChineseTranslated

5 scripts/Attention