Search Results: asr

Found 58 Skills

AI & Machine Learningdaymade/claude-code-skill...

stepfun-tts

Generate Chinese / Japanese speech with StepFun's stepaudio-2.5-tts — Contextual TTS that replaces step-tts-2's `voice_label` with natural-language `instruction` (≤200 chars) plus inline `()` parentheses for句内 prosody. Use when the user wants emotional / prosody control over voice synthesis (whisper, pause, stress, mood pivot mid-sentence), batch-generates game / app voice lines, migrates from `step-tts-2` (the `voice_label → instruction` breaking change), or hits StepFun's stricter 2.5-era censorship (死/消失/political terms). Triggers on 阶跃 TTS, StepAudio 合成, 语音合成, 配音, 文本转语音, TTS 升级, 迁移 step-tts-2. For transcription with the sibling stepaudio-2.5-asr model, use the stepfun-asr skill instead.

🇺🇸|EnglishTranslated

2 scripts/Attention

AI & Machine Learningqwencloud/qwencloud-ai

qwencloud-audio-tts

[QwenCloud] Synthesize speech from text with Qwen TTS models. TRIGGER when: user wants to convert text to speech, create voiceovers, generate audio narration, read text aloud, build TTS applications, mentions speech synthesis/voice generation/audio output from text, or explicitly invokes this skill by name (e.g. use qwencloud-audio-tts). DO NOT TRIGGER when: user wants speech recognition/ASR, text generation without audio, non-Qwen audio tasks.

🇺🇸|EnglishTranslated

4 scripts/Checked

AI & Machine Learningnvidia/skills

nemotron-voice-agent-deploy

Deploy Nemotron Voice Agent on Workstation (x86), Jetson Thor, or Cloud NIMs. Real-time speech-to-speech using NVIDIA ASR, TTS, LLM with WebRTC/WebSocket transport.

🇺🇸|EnglishTranslated

Tools & Utilitieshkuds/cli-anything

cli-anything-videocaptioner

AI-powered video captioning — transcribe speech, optimize/translate subtitles, burn into video with beautiful customizable styles (ASS outline or rounded background). Free ASR and translation included.

🇺🇸|EnglishTranslated

AI & Machine Learningtdimino/claude-code-minoa...

parakeet

Local speech-to-text via Handy app (push-to-talk) and NeMo CLI scripts. Parakeet V3: 25 languages, auto-detection, ~30x realtime on M4 Max, 6% WER. This skill should be used when transcribing audio files or dictating voice input.

🇺🇸|EnglishTranslated

3 scripts/Attention

Tools & Utilitieszrt-ai-lab/opencode-skill...

videocut-clip-oral

Spoken video transcription and slip-of-the-tongue recognition. Generate review drafts and deletion task checklists. Trigger phrases: edit spoken video, process video, recognize slip-of-the-tongue

🇨🇳|ChineseTranslated

AI & Machine Learningeyadsibai/ltk

multimodal-models

Use when "CLIP", "Whisper", "Stable Diffusion", "SDXL", "speech-to-text", "text-to-image", "image generation", "transcription", "zero-shot classification", "image-text similarity", "inpainting", "ControlNet"

🇺🇸|EnglishTranslated

Tools & Utilitiesjiangjiax/li-skills

li-transcript

Use this skill when the user says phrases like "get transcript", "transcribe video", "extract script", "help me extract it", "what does this video say", "what did this blogger say", or directly provides a video link requesting content extraction. Even if the user only sends a video link without stating their request, proactively trigger this skill if the context involves benchmark analysis or content extraction. Call video2text.py to obtain the raw transcript, use AI to correct common speech recognition errors, identify the author, and archive it to the benchmark blogger directory. Do NOT trigger for: analyzing viral content patterns (use li-analyzer), recording own topic ideas (use li-recorder), writing own scripts (use li-writer). Use when the user wants to "get transcript", "transcribe video", "extract script", or gives a video link for content extraction. Runs speech-to-text, AI proofreads, and archives to benchmark blogger directory.

🇨🇳|ChineseTranslated

1 scripts/Checked

Automationjianshuo/claude-skills

wjs-localizing-video

Thin orchestrator for the end-to-end video localization pipeline. Routes to the four focused sub-skills — /wjs-transcribing-audio, /wjs-translating-subtitles, /wjs-dubbing-video, /wjs-burning-subtitles. Use when the user asks for full localization in one go ("帮我把这个西班牙语视频做成中文字幕+配音", "translate and dub this video", "做完整的本地化"). For any individual step (just transcribe, just translate, just dub, just burn), invoke the sub-skill directly — it's faster and the boundary is cleaner.

🇺🇸|EnglishTranslated

Tools & Utilitiesceeon/videocut-skills

videocut:安装

Environment Preparation. Install dependencies, configure API Key, verify environment. Trigger words: install, environment preparation, initialization

🇨🇳|ChineseTranslated