Search Results: asr

Found 34 Skills

AI & Machine Learningcinience/alicloud-skills

aliyun-qwen-asr

Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

🇺🇸|EnglishTranslated

1 scripts/Checked

Tools & Utilitieszl007700/luma-cli

luma-subtitle

Create short-video subtitles with Luma / 拾光 / 拾光工具. Use ASR, segmentation, styling, and burn-in as composable steps; keep editorial decisions in the agent instructions.

🇺🇸|EnglishTranslated

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-audio-asr-realtime-test

Minimal realtime ASR smoke test for Model Studio Qwen ASR Realtime.

🇺🇸|EnglishTranslated

AI & Machine Learningopenrouterteam/skills

openrouter-stt

Transcribe speech to text using OpenRouter's speech-to-text API. Use when the user asks to transcribe audio, convert speech to text, extract a transcript from a recording or meeting, caption a video's audio, or mentions STT, speech-to-text, ASR, or transcription.

🇺🇸|EnglishTranslated

Testing & QAcinience/alicloud-skills

alicloud-ai-audio-asr-test

Minimal non-realtime ASR smoke test for Model Studio Qwen ASR.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

stepfun-asr

Transcribe audio with StepFun's stepaudio-2.5-asr — an SSE endpoint (NOT /v1/audio/transcriptions) with 32K context, ~85-101x RTF on long audio, and a single-call ceiling around 30 minutes (no client-side chunking). Use when transcribing Chinese / English audio with StepFun, when long-form recordings (5-30 min) need to land in one request, when migrating from step-asr / step-asr-1.1, or when hitting the misleading `model stepaudio-2.5-asr not supported` error (which actually means wrong endpoint). Triggers on 阶跃 ASR, StepFun ASR, stepaudio-2.5-asr, 转录, 语音识别, 长音频转写, 语音转文字. For TTS with the sibling stepaudio-2.5-tts model, use the stepfun-tts skill instead.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningnovitalabs/novita-skills

novita-ai

Novita AI: LLM, Image Generation & Editing, Video Generation, Audio (TTS/ASR), and GPU Cloud. Use this skill whenever the user wants to call Novita AI APIs — chat with LLMs (DeepSeek, Llama, Qwen), generate images (FLUX, Stable Diffusion, Seedream, Hunyuan Image), edit images (remove background, upscale, inpainting, img2img, outpainting, reimagine, merge face, replace background, remove text), generate videos (Kling, Wan, Hunyuan, Minimax Hailuo, Vidu, PixVerse, Seedance), do text-to-speech or speech-to-text (MiniMax TTS, GLM TTS, Fish Audio, ASR, voice cloning), run OpenAI-compatible batch jobs, manage GPU cloud instances and serverless endpoints, or check account balance and billing. Also trigger when the user mentions novita.ai, Novita AI, Novita API key, or wants to use any Novita platform service — even if they just say "generate an image" or "run an LLM" and Novita is available as a provider.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

digital-health-clinical-asr-finetune

Stage 4 of the Clinical ASR Flywheel. Use when priority KER is above 0.3 to run stock NeMo SFT on Parakeet TDT v2 and offline cycle N+1 re-eval. NOT for generic word boosting (use /finetune-asr).

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

transcript-fixer

Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.

🇺🇸|EnglishTranslated

51 scripts/Checked

AI & Machine Learningovachiever/droid-tings

elevenlabs-agents

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

🇺🇸|EnglishTranslated

5 scripts/Attention

AI & Machine Learningdaemon-blockint-tech/agen...

ai-adversarial-robustness-engineer

Adversarial robustness engineering for ML/AI—evasion, poisoning, extraction, membership-inference threat models; robust training, sanitization, detectors; ASR/certified evals; lab model attacks; data-pipeline integrity; production I/O guardrails (classical ML and LLM/multimodal). Use for adversarial examples, robustness suites, poison audits, deploy guardrails—not LLM app red team (ai-redteam), governance (ai-risk-governance), safety classifier R&D (ml-research-engineer-safeguards), safeguard serving (ml-infrastructure-engineer-safeguards), privacy research (privacy-research-engineer-safeguards), AppSec pentest (penetration-tester).

🇺🇸|EnglishTranslated