Loading...
Loading...
Found 34 Skills
Automatic Speech Recognition (ASR). Uses Volcano Engine BigModel ASR for speech recognition, with two available modes: Express Edition (≤2h/100MB, synchronous fast response) and Standard Edition (≤5h, asynchronous recognition). It supports Feishu voice messages, local audio files and audio URLs. Use this skill when you receive voice messages or audio attachments (.ogg/.mp3/.wav).
Fast ASR CLI tool for transcribing audio/video files. Use when user wants to transcribe audio/video, generate subtitles (VTT), convert speech to text with timestamps (JSON), or optimize transcription for low memory.
Video understanding and transcription with intelligent multi-provider fallback. Use when: (1) Transcribing video or audio content, (2) Understanding video content including visual elements and scenes, (3) Analyzing YouTube videos by URL, (4) Extracting information from local video files, (5) Getting timestamps, summaries, or answering questions about video content. Automatically selects the best available provider based on configured API keys - prefers full video understanding (Gemini/OpenRouter) over ASR-only providers. Supports model selection per provider.
Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch processing (up to 100 videos), translation to 100+ languages, and multiple output formats. Use when working with YouTube videos, subtitles, captions, or video-to-text conversion.
Minimal realtime ASR smoke test for Model Studio Qwen ASR Realtime.
Vox single-entry voice orchestration skill. Used to complete environment guarding, CLI installation, on-demand model download, ASR transcription, voice cloning, pipeline execution and task troubleshooting through natural language. It is used when users only describe the target without providing specific commands.
MacOS voice input tool with local/cloud ASR engines, LLM text optimization, and fully local storage built in Swift
Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.
Transcribe speech to text using OpenRouter's speech-to-text API. Use when the user asks to transcribe audio, convert speech to text, extract a transcript from a recording or meeting, caption a video's audio, or mentions STT, speech-to-text, ASR, or transcription.
Professional-level refinement and verification for Chinese SRT subtitles for launch. Used to clean ASR-based raw subtitles into a publishable version, only performing subtitle-level cleaning and correction without formal rewriting, summarization, or expansion; meanwhile, strictly maintaining synchronization with the original audio, splitting entries only within the original subtitle time range when necessary, outputting a complete clean SRT, and then using the accompanying verification script for final rule checks and timeline review. Suitable for tasks such as documentaries, interviews, oral broadcasts, screen recordings that require correcting recognition errors, deleting meaningless filler words, adding pause spaces, limiting single-entry word count, and avoiding accidental deletion of meaningful subtitles.
Create short-video subtitles with Luma / 拾光 / 拾光工具. Use ASR, segmentation, styling, and burn-in as composable steps; keep editorial decisions in the agent instructions.
Minimal non-realtime ASR smoke test for Model Studio Qwen ASR.