Loading...
Loading...
Found 196 Skills
Use when designing custom voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from a voice prompt plus preview text before using the returned voice_id in TTS.
Build voice apps with Sinch Voice REST API. Use for phone calls, text-to-speech (TTS), IVR menus, DTMF input, conference calling, call recording, call forwarding, answering machine detection (AMD), SIP routing, WebSocket audio streaming, and SVAML call control.
Help users integrate Runway audio APIs (TTS, sound effects, voice isolation, dubbing)
Generate audio narration of blog posts using Google Gemini TTS. Supports summary narration, full article read-aloud, and two-speaker podcast/dialogue mode with 30 voice options. Outputs MP3 with HTML5 audio embed code. Works standalone via /blog audio or internally from blog-write. Falls back gracefully when API key is not configured. Use when user says "blog audio", "narrate blog", "audio version", "text to speech", "tts", "podcast mode", "read aloud", "audio narration", "voice", "narration", "generate audio".
Use this skill when the user wants to convert a Wang Jianshuo-style WeChat article (article.md) into a narrated short MP4 video — featuring TTS voiceover via Volcano Engine Volcano TTS, scene-specific HyperFrames CSS/GSAP animations, subtle sound effects (SFX), abstract watercolor backgrounds, and end-to-end pipeline rendering to a 1080×1920 portrait MP4 (30-90 seconds). Triggers — "把这篇文章做成视频", "做一个解说视频", "讲解视频", "/wjs-converting-text-to-video".
Run provider-agnostic live voice conversations with VAD, silence boundaries, wake-word gating, STT, and TTS through the AgentOS speech runtime.
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".
Apply cognitive science and HCI research to design decisions. Use when you need the scientific 'why' behind usability, explaining user behavior, understanding perception/memory/attention limits, evaluating cognitive load, assessing mental model alignment, predicting performance with Fitts's/Hick's Law, or grounding interface decisions in research rather than opinion.
Generate audio replies using TTS. Trigger with "read it to me [URL]" to fetch and read content aloud, or "talk to me [topic]" to generate a spoken response. Also responds to "speak", "say it", "voice reply".
Health check for TTS and Telegram bot subsystems. TRIGGERS - tts health, kokoro status, telegram bot check, tts diagnostics.
Generate voice messages using local Qwen3-TTS (offline, Apple Silicon). Convert text to speech with customizable voices, emotions, and speed. Use when user asks for voice reply, audio, or TTS.
Use when the user wants to generate speech, voiceover, or text-to-audio. Converts text to AI voice via Giggle.pro TTS API. Triggers: generate speech, text-to-speech, TTS, voiceover, read this text aloud, synthesize speech.