Loading...
Loading...
Found 118 Skills
Use this skill for AI text-to-speech generation. Triggers include: "generate voice", "create audio", "text to speech", "TTS", "read this aloud", "generate narration", "create voiceover", "synthesize speech", "podcast audio", "dialogue audio", "multi-speaker", "audiobook" Supports Google Gemini TTS, ElevenLabs, and OpenAI TTS.
Convert Deckset-format markdown slides with speaker notes to presentation video with TTS narration. Use when user requests to create video from slides, generate presentation video, or convert slides to MP4 format.
Diagnose and resolve TTS and Telegram bot issues. TRIGGERS - tts not working, bot not responding, kokoro error, audio not playing, lock stuck, telegram bot troubleshoot, diagnose issue.
Configure TTS voices, speed, timeouts, queue depth, and bot settings. TRIGGERS - configure tts, change voice, tts speed, queue depth, tts timeout, bot config, tune settings, adjust parameters.
Health check for TTS and Telegram bot subsystems. TRIGGERS - tts health, kokoro status, telegram bot check, tts diagnostics.
Use when designing custom voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from a voice prompt plus preview text before using the returned voice_id in TTS.
Generate voice messages using local Qwen3-TTS (offline, Apple Silicon). Convert text to speech with customizable voices, emotions, and speed. Use when user asks for voice reply, audio, or TTS.
Convert written documents to narrated video scripts with TTS audio and word-level timing. Use when preparing essays, blog posts, or articles for video narration. Outputs scene files, audio, and VTT with precise word timestamps. Keywords: narration, voiceover, TTS, scenes, audio, timing, video script, spoken.
Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).
AI media generation CLI tool using Google's Imagen 4, Veo 3.1, and Gemini TTS. Use when the user wants to (1) generate images from text prompts, (2) edit existing images with AI, (3) explain image contents, (4) generate videos from text or images, (5) create narration/voice audio with character settings. Triggers on requests like "generate an image of...", "create a video...", "make a voice that says...", "edit this image to...", "describe this image".
Configure Tavus CVI personas with custom LLMs, TTS engines, perception, and turn-taking. Use when customizing AI behavior, bringing your own LLM, configuring voice/TTS, enabling vision with Raven, or tuning conversation flow with Sparrow.
ElevenLabs TTS integration for video narration. Use when generating voiceover audio, selecting voices, or building script-to-audio pipelines