Loading...
Loading...
Found 26 Skills
Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue, sound effects from descriptions, background music, soundtrack generation, voice cloning, or any audio synthesis task.
MiniMax TTS API - Text-to-Speech, Voice Cloning, Voice Design
Convert text into speech with Kokoro or Noiz, including simple and timeline-aligned modes.
Two-host podcast video for any URL or free-form topic — 1 minute, 4 acts × ~15s, native multi-shot dialogue, optional voice cloning for Host A. Use when the user asks to "make a podcast", "podcast about [thing]", "podcast review of [url]", "two-host explainer", "interview-style clip", "two people talking on camera", "I/me and X talk about Y", or "interview with [persona] about [topic]". Native audio is the deliverable; captions are skipped by default because podcast dialogue mistranscribes domain terms.
Vox single-entry voice orchestration skill. Used to complete environment guarding, CLI installation, on-demand model download, ASR transcription, voice cloning, pipeline execution and task troubleshooting through natural language. It is used when users only describe the target without providing specific commands.
Generate (TTS), Transcribe (STT), and Clone voices using Google's GenAI and Cloud Speech SDKs. Supports Gemini-TTS, Chirp 3, and Instant Custom Voice.
Make generated speech feel companion-like with fillers, emotional tuning, and preset speaking styles.
Expert skill for Voicebox — the open-source local voice cloning and TTS studio built with Tauri, React, and FastAPI
Generate and manage persona-aware voice assets for short-form video production, including voice design, script-specific audio takes, and future reusable voice identities. Use this when persona registries and scripts already exist and you need local audio assets, voice manifests, and reviewable voice iterations without losing continuity across many videos.
Generate videos using Flyworks (a.k.a HiFly) Digital Humans. Create talking photo videos from images, use public avatars with TTS, or clone voices for custom audio.
Use Chanjing TTS API to synthesize speech from text, using user-provided voice
Select and create the perfect AI voice for your content using ElevenLabs, Qwen3-TTS, and other platforms—matching voice characteristics to brand personality and audience. Use when: Choosing an AI voice for video narration; Creating a consistent brand voice across content; Cloning a voice for scalable production; Comparing voice synthesis platforms; Designing voice characteristics by description