Loading...
Loading...
Found 153 Skills
Convert Deckset-format markdown slides with speaker notes to presentation video with TTS narration. Use when user requests to create video from slides, generate presentation video, or convert slides to MP4 format.
Video creation skill. Combine images and audio to generate videos, supporting TTS dubbing, fade-in/fade-out transitions, subtitles, outro, and BGM. Triggered when users mention phrases like 'generate video', 'make video', 'educational video', 'image-to-video', 'create video account content', 'dubbed video', 'image-text integrated video', 'ancient poetry video', 'story video'. Includes the full workflow of image generation → dubbing → video synthesis, no need to call image-service separately.
Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support
Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).
Build LiveKit Agent backends in TypeScript or JavaScript. Use this skill when creating voice AI agents, voice assistants, or any realtime AI application using LiveKit's Node.js Agents SDK (@livekit/agents-js). Covers AgentSession, Agent class, function tools with zod, STT/LLM/TTS models, turn detection, and realtime models.
AI media generation CLI tool using Google's Imagen 4, Veo 3.1, and Gemini TTS. Use when the user wants to (1) generate images from text prompts, (2) edit existing images with AI, (3) explain image contents, (4) generate videos from text or images, (5) create narration/voice audio with character settings. Triggers on requests like "generate an image of...", "create a video...", "make a voice that says...", "edit this image to...", "describe this image".
Inworld TTS API. Covers voice cloning, audio markups, timestamps. Keywords: text-to-speech, visemes.
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".
Translate and dub videos from one language to another, replacing the original audio with TTS while keeping the video intact.
Fetches the latest news using news-aggregator-skill, formats it into a podcast script in Markdown format, and uses the tts skill to generate a podcast audio file. Use when the user asks to get the latest news and read it out as a podcast.
On-device, real-time multimodal AI voice and vision assistant powered by Gemma 4 E2B and Kokoro TTS, running entirely locally via FastAPI WebSocket server.
Audio generation skill — jingles, beds, voiceover, and sound effects. Routes music requests to Suno V5 / Udio / Lyria, speech to MiniMax TTS / FishAudio / ElevenLabs V3, and SFX to ElevenLabs SFX or AudioCraft. Output is one MP3/WAV file saved to the project folder.