Search Results: TTS

Found 156 Skills

video-podcast-maker

Use when the user gives a topic and wants an automated topic-driven narrated explainer, podcast, or knowledge-summary video (Bilibili / YouTube / Xiaohongshu / Douyin / WeChat Channels), or asks to learn visual design patterns from a reference video/image. Trigger when the user mentions creating a knowledge video, narrated explainer, video podcast, or talking-head topic video from a topic — even if they don't say "video podcast" explicitly. Do NOT trigger for generic video editing, trimming, format conversion, color grading, or non-narrative video tasks. Produces 4K video via research → script → TTS → Remotion → MP4 + BGM.

🇺🇸|EnglishTranslated

32 scripts/Attention

AI & Machine Learningframersai/agentos-skills

streaming-tts-openai

Low-latency streaming text-to-speech via OpenAI TTS API — adaptive sentence chunking, concurrent fetch pipelining, six voices.

🇺🇸|EnglishTranslated

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-audio-tts

Generate human-like speech audio with Model Studio DashScope Qwen TTS (qwen3-tts-flash). Use when converting text to speech, producing voice lines for short drama/news videos, or documenting TTS request/response fields for DashScope.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningjianshuo/claude-skills

wjs-dubbing-video

Use when the user has a video + a target-language SRT and wants the video to actually speak that language — generates a time-aligned TTS voice dub. Routes by voice ID — Volcano (豆包) TTS for Chinese, edge-tts neural for any language. Defaults to one voice (single-speaker); opt-in multi-speaker via visual diarization. Outputs `*_<lang>_dub.mp4` with the dub audio in place of the original. Final mixing (audio bed + burn-in) is handed off to `/wjs-burning-subtitles`. Triggers — "配音", "中文配音", "Chinese dub", "voice over this", "dub the video", "TTS this SRT", "different voice for each speaker".

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learninghirokidaichi/ergon

ergon

AI media generation CLI tool using Google's Imagen 4, Veo 3.1, and Gemini TTS. Use when the user wants to (1) generate images from text prompts, (2) edit existing images with AI, (3) explain image contents, (4) generate videos from text or images, (5) create narration/voice audio with character settings. Triggers on requests like "generate an image of...", "create a video...", "make a voice that says...", "edit this image to...", "describe this image".

🇺🇸|EnglishTranslated

AI & Machine Learningema93sh/ai-voice-stack

ai-say

Local text-to-speech on Ubuntu using Kokoro TTS with fallbacks. Use when the user asks to speak text out loud, test audio output, switch Kokoro voices, or debug TTS playback issues. Triggers on "say this", "read aloud", "speak", "TTS", "voice test".

🇺🇸|EnglishTranslated

2 scripts/Attention

AI & Machine Learningsanjay3290/ai-skills

google-tts

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-audio-tts-voice-clone

Voice cloning workflows with Alibaba Cloud Model Studio Qwen TTS VC models. Use when creating cloned voices from sample audio and synthesizing text with cloned timbre.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningzainhas/togetherai-skills

together-audio

Text-to-speech (TTS) and speech-to-text (STT) via Together AI. TTS models include Orpheus, Kokoro, Cartesia Sonic, Rime, MiniMax with REST, streaming, and WebSocket support. STT models include Whisper and Voxtral. Use when users need voice synthesis, audio generation, speech recognition, transcription, TTS, STT, or real-time voice applications.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningheygen-com/skills

text-to-speech

Generate speech audio from text using HeyGen's Starfish TTS model. Use when: (1) Generating standalone speech audio files from text, (2) Converting text to speech with voice selection, speed, and pitch control, (3) Creating audio for voiceovers, narration, or podcasts, (4) Working with HeyGen's /v1/audio endpoints, (5) Listing available TTS voices by language or gender.

🇺🇸|EnglishTranslated

AI & Machine Learningqianwen-ai/qianwen-ai

qianwen-audio-tts

[QianWen] Synthesize speech from text with Qwen TTS models. TRIGGER when: user wants to convert text to speech, create voiceovers, generate audio narration, read text aloud, build TTS applications, mentions speech synthesis/voice generation/audio output from text, or explicitly invokes this skill by name (e.g. use qianwen-audio-tts). DO NOT TRIGGER when: user wants speech recognition/ASR, text generation without audio, non-Qwen audio tasks.

🇺🇸|EnglishTranslated

4 scripts/Checked

Tools & Utilitiesterrylica/cc-skills

clean-component-removal

Remove TTS and Telegram sync components cleanly. TRIGGERS - uninstall tts, remove telegram bot, uninstall kokoro, clean tts, teardown, component removal.

🇺🇸|EnglishTranslated