Loading...
Loading...
Found 156 Skills
Expert skill for using MOSS-TTS-Nano, a 0.1B parameter multilingual real-time TTS model that runs on CPU with voice cloning support.
Submit a kid-facing English content PR to the football-english repo (zxkane/football-english). Use this whenever the user asks to "submit this week's quiz", "publish a weekly quiz", "submit a daily reading", "add today's reading", "publish a daily article", "open a daily PR", "open a quiz PR", or hands over essay paragraphs (with optional KET/PET points, vocabulary, audio, image, questions) and expects them to land in the kid-facing site. The repo publishes two kinds of content — weekly quizzes (essay + 3-20 graded questions, one per ISO week) and daily readings (essay + TTS audio + image + structured KET/PET points + optional 0-3 small questions, irregular cadence). This skill picks the right shape, writes the JSON, validates it, and opens the PR end-to-end. Trigger this skill even if the user doesn't use the words "quiz", "daily", or "PR" — any handover of essay paragraphs intended for this repo qualifies.
Translate and dub a video into another language with voice cloning and lip-sync, powered by HeyGen Video Translation. The presenter keeps their face, their voice is cloned into the target language, and lips re-sync to the new audio — viewers see the same person speaking natively. Use when: (1) localizing an existing video into one or more languages ("translate this video to Spanish", "make this in French and German", "dub this into Japanese", "I need this in 10 languages for a launch"), (2) the user has a finished video and wants the SAME presenter speaking another language (not a new presenter — that's heygen-video), (3) podcast / audio-only translation ("translate this podcast", "dub the audio but keep my video"), (4) high-stakes translations where the user wants to review/edit subtitles before final render (the proofreads workflow), (5) "translate my video", "dub this", "localize this clip", "make a multilingual version", "subtitle and dub". Returns the translated video URL (or audio file for audio-only mode), one per target language. Chain signal: if the user wants to CREATE a new video in another language (no source video exists yet), route to heygen-video and write the script in the target language — do not use heygen-translate. Use heygen-translate only when there is an existing source video to localize. NOT for: creating new videos from scratch (use heygen-video), avatar creation (use heygen-avatar), TTS-only synthesis (use heygen-video with audio-only output), or text-only translation.
Deploy Nemotron Voice Agent on Workstation (x86), Jetson Thor, or Cloud NIMs. Real-time speech-to-speech using NVIDIA ASR, TTS, LLM with WebRTC/WebSocket transport.
Production voice AI agents with sub-500ms latency. Groq LLM, Deepgram STT, Cartesia TTS, Twilio integration. No OpenAI. Use when: voice agent, phone bot, STT, TTS, Deepgram, Cartesia, Twilio, voice AI, speech to text, IVR, call center, voice latency.
Use this skill when building AI voice agents with the ElevenLabs Agents Platform. This skill covers the complete platform including agent configuration (system prompts, turn-taking, workflows), voice & language features (multi-voice, pronunciation, speed control), knowledge base (RAG), tools (client/server/MCP/system), SDKs (React, JavaScript, React Native, Swift, Widget), Scribe (real-time STT), WebRTC/WebSocket connections, testing & evaluation, analytics, privacy/compliance (GDPR/HIPAA/SOC 2), cost optimization, CLI workflows ("agents as code"), and DevOps integration. Prevents 17+ common errors including package deprecation, Android audio cutoff, CSP violations, missing dynamic variables, case-sensitive tool names, webhook authentication failures, and WebRTC configuration issues. Provides production-tested templates for React, Next.js, React Native, Swift, and Cloudflare Workers. Token savings: ~73% (22k → 6k tokens). Production tested. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".
Route Alibaba Cloud Model Studio requests to the right local skill (Qwen Image, Qwen Image Edit, Wan Video, Wan R2V, Qwen TTS and advanced TTS variants). Use when the user asks for Model Studio without specifying a capability.
Complete Google Gemini API reference for 2026. Use whenever writing code that calls Gemini models. Covers the google-genai SDK, Gemini 3/3.1 models, thought signatures, thinking config, Interactions API, File Search (managed RAG), Computer Use, URL Context, Nano Banana image gen, Live API, ephemeral tokens, TTS, Veo video gen, Lyria music gen, and all tools. ALWAYS prefer `from google import genai` over any legacy import. Use this skill for ANY Gemini API question, even simple ones.
Generate voice messages using local Qwen3-TTS (offline, Apple Silicon). Convert text to speech with customizable voices, emotions, and speed. Use when user asks for voice reply, audio, or TTS.
Use Alibaba Cloud DashScope API and LingMou to generate AI video and speech. Seven capabilities — (1) LivePortrait talking-head (image + audio → video, two-step), (2) EMO talking-head, (3) AA/AnimateAnyone full-body animation (three-step), (4) T2I text-to-image (Wan 2.x, default wan2.2-t2i-flash), (5) I2V image-to-video (Wan 2.x, default wan2.7-i2v-flash, supports T2I→I2V pipeline), (6) Qwen TTS (auto model/voice by scene, default qwen3-tts-vd-realtime-2026-01-15), (7) LingMou digital-human template video with random template, public-template copy, and script confirmation. Trigger when the user needs talking-head, portrait, full-body animation, text-to-image, text-to-video, or speech synthesis.
Run provider-agnostic live voice conversations with VAD, silence boundaries, wake-word gating, STT, and TTS through the AgentOS speech runtime.