Loading...
Loading...
Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.
npx skill4agent add pexoai/pexo-skills videoagent-audio-studio| Request Type | Best Model | Latency |
|---|---|---|
| Narrate text / Voice-over | | ~3s |
| Low-latency TTS (real-time) | | <1s |
| Background music | | ~15s |
| Sound effect | | ~5s |
| Clone a voice from audio | | ~10s |
bash {baseDir}/tools/start_server.shUse MCP tool: text_to_speech
text: "<the text to narrate>"
voice_id: "JBFqnCBsd6RMkjVDRZzb" # Default: "George" (professional, neutral)
model_id: "eleven_multilingual_v2" # Use "eleven_turbo_v2_5" for low latencyUse MCP tool: text_to_sound_effects (via cassetteai-music on fal.ai)
prompt: "<music description, e.g. 'upbeat lo-fi hip hop, 90 seconds'>"
duration_seconds: <duration>Use MCP tool: text_to_sound_effects
text: "<sound description>"
duration_seconds: <1-22>Use MCP tool: voice_add
name: "<voice name>"
files: ["<audio_file_url>"]→ Route to: text_to_speech
text: "Welcome to our product launch"
voice_id: "JBFqnCBsd6RMkjVDRZzb"
model_id: "eleven_multilingual_v2"🎙️ Voiceover done! Listen here
→ Route to: cassetteai-music (fal.ai)
prompt: "relaxing lo-fi background music for a podcast, gentle piano and soft beats, 60 seconds"
duration_seconds: 60🎵 Background music ready! Listen here
→ Route to: text_to_sound_effects
text: "a futuristic sci-fi door sliding open with a hydraulic hiss"
duration_seconds: 3ELEVENLABS_API_KEY~/.openclaw/openclaw.json{
"skills": {
"entries": {
"videoagent-audio-studio": {
"enabled": true,
"env": {
"ELEVENLABS_API_KEY": "your_elevenlabs_key_here"
}
}
}
}
}"FAL_KEY": "your_fal_key_here"cli.jsvercel.appproxy/cd proxy
npm install
vercel --prod| Variable | Required For | Where to Get |
|---|---|---|
| TTS, SFX, Voice Clone | elevenlabs.io/app/settings/api-keys |
| Music generation | fal.ai/dashboard/keys |
| (Optional) Restrict access | Comma-separated list of allowed client keys |
export AUDIOMIND_PROXY_URL="https://your-domain.com/api/audio"~/.openclaw/openclaw.json{
"skills": {
"entries": {
"videoagent-audio-studio": {
"env": {
"AUDIOMIND_PROXY_URL": "https://your-domain.com/api/audio"
}
}
}
}
}vercel.app| Model ID | Type | Provider | Notes |
|---|---|---|---|
| TTS | ElevenLabs | Best quality, supports 29 languages |
| TTS | ElevenLabs | Ultra-low latency, ideal for real-time |
| TTS | ElevenLabs | English only, fastest |
| Music | fal.ai | Reliable, fast music generation |
| SFX | ElevenLabs | High-quality sound effects (up to 22s) |
| Clone | ElevenLabs | Clone any voice from a short audio sample |
ELEVENLABS_API_KEYFAL_KEYcassetteai-musiccassetteai-music