modelslab-audio-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseModelsLab Audio Generation
ModelsLab 音频生成
Generate high-quality audio including speech, music, voice conversion, sound effects, and dubbing using AI.
借助AI生成高质量音频,包括语音、音乐、语音转换、音效和配音。
When to Use This Skill
何时使用该技能
- Convert text to natural-sounding speech (TTS)
- Transcribe speech to text
- Transform voice characteristics (speech-to-speech)
- Generate music from text prompts
- Create sound effects
- Dub audio into different languages
- Extend or inpaint songs
- Build voice assistants or audiobooks
- 将文本转换为自然流畅的语音(TTS)
- 将语音转录为文本
- 转换语音特征(语音转语音)
- 通过文本提示生成音乐
- 制作音效
- 为音频添加多语言配音
- 扩展或补全歌曲
- 构建语音助手或有声读物
Available APIs (v7)
可用API(v7版本)
Voice Endpoints
语音接口
- Text to Speech:
POST https://modelslab.com/api/v7/voice/text-to-speech - Speech to Text:
POST https://modelslab.com/api/v7/voice/speech-to-text - Speech to Speech:
POST https://modelslab.com/api/v7/voice/speech-to-speech - Music Generation:
POST https://modelslab.com/api/v7/voice/music-gen - Sound Generation:
POST https://modelslab.com/api/v7/voice/sound-generation - Create Dubbing:
POST https://modelslab.com/api/v7/voice/create-dubbing - Song Extender:
POST https://modelslab.com/api/v7/voice/song-extender - Song Inpaint:
POST https://modelslab.com/api/v7/voice/song-inpaint - Fetch Result:
POST https://modelslab.com/api/v7/voice/fetch/{id}
Note: v6 endpoints (, etc.) still work but v7 is the current version. Parameter names have changed in v7 (e.g.,/api/v6/voice/text_to_speechis nowtext,promptis nowaudio).init_audio
- 文本转语音:
POST https://modelslab.com/api/v7/voice/text-to-speech - 语音转文本:
POST https://modelslab.com/api/v7/voice/speech-to-text - 语音转语音:
POST https://modelslab.com/api/v7/voice/speech-to-speech - 音乐生成:
POST https://modelslab.com/api/v7/voice/music-gen - 音效生成:
POST https://modelslab.com/api/v7/voice/sound-generation - 制作配音:
POST https://modelslab.com/api/v7/voice/create-dubbing - 歌曲扩展:
POST https://modelslab.com/api/v7/voice/song-extender - 歌曲补全:
POST https://modelslab.com/api/v7/voice/song-inpaint - 获取结果:
POST https://modelslab.com/api/v7/voice/fetch/{id}
注意:v6版本的接口(如等)仍可使用,但v7为当前最新版本。v7版本中部分参数名称已变更(例如:/api/v6/voice/text_to_speech改为text,prompt改为audio)。init_audio
Discovering Audio Models
发现音频模型
bash
undefinedbash
undefinedSearch audio/voice models
搜索音频/语音模型
modelslab models search --feature audio_gen
modelslab models search --feature audio_gen
Search by provider
按提供商搜索
modelslab models search --search "eleven"
modelslab models search --search "eleven"
Get model details
获取模型详情
modelslab models detail --id eleven_multilingual_v2
undefinedmodelslab models detail --id eleven_multilingual_v2
undefinedAudio Model IDs
音频模型ID
| model_id | Name | Use With |
|---|---|---|
| ElevenLabs Multilingual v2 | text-to-speech |
| ElevenLabs Voice Changer | speech-to-speech |
| ElevenLabs Scribe | speech-to-text |
| ElevenLabs Sound Effects | sound-generation |
| ElevenLabs Music | music-gen |
| Inworld TTS | text-to-speech |
| model_id | 名称 | 适用场景 |
|---|---|---|
| ElevenLabs Multilingual v2 | 文本转语音 |
| ElevenLabs Voice Changer | 语音转语音 |
| ElevenLabs Scribe | 语音转文本 |
| ElevenLabs Sound Effects | 音效生成 |
| ElevenLabs Music | 音乐生成 |
| Inworld TTS | 文本转语音 |
Text to Speech
文本转语音
python
import requests
import time
def text_to_speech(text, api_key, voice_id="21m00Tcm4TlvDq8ikWAM", model_id="eleven_multilingual_v2"):
"""Convert text to speech.
Args:
text: The text to convert to speech
api_key: Your ModelsLab API key
voice_id: ElevenLabs voice ID (see Available Voices below)
model_id: TTS model to use
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/text-to-speech",
json={
"key": api_key,
"prompt": text, # v7 uses "prompt" not "text"
"voice_id": voice_id,
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)
else:
raise Exception(f"Error: {data.get('message', 'Unknown error')}")python
import requests
import time
def text_to_speech(text, api_key, voice_id="21m00Tcm4TlvDq8ikWAM", model_id="eleven_multilingual_v2"):
"""Convert text to speech.
Args:
text: The text to convert to speech
api_key: Your ModelsLab API key
voice_id: ElevenLabs voice ID (see Available Voices below)
model_id: TTS model to use
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/text-to-speech",
json={
"key": api_key,
"prompt": text, # v7 uses "prompt" not "text"
"voice_id": voice_id,
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)
else:
raise Exception(f"Error: {data.get('message', 'Unknown error')}")Usage
Usage
audio_url = text_to_speech(
"Hello! Welcome to ModelsLab. This is a test of our text-to-speech API.",
"your_api_key"
)
print(f"Audio URL: {audio_url}")
undefinedaudio_url = text_to_speech(
"Hello! Welcome to ModelsLab. This is a test of our text-to-speech API.",
"your_api_key"
)
print(f"Audio URL: {audio_url}")
undefinedSpeech to Text (Transcription)
语音转文本(转录)
python
def speech_to_text(audio_url, api_key, model_id="scribe_v1"):
"""Transcribe speech from audio to text.
Args:
audio_url: URL of audio file (must be publicly accessible)
model_id: STT model to use
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/speech-to-text",
json={
"key": api_key,
"init_audio": audio_url, # v7 uses "init_audio" not "audio"
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)
else:
raise Exception(data.get("message"))python
def speech_to_text(audio_url, api_key, model_id="scribe_v1"):
"""Transcribe speech from audio to text.
Args:
audio_url: URL of audio file (must be publicly accessible)
model_id: STT model to use
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/speech-to-text",
json={
"key": api_key,
"init_audio": audio_url, # v7 uses "init_audio" not "audio"
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)
else:
raise Exception(data.get("message"))Transcribe audio
Transcribe audio
result = speech_to_text(
"https://example.com/speech.mp3",
"your_api_key"
)
print(f"Transcription: {result}")
undefinedresult = speech_to_text(
"https://example.com/speech.mp3",
"your_api_key"
)
print(f"Transcription: {result}")
undefinedSpeech to Speech (Voice Conversion)
语音转语音(语音转换)
python
def speech_to_speech(audio_url, voice_id, api_key, model_id="eleven_english_sts_v2"):
"""Convert voice characteristics in audio.
Args:
audio_url: URL of the source audio
voice_id: Target ElevenLabs voice ID
model_id: Voice conversion model
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/speech-to-speech",
json={
"key": api_key,
"init_audio": audio_url,
"voice_id": voice_id,
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)python
def speech_to_speech(audio_url, voice_id, api_key, model_id="eleven_english_sts_v2"):
"""Convert voice characteristics in audio.
Args:
audio_url: URL of the source audio
voice_id: Target ElevenLabs voice ID
model_id: Voice conversion model
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/speech-to-speech",
json={
"key": api_key,
"init_audio": audio_url,
"voice_id": voice_id,
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)Sound Effects Generation
音效生成
python
def generate_sound_effect(description, api_key, model_id="eleven_sound_effect"):
"""Generate a sound effect from a text description.
Args:
description: What sound to generate
model_id: Sound effects model
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/sound-generation",
json={
"key": api_key,
"prompt": description,
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)python
def generate_sound_effect(description, api_key, model_id="eleven_sound_effect"):
"""Generate a sound effect from a text description.
Args:
description: What sound to generate
model_id: Sound effects model
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/sound-generation",
json={
"key": api_key,
"prompt": description,
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)Generate door slam sound
Generate door slam sound
sfx_url = generate_sound_effect(
"Heavy wooden door slamming shut",
"your_api_key"
)
undefinedsfx_url = generate_sound_effect(
"Heavy wooden door slamming shut",
"your_api_key"
)
undefinedMusic Generation
音乐生成
python
def generate_music(prompt, api_key, model_id="music_v1"):
"""Generate music from a text description.
Args:
prompt: Description of music style/mood
model_id: Music generation model
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/music-gen",
json={
"key": api_key,
"prompt": prompt,
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)python
def generate_music(prompt, api_key, model_id="music_v1"):
"""Generate music from a text description.
Args:
prompt: Description of music style/mood
model_id: Music generation model
"""
response = requests.post(
"https://modelslab.com/api/v7/voice/music-gen",
json={
"key": api_key,
"prompt": prompt,
"model_id": model_id
}
)
data = response.json()
if data["status"] == "success":
return data["output"][0]
elif data["status"] == "processing":
return poll_audio_result(data["id"], api_key)Generate background music
Generate background music
music_url = generate_music(
"Upbeat electronic music with a driving beat, perfect for a tech startup video",
"your_api_key"
)
print(f"Music: {music_url}")
undefinedmusic_url = generate_music(
"Upbeat electronic music with a driving beat, perfect for a tech startup video",
"your_api_key"
)
print(f"Music: {music_url}")
undefinedPolling for Async Results
轮询异步结果
python
def poll_audio_result(request_id, api_key, timeout=300):
"""Poll for async audio generation results."""
start_time = time.time()
while time.time() - start_time < timeout:
fetch = requests.post(
f"https://modelslab.com/api/v7/voice/fetch/{request_id}",
json={"key": api_key}
)
result = fetch.json()
if result["status"] == "success":
return result["output"][0]
elif result["status"] == "failed":
raise Exception(result.get("message", "Generation failed"))
time.sleep(5)
raise Exception("Timeout waiting for audio generation")python
def poll_audio_result(request_id, api_key, timeout=300):
"""Poll for async audio generation results."""
start_time = time.time()
while time.time() - start_time < timeout:
fetch = requests.post(
f"https://modelslab.com/api/v7/voice/fetch/{request_id}",
json={"key": api_key}
)
result = fetch.json()
if result["status"] == "success":
return result["output"][0]
elif result["status"] == "failed":
raise Exception(result.get("message", "Generation failed"))
time.sleep(5)
raise Exception("Timeout waiting for audio generation")Available ElevenLabs Voice IDs
可用的ElevenLabs语音ID
| Voice ID | Name | Style |
|---|---|---|
| Rachel | Neutral, calm |
| Domi | Confident |
| Bella | Soft, warm |
| Antoni | Well-rounded |
| Elli | Young, clear |
| Josh | Deep, warm |
| Arnold | Strong |
| Adam | Deep, narrative |
| Sam | Dynamic |
| 语音ID | 名称 | 风格 |
|---|---|---|
| Rachel | 中性、沉稳 |
| Domi | 自信 |
| Bella | 柔和、温暖 |
| Antoni | 全面均衡 |
| Elli | 年轻、清晰 |
| Josh | 低沉、温暖 |
| Arnold | 强劲有力 |
| Adam | 低沉、叙事感 |
| Sam | 富有活力 |
Key Parameters
关键参数
Text to Speech
文本转语音
| Parameter | Type | Required | Description |
|---|---|---|---|
| string | Yes | Text to convert to speech |
| string | Yes | ElevenLabs voice identifier |
| string | Yes | TTS model (e.g., |
| float | No | Voice variation |
| string | No | Async notification URL |
| 参数 | 类型 | 是否必填 | 描述 |
|---|---|---|---|
| string | 是 | 要转换为语音的文本 |
| string | 是 | ElevenLabs语音标识符 |
| string | 是 | TTS模型(例如: |
| float | 否 | 语音变化程度 |
| string | 否 | 异步通知URL |
Speech to Text
语音转文本
| Parameter | Type | Required | Description |
|---|---|---|---|
| string | Yes | URL of audio to transcribe |
| string | Yes | STT model (e.g., |
| 参数 | 类型 | 是否必填 | 描述 |
|---|---|---|---|
| string | 是 | 要转录的音频URL |
| string | 是 | STT模型(例如: |
Sound Generation
音效生成
| Parameter | Type | Required | Description |
|---|---|---|---|
| string | Yes | Sound effect description |
| string | Yes | SFX model (e.g., |
| 参数 | 类型 | 是否必填 | 描述 |
|---|---|---|---|
| string | 是 | 音效描述文本 |
| string | 是 | 音效模型(例如: |
v6 to v7 Parameter Changes
v6到v7的参数变更
| v6 Parameter | v7 Parameter | Notes |
|---|---|---|
| | TTS text input |
| | STT/STS audio input |
| | Voice-to-voice source |
| (not required) | | Now required on all endpoints |
| v6 参数 | v7 参数 | 说明 |
|---|---|---|
| | TTS文本输入 |
| | STT/STS音频输入 |
| | 语音转换的源音频 |
| (非必填) | | 现在所有接口均为必填项 |
Best Practices
最佳实践
1. Use Correct Voice IDs
1. 使用正确的语音ID
TTS requires valid ElevenLabs voice IDs (not generic names like "alloy").
TTS需要有效的ElevenLabs语音ID(而非通用名称如"alloy")。
2. Ensure Audio Accessibility
2. 确保音频可访问性
Audio URLs for speech-to-text must be publicly accessible without redirects or authentication.
用于语音转文本的音频URL必须是公开可访问的,无重定向或认证要求。
3. Use Webhooks for Long Operations
3. 对长时间操作使用Webhook
python
payload = {
"key": api_key,
"prompt": "...",
"model_id": "eleven_multilingual_v2",
"webhook": "https://yourserver.com/webhook/audio",
"track_id": "audio_001"
}python
payload = {
"key": api_key,
"prompt": "...",
"model_id": "eleven_multilingual_v2",
"webhook": "https://yourserver.com/webhook/audio",
"track_id": "audio_001"
}Error Handling
错误处理
python
try:
audio = text_to_speech(text, api_key)
print(f"Audio generated: {audio}")
except Exception as e:
print(f"Audio generation failed: {e}")python
try:
audio = text_to_speech(text, api_key)
print(f"Audio generated: {audio}")
except Exception as e:
print(f"Audio generation failed: {e}")Resources
相关资源
- Audio API Docs: https://docs.modelslab.com/voice-cloning/overview
- Model Browser: https://modelslab.com/models
- Model Selection Guide: https://docs.modelslab.com/guides/model-selection
- Get API Key: https://modelslab.com/dashboard
Related Skills
相关技能
- - Find and filter models
modelslab-model-discovery - - Add audio to videos
modelslab-video-generation - - Chat with LLM models
modelslab-chat-generation - - Handle async audio generation
modelslab-webhooks
- - 查找和筛选模型
modelslab-model-discovery - - 为视频添加音频
modelslab-video-generation - - 与大语言模型对话
modelslab-chat-generation - - 处理异步音频生成
modelslab-webhooks