modelslab-audio-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ModelsLab Audio Generation

ModelsLab 音频生成

Generate high-quality audio including speech, music, voice conversion, sound effects, and dubbing using AI.
借助AI生成高质量音频,包括语音、音乐、语音转换、音效和配音。

When to Use This Skill

何时使用该技能

  • Convert text to natural-sounding speech (TTS)
  • Transcribe speech to text
  • Transform voice characteristics (speech-to-speech)
  • Generate music from text prompts
  • Create sound effects
  • Dub audio into different languages
  • Extend or inpaint songs
  • Build voice assistants or audiobooks
  • 将文本转换为自然流畅的语音(TTS)
  • 将语音转录为文本
  • 转换语音特征(语音转语音)
  • 通过文本提示生成音乐
  • 制作音效
  • 为音频添加多语言配音
  • 扩展或补全歌曲
  • 构建语音助手或有声读物

Available APIs (v7)

可用API(v7版本)

Voice Endpoints

语音接口

  • Text to Speech:
    POST https://modelslab.com/api/v7/voice/text-to-speech
  • Speech to Text:
    POST https://modelslab.com/api/v7/voice/speech-to-text
  • Speech to Speech:
    POST https://modelslab.com/api/v7/voice/speech-to-speech
  • Music Generation:
    POST https://modelslab.com/api/v7/voice/music-gen
  • Sound Generation:
    POST https://modelslab.com/api/v7/voice/sound-generation
  • Create Dubbing:
    POST https://modelslab.com/api/v7/voice/create-dubbing
  • Song Extender:
    POST https://modelslab.com/api/v7/voice/song-extender
  • Song Inpaint:
    POST https://modelslab.com/api/v7/voice/song-inpaint
  • Fetch Result:
    POST https://modelslab.com/api/v7/voice/fetch/{id}
Note: v6 endpoints (
/api/v6/voice/text_to_speech
, etc.) still work but v7 is the current version. Parameter names have changed in v7 (e.g.,
text
is now
prompt
,
audio
is now
init_audio
).
  • 文本转语音
    POST https://modelslab.com/api/v7/voice/text-to-speech
  • 语音转文本
    POST https://modelslab.com/api/v7/voice/speech-to-text
  • 语音转语音
    POST https://modelslab.com/api/v7/voice/speech-to-speech
  • 音乐生成
    POST https://modelslab.com/api/v7/voice/music-gen
  • 音效生成
    POST https://modelslab.com/api/v7/voice/sound-generation
  • 制作配音
    POST https://modelslab.com/api/v7/voice/create-dubbing
  • 歌曲扩展
    POST https://modelslab.com/api/v7/voice/song-extender
  • 歌曲补全
    POST https://modelslab.com/api/v7/voice/song-inpaint
  • 获取结果
    POST https://modelslab.com/api/v7/voice/fetch/{id}
注意:v6版本的接口(如
/api/v6/voice/text_to_speech
等)仍可使用,但v7为当前最新版本。v7版本中部分参数名称已变更(例如:
text
改为
prompt
audio
改为
init_audio
)。

Discovering Audio Models

发现音频模型

bash
undefined
bash
undefined

Search audio/voice models

搜索音频/语音模型

modelslab models search --feature audio_gen
modelslab models search --feature audio_gen

Search by provider

按提供商搜索

modelslab models search --search "eleven"
modelslab models search --search "eleven"

Get model details

获取模型详情

modelslab models detail --id eleven_multilingual_v2
undefined
modelslab models detail --id eleven_multilingual_v2
undefined

Audio Model IDs

音频模型ID

model_idNameUse With
eleven_multilingual_v2
ElevenLabs Multilingual v2text-to-speech
eleven_english_sts_v2
ElevenLabs Voice Changerspeech-to-speech
scribe_v1
ElevenLabs Scribespeech-to-text
eleven_sound_effect
ElevenLabs Sound Effectssound-generation
music_v1
ElevenLabs Musicmusic-gen
inworld-tts-1
Inworld TTStext-to-speech
model_id名称适用场景
eleven_multilingual_v2
ElevenLabs Multilingual v2文本转语音
eleven_english_sts_v2
ElevenLabs Voice Changer语音转语音
scribe_v1
ElevenLabs Scribe语音转文本
eleven_sound_effect
ElevenLabs Sound Effects音效生成
music_v1
ElevenLabs Music音乐生成
inworld-tts-1
Inworld TTS文本转语音

Text to Speech

文本转语音

python
import requests
import time

def text_to_speech(text, api_key, voice_id="21m00Tcm4TlvDq8ikWAM", model_id="eleven_multilingual_v2"):
    """Convert text to speech.

    Args:
        text: The text to convert to speech
        api_key: Your ModelsLab API key
        voice_id: ElevenLabs voice ID (see Available Voices below)
        model_id: TTS model to use
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/text-to-speech",
        json={
            "key": api_key,
            "prompt": text,             # v7 uses "prompt" not "text"
            "voice_id": voice_id,
            "model_id": model_id
        }
    )

    data = response.json()

    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
    else:
        raise Exception(f"Error: {data.get('message', 'Unknown error')}")
python
import requests
import time

def text_to_speech(text, api_key, voice_id="21m00Tcm4TlvDq8ikWAM", model_id="eleven_multilingual_v2"):
    """Convert text to speech.

    Args:
        text: The text to convert to speech
        api_key: Your ModelsLab API key
        voice_id: ElevenLabs voice ID (see Available Voices below)
        model_id: TTS model to use
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/text-to-speech",
        json={
            "key": api_key,
            "prompt": text,             # v7 uses "prompt" not "text"
            "voice_id": voice_id,
            "model_id": model_id
        }
    )

    data = response.json()

    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
    else:
        raise Exception(f"Error: {data.get('message', 'Unknown error')}")

Usage

Usage

audio_url = text_to_speech( "Hello! Welcome to ModelsLab. This is a test of our text-to-speech API.", "your_api_key" ) print(f"Audio URL: {audio_url}")
undefined
audio_url = text_to_speech( "Hello! Welcome to ModelsLab. This is a test of our text-to-speech API.", "your_api_key" ) print(f"Audio URL: {audio_url}")
undefined

Speech to Text (Transcription)

语音转文本(转录)

python
def speech_to_text(audio_url, api_key, model_id="scribe_v1"):
    """Transcribe speech from audio to text.

    Args:
        audio_url: URL of audio file (must be publicly accessible)
        model_id: STT model to use
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/speech-to-text",
        json={
            "key": api_key,
            "init_audio": audio_url,    # v7 uses "init_audio" not "audio"
            "model_id": model_id
        }
    )

    data = response.json()

    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
    else:
        raise Exception(data.get("message"))
python
def speech_to_text(audio_url, api_key, model_id="scribe_v1"):
    """Transcribe speech from audio to text.

    Args:
        audio_url: URL of audio file (must be publicly accessible)
        model_id: STT model to use
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/speech-to-text",
        json={
            "key": api_key,
            "init_audio": audio_url,    # v7 uses "init_audio" not "audio"
            "model_id": model_id
        }
    )

    data = response.json()

    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
    else:
        raise Exception(data.get("message"))

Transcribe audio

Transcribe audio

result = speech_to_text( "https://example.com/speech.mp3", "your_api_key" ) print(f"Transcription: {result}")
undefined
result = speech_to_text( "https://example.com/speech.mp3", "your_api_key" ) print(f"Transcription: {result}")
undefined

Speech to Speech (Voice Conversion)

语音转语音(语音转换)

python
def speech_to_speech(audio_url, voice_id, api_key, model_id="eleven_english_sts_v2"):
    """Convert voice characteristics in audio.

    Args:
        audio_url: URL of the source audio
        voice_id: Target ElevenLabs voice ID
        model_id: Voice conversion model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/speech-to-speech",
        json={
            "key": api_key,
            "init_audio": audio_url,
            "voice_id": voice_id,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
python
def speech_to_speech(audio_url, voice_id, api_key, model_id="eleven_english_sts_v2"):
    """Convert voice characteristics in audio.

    Args:
        audio_url: URL of the source audio
        voice_id: Target ElevenLabs voice ID
        model_id: Voice conversion model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/speech-to-speech",
        json={
            "key": api_key,
            "init_audio": audio_url,
            "voice_id": voice_id,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)

Sound Effects Generation

音效生成

python
def generate_sound_effect(description, api_key, model_id="eleven_sound_effect"):
    """Generate a sound effect from a text description.

    Args:
        description: What sound to generate
        model_id: Sound effects model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/sound-generation",
        json={
            "key": api_key,
            "prompt": description,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
python
def generate_sound_effect(description, api_key, model_id="eleven_sound_effect"):
    """Generate a sound effect from a text description.

    Args:
        description: What sound to generate
        model_id: Sound effects model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/sound-generation",
        json={
            "key": api_key,
            "prompt": description,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)

Generate door slam sound

Generate door slam sound

sfx_url = generate_sound_effect( "Heavy wooden door slamming shut", "your_api_key" )
undefined
sfx_url = generate_sound_effect( "Heavy wooden door slamming shut", "your_api_key" )
undefined

Music Generation

音乐生成

python
def generate_music(prompt, api_key, model_id="music_v1"):
    """Generate music from a text description.

    Args:
        prompt: Description of music style/mood
        model_id: Music generation model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/music-gen",
        json={
            "key": api_key,
            "prompt": prompt,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)
python
def generate_music(prompt, api_key, model_id="music_v1"):
    """Generate music from a text description.

    Args:
        prompt: Description of music style/mood
        model_id: Music generation model
    """
    response = requests.post(
        "https://modelslab.com/api/v7/voice/music-gen",
        json={
            "key": api_key,
            "prompt": prompt,
            "model_id": model_id
        }
    )

    data = response.json()
    if data["status"] == "success":
        return data["output"][0]
    elif data["status"] == "processing":
        return poll_audio_result(data["id"], api_key)

Generate background music

Generate background music

music_url = generate_music( "Upbeat electronic music with a driving beat, perfect for a tech startup video", "your_api_key" ) print(f"Music: {music_url}")
undefined
music_url = generate_music( "Upbeat electronic music with a driving beat, perfect for a tech startup video", "your_api_key" ) print(f"Music: {music_url}")
undefined

Polling for Async Results

轮询异步结果

python
def poll_audio_result(request_id, api_key, timeout=300):
    """Poll for async audio generation results."""
    start_time = time.time()

    while time.time() - start_time < timeout:
        fetch = requests.post(
            f"https://modelslab.com/api/v7/voice/fetch/{request_id}",
            json={"key": api_key}
        )
        result = fetch.json()

        if result["status"] == "success":
            return result["output"][0]
        elif result["status"] == "failed":
            raise Exception(result.get("message", "Generation failed"))

        time.sleep(5)

    raise Exception("Timeout waiting for audio generation")
python
def poll_audio_result(request_id, api_key, timeout=300):
    """Poll for async audio generation results."""
    start_time = time.time()

    while time.time() - start_time < timeout:
        fetch = requests.post(
            f"https://modelslab.com/api/v7/voice/fetch/{request_id}",
            json={"key": api_key}
        )
        result = fetch.json()

        if result["status"] == "success":
            return result["output"][0]
        elif result["status"] == "failed":
            raise Exception(result.get("message", "Generation failed"))

        time.sleep(5)

    raise Exception("Timeout waiting for audio generation")

Available ElevenLabs Voice IDs

可用的ElevenLabs语音ID

Voice IDNameStyle
21m00Tcm4TlvDq8ikWAM
RachelNeutral, calm
AZnzlk1XvdvUeBnXmlld
DomiConfident
EXAVITQu4vr4xnSDxMaL
BellaSoft, warm
ErXwobaYiN019PkySvjV
AntoniWell-rounded
MF3mGyEYCl7XYWbV9V6O
ElliYoung, clear
TxGEqnHWrfWFTfGW9XjX
JoshDeep, warm
VR6AewLTigWG4xSOukaG
ArnoldStrong
pNInz6obpgDQGcFmaJgB
AdamDeep, narrative
yoZ06aMxZJJ28mfd3POQ
SamDynamic
语音ID名称风格
21m00Tcm4TlvDq8ikWAM
Rachel中性、沉稳
AZnzlk1XvdvUeBnXmlld
Domi自信
EXAVITQu4vr4xnSDxMaL
Bella柔和、温暖
ErXwobaYiN019PkySvjV
Antoni全面均衡
MF3mGyEYCl7XYWbV9V6O
Elli年轻、清晰
TxGEqnHWrfWFTfGW9XjX
Josh低沉、温暖
VR6AewLTigWG4xSOukaG
Arnold强劲有力
pNInz6obpgDQGcFmaJgB
Adam低沉、叙事感
yoZ06aMxZJJ28mfd3POQ
Sam富有活力

Key Parameters

关键参数

Text to Speech

文本转语音

ParameterTypeRequiredDescription
prompt
stringYesText to convert to speech
voice_id
stringYesElevenLabs voice identifier
model_id
stringYesTTS model (e.g.,
eleven_multilingual_v2
)
temperature
floatNoVoice variation
webhook
stringNoAsync notification URL
参数类型是否必填描述
prompt
string要转换为语音的文本
voice_id
stringElevenLabs语音标识符
model_id
stringTTS模型(例如:
eleven_multilingual_v2
temperature
float语音变化程度
webhook
string异步通知URL

Speech to Text

语音转文本

ParameterTypeRequiredDescription
init_audio
stringYesURL of audio to transcribe
model_id
stringYesSTT model (e.g.,
scribe_v1
)
参数类型是否必填描述
init_audio
string要转录的音频URL
model_id
stringSTT模型(例如:
scribe_v1

Sound Generation

音效生成

ParameterTypeRequiredDescription
prompt
stringYesSound effect description
model_id
stringYesSFX model (e.g.,
eleven_sound_effect
)
参数类型是否必填描述
prompt
string音效描述文本
model_id
string音效模型(例如:
eleven_sound_effect

v6 to v7 Parameter Changes

v6到v7的参数变更

v6 Parameterv7 ParameterNotes
text
prompt
TTS text input
audio
init_audio
STT/STS audio input
target_audio
init_audio
Voice-to-voice source
(not required)
model_id
Now required on all endpoints
v6 参数v7 参数说明
text
prompt
TTS文本输入
audio
init_audio
STT/STS音频输入
target_audio
init_audio
语音转换的源音频
(非必填)
model_id
现在所有接口均为必填项

Best Practices

最佳实践

1. Use Correct Voice IDs

1. 使用正确的语音ID

TTS requires valid ElevenLabs voice IDs (not generic names like "alloy").
TTS需要有效的ElevenLabs语音ID(而非通用名称如"alloy")。

2. Ensure Audio Accessibility

2. 确保音频可访问性

Audio URLs for speech-to-text must be publicly accessible without redirects or authentication.
用于语音转文本的音频URL必须是公开可访问的,无重定向或认证要求。

3. Use Webhooks for Long Operations

3. 对长时间操作使用Webhook

python
payload = {
    "key": api_key,
    "prompt": "...",
    "model_id": "eleven_multilingual_v2",
    "webhook": "https://yourserver.com/webhook/audio",
    "track_id": "audio_001"
}
python
payload = {
    "key": api_key,
    "prompt": "...",
    "model_id": "eleven_multilingual_v2",
    "webhook": "https://yourserver.com/webhook/audio",
    "track_id": "audio_001"
}

Error Handling

错误处理

python
try:
    audio = text_to_speech(text, api_key)
    print(f"Audio generated: {audio}")
except Exception as e:
    print(f"Audio generation failed: {e}")
python
try:
    audio = text_to_speech(text, api_key)
    print(f"Audio generated: {audio}")
except Exception as e:
    print(f"Audio generation failed: {e}")

Resources

相关资源

Related Skills

相关技能

  • modelslab-model-discovery
    - Find and filter models
  • modelslab-video-generation
    - Add audio to videos
  • modelslab-chat-generation
    - Chat with LLM models
  • modelslab-webhooks
    - Handle async audio generation
  • modelslab-model-discovery
    - 查找和筛选模型
  • modelslab-video-generation
    - 为视频添加音频
  • modelslab-chat-generation
    - 与大语言模型对话
  • modelslab-webhooks
    - 处理异步音频生成