fal-audio
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesefal.ai Audio
fal.ai 音频处理
Text-to-speech and speech-to-text using state-of-the-art audio models on fal.ai.
借助fal.ai上的前沿音频模型实现文本转语音(TTS)与语音转文本(STT)功能。
How It Works
工作原理
- User provides text (for TTS) or audio URL (for STT)
- Script selects appropriate model
- Sends request to fal.ai API
- Returns audio URL (TTS) or transcription text (STT)
- 用户提供文本(用于TTS)或音频URL(用于STT)
- 脚本选择合适的模型
- 向fal.ai API发送请求
- 返回音频URL(TTS)或转录文本(STT)
Text-to-Speech Models
文本转语音模型
| Model | Notes |
|---|---|
| Best quality |
| Fast, good quality |
| Natural voices |
| Multi-language, fast |
| For video sync |
| 模型 | 说明 |
|---|---|
| 最佳音质 |
| 速度快,音质佳 |
| 语音自然逼真 |
| 多语言支持,速度快 |
| 用于视频同步 |
Text-to-Music Models
文本转音乐模型
| Model | Notes |
|---|---|
| Best quality |
| Fast |
| Google's model |
| Song generation |
| Instrumental |
| Short clips |
| Background music |
| 模型 | 说明 |
|---|---|
| 最佳音质 |
| 速度快 |
| Google官方模型 |
| 歌曲生成 |
| 器乐生成 |
| 短片段生成 |
| 背景音乐生成 |
Speech-to-Text Models
语音转文本模型
| Model | Features | Speed |
|---|---|---|
| Multi-language, timestamps | Fast |
| Speaker diarization | Medium |
| 模型 | 特性 | 速度 |
|---|---|---|
| 多语言支持,带时间戳 | 快 |
| 说话人分离 | 中等 |
Usage
使用方法
Text-to-Speech
文本转语音
bash
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [options]Arguments:
- - Text to convert to speech (required)
--text - - TTS model (defaults to
--model)fal-ai/minimax/speech-2.6-turbo - - Voice ID or name (model-specific)
--voice
Examples:
bash
undefinedbash
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [选项]参数说明:
- - 待转换为语音的文本(必填)
--text - - TTS模型(默认值:
--model)fal-ai/minimax/speech-2.6-turbo - - 语音ID或名称(模型专属)
--voice
示例:
bash
undefinedBasic TTS (fast, good quality)
基础TTS(速度快,音质佳)
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "Hello, welcome to the future of AI."
--text "Hello, welcome to the future of AI."
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "你好,欢迎来到AI的未来。"
--text "你好,欢迎来到AI的未来。"
High quality with MiniMax HD
使用MiniMax HD生成高品质语音
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "This is premium quality speech."
--model "fal-ai/minimax/speech-2.6-hd"
--text "This is premium quality speech."
--model "fal-ai/minimax/speech-2.6-hd"
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "这是高品质的语音生成效果。"
--model "fal-ai/minimax/speech-2.6-hd"
--text "这是高品质的语音生成效果。"
--model "fal-ai/minimax/speech-2.6-hd"
Natural voices with ElevenLabs
使用ElevenLabs生成自然语音
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "Natural sounding voice generation"
--model "fal-ai/elevenlabs/eleven-v3"
--text "Natural sounding voice generation"
--model "fal-ai/elevenlabs/eleven-v3"
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "生成自然逼真的语音效果"
--model "fal-ai/elevenlabs/eleven-v3"
--text "生成自然逼真的语音效果"
--model "fal-ai/elevenlabs/eleven-v3"
Multi-language TTS
多语言TTS示例
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "Bonjour, bienvenue dans le futur."
--model "fal-ai/chatterbox/multilingual"
--text "Bonjour, bienvenue dans le futur."
--model "fal-ai/chatterbox/multilingual"
undefinedbash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "Bonjour, bienvenue dans le futur."
--model "fal-ai/chatterbox/multilingual"
--text "Bonjour, bienvenue dans le futur."
--model "fal-ai/chatterbox/multilingual"
undefinedSpeech-to-Text
语音转文本
bash
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [options]Arguments:
- - URL of audio file to transcribe (required)
--audio-url - - STT model (defaults to
--model)fal-ai/whisper - - Language code (optional, auto-detected)
--language
Examples:
bash
undefinedbash
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [选项]参数说明:
- - 待转录的音频文件URL(必填)
--audio-url - - STT模型(默认值:
--model)fal-ai/whisper - - 语言代码(可选,自动检测)
--language
示例:
bash
undefinedTranscribe with Whisper
使用Whisper模型转录音频
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/audio.mp3"
--audio-url "https://example.com/audio.mp3"
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/audio.mp3"
--audio-url "https://example.com/audio.mp3"
Transcribe with speaker diarization
使用说话人分离功能转录会议音频
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/meeting.mp3"
--model "fal-ai/elevenlabs/scribe"
--audio-url "https://example.com/meeting.mp3"
--model "fal-ai/elevenlabs/scribe"
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/meeting.mp3"
--model "fal-ai/elevenlabs/scribe"
--audio-url "https://example.com/meeting.mp3"
--model "fal-ai/elevenlabs/scribe"
Transcribe specific language
指定语言转录音频
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/spanish.mp3"
--language "es"
--audio-url "https://example.com/spanish.mp3"
--language "es"
undefinedbash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/spanish.mp3"
--language "es"
--audio-url "https://example.com/spanish.mp3"
--language "es"
undefinedMCP Tool Alternative
MCP工具替代方案
Text-to-Speech
文本转语音
javascript
mcp__fal-ai__generate({
modelId: "fal-ai/minimax/speech-2.6-turbo",
input: {
text: "Hello, welcome to the future of AI."
}
})javascript
mcp__fal-ai__generate({
modelId: "fal-ai/minimax/speech-2.6-turbo",
input: {
text: "你好,欢迎来到AI的未来。"
}
})Speech-to-Text
语音转文本
javascript
mcp__fal-ai__generate({
modelId: "fal-ai/whisper",
input: {
audio_url: "https://example.com/audio.mp3"
}
})javascript
mcp__fal-ai__generate({
modelId: "fal-ai/whisper",
input: {
audio_url: "https://example.com/audio.mp3"
}
})Output
输出示例
Text-to-Speech Output
文本转语音输出
Generating speech...
Model: fal-ai/minimax/speech-2.6-turbo
Speech generated!
Audio URL: https://v3.fal.media/files/abc123/speech.mp3
Duration: 5.2s正在生成语音...
模型:fal-ai/minimax/speech-2.6-turbo
语音生成完成!
音频URL:https://v3.fal.media/files/abc123/speech.mp3
时长:5.2秒Speech-to-Text Output
语音转文本输出
Transcribing audio...
Model: fal-ai/whisper
Transcription complete!
Text: "Hello, this is the transcribed text from the audio file."
Duration: 12.5s
Language: en正在转录音频...
模型:fal-ai/whisper
转录完成!
文本:"你好,这是从音频文件中转录得到的文本内容。"
时长:12.5秒
语言:中文Present Results to User
向用户展示结果
For TTS:
TTS结果展示:
Here's the generated speech:
[Download audio](https://v3.fal.media/files/.../speech.mp3)
• Duration: 5.2s | Model: Maya TTS以下是生成的语音:
[下载音频](https://v3.fal.media/files/.../speech.mp3)
• 时长:5.2秒 | 模型:Maya TTSFor STT:
STT结果展示:
Here's the transcription:
"Hello, this is the transcribed text from the audio file."
• Duration: 12.5s | Language: English以下是音频转录内容:
"你好,这是从音频文件中转录得到的文本内容。"
• 时长:12.5秒 | 语言:中文Model Selection Guide
模型选择指南
Text-to-Speech
文本转语音
MiniMax Speech 2.6 HD ()
fal-ai/minimax/speech-2.6-hd- Best for: Premium quality requirements
- Quality: Highest
- Speed: Medium
MiniMax Speech 2.6 Turbo ()
fal-ai/minimax/speech-2.6-turbo- Best for: General use with good quality
- Quality: High
- Speed: Fast
ElevenLabs v3 ()
fal-ai/elevenlabs/eleven-v3- Best for: Natural, realistic voices
- Quality: High
- Features: Many voice options
Chatterbox Multilingual ()
fal-ai/chatterbox/multilingual- Best for: Multi-language support
- Quality: Good
- Speed: Fast
MiniMax Speech 2.6 HD ()
fal-ai/minimax/speech-2.6-hd- 适用场景:对音质有极高要求的场景
- 音质:最高
- 速度:中等
MiniMax Speech 2.6 Turbo ()
fal-ai/minimax/speech-2.6-turbo- 适用场景:通用场景,兼顾音质与速度
- 音质:高
- 速度:快
ElevenLabs v3 ()
fal-ai/elevenlabs/eleven-v3- 适用场景:需要自然逼真语音的场景
- 音质:高
- 特性:提供多种语音选项
Chatterbox Multilingual ()
fal-ai/chatterbox/multilingual- 适用场景:多语言需求场景
- 音质:良好
- 速度:快
Text-to-Music
文本转音乐
MiniMax Music v2 ()
fal-ai/minimax-music/v2- Best for: High quality music generation
- Quality: Highest
Lyria2 ()
fal-ai/lyria2- Best for: Google's music model
- Quality: High
MiniMax Music v2 ()
fal-ai/minimax-music/v2- 适用场景:高品质音乐生成需求
- 音质:最高
Lyria2 ()
fal-ai/lyria2- 适用场景:使用Google官方音乐模型的需求
- 音质:高
Speech-to-Text
语音转文本
Whisper ()
fal-ai/whisper- Best for: General transcription, timestamps
- Languages: 99+ languages
- Features: Word-level timestamps
ElevenLabs Scribe ()
fal-ai/elevenlabs/scribe- Best for: Multi-speaker recordings
- Features: Speaker diarization
- Quality: Professional-grade
Whisper ()
fal-ai/whisper- 适用场景:通用转录需求,需要时间戳
- 支持语言:99+种
- 特性:支持词级时间戳
ElevenLabs Scribe ()
fal-ai/elevenlabs/scribe- 适用场景:多说话人录音转录
- 特性:说话人分离
- 音质:专业级
Troubleshooting
故障排查
Empty Audio
生成音频为空
Error: Generated audio is empty
Check that your text is not empty and contains valid content.错误:生成的音频为空
请检查输入文本是否为空或包含有效内容。Unsupported Audio Format
不支持的音频格式
Error: Audio format not supported
Supported formats: MP3, WAV, M4A, FLAC, OGG
Convert your audio to a supported format.错误:音频格式不支持
支持的格式:MP3、WAV、M4A、FLAC、OGG
请将音频转换为支持的格式。Language Detection Failed
语言检测失败
Warning: Could not detect language, defaulting to English
Specify the language explicitly with --language option.警告:无法检测语言,默认使用英文
请通过--language选项明确指定语言。