fal-audio

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

fal.ai Audio

fal.ai 音频处理

Text-to-speech and speech-to-text using state-of-the-art audio models on fal.ai.
借助fal.ai上的前沿音频模型实现文本转语音(TTS)与语音转文本(STT)功能。

How It Works

工作原理

  1. User provides text (for TTS) or audio URL (for STT)
  2. Script selects appropriate model
  3. Sends request to fal.ai API
  4. Returns audio URL (TTS) or transcription text (STT)
  1. 用户提供文本(用于TTS)或音频URL(用于STT)
  2. 脚本选择合适的模型
  3. 向fal.ai API发送请求
  4. 返回音频URL(TTS)或转录文本(STT)

Text-to-Speech Models

文本转语音模型

ModelNotes
fal-ai/minimax/speech-2.6-hd
Best quality
fal-ai/minimax/speech-2.6-turbo
Fast, good quality
fal-ai/elevenlabs/eleven-v3
Natural voices
fal-ai/chatterbox/multilingual
Multi-language, fast
fal-ai/kling-video/v1/tts
For video sync
模型说明
fal-ai/minimax/speech-2.6-hd
最佳音质
fal-ai/minimax/speech-2.6-turbo
速度快,音质佳
fal-ai/elevenlabs/eleven-v3
语音自然逼真
fal-ai/chatterbox/multilingual
多语言支持,速度快
fal-ai/kling-video/v1/tts
用于视频同步

Text-to-Music Models

文本转音乐模型

ModelNotes
fal-ai/minimax-music/v2
Best quality
fal-ai/minimax-music/v1.5
Fast
fal-ai/lyria2
Google's model
fal-ai/elevenlabs/music
Song generation
fal-ai/sonauto/v2
Instrumental
fal-ai/ace-step
Short clips
fal-ai/beatoven
Background music
模型说明
fal-ai/minimax-music/v2
最佳音质
fal-ai/minimax-music/v1.5
速度快
fal-ai/lyria2
Google官方模型
fal-ai/elevenlabs/music
歌曲生成
fal-ai/sonauto/v2
器乐生成
fal-ai/ace-step
短片段生成
fal-ai/beatoven
背景音乐生成

Speech-to-Text Models

语音转文本模型

ModelFeaturesSpeed
fal-ai/whisper
Multi-language, timestampsFast
fal-ai/elevenlabs/scribe
Speaker diarizationMedium
模型特性速度
fal-ai/whisper
多语言支持,带时间戳
fal-ai/elevenlabs/scribe
说话人分离中等

Usage

使用方法

Text-to-Speech

文本转语音

bash
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [options]
Arguments:
  • --text
    - Text to convert to speech (required)
  • --model
    - TTS model (defaults to
    fal-ai/minimax/speech-2.6-turbo
    )
  • --voice
    - Voice ID or name (model-specific)
Examples:
bash
undefined
bash
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [选项]
参数说明:
  • --text
    - 待转换为语音的文本(必填)
  • --model
    - TTS模型(默认值:
    fal-ai/minimax/speech-2.6-turbo
  • --voice
    - 语音ID或名称(模型专属)
示例:
bash
undefined

Basic TTS (fast, good quality)

基础TTS(速度快,音质佳)

bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "Hello, welcome to the future of AI."
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "你好,欢迎来到AI的未来。"

High quality with MiniMax HD

使用MiniMax HD生成高品质语音

bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "This is premium quality speech."
--model "fal-ai/minimax/speech-2.6-hd"
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "这是高品质的语音生成效果。"
--model "fal-ai/minimax/speech-2.6-hd"

Natural voices with ElevenLabs

使用ElevenLabs生成自然语音

bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "Natural sounding voice generation"
--model "fal-ai/elevenlabs/eleven-v3"
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "生成自然逼真的语音效果"
--model "fal-ai/elevenlabs/eleven-v3"

Multi-language TTS

多语言TTS示例

bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "Bonjour, bienvenue dans le futur."
--model "fal-ai/chatterbox/multilingual"
undefined
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh
--text "Bonjour, bienvenue dans le futur."
--model "fal-ai/chatterbox/multilingual"
undefined

Speech-to-Text

语音转文本

bash
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [options]
Arguments:
  • --audio-url
    - URL of audio file to transcribe (required)
  • --model
    - STT model (defaults to
    fal-ai/whisper
    )
  • --language
    - Language code (optional, auto-detected)
Examples:
bash
undefined
bash
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [选项]
参数说明:
  • --audio-url
    - 待转录的音频文件URL(必填)
  • --model
    - STT模型(默认值:
    fal-ai/whisper
  • --language
    - 语言代码(可选,自动检测)
示例:
bash
undefined

Transcribe with Whisper

使用Whisper模型转录音频

bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/audio.mp3"
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/audio.mp3"

Transcribe with speaker diarization

使用说话人分离功能转录会议音频

bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/meeting.mp3"
--model "fal-ai/elevenlabs/scribe"
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/meeting.mp3"
--model "fal-ai/elevenlabs/scribe"

Transcribe specific language

指定语言转录音频

bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/spanish.mp3"
--language "es"
undefined
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh
--audio-url "https://example.com/spanish.mp3"
--language "es"
undefined

MCP Tool Alternative

MCP工具替代方案

Text-to-Speech

文本转语音

javascript
mcp__fal-ai__generate({
  modelId: "fal-ai/minimax/speech-2.6-turbo",
  input: {
    text: "Hello, welcome to the future of AI."
  }
})
javascript
mcp__fal-ai__generate({
  modelId: "fal-ai/minimax/speech-2.6-turbo",
  input: {
    text: "你好,欢迎来到AI的未来。"
  }
})

Speech-to-Text

语音转文本

javascript
mcp__fal-ai__generate({
  modelId: "fal-ai/whisper",
  input: {
    audio_url: "https://example.com/audio.mp3"
  }
})
javascript
mcp__fal-ai__generate({
  modelId: "fal-ai/whisper",
  input: {
    audio_url: "https://example.com/audio.mp3"
  }
})

Output

输出示例

Text-to-Speech Output

文本转语音输出

Generating speech...
Model: fal-ai/minimax/speech-2.6-turbo

Speech generated!

Audio URL: https://v3.fal.media/files/abc123/speech.mp3
Duration: 5.2s
正在生成语音...
模型:fal-ai/minimax/speech-2.6-turbo

语音生成完成!

音频URL:https://v3.fal.media/files/abc123/speech.mp3
时长:5.2秒

Speech-to-Text Output

语音转文本输出

Transcribing audio...
Model: fal-ai/whisper

Transcription complete!

Text: "Hello, this is the transcribed text from the audio file."
Duration: 12.5s
Language: en
正在转录音频...
模型:fal-ai/whisper

转录完成!

文本:"你好,这是从音频文件中转录得到的文本内容。"
时长:12.5秒
语言:中文

Present Results to User

向用户展示结果

For TTS:

TTS结果展示:

Here's the generated speech:

[Download audio](https://v3.fal.media/files/.../speech.mp3)

• Duration: 5.2s | Model: Maya TTS
以下是生成的语音:

[下载音频](https://v3.fal.media/files/.../speech.mp3)

• 时长:5.2秒 | 模型:Maya TTS

For STT:

STT结果展示:

Here's the transcription:

"Hello, this is the transcribed text from the audio file."

• Duration: 12.5s | Language: English
以下是音频转录内容:

"你好,这是从音频文件中转录得到的文本内容。"

• 时长:12.5秒 | 语言:中文

Model Selection Guide

模型选择指南

Text-to-Speech

文本转语音

MiniMax Speech 2.6 HD (
fal-ai/minimax/speech-2.6-hd
)
  • Best for: Premium quality requirements
  • Quality: Highest
  • Speed: Medium
MiniMax Speech 2.6 Turbo (
fal-ai/minimax/speech-2.6-turbo
)
  • Best for: General use with good quality
  • Quality: High
  • Speed: Fast
ElevenLabs v3 (
fal-ai/elevenlabs/eleven-v3
)
  • Best for: Natural, realistic voices
  • Quality: High
  • Features: Many voice options
Chatterbox Multilingual (
fal-ai/chatterbox/multilingual
)
  • Best for: Multi-language support
  • Quality: Good
  • Speed: Fast
MiniMax Speech 2.6 HD (
fal-ai/minimax/speech-2.6-hd
)
  • 适用场景:对音质有极高要求的场景
  • 音质:最高
  • 速度:中等
MiniMax Speech 2.6 Turbo (
fal-ai/minimax/speech-2.6-turbo
)
  • 适用场景:通用场景,兼顾音质与速度
  • 音质:高
  • 速度:快
ElevenLabs v3 (
fal-ai/elevenlabs/eleven-v3
)
  • 适用场景:需要自然逼真语音的场景
  • 音质:高
  • 特性:提供多种语音选项
Chatterbox Multilingual (
fal-ai/chatterbox/multilingual
)
  • 适用场景:多语言需求场景
  • 音质:良好
  • 速度:快

Text-to-Music

文本转音乐

MiniMax Music v2 (
fal-ai/minimax-music/v2
)
  • Best for: High quality music generation
  • Quality: Highest
Lyria2 (
fal-ai/lyria2
)
  • Best for: Google's music model
  • Quality: High
MiniMax Music v2 (
fal-ai/minimax-music/v2
)
  • 适用场景:高品质音乐生成需求
  • 音质:最高
Lyria2 (
fal-ai/lyria2
)
  • 适用场景:使用Google官方音乐模型的需求
  • 音质:高

Speech-to-Text

语音转文本

Whisper (
fal-ai/whisper
)
  • Best for: General transcription, timestamps
  • Languages: 99+ languages
  • Features: Word-level timestamps
ElevenLabs Scribe (
fal-ai/elevenlabs/scribe
)
  • Best for: Multi-speaker recordings
  • Features: Speaker diarization
  • Quality: Professional-grade
Whisper (
fal-ai/whisper
)
  • 适用场景:通用转录需求,需要时间戳
  • 支持语言:99+种
  • 特性:支持词级时间戳
ElevenLabs Scribe (
fal-ai/elevenlabs/scribe
)
  • 适用场景:多说话人录音转录
  • 特性:说话人分离
  • 音质:专业级

Troubleshooting

故障排查

Empty Audio

生成音频为空

Error: Generated audio is empty

Check that your text is not empty and contains valid content.
错误:生成的音频为空

请检查输入文本是否为空或包含有效内容。

Unsupported Audio Format

不支持的音频格式

Error: Audio format not supported

Supported formats: MP3, WAV, M4A, FLAC, OGG
Convert your audio to a supported format.
错误:音频格式不支持

支持的格式:MP3、WAV、M4A、FLAC、OGG
请将音频转换为支持的格式。

Language Detection Failed

语言检测失败

Warning: Could not detect language, defaulting to English

Specify the language explicitly with --language option.
警告:无法检测语言,默认使用英文

请通过--language选项明确指定语言。