dialogue-audio

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Dialogue Audio

对话音频

Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.
通过inference.sh CLI,使用Dia TTS创建逼真的多说话者对话。

Quick Start

快速开始

bash
curl -fsSL https://cli.inference.sh | sh && infsh login
bash
curl -fsSL https://cli.inference.sh | sh && infsh login

Two-speaker conversation

双说话者对话

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today." }'
undefined
infsh app run falai/dia-tts --input '{ "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today." }'
undefined

Speaker Tags

说话者标签

Dia TTS uses
[S1]
and
[S2]
to distinguish two speakers.
TagRoleVoice
[S1]
Speaker 1Automatically assigned voice A
[S2]
Speaker 2Automatically assigned voice B
Rules:
  • Always start each speaker turn with the tag
  • Tags must be uppercase:
    [S1]
    not
    [s1]
  • Maximum 2 speakers per generation
  • Each speaker maintains consistent voice within a session
Dia TTS使用
[S1]
[S2]
来区分两位说话者。
标签角色语音
[S1]
说话者1自动分配语音A
[S2]
说话者2自动分配语音B
规则:
  • 每个说话者的发言必须以标签开头
  • 标签必须大写:
    [S1]
    而非
    [s1]
  • 每次生成最多支持2位说话者
  • 同一会话中每位说话者的语音保持一致

Emotion & Expression Control

情绪与表达控制

Dia TTS interprets punctuation and non-speech cues for emotional delivery.
Dia TTS会通过标点符号和非语音提示来解读情绪表达。

Punctuation Effects

标点符号的作用

PunctuationEffectExample
.
Neutral, declarative, medium pause"This is important."
!
Emphasis, excitement, energy"This is amazing!"
?
Rising intonation, questioning"Are you sure about that?"
...
Hesitation, trailing off, long pause"I thought it would work... but it didn't."
,
Short breath pause"First, we analyze. Then, we act."
or
--
Interruption or pivot"I was going to say — never mind."
标点符号效果示例
.
中性、陈述语气、中等停顿"This is important."
!
强调、兴奋、充满活力"This is amazing!"
?
语调上扬、疑问语气"Are you sure about that?"
...
犹豫、语气渐弱、长停顿"I thought it would work... but it didn't."
,
短呼吸停顿"First, we analyze. Then, we act."
--
打断话题或转换思路"I was going to say — never mind."

Non-Speech Sounds

非语音音效

Dia TTS supports parenthetical sound descriptions:
(laughs)      — laughter
(sighs)       — exasperation or relief
(clears throat) — attention-getting pause
(whispers)    — softer delivery
(gasps)       — surprise
Dia TTS支持括号内的音效描述:
(laughs)      — 笑声
(sighs)       — 无奈或释然的叹息
(clears throat) — 引起注意的清嗓子声
(whispers)    — 低语
(gasps)       — 惊讶的喘气声

Examples with Emotion

带情绪的示例

bash
undefined
bash
undefined

Excited conversation

兴奋的对话

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it." }'
infsh app run falai/dia-tts --input '{ "prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it." }'

Serious/thoughtful dialogue

严肃/深思的对话

infsh app run falai/dia-tts --input '{ "prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off." }'
infsh app run falai/dia-tts --input '{ "prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off." }'

Teaching/explaining

教学/讲解类对话

infsh app run falai/dia-tts --input '{ "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something." }'
undefined
infsh app run falai/dia-tts --input '{ "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something." }'
undefined

Pacing Control

语速节奏控制

Pause Hierarchy

停顿层级

TechniquePause LengthUse For
Comma
,
~0.3 secondsBetween clauses, list items
Period
.
~0.5 secondsBetween sentences
Ellipsis
...
~1.0 secondsDramatic pause, thinking, hesitation
New speaker tag~0.3 secondsNatural turn-taking gap
技巧停顿时长适用场景
逗号
,
~0.3秒分句、列表项之间
句号
.
~0.5秒句子之间
省略号
...
~1.0秒戏剧性停顿、思考、犹豫
新说话者标签~0.3秒自然的对话交替间隙

Speed Control

语速控制

  • Shorter sentences = faster perceived pace
  • Longer sentences with commas = measured, thoughtful pace
  • Questions followed by answers = engaging back-and-forth rhythm
bash
undefined
  • 短句 = 感知节奏更快
  • 带逗号的长句 = 从容、深思的节奏
  • 提问后紧跟回答 = 引人入胜的来回对话节奏
bash
undefined

Fast-paced, energetic

快节奏、充满活力

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync." }'
infsh app run falai/dia-tts --input '{ "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync." }'

Slow, contemplative

缓慢、沉思

infsh app run falai/dia-tts --input '{ "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now." }'
undefined
infsh app run falai/dia-tts --input '{ "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now." }'
undefined

Conversation Structure Patterns

对话结构模式

Interview Format

访谈格式

bash
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'
bash
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'

Tutorial / Explainer

教程/讲解类

bash
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'
bash
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'

Debate / Discussion

辩论/讨论类

bash
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'
bash
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'

Post-Production Tips

后期制作技巧

Volume Normalization

音量归一化

Both speakers should be at consistent volume. If one is louder:
bash
undefined
两位说话者的音量应保持一致。如果其中一位音量过大:
bash
undefined

Merge with balanced audio

合并并平衡音频

infsh app run infsh/video-audio-merger --input '{ "video": "talking-head.mp4", "audio": "dialogue.mp3", "audio_volume": 1.0 }'
undefined
infsh app run infsh/video-audio-merger --input '{ "video": "talking-head.mp4", "audio": "dialogue.mp3", "audio_volume": 1.0 }'
undefined

Adding Background/Music

添加背景音/音乐

bash
undefined
bash
undefined

Merge dialogue with background music

将对话与背景音乐合并

infsh app run infsh/media-merger --input '{ "media": ["dialogue.mp3", "background-music.mp3"] }'
undefined
infsh app run infsh/media-merger --input '{ "media": ["dialogue.mp3", "background-music.mp3"] }'
undefined

Segmenting Long Conversations

分割长对话

For conversations longer than ~30 seconds, generate in segments:
bash
undefined
对于超过约30秒的对话,建议分段生成:
bash
undefined

Segment 1: Introduction

片段1:引言

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Welcome back to another episode..." }'
infsh app run falai/dia-tts --input '{ "prompt": "[S1] Welcome back to another episode..." }'

Segment 2: Main content

片段2:主要内容

infsh app run falai/dia-tts --input '{ "prompt": "[S1] So let us dive into today s topic..." }'
infsh app run falai/dia-tts --input '{ "prompt": "[S1] So let us dive into today s topic..." }'

Segment 3: Wrap-up

片段3:收尾

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Great conversation today..." }'
infsh app run falai/dia-tts --input '{ "prompt": "[S1] Great conversation today..." }'

Merge all segments

合并所有片段

infsh app run infsh/media-merger --input '{ "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"] }'
undefined
infsh app run infsh/media-merger --input '{ "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"] }'
undefined

Script Writing Tips

脚本编写技巧

DoDon't
Write how people talkWrite how people write
Short sentences (< 15 words)Long academic sentences
Contractions ("can't", "won't")Formal ("cannot", "will not")
Natural fillers ("So,", "Well,")Every sentence perfectly formed
Vary sentence lengthAll sentences same length
Include reactions ("Exactly!", "Hmm.")One-sided monologues
Read it aloud before generatingAssume it sounds right
建议做法避免做法
按照日常对话的方式编写按照书面语的方式编写
使用短句(少于15词)使用冗长的学术性句子
使用缩约形式("can't", "won't")使用正式表达("cannot", "will not")
加入自然的填充词("So,", "Well,")追求每句都完美无缺
变换句子长度所有句子长度一致
加入反应语("Exactly!", "Hmm.")单方面的长篇独白
生成前先大声朗读脚本想当然地认为脚本听起来自然

Common Mistakes

常见错误

MistakeProblemFix
Monologues longer than 3 sentencesSounds like a lecture, not conversationBreak into exchanges
No emotional variationFlat, robotic deliveryUse punctuation and non-speech cues
Missing speaker tagsVoices don't alternateStart every turn with
[S1]
or
[S2]
Formal written languageSounds unnatural spokenUse contractions, short sentences
No pauses between topicsFeels rushedUse
...
or scene breaks
All same energy levelMonotonousVary between high/low energy moments
错误问题解决方法
独白超过3句听起来像讲座而非对话拆分成多轮对话
没有情绪变化语调平淡、机械使用标点符号和非语音提示
缺少说话者标签语音无法交替每个发言都以
[S1]
[S2]
开头
使用正式书面语听起来不自然使用缩约形式和短句
话题之间没有停顿感觉节奏仓促使用
...
或场景切换
全程能量水平一致单调乏味在高低能量时刻之间切换

Related Skills

相关技能

bash
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video
Browse all apps:
infsh app list
bash
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video
浏览所有应用:
infsh app list