Dialogue Audio

对话音频

Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.

通过inference.sh CLI，使用Dia TTS创建逼真的多说话者对话。

Quick Start

快速开始

bash

curl -fsSL https://cli.inference.sh | sh && infsh login

bash

curl -fsSL https://cli.inference.sh | sh && infsh login

Two-speaker conversation

双说话者对话

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today." }'

undefined

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today." }'

undefined

Speaker Tags

说话者标签

Dia TTS uses

[S1]

and

[S2]

to distinguish two speakers.

Tag	Role	Voice
`[S1]`	Speaker 1	Automatically assigned voice A
`[S2]`	Speaker 2	Automatically assigned voice B

Rules:

Always start each speaker turn with the tag
Tags must be uppercase:
```
[S1]
```
not
```
[s1]
```
Maximum 2 speakers per generation
Each speaker maintains consistent voice within a session

Dia TTS使用

[S1]

和

[S2]

来区分两位说话者。

标签	角色	语音
`[S1]`	说话者1	自动分配语音A
`[S2]`	说话者2	自动分配语音B

规则：

每个说话者的发言必须以标签开头
标签必须大写：
```
[S1]
```
而非
```
[s1]
```
每次生成最多支持2位说话者
同一会话中每位说话者的语音保持一致

Emotion & Expression Control

情绪与表达控制

Dia TTS interprets punctuation and non-speech cues for emotional delivery.

Dia TTS会通过标点符号和非语音提示来解读情绪表达。

Punctuation Effects

标点符号的作用

Punctuation	Effect	Example
`.`	Neutral, declarative, medium pause	"This is important."
`!`	Emphasis, excitement, energy	"This is amazing!"
`?`	Rising intonation, questioning	"Are you sure about that?"
`...`	Hesitation, trailing off, long pause	"I thought it would work... but it didn't."
`,`	Short breath pause	"First, we analyze. Then, we act."
`—` or `--`	Interruption or pivot	"I was going to say — never mind."

标点符号	效果	示例
`.`	中性、陈述语气、中等停顿	"This is important."
`!`	强调、兴奋、充满活力	"This is amazing!"
`?`	语调上扬、疑问语气	"Are you sure about that?"
`...`	犹豫、语气渐弱、长停顿	"I thought it would work... but it didn't."
`,`	短呼吸停顿	"First, we analyze. Then, we act."
`—` 或 `--`	打断话题或转换思路	"I was going to say — never mind."

Non-Speech Sounds

非语音音效

Dia TTS supports parenthetical sound descriptions:

(laughs)      — laughter
(sighs)       — exasperation or relief
(clears throat) — attention-getting pause
(whispers)    — softer delivery
(gasps)       — surprise

Dia TTS支持括号内的音效描述：

(laughs)      — 笑声
(sighs)       — 无奈或释然的叹息
(clears throat) — 引起注意的清嗓子声
(whispers)    — 低语
(gasps)       — 惊讶的喘气声

Examples with Emotion

带情绪的示例

bash

undefined

bash

undefined

Excited conversation

兴奋的对话

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it." }'

Serious/thoughtful dialogue

严肃/深思的对话

infsh app run falai/dia-tts --input '{ "prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off." }'

Teaching/explaining

教学/讲解类对话

infsh app run falai/dia-tts --input '{ "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something." }'

undefined

infsh app run falai/dia-tts --input '{ "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something." }'

undefined

Pacing Control

语速节奏控制

Pause Hierarchy

停顿层级

Technique	Pause Length	Use For
Comma `,`	~0.3 seconds	Between clauses, list items
Period `.`	~0.5 seconds	Between sentences
Ellipsis `...`	~1.0 seconds	Dramatic pause, thinking, hesitation
New speaker tag	~0.3 seconds	Natural turn-taking gap

技巧	停顿时长	适用场景
逗号 `,`	~0.3秒	分句、列表项之间
句号 `.`	~0.5秒	句子之间
省略号 `...`	~1.0秒	戏剧性停顿、思考、犹豫
新说话者标签	~0.3秒	自然的对话交替间隙

Speed Control

语速控制

Shorter sentences = faster perceived pace
Longer sentences with commas = measured, thoughtful pace
Questions followed by answers = engaging back-and-forth rhythm

bash

undefined

短句 = 感知节奏更快
带逗号的长句 = 从容、深思的节奏
提问后紧跟回答 = 引人入胜的来回对话节奏

bash

undefined

Fast-paced, energetic

快节奏、充满活力

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync." }'

Slow, contemplative

缓慢、沉思

infsh app run falai/dia-tts --input '{ "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now." }'

undefined

infsh app run falai/dia-tts --input '{ "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now." }'

undefined

Conversation Structure Patterns

对话结构模式

Interview Format

访谈格式

bash

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'

bash

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'

Tutorial / Explainer

教程/讲解类

bash

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'

bash

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'

Debate / Discussion

辩论/讨论类

bash

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'

bash

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'

Post-Production Tips

后期制作技巧

Volume Normalization

音量归一化

Both speakers should be at consistent volume. If one is louder:

bash

undefined

两位说话者的音量应保持一致。如果其中一位音量过大：

bash

undefined

Merge with balanced audio

合并并平衡音频

infsh app run infsh/video-audio-merger --input '{ "video": "talking-head.mp4", "audio": "dialogue.mp3", "audio_volume": 1.0 }'

undefined

infsh app run infsh/video-audio-merger --input '{ "video": "talking-head.mp4", "audio": "dialogue.mp3", "audio_volume": 1.0 }'

undefined

Adding Background/Music

添加背景音/音乐

bash

undefined

bash

undefined

Merge dialogue with background music

将对话与背景音乐合并

infsh app run infsh/media-merger --input '{ "media": ["dialogue.mp3", "background-music.mp3"] }'

undefined

infsh app run infsh/media-merger --input '{ "media": ["dialogue.mp3", "background-music.mp3"] }'

undefined

Segmenting Long Conversations

分割长对话

For conversations longer than ~30 seconds, generate in segments:

bash

undefined

对于超过约30秒的对话，建议分段生成：

bash

undefined

Segment 1: Introduction

片段1：引言

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Welcome back to another episode..." }'

Segment 2: Main content

片段2：主要内容

infsh app run falai/dia-tts --input '{ "prompt": "[S1] So let us dive into today s topic..." }'

Segment 3: Wrap-up

片段3：收尾

infsh app run falai/dia-tts --input '{ "prompt": "[S1] Great conversation today..." }'

Merge all segments

合并所有片段

infsh app run infsh/media-merger --input '{ "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"] }'

undefined

infsh app run infsh/media-merger --input '{ "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"] }'

undefined

Script Writing Tips

脚本编写技巧

Do	Don't
Write how people talk	Write how people write
Short sentences (< 15 words)	Long academic sentences
Contractions ("can't", "won't")	Formal ("cannot", "will not")
Natural fillers ("So,", "Well,")	Every sentence perfectly formed
Vary sentence length	All sentences same length
Include reactions ("Exactly!", "Hmm.")	One-sided monologues
Read it aloud before generating	Assume it sounds right

建议做法	避免做法
按照日常对话的方式编写	按照书面语的方式编写
使用短句（少于15词）	使用冗长的学术性句子
使用缩约形式（"can't", "won't"）	使用正式表达（"cannot", "will not"）
加入自然的填充词（"So,", "Well,")	追求每句都完美无缺
变换句子长度	所有句子长度一致
加入反应语（"Exactly!", "Hmm.")	单方面的长篇独白
生成前先大声朗读脚本	想当然地认为脚本听起来自然

Common Mistakes

常见错误

Mistake	Problem	Fix
Monologues longer than 3 sentences	Sounds like a lecture, not conversation	Break into exchanges
No emotional variation	Flat, robotic delivery	Use punctuation and non-speech cues
Missing speaker tags	Voices don't alternate	Start every turn with `[S1]` or `[S2]`
Formal written language	Sounds unnatural spoken	Use contractions, short sentences
No pauses between topics	Feels rushed	Use `...` or scene breaks
All same energy level	Monotonous	Vary between high/low energy moments

错误	问题	解决方法
独白超过3句	听起来像讲座而非对话	拆分成多轮对话
没有情绪变化	语调平淡、机械	使用标点符号和非语音提示
缺少说话者标签	语音无法交替	每个发言都以 `[S1]` 或 `[S2]` 开头
使用正式书面语	听起来不自然	使用缩约形式和短句
话题之间没有停顿	感觉节奏仓促	使用 `...` 或场景切换
全程能量水平一致	单调乏味	在高低能量时刻之间切换

dialogue-audio

Original

Translation

Dialogue Audio

对话音频

Quick Start

快速开始

Two-speaker conversation

双说话者对话

Speaker Tags

说话者标签

Emotion & Expression Control

情绪与表达控制

Punctuation Effects

标点符号的作用

Non-Speech Sounds

非语音音效

Examples with Emotion

带情绪的示例

Excited conversation

兴奋的对话

Serious/thoughtful dialogue

严肃/深思的对话

Teaching/explaining

教学/讲解类对话

Pacing Control

语速节奏控制

Pause Hierarchy

停顿层级

Speed Control

语速控制

Fast-paced, energetic

快节奏、充满活力

Slow, contemplative

缓慢、沉思

Conversation Structure Patterns

对话结构模式

Interview Format

访谈格式

Tutorial / Explainer

教程/讲解类

Debate / Discussion

辩论/讨论类

Post-Production Tips

后期制作技巧

Volume Normalization

音量归一化

Merge with balanced audio

合并并平衡音频

Adding Background/Music

添加背景音/音乐

Merge dialogue with background music

将对话与背景音乐合并

Segmenting Long Conversations

分割长对话

Segment 1: Introduction

片段1：引言

Segment 2: Main content

片段2：主要内容

Segment 3: Wrap-up

片段3：收尾

Merge all segments

合并所有片段

Script Writing Tips

脚本编写技巧

Common Mistakes

常见错误

Related Skills

相关技能