dialogue-audio
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDialogue Audio
对话音频
Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.
通过inference.sh CLI,使用Dia TTS创建逼真的多说话者对话。
Quick Start
快速开始
bash
curl -fsSL https://cli.inference.sh | sh && infsh loginbash
curl -fsSL https://cli.inference.sh | sh && infsh loginTwo-speaker conversation
双说话者对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'
undefinedinfsh app run falai/dia-tts --input '{
"prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'
undefinedSpeaker Tags
说话者标签
Dia TTS uses and to distinguish two speakers.
[S1][S2]| Tag | Role | Voice |
|---|---|---|
| Speaker 1 | Automatically assigned voice A |
| Speaker 2 | Automatically assigned voice B |
Rules:
- Always start each speaker turn with the tag
- Tags must be uppercase: not
[S1][s1] - Maximum 2 speakers per generation
- Each speaker maintains consistent voice within a session
Dia TTS使用和来区分两位说话者。
[S1][S2]| 标签 | 角色 | 语音 |
|---|---|---|
| 说话者1 | 自动分配语音A |
| 说话者2 | 自动分配语音B |
规则:
- 每个说话者的发言必须以标签开头
- 标签必须大写:而非
[S1][s1] - 每次生成最多支持2位说话者
- 同一会话中每位说话者的语音保持一致
Emotion & Expression Control
情绪与表达控制
Dia TTS interprets punctuation and non-speech cues for emotional delivery.
Dia TTS会通过标点符号和非语音提示来解读情绪表达。
Punctuation Effects
标点符号的作用
| Punctuation | Effect | Example |
|---|---|---|
| Neutral, declarative, medium pause | "This is important." |
| Emphasis, excitement, energy | "This is amazing!" |
| Rising intonation, questioning | "Are you sure about that?" |
| Hesitation, trailing off, long pause | "I thought it would work... but it didn't." |
| Short breath pause | "First, we analyze. Then, we act." |
| Interruption or pivot | "I was going to say — never mind." |
| 标点符号 | 效果 | 示例 |
|---|---|---|
| 中性、陈述语气、中等停顿 | "This is important." |
| 强调、兴奋、充满活力 | "This is amazing!" |
| 语调上扬、疑问语气 | "Are you sure about that?" |
| 犹豫、语气渐弱、长停顿 | "I thought it would work... but it didn't." |
| 短呼吸停顿 | "First, we analyze. Then, we act." |
| 打断话题或转换思路 | "I was going to say — never mind." |
Non-Speech Sounds
非语音音效
Dia TTS supports parenthetical sound descriptions:
(laughs) — laughter
(sighs) — exasperation or relief
(clears throat) — attention-getting pause
(whispers) — softer delivery
(gasps) — surpriseDia TTS支持括号内的音效描述:
(laughs) — 笑声
(sighs) — 无奈或释然的叹息
(clears throat) — 引起注意的清嗓子声
(whispers) — 低语
(gasps) — 惊讶的喘气声Examples with Emotion
带情绪的示例
bash
undefinedbash
undefinedExcited conversation
兴奋的对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'
Serious/thoughtful dialogue
严肃/深思的对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'
Teaching/explaining
教学/讲解类对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'
undefinedinfsh app run falai/dia-tts --input '{
"prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'
undefinedPacing Control
语速节奏控制
Pause Hierarchy
停顿层级
| Technique | Pause Length | Use For |
|---|---|---|
Comma | ~0.3 seconds | Between clauses, list items |
Period | ~0.5 seconds | Between sentences |
Ellipsis | ~1.0 seconds | Dramatic pause, thinking, hesitation |
| New speaker tag | ~0.3 seconds | Natural turn-taking gap |
| 技巧 | 停顿时长 | 适用场景 |
|---|---|---|
逗号 | ~0.3秒 | 分句、列表项之间 |
句号 | ~0.5秒 | 句子之间 |
省略号 | ~1.0秒 | 戏剧性停顿、思考、犹豫 |
| 新说话者标签 | ~0.3秒 | 自然的对话交替间隙 |
Speed Control
语速控制
- Shorter sentences = faster perceived pace
- Longer sentences with commas = measured, thoughtful pace
- Questions followed by answers = engaging back-and-forth rhythm
bash
undefined- 短句 = 感知节奏更快
- 带逗号的长句 = 从容、深思的节奏
- 提问后紧跟回答 = 引人入胜的来回对话节奏
bash
undefinedFast-paced, energetic
快节奏、充满活力
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync."
}'
Slow, contemplative
缓慢、沉思
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now."
}'
undefinedinfsh app run falai/dia-tts --input '{
"prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now."
}'
undefinedConversation Structure Patterns
对话结构模式
Interview Format
访谈格式
bash
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'bash
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'Tutorial / Explainer
教程/讲解类
bash
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'bash
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'Debate / Discussion
辩论/讨论类
bash
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'bash
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'Post-Production Tips
后期制作技巧
Volume Normalization
音量归一化
Both speakers should be at consistent volume. If one is louder:
bash
undefined两位说话者的音量应保持一致。如果其中一位音量过大:
bash
undefinedMerge with balanced audio
合并并平衡音频
infsh app run infsh/video-audio-merger --input '{
"video": "talking-head.mp4",
"audio": "dialogue.mp3",
"audio_volume": 1.0
}'
undefinedinfsh app run infsh/video-audio-merger --input '{
"video": "talking-head.mp4",
"audio": "dialogue.mp3",
"audio_volume": 1.0
}'
undefinedAdding Background/Music
添加背景音/音乐
bash
undefinedbash
undefinedMerge dialogue with background music
将对话与背景音乐合并
infsh app run infsh/media-merger --input '{
"media": ["dialogue.mp3", "background-music.mp3"]
}'
undefinedinfsh app run infsh/media-merger --input '{
"media": ["dialogue.mp3", "background-music.mp3"]
}'
undefinedSegmenting Long Conversations
分割长对话
For conversations longer than ~30 seconds, generate in segments:
bash
undefined对于超过约30秒的对话,建议分段生成:
bash
undefinedSegment 1: Introduction
片段1:引言
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome back to another episode..."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome back to another episode..."
}'
Segment 2: Main content
片段2:主要内容
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So let us dive into today s topic..."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So let us dive into today s topic..."
}'
Segment 3: Wrap-up
片段3:收尾
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Great conversation today..."
}'
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Great conversation today..."
}'
Merge all segments
合并所有片段
infsh app run infsh/media-merger --input '{
"media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'
undefinedinfsh app run infsh/media-merger --input '{
"media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'
undefinedScript Writing Tips
脚本编写技巧
| Do | Don't |
|---|---|
| Write how people talk | Write how people write |
| Short sentences (< 15 words) | Long academic sentences |
| Contractions ("can't", "won't") | Formal ("cannot", "will not") |
| Natural fillers ("So,", "Well,") | Every sentence perfectly formed |
| Vary sentence length | All sentences same length |
| Include reactions ("Exactly!", "Hmm.") | One-sided monologues |
| Read it aloud before generating | Assume it sounds right |
| 建议做法 | 避免做法 |
|---|---|
| 按照日常对话的方式编写 | 按照书面语的方式编写 |
| 使用短句(少于15词) | 使用冗长的学术性句子 |
| 使用缩约形式("can't", "won't") | 使用正式表达("cannot", "will not") |
| 加入自然的填充词("So,", "Well,") | 追求每句都完美无缺 |
| 变换句子长度 | 所有句子长度一致 |
| 加入反应语("Exactly!", "Hmm.") | 单方面的长篇独白 |
| 生成前先大声朗读脚本 | 想当然地认为脚本听起来自然 |
Common Mistakes
常见错误
| Mistake | Problem | Fix |
|---|---|---|
| Monologues longer than 3 sentences | Sounds like a lecture, not conversation | Break into exchanges |
| No emotional variation | Flat, robotic delivery | Use punctuation and non-speech cues |
| Missing speaker tags | Voices don't alternate | Start every turn with |
| Formal written language | Sounds unnatural spoken | Use contractions, short sentences |
| No pauses between topics | Feels rushed | Use |
| All same energy level | Monotonous | Vary between high/low energy moments |
| 错误 | 问题 | 解决方法 |
|---|---|---|
| 独白超过3句 | 听起来像讲座而非对话 | 拆分成多轮对话 |
| 没有情绪变化 | 语调平淡、机械 | 使用标点符号和非语音提示 |
| 缺少说话者标签 | 语音无法交替 | 每个发言都以 |
| 使用正式书面语 | 听起来不自然 | 使用缩约形式和短句 |
| 话题之间没有停顿 | 感觉节奏仓促 | 使用 |
| 全程能量水平一致 | 单调乏味 | 在高低能量时刻之间切换 |
Related Skills
相关技能
bash
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-videoBrowse all apps:
infsh app listbash
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video浏览所有应用:
infsh app list