Dialogue Audio

Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.

Quick Start

CODEBLOCK0

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Speaker Tags

Dia TTS uses [S1] and [S2] to distinguish two speakers.

Tag	Role	Voice
INLINECODE3	Speaker 1	Automatically assigned voice A
INLINECODE4

Speaker 2 | Automatically assigned voice B |

Rules:

- Always start each speaker turn with the tag
Tags must be uppercase: [S1] not INLINECODE6
Maximum 2 speakers per generation
Each speaker maintains consistent voice within a session

Emotion & Expression Control

Dia TTS interprets punctuation and non-speech cues for emotional delivery.

Punctuation Effects

Punctuation	Effect	Example
INLINECODE7	Neutral, declarative, medium pause	"This is important."
INLINECODE8

Non-Speech Sounds

Dia TTS supports parenthetical sound descriptions:

CODEBLOCK1

Examples with Emotion

CODEBLOCK2

Pacing Control

Pause Hierarchy

Technique	Pause Length	Use For
Comma INLINECODE14	~0.3 seconds	Between clauses, list items
Period INLINECODE15

Speed Control

- Shorter sentences = faster perceived pace
Longer sentences with commas = measured, thoughtful pace
Questions followed by answers = engaging back-and-forth rhythm

CODEBLOCK3

Conversation Structure Patterns

Interview Format

CODEBLOCK4

Tutorial / Explainer

CODEBLOCK5

Debate / Discussion

CODEBLOCK6

Post-Production Tips

Volume Normalization

Both speakers should be at consistent volume. If one is louder:

CODEBLOCK7

Adding Background/Music

CODEBLOCK8

Segmenting Long Conversations

For conversations longer than ~30 seconds, generate in segments:

CODEBLOCK9

Script Writing Tips

Do	Don't
Write how people talk	Write how people write
Short sentences (< 15 words)

Common Mistakes

Mistake	Problem	Fix
Monologues longer than 3 sentences	Sounds like a lecture, not conversation	Break into exchanges
No emotional variation

Related Skills

CODEBLOCK10

Browse all apps: INLINECODE20

技能名称: dialogue-audio
详细描述:

对话音频

通过 inference.sh CLI 使用 Dia TTS 创建逼真的多说话人对话。

快速开始

bash
curl -fsSL https://cli.inference.sh | sh && infsh login

双人对话

infsh app run falai/dia-tts --input { prompt: [S1] 你试过那个新功能了吗？[S2] 还没，但我听说它能省下大把时间。[S1] 确实如此，我的工作流程缩短了一半。[S2] 好吧，我今天一定试试。 }

安装说明： 安装脚本仅检测您的操作系统/架构，从 dist.inference.sh 下载匹配的二进制文件，并验证其 SHA-256 校验和。无需提升权限或后台进程。也提供手动安装与验证。

说话人标签

Dia TTS 使用 [S1] 和 [S2] 来区分两个说话人。

标签	角色	声音
[S1]	说话人 1	自动分配声音 A
[S2]

说话人 2 | 自动分配声音 B |

规则：

- 每个说话人轮次始终以标签开头
标签必须大写：[S1] 而非 [s1]
每次生成最多 2 个说话人
每个说话人在同一会话中保持声音一致

情感与表达控制

Dia TTS 通过标点符号和非语言提示来诠释情感表达。

标点符号效果

标点	效果	示例
.	中性、陈述、中等停顿	这很重要。
!

非语言声音

Dia TTS 支持括号内的声音描述：

(laughs) — 笑声
(sighs) — 恼怒或宽慰
(clears throat) — 引起注意的停顿
(whispers) — 更轻柔的表达
(gasps) — 惊讶

带情感的示例

bash

兴奋的对话

infsh app run falai/dia-tts --input {
prompt: [S1] 猜猜今天发生了什么！[S2] 什么？快告诉我！[S1] 我们达到了一万用户！[S2] (gasps) 不会吧！太不可思议了！[S1] 我知道……我到现在还不敢相信。
}

严肃/深思的对话

infsh app run falai/dia-tts --input { prompt: [S1] 我们需要谈谈时间表。[S2] (sighs) 我知道。时间很紧。[S1] 我们能从范围中砍掉什么吗？[S2] 也许吧……但那就意味着要放弃分析仪表盘。[S1] 这是个艰难的取舍。 }

教学/解释

infsh app run falai/dia-tts --input { prompt: [S1] 那么它到底是怎么工作的？[S2] 好问题。把它想象成一条流水线。数据从一端进入，在中间处理，然后从另一端转换输出。[S1] 就像装配线？[S2] 完全正确！每一步都增加了新的东西。 }

节奏控制

停顿层级

技巧	停顿时长	用途
逗号 ,	~0.3 秒	从句之间、列表项之间
句号 .

~0.5 秒 | 句子之间 | | 省略号 ... | ~1.0 秒 | 戏剧性停顿、思考、犹豫 | | 新说话人标签 | ~0.3 秒 | 自然的轮换间隙 |

语速控制

- 较短的句子 = 感知节奏更快
带逗号的长句 = 沉稳、深思的节奏
问题后接答案 = 引人入胜的来回节奏

bash

快节奏、充满活力

infsh app run falai/dia-tts --input {
prompt: [S1] 准备好了吗？[S2] 准备好了。[S1] 我们开始吧！三个功能。五分钟。[S2] 开始！[S1] 功能一：实时同步。
}

缓慢、沉思

infsh app run falai/dia-tts --input { prompt: [S1] 我想这件事已经有一阵子了……我觉得我们需要改变方向。[S2] 你什么意思？[S1] 市场已经变了。去年管用的方法……现在不管用了。 }

对话结构模式

采访格式

bash
infsh app run falai/dia-tts --input {
prompt: [S1] 欢迎来到节目。今天我们有一位特邀嘉宾。请介绍一下你自己。[S2] 谢谢邀请！我是一名产品设计师，为创作者构建工具已经有大约十年了。[S1] 是什么让你开始做设计的？[S2] 说实话？我编程很烂，但喜欢让东西看起来好看。(laughs) 所以设计是自然而然的路。
}

教程/讲解

bash
infsh app run falai/dia-tts --input {
prompt: [S1] 你能带我走一遍设置流程吗？[S2] 当然。第一步，安装 CLI。大约需要三十秒。[S1] 然后呢？[S2] 第二步，运行登录命令。它会打开你的浏览器进行身份验证。[S1] 听起来很简单。[S2] 是的！第三步，你就可以运行第一个应用了。
}

辩论/讨论

bash
infsh app run falai/dia-tts --input {
prompt: [S1] 我认为我们应该选择方案A。实现起来更快。[S2] 但方案B长期来看扩展性更好。[S1] 没错，但我们需要在本季度交付一些东西。[S2] 有道理……如果我们先做A，同时规划迁移到B的路径呢？[S1] 这可行。我们做个原型吧。
}

后期制作技巧

音量标准化

两个说话人的音量应保持一致。如果其中一个声音较大：

bash

合并并平衡音频

infsh app run infsh/video-audio-merger --input {
video: talking-head.mp4,
audio: dialogue.mp3,
audio_volume: 1.0
}

添加背景/音乐

bash

将对话与背景音乐合并

infsh app run infsh/media-merger --input {
media: [dialogue.mp3, background-music.mp3]
}

分割长对话

对于超过约 30 秒的对话，分段生成：

bash

第1段：介绍

infsh app run falai/dia-tts --input {
prompt: [S1] 欢迎回到新的一期节目……
}

第2段：主要内容

infsh app run falai/dia-tts --input { prompt: [S1] 那么让我们深入今天的话题…… }

第3段：结尾

infsh app run falai/dia-tts --input { prompt: [S1] 今天的对话很棒…… }

合并所有段落

infsh app run infsh/media-merger --input { media: [segment1.mp3, segment2.mp3, segment3.mp3] }

脚本写作技巧

应该做	不应该做
按人们说话的方式写	按人们写作的方式写
短句（少于 15 个词）

dialogue-audio对话音频生成

dialogue-audio

Dialogue Audio

Quick Start

Speaker Tags

Emotion & Expression Control

Punctuation Effects

Non-Speech Sounds

Examples with Emotion

Pacing Control

Pause Hierarchy

Speed Control

Conversation Structure Patterns

Interview Format

Tutorial / Explainer

Debate / Discussion

Post-Production Tips

Volume Normalization

Adding Background/Music

Segmenting Long Conversations

Script Writing Tips

Common Mistakes

Related Skills

对话音频

快速开始

双人对话

说话人标签

情感与表达控制

标点符号效果

非语言声音

带情感的示例

兴奋的对话

严肃/深思的对话

教学/解释

节奏控制

停顿层级

语速控制

快节奏、充满活力

缓慢、沉思

对话结构模式

采访格式

教程/讲解

辩论/讨论

后期制作技巧

音量标准化

合并并平衡音频

添加背景/音乐

将对话与背景音乐合并

分割长对话

第1段：介绍

第2段：主要内容

第3段：结尾

合并所有段落

脚本写作技巧

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement