liveavatar-integrate

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LiveAvatar Integration

LiveAvatar 集成指南

LiveAvatar gives your product a human face — real-time, lip-synced video avatars that speak, react, and maintain eye contact. This skill assesses what you have, recommends the best integration path, and walks you through building it.

LiveAvatar为你的产品赋予拟人形象——实时、唇形同步的视频虚拟形象，可说话、做出反应并保持眼神交流。本技能会评估你的现有配置，推荐最佳集成路径，并逐步引导你完成构建。

Step 1: Discover What the User Has

步骤1：了解用户现有配置

Before recommending a path, gather context. Check the codebase and conversation for signals. Do not ask questions the codebase already answers.

在推荐路径前，先收集上下文信息。检查代码库和对话内容中的信号。不要询问代码库已能回答的问题。

Signals to look for in the codebase

代码库中需查找的信号

Scan for these automatically — do not ask the user if you can detect them:

Signal	Where to look	What it means
OpenAI / Anthropic / LLM SDK imports	`package.json` , `requirements.txt` , imports	User has their own LLM
ElevenLabs / PlayHT / Deepgram TTS SDK	dependencies, imports	User has their own TTS
Deepgram / Whisper / AssemblyAI STT SDK	dependencies, imports	User has their own STT
LiveKit SDK ( `livekit-server-sdk` , `@livekit/` )	dependencies	User has LiveKit infra
Agora SDK	dependencies	User has Agora infra
Pipecat imports	dependencies, imports	User has a Pipecat pipeline
ElevenLabs Agent / Conversational AI	dependencies, config	User has an ElevenLabs agent
`HEYGEN_API_KEY` / `LIVEAVATAR_API_KEY`	`.env` , config files	User already has an API key
Existing LiveAvatar code	imports, API calls to `api.liveavatar.com`	Existing integration (debug, not new setup)
No backend / static site	file structure (pure HTML/CSS/JS, no server)	Embed is the only option

自动扫描以下内容——若能检测到，无需询问用户：

信号	查找位置	含义
OpenAI / Anthropic / LLM SDK 导入	`package.json` 、 `requirements.txt` 、导入语句	用户拥有自有LLM
ElevenLabs / PlayHT / Deepgram TTS SDK	依赖项、导入语句	用户拥有自有TTS
Deepgram / Whisper / AssemblyAI STT SDK	依赖项、导入语句	用户拥有自有STT
LiveKit SDK（ `livekit-server-sdk` 、 `@livekit/` ）	依赖项	用户拥有LiveKit基础设施
Agora SDK	依赖项	用户拥有Agora基础设施
Pipecat 导入	依赖项、导入语句	用户拥有Pipecat管道
ElevenLabs Agent / 对话式AI	依赖项、配置文件	用户拥有ElevenLabs Agent
`HEYGEN_API_KEY` / `LIVEAVATAR_API_KEY`	`.env` 、配置文件	用户已拥有API密钥
现有LiveAvatar代码	导入语句、对 `api.liveavatar.com` 的API调用	已有集成（需调试，而非全新搭建）
无后端/静态站点	文件结构（纯HTML/CSS/JS，无服务器）	仅能选择Embed方案

Questions to ask (only what's still unknown)

需询问的问题（仅针对未知信息）

If the codebase scan leaves gaps, ask the user. Frame as a concise checklist — do not ask these one at a time:

To recommend the best LiveAvatar integration for your setup, I need to know:

1. **What's the goal?** (e.g., customer support avatar, sales demo, onboarding guide, talking head on landing page)
2. **Do you have your own AI pipeline?** (STT, LLM, TTS — or any combination)
3. **Do you need programmatic control** over the conversation (events, interrupts, custom logic), or just an avatar on a page?

Skip any question the codebase or conversation already answered.

若代码库扫描仍有信息缺口，可向用户询问。以简洁清单形式呈现——不要逐个提问：

为了给你的配置推荐最佳LiveAvatar集成方案，我需要了解：

1. **目标是什么？**（例如：客户支持虚拟形象、销售演示、入门向导、着陆页虚拟发言人）
2. **你是否拥有自有AI管道？**（STT、LLM、TTS——或任意组合）
3. **你是否需要对对话进行程序化控制**（事件、中断、自定义逻辑），还是仅需在页面上展示虚拟形象？

跳过代码库或对话已能回答的问题。

Step 2: Route to the Golden Pathway

步骤2：匹配最优路径

Based on what you've gathered, match to ONE pathway. Always pick the simplest path that works. Do not offer multiple options — make the call.

基于收集到的信息，匹配唯一路径。始终选择最简单且可行的路径，不要提供多个选项——直接做出选择。

Decision tree

决策树

Has NO backend OR just wants an avatar on a page?
  → EMBED

Has NO existing AI stack (no STT, no LLM, no TTS)?
  → FULL MODE (standard)

Has their OWN LLM but no STT/TTS?
  → FULL MODE + Custom LLM

Has their OWN LLM + their own ElevenLabs TTS?
  → FULL MODE + Custom LLM + Custom TTS

Needs explicit mic control (walkie-talkie style)?
  → FULL MODE + Push-to-Talk

Has a COMPLETE pipeline (STT + LLM + TTS)?
  → LITE MODE

Has an ElevenLabs Conversational AI agent?
  → LITE MODE + ElevenLabs Plugin

Has their own LiveKit or Agora infrastructure?
  → LITE MODE + BYO WebRTC

无后端 或 仅需在页面上展示虚拟形象？
  → EMBED

无现有AI技术栈（无STT、无LLM、无TTS）？
  → FULL MODE（标准方案）

拥有自有LLM但无STT/TTS？
  → FULL MODE + 自定义LLM

拥有自有LLM + 自有ElevenLabs TTS？
  → FULL MODE + 自定义LLM + 自定义TTS

需要明确的麦克风控制（对讲机模式）？
  → FULL MODE + 一键对讲

拥有完整管道（STT + LLM + TTS）？
  → LITE MODE

拥有ElevenLabs Conversational AI Agent？
  → LITE MODE + ElevenLabs 插件

拥有自有LiveKit或Agora基础设施？
  → LITE MODE + 自带WebRTC

Golden pathways (pick one, then implement)

最优路径（选其一后实施）

Pathway	When	Implementation guide
Embed	No backend, or no custom logic needed	references/embed-guide.md
FULL standard	No existing AI stack	references/full-mode-guide.md
FULL + Custom LLM	Has own LLM, wants LiveAvatar's ASR + TTS	references/full-mode-guide.md (Custom LLM section)
FULL + Custom TTS	Has own ElevenLabs voice	references/full-mode-guide.md (Custom TTS section)
FULL + Push-to-Talk	Needs explicit mic control	references/full-mode-guide.md (Push-to-Talk section)
LITE standard	Has complete STT + LLM + TTS pipeline	references/lite-mode-guide.md
LITE + ElevenLabs Plugin	Has ElevenLabs Conversational AI agent	references/lite-mode-guide.md (ElevenLabs Plugin section)
LITE + BYO WebRTC	Has own LiveKit / Agora	references/lite-mode-guide.md (BYO WebRTC section)

路径	适用场景	实施指南
Embed	无后端，或无需自定义逻辑	references/embed-guide.md
FULL 标准方案	无现有AI技术栈	references/full-mode-guide.md
FULL + 自定义LLM	拥有自有LLM，希望LiveAvatar处理ASR + TTS	references/full-mode-guide.md（自定义LLM章节）
FULL + 自定义TTS	拥有自有ElevenLabs语音	references/full-mode-guide.md（自定义TTS章节）
FULL + 一键对讲	需要明确的麦克风控制	references/full-mode-guide.md（一键对讲章节）
LITE 标准方案	拥有完整的STT + LLM + TTS管道	references/lite-mode-guide.md
LITE + ElevenLabs 插件	拥有ElevenLabs Conversational AI Agent	references/lite-mode-guide.md（ElevenLabs插件章节）
LITE + 自带WebRTC	拥有自有LiveKit / Agora	references/lite-mode-guide.md（自带WebRTC章节）

Step 3: Present the Recommendation

步骤3：呈现推荐方案

Once you've picked a pathway, tell the user what you recommend and why, in 2-3 sentences. Example:

Based on your setup, I recommend FULL Mode with Custom LLM. You already have an OpenAI integration for your LLM, so we'll plug that in and let LiveAvatar handle ASR, TTS, and video. This gets you a conversational avatar without rebuilding your audio pipeline.

Then proceed directly to implementation using the corresponding guide in

references/

选好路径后，用2-3句话告知用户你的推荐及理由。示例：

根据你的配置，我推荐FULL Mode + 自定义LLM方案。你已集成OpenAI作为LLM，因此我们只需将其接入，由LiveAvatar处理ASR、TTS和视频。这样无需重建音频管道即可获得对话式虚拟形象。

随后直接使用

references/

中的对应指南开始实施。

Step 4: Implement

步骤4：实施

Read the appropriate reference guide and implement. Every guide follows the same structure:

Prerequisites — what to create/gather before writing code
Session lifecycle — step-by-step with curl commands and code
Events — what to send and receive
Add-ons — mode-specific optional features
Sandbox testing — free testing before going live
Gotchas — what breaks and how to avoid it

阅读对应参考指南并开展实施。所有指南均遵循以下结构：

前置条件 —— 编写代码前需创建/收集的内容
会话生命周期 —— 含curl命令和代码的分步指南
事件 —— 需要发送和接收的事件
附加功能 —— 特定模式下的可选功能
沙箱测试 —— 上线前的免费测试
常见问题 —— 易出错点及规避方法

Principles that apply to ALL paths

适用于所有路径的原则

Backend / frontend split is non-negotiable.

X-API-KEY

is a secret — backend only. Frontend only gets

livekit_client_token

(safe for browsers). If you see the API key in client code, stop and restructure.

Context makes the avatar conversational. In FULL Mode, no

context_id

= silent avatar. No error thrown. Always create a context first, even a minimal

"You are a helpful assistant."

FULL and LITE are completely different protocols. FULL = LiveKit data channels (

avatar.*

user.*

). LITE = WebSocket (

agent.*

session.*

). Never mix them.

Start with sandbox.

is_sandbox: true

, avatar ID

dd73ea75-1218-4ef3-92ce-606d5f7fbc0a

. Free, ~1 min sessions. Swap to production avatar when ready.

后端/前端分离不可协商。

X-API-KEY

是密钥——仅可在后端使用。前端仅能获取

livekit_client_token

（可安全在浏览器中使用）。若在客户端代码中发现API密钥，需立即停止并调整结构。

上下文是虚拟形象具备对话能力的关键。在FULL Mode中，若无

context_id

，虚拟形象将无法发声，且不会抛出错误。务必先创建上下文，哪怕是最简单的

"You are a helpful assistant."

。

FULL和LITE是完全不同的协议。FULL = LiveKit数据通道（

avatar.*

user.*

）。LITE = WebSocket（

agent.*

session.*

）。切勿混合使用。

从沙箱开始。设置

is_sandbox: true

，虚拟形象ID为

dd73ea75-1218-4ef3-92ce-606d5f7fbc0a

。免费使用，会话时长约1分钟。准备就绪后切换至生产环境虚拟形象。

LITE Mode: Fitting into an existing pipeline

LITE Mode：适配现有管道

LITE users almost always have a working conversational system already. Do not ask them to rebuild their pipeline. Instead, map their existing components onto the LITE turn cycle:

Identify their current flow. Read their code to understand how conversation turns work today — where does user audio come in, how does it reach the LLM, how does TTS output get delivered? Look for their event loop, message handler, or turn manager.
Find the integration points. You need to hook into three moments in their existing flow:
- User starts/stops speaking → add
```
agent.start_listening
```
  /
```
agent.stop_listening
```
- TTS produces audio → route PCM output to
```
agent.speak
```
  chunks over WebSocket instead of (or in addition to) their current audio output
- Response finishes → send
```
agent.speak_end
```
  and wait for
```
agent.speak_ended
```
Adapt, don't replace. If they have a working turn manager, add LiveAvatar calls into it. If they stream TTS to a browser via WebSocket already, tap into that same stream. The goal is the minimum change to get avatar video synced to their existing audio flow.
Verify audio format last. Once the wiring is in place, confirm their TTS outputs PCM 16-bit 24KHz. If not, either configure the TTS provider's output format or add resampling at the integration point.

使用LITE模式的用户几乎都已拥有可用的对话系统。不要要求他们重建管道，而是将其现有组件映射至LITE模式的对话周期：

识别当前流程。阅读代码以了解当前对话轮次的运作方式——用户音频从何处输入、如何传递至LLM、TTS输出如何交付？查找事件循环、消息处理器或轮次管理器。
找到集成点。需要在现有流程中接入三个节点：
- 用户开始/停止说话 → 添加
```
agent.start_listening
```
  /
```
agent.stop_listening
```
- TTS生成音频 → 将PCM输出通过WebSocket分块路由至
```
agent.speak
```
  ，替代（或同时保留）当前音频输出方式
- 响应结束 → 发送
```
agent.speak_end
```
  并等待
```
agent.speak_ended
```
适配而非替换。若用户已有可用的轮次管理器，在其中添加LiveAvatar调用即可。若用户已通过WebSocket向浏览器流式传输TTS，直接接入该流即可。目标是通过最小改动使虚拟形象视频与现有音频流同步。
最后验证音频格式。完成接线后，确认TTS输出为16位24KHz的PCM格式。若不符合，可配置TTS提供商的输出格式，或在集成点添加重采样功能。

Step 5: Collect Feedback

步骤5：收集反馈

Trigger this immediately after you finish writing the integration code — do not wait for the user to confirm it works or say they're done. Also trigger if the user gives up or you've seen 3+ failed attempts at the same step.

What to do: Follow the

liveavatar-feedback

skill. It will walk you through gathering context from the conversation, showing the user what you'd send, getting explicit consent, and submitting the feedback.

Keep it lightweight. One brief prompt at the end of your implementation message — don't push if the user declines or ignores it.

完成集成代码编写后立即触发此步骤——无需等待用户确认功能正常或表示完成。若用户放弃尝试，或同一步骤失败3次以上，也需触发。

操作方式：遵循

liveavatar-feedback

技能。该技能会引导你从对话中收集上下文信息，向用户展示拟发送的内容，获取明确同意后提交反馈。

保持轻量化。在实施完成的消息末尾添加简短提示即可——若用户拒绝或忽略，无需继续推进。

What to consult

参考资源

references/embed-guide.md — Embed implementation
references/full-mode-guide.md — FULL Mode implementation + events + add-ons
references/lite-mode-guide.md — LITE Mode implementation + events + audio format + add-ons

references/embed-guide.md —— Embed实施指南
references/full-mode-guide.md —— FULL Mode实施指南 + 事件 + 附加功能
references/lite-mode-guide.md —— LITE Mode实施指南 + 事件 + 音频格式 + 附加功能