liveavatar-integrate
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLiveAvatar Integration
LiveAvatar 集成指南
LiveAvatar gives your product a human face — real-time, lip-synced video avatars that speak, react, and maintain eye contact. This skill assesses what you have, recommends the best integration path, and walks you through building it.
LiveAvatar为你的产品赋予拟人形象——实时、唇形同步的视频虚拟形象,可说话、做出反应并保持眼神交流。本技能会评估你的现有配置,推荐最佳集成路径,并逐步引导你完成构建。
Step 1: Discover What the User Has
步骤1:了解用户现有配置
Before recommending a path, gather context. Check the codebase and conversation for signals. Do not ask questions the codebase already answers.
在推荐路径前,先收集上下文信息。检查代码库和对话内容中的信号。不要询问代码库已能回答的问题。
Signals to look for in the codebase
代码库中需查找的信号
Scan for these automatically — do not ask the user if you can detect them:
| Signal | Where to look | What it means |
|---|---|---|
| OpenAI / Anthropic / LLM SDK imports | | User has their own LLM |
| ElevenLabs / PlayHT / Deepgram TTS SDK | dependencies, imports | User has their own TTS |
| Deepgram / Whisper / AssemblyAI STT SDK | dependencies, imports | User has their own STT |
LiveKit SDK ( | dependencies | User has LiveKit infra |
| Agora SDK | dependencies | User has Agora infra |
| Pipecat imports | dependencies, imports | User has a Pipecat pipeline |
| ElevenLabs Agent / Conversational AI | dependencies, config | User has an ElevenLabs agent |
| | User already has an API key |
| Existing LiveAvatar code | imports, API calls to | Existing integration (debug, not new setup) |
| No backend / static site | file structure (pure HTML/CSS/JS, no server) | Embed is the only option |
自动扫描以下内容——若能检测到,无需询问用户:
| 信号 | 查找位置 | 含义 |
|---|---|---|
| OpenAI / Anthropic / LLM SDK 导入 | | 用户拥有自有LLM |
| ElevenLabs / PlayHT / Deepgram TTS SDK | 依赖项、导入语句 | 用户拥有自有TTS |
| Deepgram / Whisper / AssemblyAI STT SDK | 依赖项、导入语句 | 用户拥有自有STT |
LiveKit SDK( | 依赖项 | 用户拥有LiveKit基础设施 |
| Agora SDK | 依赖项 | 用户拥有Agora基础设施 |
| Pipecat 导入 | 依赖项、导入语句 | 用户拥有Pipecat管道 |
| ElevenLabs Agent / 对话式AI | 依赖项、配置文件 | 用户拥有ElevenLabs Agent |
| | 用户已拥有API密钥 |
| 现有LiveAvatar代码 | 导入语句、对 | 已有集成(需调试,而非全新搭建) |
| 无后端/静态站点 | 文件结构(纯HTML/CSS/JS,无服务器) | 仅能选择Embed方案 |
Questions to ask (only what's still unknown)
需询问的问题(仅针对未知信息)
If the codebase scan leaves gaps, ask the user. Frame as a concise checklist — do not ask these one at a time:
To recommend the best LiveAvatar integration for your setup, I need to know:
1. **What's the goal?** (e.g., customer support avatar, sales demo, onboarding guide, talking head on landing page)
2. **Do you have your own AI pipeline?** (STT, LLM, TTS — or any combination)
3. **Do you need programmatic control** over the conversation (events, interrupts, custom logic), or just an avatar on a page?Skip any question the codebase or conversation already answered.
若代码库扫描仍有信息缺口,可向用户询问。以简洁清单形式呈现——不要逐个提问:
为了给你的配置推荐最佳LiveAvatar集成方案,我需要了解:
1. **目标是什么?**(例如:客户支持虚拟形象、销售演示、入门向导、着陆页虚拟发言人)
2. **你是否拥有自有AI管道?**(STT、LLM、TTS——或任意组合)
3. **你是否需要对对话进行程序化控制**(事件、中断、自定义逻辑),还是仅需在页面上展示虚拟形象?跳过代码库或对话已能回答的问题。
Step 2: Route to the Golden Pathway
步骤2:匹配最优路径
Based on what you've gathered, match to ONE pathway. Always pick the simplest path that works. Do not offer multiple options — make the call.
基于收集到的信息,匹配唯一路径。始终选择最简单且可行的路径,不要提供多个选项——直接做出选择。
Decision tree
决策树
Has NO backend OR just wants an avatar on a page?
→ EMBED
Has NO existing AI stack (no STT, no LLM, no TTS)?
→ FULL MODE (standard)
Has their OWN LLM but no STT/TTS?
→ FULL MODE + Custom LLM
Has their OWN LLM + their own ElevenLabs TTS?
→ FULL MODE + Custom LLM + Custom TTS
Needs explicit mic control (walkie-talkie style)?
→ FULL MODE + Push-to-Talk
Has a COMPLETE pipeline (STT + LLM + TTS)?
→ LITE MODE
Has an ElevenLabs Conversational AI agent?
→ LITE MODE + ElevenLabs Plugin
Has their own LiveKit or Agora infrastructure?
→ LITE MODE + BYO WebRTC无后端 或 仅需在页面上展示虚拟形象?
→ EMBED
无现有AI技术栈(无STT、无LLM、无TTS)?
→ FULL MODE(标准方案)
拥有自有LLM但无STT/TTS?
→ FULL MODE + 自定义LLM
拥有自有LLM + 自有ElevenLabs TTS?
→ FULL MODE + 自定义LLM + 自定义TTS
需要明确的麦克风控制(对讲机模式)?
→ FULL MODE + 一键对讲
拥有完整管道(STT + LLM + TTS)?
→ LITE MODE
拥有ElevenLabs Conversational AI Agent?
→ LITE MODE + ElevenLabs 插件
拥有自有LiveKit或Agora基础设施?
→ LITE MODE + 自带WebRTCGolden pathways (pick one, then implement)
最优路径(选其一后实施)
| Pathway | When | Implementation guide |
|---|---|---|
| Embed | No backend, or no custom logic needed | references/embed-guide.md |
| FULL standard | No existing AI stack | references/full-mode-guide.md |
| FULL + Custom LLM | Has own LLM, wants LiveAvatar's ASR + TTS | references/full-mode-guide.md (Custom LLM section) |
| FULL + Custom TTS | Has own ElevenLabs voice | references/full-mode-guide.md (Custom TTS section) |
| FULL + Push-to-Talk | Needs explicit mic control | references/full-mode-guide.md (Push-to-Talk section) |
| LITE standard | Has complete STT + LLM + TTS pipeline | references/lite-mode-guide.md |
| LITE + ElevenLabs Plugin | Has ElevenLabs Conversational AI agent | references/lite-mode-guide.md (ElevenLabs Plugin section) |
| LITE + BYO WebRTC | Has own LiveKit / Agora | references/lite-mode-guide.md (BYO WebRTC section) |
| 路径 | 适用场景 | 实施指南 |
|---|---|---|
| Embed | 无后端,或无需自定义逻辑 | references/embed-guide.md |
| FULL 标准方案 | 无现有AI技术栈 | references/full-mode-guide.md |
| FULL + 自定义LLM | 拥有自有LLM,希望LiveAvatar处理ASR + TTS | references/full-mode-guide.md(自定义LLM章节) |
| FULL + 自定义TTS | 拥有自有ElevenLabs语音 | references/full-mode-guide.md(自定义TTS章节) |
| FULL + 一键对讲 | 需要明确的麦克风控制 | references/full-mode-guide.md(一键对讲章节) |
| LITE 标准方案 | 拥有完整的STT + LLM + TTS管道 | references/lite-mode-guide.md |
| LITE + ElevenLabs 插件 | 拥有ElevenLabs Conversational AI Agent | references/lite-mode-guide.md(ElevenLabs插件章节) |
| LITE + 自带WebRTC | 拥有自有LiveKit / Agora | references/lite-mode-guide.md(自带WebRTC章节) |
Step 3: Present the Recommendation
步骤3:呈现推荐方案
Once you've picked a pathway, tell the user what you recommend and why, in 2-3 sentences. Example:
Based on your setup, I recommend FULL Mode with Custom LLM. You already have an OpenAI integration for your LLM, so we'll plug that in and let LiveAvatar handle ASR, TTS, and video. This gets you a conversational avatar without rebuilding your audio pipeline.
Then proceed directly to implementation using the corresponding guide in .
references/选好路径后,用2-3句话告知用户你的推荐及理由。示例:
根据你的配置,我推荐FULL Mode + 自定义LLM方案。你已集成OpenAI作为LLM,因此我们只需将其接入,由LiveAvatar处理ASR、TTS和视频。这样无需重建音频管道即可获得对话式虚拟形象。
随后直接使用中的对应指南开始实施。
references/Step 4: Implement
步骤4:实施
Read the appropriate reference guide and implement. Every guide follows the same structure:
- Prerequisites — what to create/gather before writing code
- Session lifecycle — step-by-step with curl commands and code
- Events — what to send and receive
- Add-ons — mode-specific optional features
- Sandbox testing — free testing before going live
- Gotchas — what breaks and how to avoid it
阅读对应参考指南并开展实施。所有指南均遵循以下结构:
- 前置条件 —— 编写代码前需创建/收集的内容
- 会话生命周期 —— 含curl命令和代码的分步指南
- 事件 —— 需要发送和接收的事件
- 附加功能 —— 特定模式下的可选功能
- 沙箱测试 —— 上线前的免费测试
- 常见问题 —— 易出错点及规避方法
Principles that apply to ALL paths
适用于所有路径的原则
Backend / frontend split is non-negotiable. is a secret — backend only. Frontend only gets (safe for browsers). If you see the API key in client code, stop and restructure.
X-API-KEYlivekit_client_tokenContext makes the avatar conversational. In FULL Mode, no = silent avatar. No error thrown. Always create a context first, even a minimal .
context_id"You are a helpful assistant."FULL and LITE are completely different protocols. FULL = LiveKit data channels ( / ). LITE = WebSocket ( / ). Never mix them.
avatar.*user.*agent.*session.*Start with sandbox. , avatar ID . Free, ~1 min sessions. Swap to production avatar when ready.
is_sandbox: truedd73ea75-1218-4ef3-92ce-606d5f7fbc0a后端/前端分离不可协商。是密钥——仅可在后端使用。前端仅能获取(可安全在浏览器中使用)。若在客户端代码中发现API密钥,需立即停止并调整结构。
X-API-KEYlivekit_client_token上下文是虚拟形象具备对话能力的关键。在FULL Mode中,若无,虚拟形象将无法发声,且不会抛出错误。务必先创建上下文,哪怕是最简单的。
context_id"You are a helpful assistant."FULL和LITE是完全不同的协议。FULL = LiveKit数据通道( / )。LITE = WebSocket( / )。切勿混合使用。
avatar.*user.*agent.*session.*从沙箱开始。设置,虚拟形象ID为。免费使用,会话时长约1分钟。准备就绪后切换至生产环境虚拟形象。
is_sandbox: truedd73ea75-1218-4ef3-92ce-606d5f7fbc0aLITE Mode: Fitting into an existing pipeline
LITE Mode:适配现有管道
LITE users almost always have a working conversational system already. Do not ask them to rebuild their pipeline. Instead, map their existing components onto the LITE turn cycle:
- Identify their current flow. Read their code to understand how conversation turns work today — where does user audio come in, how does it reach the LLM, how does TTS output get delivered? Look for their event loop, message handler, or turn manager.
- Find the integration points. You need to hook into three moments in their existing flow:
- User starts/stops speaking → add /
agent.start_listeningagent.stop_listening - TTS produces audio → route PCM output to chunks over WebSocket instead of (or in addition to) their current audio output
agent.speak - Response finishes → send and wait for
agent.speak_endagent.speak_ended
- User starts/stops speaking → add
- Adapt, don't replace. If they have a working turn manager, add LiveAvatar calls into it. If they stream TTS to a browser via WebSocket already, tap into that same stream. The goal is the minimum change to get avatar video synced to their existing audio flow.
- Verify audio format last. Once the wiring is in place, confirm their TTS outputs PCM 16-bit 24KHz. If not, either configure the TTS provider's output format or add resampling at the integration point.
使用LITE模式的用户几乎都已拥有可用的对话系统。不要要求他们重建管道,而是将其现有组件映射至LITE模式的对话周期:
- 识别当前流程。阅读代码以了解当前对话轮次的运作方式——用户音频从何处输入、如何传递至LLM、TTS输出如何交付?查找事件循环、消息处理器或轮次管理器。
- 找到集成点。需要在现有流程中接入三个节点:
- 用户开始/停止说话 → 添加/
agent.start_listeningagent.stop_listening - TTS生成音频 → 将PCM输出通过WebSocket分块路由至,替代(或同时保留)当前音频输出方式
agent.speak - 响应结束 → 发送并等待
agent.speak_endagent.speak_ended
- 用户开始/停止说话 → 添加
- 适配而非替换。若用户已有可用的轮次管理器,在其中添加LiveAvatar调用即可。若用户已通过WebSocket向浏览器流式传输TTS,直接接入该流即可。目标是通过最小改动使虚拟形象视频与现有音频流同步。
- 最后验证音频格式。完成接线后,确认TTS输出为16位24KHz的PCM格式。若不符合,可配置TTS提供商的输出格式,或在集成点添加重采样功能。
Step 5: Collect Feedback
步骤5:收集反馈
Trigger this immediately after you finish writing the integration code — do not wait for the user to confirm it works or say they're done. Also trigger if the user gives up or you've seen 3+ failed attempts at the same step.
What to do: Follow the skill. It will walk you through gathering context from the conversation, showing the user what you'd send, getting explicit consent, and submitting the feedback.
liveavatar-feedbackKeep it lightweight. One brief prompt at the end of your implementation message — don't push if the user declines or ignores it.
完成集成代码编写后立即触发此步骤——无需等待用户确认功能正常或表示完成。若用户放弃尝试,或同一步骤失败3次以上,也需触发。
操作方式:遵循技能。该技能会引导你从对话中收集上下文信息,向用户展示拟发送的内容,获取明确同意后提交反馈。
liveavatar-feedback保持轻量化。在实施完成的消息末尾添加简短提示即可——若用户拒绝或忽略,无需继续推进。
What to consult
参考资源
- references/embed-guide.md — Embed implementation
- references/full-mode-guide.md — FULL Mode implementation + events + add-ons
- references/lite-mode-guide.md — LITE Mode implementation + events + audio format + add-ons
- references/embed-guide.md —— Embed实施指南
- references/full-mode-guide.md —— FULL Mode实施指南 + 事件 + 附加功能
- references/lite-mode-guide.md —— LITE Mode实施指南 + 事件 + 音频格式 + 附加功能