liveavatar-integrate

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LiveAvatar Integration

LiveAvatar 集成指南

LiveAvatar gives your product a human face — real-time, lip-synced video avatars that speak, react, and maintain eye contact. This skill assesses what you have, recommends the best integration path, and walks you through building it.
LiveAvatar为你的产品赋予拟人形象——实时、唇形同步的视频虚拟形象,可说话、做出反应并保持眼神交流。本技能会评估你的现有配置,推荐最佳集成路径,并逐步引导你完成构建。

Step 1: Discover What the User Has

步骤1:了解用户现有配置

Before recommending a path, gather context. Check the codebase and conversation for signals. Do not ask questions the codebase already answers.
在推荐路径前,先收集上下文信息。检查代码库和对话内容中的信号。不要询问代码库已能回答的问题。

Signals to look for in the codebase

代码库中需查找的信号

Scan for these automatically — do not ask the user if you can detect them:
SignalWhere to lookWhat it means
OpenAI / Anthropic / LLM SDK imports
package.json
,
requirements.txt
, imports
User has their own LLM
ElevenLabs / PlayHT / Deepgram TTS SDKdependencies, importsUser has their own TTS
Deepgram / Whisper / AssemblyAI STT SDKdependencies, importsUser has their own STT
LiveKit SDK (
livekit-server-sdk
,
@livekit/
)
dependenciesUser has LiveKit infra
Agora SDKdependenciesUser has Agora infra
Pipecat importsdependencies, importsUser has a Pipecat pipeline
ElevenLabs Agent / Conversational AIdependencies, configUser has an ElevenLabs agent
HEYGEN_API_KEY
/
LIVEAVATAR_API_KEY
.env
, config files
User already has an API key
Existing LiveAvatar codeimports, API calls to
api.liveavatar.com
Existing integration (debug, not new setup)
No backend / static sitefile structure (pure HTML/CSS/JS, no server)Embed is the only option
自动扫描以下内容——若能检测到,无需询问用户:
信号查找位置含义
OpenAI / Anthropic / LLM SDK 导入
package.json
requirements.txt
、导入语句
用户拥有自有LLM
ElevenLabs / PlayHT / Deepgram TTS SDK依赖项、导入语句用户拥有自有TTS
Deepgram / Whisper / AssemblyAI STT SDK依赖项、导入语句用户拥有自有STT
LiveKit SDK(
livekit-server-sdk
@livekit/
依赖项用户拥有LiveKit基础设施
Agora SDK依赖项用户拥有Agora基础设施
Pipecat 导入依赖项、导入语句用户拥有Pipecat管道
ElevenLabs Agent / 对话式AI依赖项、配置文件用户拥有ElevenLabs Agent
HEYGEN_API_KEY
/
LIVEAVATAR_API_KEY
.env
、配置文件
用户已拥有API密钥
现有LiveAvatar代码导入语句、对
api.liveavatar.com
的API调用
已有集成(需调试,而非全新搭建)
无后端/静态站点文件结构(纯HTML/CSS/JS,无服务器)仅能选择Embed方案

Questions to ask (only what's still unknown)

需询问的问题(仅针对未知信息)

If the codebase scan leaves gaps, ask the user. Frame as a concise checklist — do not ask these one at a time:
To recommend the best LiveAvatar integration for your setup, I need to know:

1. **What's the goal?** (e.g., customer support avatar, sales demo, onboarding guide, talking head on landing page)
2. **Do you have your own AI pipeline?** (STT, LLM, TTS — or any combination)
3. **Do you need programmatic control** over the conversation (events, interrupts, custom logic), or just an avatar on a page?
Skip any question the codebase or conversation already answered.
若代码库扫描仍有信息缺口,可向用户询问。以简洁清单形式呈现——不要逐个提问
为了给你的配置推荐最佳LiveAvatar集成方案,我需要了解:

1. **目标是什么?**(例如:客户支持虚拟形象、销售演示、入门向导、着陆页虚拟发言人)
2. **你是否拥有自有AI管道?**(STT、LLM、TTS——或任意组合)
3. **你是否需要对对话进行程序化控制**(事件、中断、自定义逻辑),还是仅需在页面上展示虚拟形象?
跳过代码库或对话已能回答的问题。

Step 2: Route to the Golden Pathway

步骤2:匹配最优路径

Based on what you've gathered, match to ONE pathway. Always pick the simplest path that works. Do not offer multiple options — make the call.
基于收集到的信息,匹配唯一路径。始终选择最简单且可行的路径,不要提供多个选项——直接做出选择。

Decision tree

决策树

Has NO backend OR just wants an avatar on a page?
  → EMBED

Has NO existing AI stack (no STT, no LLM, no TTS)?
  → FULL MODE (standard)

Has their OWN LLM but no STT/TTS?
  → FULL MODE + Custom LLM

Has their OWN LLM + their own ElevenLabs TTS?
  → FULL MODE + Custom LLM + Custom TTS

Needs explicit mic control (walkie-talkie style)?
  → FULL MODE + Push-to-Talk

Has a COMPLETE pipeline (STT + LLM + TTS)?
  → LITE MODE

Has an ElevenLabs Conversational AI agent?
  → LITE MODE + ElevenLabs Plugin

Has their own LiveKit or Agora infrastructure?
  → LITE MODE + BYO WebRTC
无后端 或 仅需在页面上展示虚拟形象?
  → EMBED

无现有AI技术栈(无STT、无LLM、无TTS)?
  → FULL MODE(标准方案)

拥有自有LLM但无STT/TTS?
  → FULL MODE + 自定义LLM

拥有自有LLM + 自有ElevenLabs TTS?
  → FULL MODE + 自定义LLM + 自定义TTS

需要明确的麦克风控制(对讲机模式)?
  → FULL MODE + 一键对讲

拥有完整管道(STT + LLM + TTS)?
  → LITE MODE

拥有ElevenLabs Conversational AI Agent?
  → LITE MODE + ElevenLabs 插件

拥有自有LiveKit或Agora基础设施?
  → LITE MODE + 自带WebRTC

Golden pathways (pick one, then implement)

最优路径(选其一后实施)

PathwayWhenImplementation guide
EmbedNo backend, or no custom logic neededreferences/embed-guide.md
FULL standardNo existing AI stackreferences/full-mode-guide.md
FULL + Custom LLMHas own LLM, wants LiveAvatar's ASR + TTSreferences/full-mode-guide.md (Custom LLM section)
FULL + Custom TTSHas own ElevenLabs voicereferences/full-mode-guide.md (Custom TTS section)
FULL + Push-to-TalkNeeds explicit mic controlreferences/full-mode-guide.md (Push-to-Talk section)
LITE standardHas complete STT + LLM + TTS pipelinereferences/lite-mode-guide.md
LITE + ElevenLabs PluginHas ElevenLabs Conversational AI agentreferences/lite-mode-guide.md (ElevenLabs Plugin section)
LITE + BYO WebRTCHas own LiveKit / Agorareferences/lite-mode-guide.md (BYO WebRTC section)
路径适用场景实施指南
Embed无后端,或无需自定义逻辑references/embed-guide.md
FULL 标准方案无现有AI技术栈references/full-mode-guide.md
FULL + 自定义LLM拥有自有LLM,希望LiveAvatar处理ASR + TTSreferences/full-mode-guide.md(自定义LLM章节)
FULL + 自定义TTS拥有自有ElevenLabs语音references/full-mode-guide.md(自定义TTS章节)
FULL + 一键对讲需要明确的麦克风控制references/full-mode-guide.md(一键对讲章节)
LITE 标准方案拥有完整的STT + LLM + TTS管道references/lite-mode-guide.md
LITE + ElevenLabs 插件拥有ElevenLabs Conversational AI Agentreferences/lite-mode-guide.md(ElevenLabs插件章节)
LITE + 自带WebRTC拥有自有LiveKit / Agorareferences/lite-mode-guide.md(自带WebRTC章节)

Step 3: Present the Recommendation

步骤3:呈现推荐方案

Once you've picked a pathway, tell the user what you recommend and why, in 2-3 sentences. Example:
Based on your setup, I recommend FULL Mode with Custom LLM. You already have an OpenAI integration for your LLM, so we'll plug that in and let LiveAvatar handle ASR, TTS, and video. This gets you a conversational avatar without rebuilding your audio pipeline.
Then proceed directly to implementation using the corresponding guide in
references/
.
选好路径后,用2-3句话告知用户你的推荐及理由。示例:
根据你的配置,我推荐FULL Mode + 自定义LLM方案。你已集成OpenAI作为LLM,因此我们只需将其接入,由LiveAvatar处理ASR、TTS和视频。这样无需重建音频管道即可获得对话式虚拟形象。
随后直接使用
references/
中的对应指南开始实施。

Step 4: Implement

步骤4:实施

Read the appropriate reference guide and implement. Every guide follows the same structure:
  1. Prerequisites — what to create/gather before writing code
  2. Session lifecycle — step-by-step with curl commands and code
  3. Events — what to send and receive
  4. Add-ons — mode-specific optional features
  5. Sandbox testing — free testing before going live
  6. Gotchas — what breaks and how to avoid it
阅读对应参考指南并开展实施。所有指南均遵循以下结构:
  1. 前置条件 —— 编写代码前需创建/收集的内容
  2. 会话生命周期 —— 含curl命令和代码的分步指南
  3. 事件 —— 需要发送和接收的事件
  4. 附加功能 —— 特定模式下的可选功能
  5. 沙箱测试 —— 上线前的免费测试
  6. 常见问题 —— 易出错点及规避方法

Principles that apply to ALL paths

适用于所有路径的原则

Backend / frontend split is non-negotiable.
X-API-KEY
is a secret — backend only. Frontend only gets
livekit_client_token
(safe for browsers). If you see the API key in client code, stop and restructure.
Context makes the avatar conversational. In FULL Mode, no
context_id
= silent avatar. No error thrown. Always create a context first, even a minimal
"You are a helpful assistant."
.
FULL and LITE are completely different protocols. FULL = LiveKit data channels (
avatar.*
/
user.*
). LITE = WebSocket (
agent.*
/
session.*
). Never mix them.
Start with sandbox.
is_sandbox: true
, avatar ID
dd73ea75-1218-4ef3-92ce-606d5f7fbc0a
. Free, ~1 min sessions. Swap to production avatar when ready.
后端/前端分离不可协商
X-API-KEY
是密钥——仅可在后端使用。前端仅能获取
livekit_client_token
(可安全在浏览器中使用)。若在客户端代码中发现API密钥,需立即停止并调整结构。
上下文是虚拟形象具备对话能力的关键。在FULL Mode中,若无
context_id
,虚拟形象将无法发声,且不会抛出错误。务必先创建上下文,哪怕是最简单的
"You are a helpful assistant."
FULL和LITE是完全不同的协议。FULL = LiveKit数据通道(
avatar.*
/
user.*
)。LITE = WebSocket(
agent.*
/
session.*
)。切勿混合使用。
从沙箱开始。设置
is_sandbox: true
,虚拟形象ID为
dd73ea75-1218-4ef3-92ce-606d5f7fbc0a
。免费使用,会话时长约1分钟。准备就绪后切换至生产环境虚拟形象。

LITE Mode: Fitting into an existing pipeline

LITE Mode:适配现有管道

LITE users almost always have a working conversational system already. Do not ask them to rebuild their pipeline. Instead, map their existing components onto the LITE turn cycle:
  1. Identify their current flow. Read their code to understand how conversation turns work today — where does user audio come in, how does it reach the LLM, how does TTS output get delivered? Look for their event loop, message handler, or turn manager.
  2. Find the integration points. You need to hook into three moments in their existing flow:
    • User starts/stops speaking → add
      agent.start_listening
      /
      agent.stop_listening
    • TTS produces audio → route PCM output to
      agent.speak
      chunks over WebSocket instead of (or in addition to) their current audio output
    • Response finishes → send
      agent.speak_end
      and wait for
      agent.speak_ended
  3. Adapt, don't replace. If they have a working turn manager, add LiveAvatar calls into it. If they stream TTS to a browser via WebSocket already, tap into that same stream. The goal is the minimum change to get avatar video synced to their existing audio flow.
  4. Verify audio format last. Once the wiring is in place, confirm their TTS outputs PCM 16-bit 24KHz. If not, either configure the TTS provider's output format or add resampling at the integration point.
使用LITE模式的用户几乎都已拥有可用的对话系统。不要要求他们重建管道,而是将其现有组件映射至LITE模式的对话周期:
  1. 识别当前流程。阅读代码以了解当前对话轮次的运作方式——用户音频从何处输入、如何传递至LLM、TTS输出如何交付?查找事件循环、消息处理器或轮次管理器。
  2. 找到集成点。需要在现有流程中接入三个节点:
    • 用户开始/停止说话 → 添加
      agent.start_listening
      /
      agent.stop_listening
    • TTS生成音频 → 将PCM输出通过WebSocket分块路由至
      agent.speak
      ,替代(或同时保留)当前音频输出方式
    • 响应结束 → 发送
      agent.speak_end
      并等待
      agent.speak_ended
  3. 适配而非替换。若用户已有可用的轮次管理器,在其中添加LiveAvatar调用即可。若用户已通过WebSocket向浏览器流式传输TTS,直接接入该流即可。目标是通过最小改动使虚拟形象视频与现有音频流同步。
  4. 最后验证音频格式。完成接线后,确认TTS输出为16位24KHz的PCM格式。若不符合,可配置TTS提供商的输出格式,或在集成点添加重采样功能。

Step 5: Collect Feedback

步骤5:收集反馈

Trigger this immediately after you finish writing the integration code — do not wait for the user to confirm it works or say they're done. Also trigger if the user gives up or you've seen 3+ failed attempts at the same step.
What to do: Follow the
liveavatar-feedback
skill. It will walk you through gathering context from the conversation, showing the user what you'd send, getting explicit consent, and submitting the feedback.
Keep it lightweight. One brief prompt at the end of your implementation message — don't push if the user declines or ignores it.
完成集成代码编写后立即触发此步骤——无需等待用户确认功能正常或表示完成。若用户放弃尝试,或同一步骤失败3次以上,也需触发。
操作方式:遵循
liveavatar-feedback
技能。该技能会引导你从对话中收集上下文信息,向用户展示拟发送的内容,获取明确同意后提交反馈。
保持轻量化。在实施完成的消息末尾添加简短提示即可——若用户拒绝或忽略,无需继续推进。

What to consult

参考资源

  • references/embed-guide.md — Embed implementation
  • references/full-mode-guide.md — FULL Mode implementation + events + add-ons
  • references/lite-mode-guide.md — LITE Mode implementation + events + audio format + add-ons
  • references/embed-guide.md —— Embed实施指南
  • references/full-mode-guide.md —— FULL Mode实施指南 + 事件 + 附加功能
  • references/lite-mode-guide.md —— LITE Mode实施指南 + 事件 + 音频格式 + 附加功能