LiveAvatar Integration

LiveAvatar gives your product a human face — real-time, lip-synced video avatars that speak, react, and maintain eye contact. This skill assesses what you have, recommends the best integration path, and walks you through building it.

Step 1: Discover What the User Has

Before recommending a path, gather context. Check the codebase and conversation for signals. Do not ask questions the codebase already answers.

Signals to look for in the codebase

Scan for these automatically — do not ask the user if you can detect them:

Signal	Where to look	What it means
OpenAI / Anthropic / LLM SDK imports	`package.json` , `requirements.txt` , imports	User has their own LLM
ElevenLabs / PlayHT / Deepgram TTS SDK	dependencies, imports	User has their own TTS
Deepgram / Whisper / AssemblyAI STT SDK	dependencies, imports	User has their own STT
LiveKit SDK ( `livekit-server-sdk` , `@livekit/` )	dependencies	User has LiveKit infra
Agora SDK	dependencies	User has Agora infra
Pipecat imports	dependencies, imports	User has a Pipecat pipeline
ElevenLabs Agent / Conversational AI	dependencies, config	User has an ElevenLabs agent
`HEYGEN_API_KEY` / `LIVEAVATAR_API_KEY`	`.env` , config files	User already has an API key
Existing LiveAvatar code	imports, API calls to `api.liveavatar.com`	Existing integration (debug, not new setup)
No backend / static site	file structure (pure HTML/CSS/JS, no server)	Embed is the only option

Questions to ask (only what's still unknown)

If the codebase scan leaves gaps, ask the user. Frame as a concise checklist — do not ask these one at a time:

To recommend the best LiveAvatar integration for your setup, I need to know:

1. **What's the goal?** (e.g., customer support avatar, sales demo, onboarding guide, talking head on landing page)
2. **Do you have your own AI pipeline?** (STT, LLM, TTS — or any combination)
3. **Do you need programmatic control** over the conversation (events, interrupts, custom logic), or just an avatar on a page?

Skip any question the codebase or conversation already answered.

Step 2: Route to the Golden Pathway

Based on what you've gathered, match to ONE pathway. Always pick the simplest path that works. Do not offer multiple options — make the call.

Decision tree

Has NO backend OR just wants an avatar on a page?
  → EMBED

Has NO existing AI stack (no STT, no LLM, no TTS)?
  → FULL MODE (standard)

Has their OWN LLM but no STT/TTS?
  → FULL MODE + Custom LLM

Has their OWN LLM + their own ElevenLabs TTS?
  → FULL MODE + Custom LLM + Custom TTS

Needs explicit mic control (walkie-talkie style)?
  → FULL MODE + Push-to-Talk

Has a COMPLETE pipeline (STT + LLM + TTS)?
  → LITE MODE

Has an ElevenLabs Conversational AI agent?
  → LITE MODE + ElevenLabs Plugin

Has their own LiveKit or Agora infrastructure?
  → LITE MODE + BYO WebRTC

Golden pathways (pick one, then implement)

Pathway	When	Implementation guide
Embed	No backend, or no custom logic needed	references/embed-guide.md
FULL standard	No existing AI stack	references/full-mode-guide.md
FULL + Custom LLM	Has own LLM, wants LiveAvatar's ASR + TTS	references/full-mode-guide.md (Custom LLM section)
FULL + Custom TTS	Has own ElevenLabs voice	references/full-mode-guide.md (Custom TTS section)
FULL + Push-to-Talk	Needs explicit mic control	references/full-mode-guide.md (Push-to-Talk section)
LITE standard	Has complete STT + LLM + TTS pipeline	references/lite-mode-guide.md
LITE + ElevenLabs Plugin	Has ElevenLabs Conversational AI agent	references/lite-mode-guide.md (ElevenLabs Plugin section)
LITE + BYO WebRTC	Has own LiveKit / Agora	references/lite-mode-guide.md (BYO WebRTC section)

Step 3: Present the Recommendation

Once you've picked a pathway, tell the user what you recommend and why, in 2-3 sentences. Example:

Based on your setup, I recommend FULL Mode with Custom LLM. You already have an OpenAI integration for your LLM, so we'll plug that in and let LiveAvatar handle ASR, TTS, and video. This gets you a conversational avatar without rebuilding your audio pipeline.

Then proceed directly to implementation using the corresponding guide in

references/

Step 4: Implement

Read the appropriate reference guide and implement. Every guide follows the same structure:

Prerequisites — what to create/gather before writing code
Session lifecycle — step-by-step with curl commands and code
Events — what to send and receive
Add-ons — mode-specific optional features
Sandbox testing — free testing before going live
Gotchas — what breaks and how to avoid it

Principles that apply to ALL paths

Backend / frontend split is non-negotiable.

X-API-KEY

is a secret — backend only. Frontend only gets

livekit_client_token

(safe for browsers). If you see the API key in client code, stop and restructure.

Context makes the avatar conversational. In FULL Mode, no

context_id

= silent avatar. No error thrown. Always create a context first, even a minimal

"You are a helpful assistant."

FULL and LITE are completely different protocols. FULL = LiveKit data channels (

avatar.*

user.*

). LITE = WebSocket (

agent.*

session.*

). Never mix them.

Start with sandbox.

is_sandbox: true

, avatar ID

dd73ea75-1218-4ef3-92ce-606d5f7fbc0a

. Free, ~1 min sessions. Swap to production avatar when ready.

LITE Mode: Fitting into an existing pipeline

LITE users almost always have a working conversational system already. Do not ask them to rebuild their pipeline. Instead, map their existing components onto the LITE turn cycle:

Identify their current flow. Read their code to understand how conversation turns work today — where does user audio come in, how does it reach the LLM, how does TTS output get delivered? Look for their event loop, message handler, or turn manager.
Find the integration points. You need to hook into three moments in their existing flow:
- User starts/stops speaking → add
```
agent.start_listening
```
  /
```
agent.stop_listening
```
- TTS produces audio → route PCM output to
```
agent.speak
```
  chunks over WebSocket instead of (or in addition to) their current audio output
- Response finishes → send
```
agent.speak_end
```
  and wait for
```
agent.speak_ended
```
Adapt, don't replace. If they have a working turn manager, add LiveAvatar calls into it. If they stream TTS to a browser via WebSocket already, tap into that same stream. The goal is the minimum change to get avatar video synced to their existing audio flow.
Verify audio format last. Once the wiring is in place, confirm their TTS outputs PCM 16-bit 24KHz. If not, either configure the TTS provider's output format or add resampling at the integration point.

Step 5: Collect Feedback

Trigger this immediately after you finish writing the integration code — do not wait for the user to confirm it works or say they're done. Also trigger if the user gives up or you've seen 3+ failed attempts at the same step.

What to do: Follow the

liveavatar-feedback

skill. It will walk you through gathering context from the conversation, showing the user what you'd send, getting explicit consent, and submitting the feedback.

Keep it lightweight. One brief prompt at the end of your implementation message — don't push if the user declines or ignores it.

What to consult

references/embed-guide.md — Embed implementation
references/full-mode-guide.md — FULL Mode implementation + events + add-ons
references/lite-mode-guide.md — LITE Mode implementation + events + audio format + add-ons

liveavatar-integrate

NPX Install

Tags

SKILL.md Content