heygen-avatar

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

HeyGen Avatar Designer

HeyGen 头像设计工具

Create and manage HeyGen avatars for anyone: the agent, the user, or named characters. Handles identity extraction, avatar generation, voice selection, and saves everything to
AVATAR-<NAME>.md
for consistent reuse.
为任意对象创建并管理HeyGen头像:Agent、用户或指定角色。处理身份提取、头像生成、语音选择,并将所有信息保存至
AVATAR-<NAME>.md
以实现一致复用。

Files & Paths

文件与路径

This skill reads and writes the following. No other files are accessed without explicit user instruction.
OperationPathPurpose
Read
SOUL.md
,
IDENTITY.md
Extract identity details when creating an avatar for the agent
Read
AVATAR-<NAME>.md
Load existing avatar identity (for variant looks, voice updates)
Write
AVATAR-<NAME>.md
Save new avatar identity after creation
Write
AVATAR-AGENT.md
,
AVATAR-USER.md
(symlinks)
Role aliases, see Phase 5
Temp write
/tmp/openclaw/uploads/
Voice preview audio (downloaded for user playback, deleted after session)
Remote uploadHeyGen (via
heygen asset create
or MCP)
User-provided photos uploaded to HeyGen for digital-twin creation
Assets are only uploaded to HeyGen when the user explicitly provides them.
本技能会读取和写入以下文件。未经用户明确指示,不会访问其他文件。
操作路径用途
读取
SOUL.md
,
IDENTITY.md
创建Agent头像时提取身份详情
读取
AVATAR-<NAME>.md
加载现有头像身份(用于变体外观、语音更新)
写入
AVATAR-<NAME>.md
创建完成后保存新的头像身份
写入
AVATAR-AGENT.md
,
AVATAR-USER.md
(符号链接)
角色别名,详见第5阶段
临时写入
/tmp/openclaw/uploads/
语音预览音频(供用户播放,会话结束后删除)
远程上传HeyGen(通过
heygen asset create
或MCP)
将用户提供的照片上传至HeyGen以创建数字孪生体
仅当用户明确提供资产时,才会将其上传至HeyGen。

Language Awareness

语言适配

Detect the user's language from their first message. Store as
user_language
(e.g.,
en
,
ja
,
es
,
ko
,
zh
,
fr
,
de
,
pt
).
  1. Communicate with the user in their language. All questions, status updates, confirmations, and error messages should be in
    user_language
    .
  2. Voice design prompts and selection respect
    user_language
    .
    When designing or selecting a voice, specify the target language so the voice library returns matches that speak it.
  3. Technical directives stay in English — enum values (
    Young Adult
    ,
    Realistic
    ,
    landscape
    , etc.) are API-level and not translated.
从用户的第一条消息检测其使用的语言,存储为
user_language
(例如:
en
ja
es
ko
zh
fr
de
pt
)。
  1. 使用用户的语言进行沟通:所有问题、状态更新、确认信息和错误提示均需使用
    user_language
  2. 语音设计提示词和选择需遵循
    user_language
    :设计或选择语音时,指定目标语言,确保语音库返回匹配该语言的选项。
  3. 技术指令保留英文:枚举值(如
    Young Adult
    Realistic
    landscape
    等)属于API层面内容,无需翻译。

UX Rules

用户体验规则

  1. Be concise. No avatar IDs, group IDs, or raw API payloads in chat. Report the result (avatar created, ready to use) not the plumbing.
  2. No internal jargon. Never mention internal phase names ("Phase 0", "Phase 5 Symlink Maintenance") to the user. The user sees natural conversation: "Setting up your avatar\u2026" not "Running Phase 2 avatar creation."
  3. One or two questions per phase. Don't batch-ask. Walk phases in order, ask the smallest set of questions needed to proceed.
  4. Read workspace files before asking.
    SOUL.md
    ,
    IDENTITY.md
    ,
    AVATAR-*.md
    at the workspace root contain identity. Check them first. Only ask the user for what's genuinely missing.
  5. Don't narrate skill internals. Never say "let me read the workflow," "checking the reference files," "loading the avatar discovery guide." Read silently. The user sees questions and results, not internal navigation.
  6. Don't announce what you're about to do. Skip meta-commentary like "Creating the avatar now." Just do the work. If a step takes time, the next thing the user hears should be the result (or a checkpoint question).
  7. Never narrate transport choice. MCP vs CLI is internal. Pick the transport silently and never mention it. If both are unavailable, ask the user to configure one without explaining why.
  1. 简洁明了:聊天中不显示头像ID、组ID或原始API负载。仅报告结果(如头像已创建、可使用),而非底层实现细节。
  2. 避免内部术语:切勿向用户提及内部阶段名称(如“第0阶段”“第5阶段符号链接维护”)。用户看到的应是自然对话:“正在为你设置头像……”而非“正在运行第2阶段头像创建流程”。
  3. 每阶段最多1-2个问题:不要批量提问。按阶段逐步操作,每次仅询问推进所需的最少问题。
  4. 先读取工作区文件再提问:工作区根目录下的
    SOUL.md
    IDENTITY.md
    AVATAR-*.md
    包含身份信息。先检查这些文件,仅向用户询问确实缺失的信息。
  5. 不描述技能内部操作:切勿说“我来读取工作流”“正在检查参考文件”“正在加载头像发现指南”等。静默读取文件,用户只需看到问题和结果,无需了解内部导航过程。
  6. 不预告即将执行的操作:跳过诸如“现在开始创建头像”之类的元注释。直接执行操作。若某步骤耗时较长,用户接下来应看到的是结果(或 checkpoint 问题)。
  7. 不提及传输方式选择:MCP与CLI属于内部实现。静默选择传输方式,切勿向用户提及。若两者均不可用,只需告知用户配置其中一种,无需解释原因。

Start Here (Critical)

起始要点(关键)

Default target = the agent. The primary use of this skill is giving the agent a face + voice so it can present videos. Route to "user" only on explicit "my avatar" / "me" / "my photo" language. When in doubt, make the agent's avatar.
Do NOT batch-ask questions. No "give me a photo, voice preference, duration, target platform, tone, key message" all at once. Walk phases in order. Each phase asks at most one or two things at a time.
For agent avatars: read SOUL.md and IDENTITY.md first, then go straight to prompt-based creation. Do NOT ask the user for a photo or appearance details first. The agent's identity lives in those workspace files. Only ask the user for traits that are genuinely missing.
Prompt-based is the default creation path. Photo is opt-in, only relevant when the user explicitly wants a real-person digital twin of themselves. Agents and named characters almost always use prompt-based creation.
默认目标 = Agent:本技能的主要用途是为Agent赋予面部+语音,使其能呈现视频。仅当用户明确使用“我的头像”“我”“我的照片”等表述时,才切换到“用户”目标。若存在疑问,默认创建Agent的头像。
禁止批量提问:不要一次性询问“提供照片、语音偏好、时长、目标平台、语气、关键信息”等所有问题。按阶段逐步操作,每个阶段最多询问1-2个问题。
对于Agent头像:先读取
SOUL.md
IDENTITY.md
,然后直接进入基于提示词的创建流程。切勿先向用户索要照片或外观细节。Agent的身份信息存储在这些工作区文件中,仅当确实缺少某些特征时才向用户询问。
基于提示词是默认创建路径:照片上传为可选操作,仅当用户明确希望创建自身的真人数字孪生体时才相关。Agent和指定角色几乎均使用基于提示词的创建方式。

Before You Start (environment detection)

启动前准备(环境检测)

Try to read
SOUL.md
from the workspace root.
  • Found → OpenClaw environment. Skip this entire section and go straight to Phase 0. Workspace-native identity (SOUL.md, IDENTITY.md) will drive agent onboarding.
  • Not found → Claude Code environment, no workspace identity files. Still go to Phase 0 next — do NOT skip ahead to listing user avatars or asking the user for a photo.
⚠️ AVATAR file caveat: Ignore any AVATAR-*.md files found in the workspace that belong to a different person or agent (e.g., AVATAR-Eve.md when creating an avatar for Claude). Only use an AVATAR file if its name matches the subject you're creating for right now.
⚠️ Do NOT fetch HeyGen avatars yet. That's a Phase 0 sub-step (only after target detection). Fetching before Phase 0 causes the agent to frame the conversation around "your existing avatars" when the default should be creating one for the agent itself.
尝试读取工作区根目录下的
SOUL.md
  • 找到文件 → OpenClaw环境。跳过本部分,直接进入第0阶段。工作区原生身份信息(SOUL.md、IDENTITY.md)将引导Agent完成初始化。
  • 未找到文件 → Claude Code环境,无工作区身份文件。仍需进入第0阶段——切勿直接跳过至列出用户头像或向用户索要照片。
⚠️ AVATAR文件注意事项:忽略工作区中属于其他人员或Agent的AVATAR-*.md文件(例如,为Claude创建头像时忽略AVATAR-Eve.md)。仅当AVATAR文件的名称与当前创建对象匹配时才使用该文件。
⚠️ 暂不获取HeyGen头像:这是第0阶段的子步骤(仅在目标检测之后执行)。在第0阶段之前获取头像会导致对话围绕“你已有的头像”展开,而默认应是为Agent创建新头像。

API Mode Detection

API模式检测

Mode selection is silent. Detect once at the start of the session, pick a mode, move on. Never narrate transport choice ("CLI is broken", "switching to MCP") — the user doesn't care how calls are made.
MCP (preferred): If HeyGen MCP tools are available (tools matching
mcp__heygen__*
), use them. MCP authenticates via OAuth — no API key needed — and runs against the user's existing HeyGen plan credits.
CLI fallback: If MCP tools are not available AND the
heygen
binary runs cleanly (
heygen --version
exits 0), use it. Auth:
HEYGEN_API_KEY
env OR
heygen auth login
(persists to
~/.heygen/credentials
). If the CLI is missing, errors on
--version
, or auth is not set, silently skip it — do not retry MCP.
Neither available: Only if MCP is unavailable AND the CLI doesn't work, tell the user once: "To use this skill, connect the HeyGen MCP server or install the HeyGen CLI:
curl -fsSL https://static.heygen.ai/cli/install.sh | bash
then
heygen auth login
."
API: v3 only. Never call v1 or v2 endpoints.
Docs-first rule: Before calling any endpoint you're unsure about:
  • Index:
    GET https://developers.heygen.com/llms.txt
    — full sitemap
  • Any page: Append
    .md
    to the URL for clean markdown
  • Or run
    heygen <noun> <verb> --help
  • Read the spec, THEN build your request. Never guess field names.
模式选择静默进行:在会话开始时检测一次,选择模式后继续操作。切勿提及传输方式选择(如“CLI已损坏”“切换至MCP”)——用户无需关心调用方式。
MCP(优先选择):若HeyGen MCP工具可用(匹配
mcp__heygen__*
的工具),则使用该工具。MCP通过OAuth认证——无需API密钥——并使用用户现有HeyGen套餐的额度。
CLI备选方案:若MCP工具不可用且
heygen
二进制文件可正常运行(
heygen --version
返回0),则使用CLI。认证方式:
HEYGEN_API_KEY
环境变量或
heygen auth login
(持久化至
~/.heygen/credentials
)。若CLI缺失、执行
--version
报错或未设置认证,则静默跳过——无需重试MCP。
两者均不可用:仅当MCP不可用且CLI无法工作时,告知用户一次:“要使用本技能,请连接HeyGen MCP服务器或安装HeyGen CLI:
curl -fsSL https://static.heygen.ai/cli/install.sh | bash
,然后执行
heygen auth login
。”
API版本:仅使用v3版本。切勿调用v1或v2端点。
文档优先规则:在调用任何不确定的端点之前:
  • 索引
    GET https://developers.heygen.com/llms.txt
    ——完整站点地图
  • 任意页面:在URL后添加
    .md
    以获取清晰的markdown格式文档
  • 或执行
    heygen <noun> <verb> --help
  • 阅读规范后,再构建请求。切勿猜测字段名称。

Avatar File Convention

Avatar文件规范

Every avatar gets one file:
AVATAR-<NAME>.md
at the workspace root.
AVATAR-EVE.md      ← agent      (named, canonical)
AVATAR-KEN.md      ← user       (named, canonical)
AVATAR-CLEO.md     ← character  (named, canonical)
The skill also maintains two role-based symlinks alongside the named files, for generic lookups by consumer skills (e.g., heygen-video) when the request doesn't carry a specific name ("make a video of yourself" → read the agent alias; "make a video of me" → read the user alias):
AVATAR-AGENT.md → AVATAR-<CURRENT-AGENT-NAME>.md   (symlink)
AVATAR-USER.md  → AVATAR-<CURRENT-USER-NAME>.md    (symlink)
Named files are the single source of truth; aliases are pointers and never drift. Phase 5 of the workflow maintains them. Named characters get NO role alias — they are referenced by name only.
Format:
markdown
undefined
每个头像对应一个文件:工作区根目录下的
AVATAR-<NAME>.md
AVATAR-EVE.md      ← Agent     (命名式,标准文件)
AVATAR-KEN.md      ← 用户      (命名式,标准文件)
AVATAR-CLEO.md     ← 角色      (命名式,标准文件)
本技能还会维护两个基于角色的符号链接,与命名文件共存,供消费类技能(如heygen-video)在请求未携带特定名称时进行通用查找(例如,“制作你的视频”→读取Agent别名;“制作我的视频”→读取用户别名):
AVATAR-AGENT.md → AVATAR-<当前Agent名称>.md   (符号链接)
AVATAR-USER.md  → AVATAR-<当前用户名称>.md    (符号链接)
命名文件是唯一的真实数据源;别名仅为指针,不会出现内容不一致。工作流的第5阶段负责维护这些别名。指定角色没有角色别名——仅通过名称引用(例如
AVATAR-CLEO.md
)。
格式:
markdown
undefined

Avatar: <Name>

Avatar: <名称>

Appearance

外观

  • Age: <natural language>
  • Gender: <natural language>
  • Ethnicity: <natural language>
  • Hair: <natural language>
  • Build: <natural language>
  • Features: <natural language>
  • Style: <natural language>
  • Reference: <optional workspace-relative path or URL>
  • 年龄: <自然语言描述>
  • 性别: <自然语言描述>
  • 种族: <自然语言描述>
  • 发型: <自然语言描述>
  • 体型: <自然语言描述>
  • 特征: <自然语言描述>
  • 风格: <自然语言描述>
  • 参考: <可选的工作区相对路径或URL>

Voice

语音

  • Tone: <natural language>
  • Accent: <natural language>
  • Energy: <natural language>
  • Think: <one-line analogy>
  • 语气: <自然语言描述>
  • 口音: <自然语言描述>
  • 活力: <自然语言描述>
  • 类比: <单行类比描述>

HeyGen

HeyGen

  • Group ID: <character identity anchor — THE stable reference, never changes>
  • Voice ID: <matched or designed voice>
  • Voice Name: <human-readable>
  • Voice Designed: <true if custom-designed, false if picked from catalog>
  • Voice Seed: <seed value used, if designed>
  • Looks: landscape=<look_id>, portrait=<look_id>, square=<look_id>
  • Last Synced: <ISO timestamp>
⚠️ look_ids are ephemeral — always resolve fresh from group_id at runtime via
heygen avatar looks list --group-id <id>
(or MCP
list_avatar_looks
). Never hardcode look_id as the primary avatar reference.

**Top sections** (Appearance, Voice) are portable natural language. Any platform can use them.
**HeyGen section** is runtime config with API IDs. Skills read this to make API calls.
  • Group ID: <角色身份锚点——稳定的引用,永不改变>
  • Voice ID: <匹配或设计的语音>
  • Voice Name: <人类可读名称>
  • Voice Designed: <自定义设计则为true,从目录选择则为false>
  • Voice Seed: <若为设计生成,此处为使用的种子值>
  • Looks: landscape=<look_id>, portrait=<look_id>, square=<look_id>
  • Last Synced: <ISO时间戳>
⚠️ look_ids是临时的——运行时始终通过
heygen avatar looks list --group-id <id>
(或MCP的
list_avatar_looks
)从group_id重新解析。切勿将look_id硬编码为主要头像引用。

**顶部章节**(外观、语音)为可移植的自然语言描述,任何平台均可使用。
**HeyGen章节**为包含API ID的运行时配置,技能读取该部分内容以调用API。

Skill Announcement

技能提示

Start every invocation with:
🎭 Using: heygen-avatar — creating an avatar for [name]
每次调用开始时显示:
🎭 正在使用: heygen-avatar — 为[名称]创建头像

Workflow

工作流

DO NOT batch-ask questions upfront. Walk phases in order. Each phase asks at most one thing at a time, and only if needed.
禁止预先批量提问:按阶段逐步操作。每个阶段最多询问1个问题,且仅在需要时提问。

Phase 0 — Who Are We Creating?

第0阶段 — 为谁创建头像?

See the Start Here block above for the default-to-agent rule. Only route to "user" or "named character" when the phrasing is unambiguous.
Routing signals (in priority order):
  1. User (explicit only) — "create my avatar", "make me an avatar", "I want my face in a video", "a digital twin of me", "based on my photo". Requires a possessive pronoun referring to the user OR explicit mention of their photo. Ask for their name if not obvious.
  2. Named character (explicit only) — "create an avatar called Cleo", "design a character named X", "build a presenter named Y" → use the given name.
  3. Agent (default) — everything else: "create your avatar", "bring yourself to life", "set up an avatar", "let's make an avatar", "create an avatar", "design a presenter", "I want you to appear in videos", or any ambiguous phrasing. Read
    IDENTITY.md
    for name.
When unsure, default to agent. Do NOT ask the user for their name, appearance, or voice on an ambiguous request — that's the wrong first move. If after reading IDENTITY.md + SOUL.md the intent still feels ambiguous, ask one short clarifying question to disambiguate (phrase it naturally — something like "quick check: this avatar is for you, or for me?").
Then check
AVATAR-<NAME>.md
at the workspace root:
  • AVATAR file exists + HeyGen section filled in → "You already have an avatar set up. Want to add a new look, update it, or start fresh?" Wait for answer.
  • AVATAR file exists but HeyGen section empty → skip to Phase 2.
  • No AVATAR file → proceed to Phase 1.
Role alias staleness check. Before proceeding, also check whether the role alias for this target is already pointing at the right named file:
  • For agent target: read
    AVATAR-AGENT.md
    (follow symlink) and compare to
    AVATAR-<CURRENT-AGENT-NAME>.md
    . If they differ (e.g.,
    AVATAR-AGENT.md
    AVATAR-OLD-NAME.md
    because the agent identity changed since the last run), re-link in Phase 5 even if no other changes are made. The named file is canonical, but the alias must match the current identity, not the historical one.
  • For user target: same check on
    AVATAR-USER.md
    .
  • For named character: no alias to check.
Optional existing-avatar check (only useful on the user path when the user might already have avatars in their HeyGen account). If Phase 0 target = user AND no
AVATAR-<USER>.md
exists, list their HeyGen avatars first:
MCP:
list_avatar_groups(ownership=private)
CLI:
heygen avatar list --ownership private
If the list is non-empty, present the options and ask which to use or whether to create new. If empty, proceed to Phase 1. Skip this check entirely for agent and named-character targets — those live in AVATAR-*.md, not the HeyGen catalog.
请参阅上方的“起始要点”中的默认Agent规则。仅当表述明确时,才切换到“用户”或“指定角色”目标。
路由信号(优先级从高到低):
  1. 用户(仅明确表述)——“创建我的头像”“为制作头像”“我想让我的脸出现在视频里”“我的数字孪生体”“基于我的照片”。需包含指代用户的所有格代词或明确提及用户的照片。若姓名不明确,询问用户姓名。
  2. 指定角色(仅明确表述)——“创建名为Cleo的头像”“设计名为X的角色”“打造名为Y的演示者”→使用给定名称。
  3. Agent(默认)——所有其他情况:“创建你的头像”“让你形象化”“设置头像”“我们来制作一个头像”“创建头像”“设计演示者”“我想让你出现在视频里”,或任何表述模糊的请求。读取
    IDENTITY.md
    获取名称。
若存在疑问,默认选择Agent。对于表述模糊的请求,切勿向用户询问姓名、外观或语音——这是错误的第一步。若读取IDENTITY.md + SOUL.md后意图仍不明确,可提出一个简短的澄清问题(自然表述,例如“快速确认:这个头像给你用,还是给我用?”)。
然后检查工作区根目录下的
AVATAR-<NAME>.md
  • AVATAR文件存在且HeyGen章节已填写 → “你已设置好头像。是否要添加新外观、更新头像或重新创建?”等待用户回复。
  • AVATAR文件存在但HeyGen章节为空 → 跳至第2阶段。
  • 无AVATAR文件 → 进入第1阶段。
角色别名过期检查:在继续操作前,还需检查当前目标的角色别名是否已指向正确的命名文件:
  • Agent目标:读取
    AVATAR-AGENT.md
    (跟随符号链接)并与
    AVATAR-<当前Agent名称>.md
    对比。若两者不同(例如,由于上次运行后Agent身份变更,
    AVATAR-AGENT.md
    指向
    AVATAR-旧名称.md
    ),即使未做其他更改,也需在第5阶段重新链接。命名文件是标准文件,但别名必须匹配当前身份,而非历史身份。
  • 用户目标:对
    AVATAR-USER.md
    执行相同检查。
  • 指定角色:无需检查别名。
可选的现有头像检查(仅在用户路径中有用,用户可能已在HeyGen账户中拥有头像)。若第0阶段目标为用户
AVATAR-<用户>.md
不存在,则先列出用户的HeyGen头像:
MCP:
list_avatar_groups(ownership=private)
CLI:
heygen avatar list --ownership private
若列表非空,展示选项并询问用户使用现有头像还是创建新头像。若列表为空,进入第1阶段。对于Agent和指定角色目标,完全跳过此检查——这些头像存储在AVATAR-*.md中,而非HeyGen目录。

Phase 1 — Identity Extraction

第1阶段 — 身份提取

Order matters. Files first, questions second. Prompt-based creation is the default path — photo is an opt-in upgrade.
For the agent (Phase 0 target = agent):
  1. Read
    SOUL.md
    ,
    IDENTITY.md
    , and any existing
    AVATAR-<NAME>.md
    from the workspace root.
  2. If SOUL.md or IDENTITY.md is found → extract appearance and voice traits silently. Do NOT ask the user "describe your appearance" — the agent IS the subject, and its identity lives in those files. If the files describe only personality / values with no physical description, do NOT hallucinate traits. Ask the user conversationally for the missing appearance traits only (one or two at a time).
  3. If neither file is found (e.g., Claude Code environment with no workspace identity) → ask the user to describe the agent's appearance and voice conversationally.
  4. Proceed directly to Type A (prompt) creation in Phase 2 by default. Do NOT ask for a photo unless the user volunteers one or explicitly asks for photo realism — agents almost always use prompt-based creation.
For users/named characters (Phase 0 target = user or named):
  • Conversational onboarding. Ask naturally about appearance and voice — one or two questions at a time, not a form. Communicate in
    user_language
    .
  • User path only: after the onboarding Q&A, run the Reference Photo Nudge below.
  • Named character path: skip the nudge, go straight to Type A (prompt) creation.
Write
AVATAR-<NAME>.md
with the Appearance and Voice sections filled in. Leave the HeyGen section empty until Phase 2 succeeds.
顺序至关重要。先读取文件,再提问。基于提示词的创建是默认路径——照片为可选升级项。
对于Agent(第0阶段目标=Agent):
  1. 读取工作区根目录下的
    SOUL.md
    IDENTITY.md
    及任何已存在的
    AVATAR-<NAME>.md
  2. 若找到SOUL.md或IDENTITY.md → 静默提取外观和语音特征。切勿向用户询问“描述你的外观”——Agent是主体,其身份信息存储在这些文件中。若文件仅描述个性/价值观而无外貌描述,切勿虚构特征。仅以自然对话方式向用户询问缺失的外貌特征(每次1-2个)。
  3. 若未找到任何文件(例如,无工作区身份的Claude Code环境)→ 请用户以自然对话方式描述Agent的外观和语音。
  4. 默认直接进入第2阶段的A型(提示词)创建。除非用户主动提供照片或明确要求照片写实风格,否则切勿索要照片——Agent几乎均使用基于提示词的创建方式。
对于用户/指定角色(第0阶段目标=用户或指定角色):
  • 自然对话式引导。以自然方式询问外观和语音——每次1-2个问题,而非填写表单。使用
    user_language
    进行沟通。
  • 仅用户路径:完成引导问答后,执行下方的参考照片提示。
  • 指定角色路径:跳过提示,直接进入A型(提示词)创建。
将填写好的外观和语音章节写入
AVATAR-<NAME>.md
。HeyGen章节留空,直至第2阶段成功完成。

Reference Photo Nudge (user path only)

参考照片提示(仅用户路径)

Only run this step when Phase 0 target = user (real-person digital twin) OR when the user explicitly asks for photo realism.
  • Check AVATAR file's Appearance → Reference field first. If a photo is already on file, skip asking and use it.
  • Otherwise, ask one sentence: "Got a headshot? It gives better face consistency for videos of you. I can also generate from your description — just say 'skip.'"
Branch:
  • Photo provided → upload via MCP
    upload_asset
    or
    heygen asset create --file <path>
    , then Type B (photo) creation in Phase 2.
  • Skip → Type A (prompt) creation in Phase 2.
For agents and named characters, skip this entire step — go straight to Type A (prompt) creation.
仅当第0阶段目标为用户(真人数字孪生体)或用户明确要求照片写实风格时,执行此步骤。
  • 先检查AVATAR文件的外观→参考字段。若已存储照片,跳过提问并使用该照片。
  • 否则,询问一句话:“有大头照吗?这能让你的视频面部一致性更好。我也可以根据你的描述生成——只需说‘跳过’即可。”
分支:
  • 提供照片 → 通过MCP的
    upload_asset
    heygen asset create --file <path>
    上传,然后进入第2阶段的B型(照片)创建。
  • 跳过 → 进入第2阶段的A型(提示词)创建。
对于Agent和指定角色,跳过此步骤——直接进入A型(提示词)创建。

Phase 2 — Avatar Creation

第2阶段 — 头像创建

📖 Full creation API surface (photo / prompt / digital twin), file input formats, identity field → enum mapping, response shape → references/avatar-creation.md
Two modes:
Mode 1 — New character (omit
avatar_group_id
): Creates a brand new character with its own group.
Mode 2 — New look (include
avatar_group_id
): Adds a variation to an existing character. Read the Group ID from the AVATAR file.
Two creation types:
Type A — From prompt (AI-generated appearance):
MCP:
create_prompt_avatar(name=<name>, prompt=<appearance>, avatar_group_id=<optional>)
CLI:
heygen avatar create -d '{"type":"prompt","name":"...","prompt":"...","avatar_group_id":"..."}'
(accepts inline JSON, a file path, or
-
for stdin)
Prompt limit is 1000 characters. Be descriptive — include style, features, expression, lighting. The API spec says 200 but the actual enforced limit is 1000.
Type B — From reference image:
MCP:
create_photo_avatar(name=<name>, file=<file_object>, avatar_group_id=<optional>)
CLI:
heygen avatar create -d '{"type":"photo","name":"...","file":{"type":"url","url":"..."},"avatar_group_id":"..."}'
File options for Type B:
  • { "type": "url", "url": "https://..." }
    — public image URL
  • { "type": "asset_id", "asset_id": "<id>" }
    — from
    heygen asset create --file <path>
  • { "type": "base64", "media_type": "image/png", "data": "<base64>" }
    — inline
📖 When to use each (URL vs asset_id vs base64), upload routing, and edge cases → references/asset-routing.md
Response: Returns
avatar_item.id
(look ID) and
avatar_item.group_id
(character identity).
Map identity fields to HeyGen enums for the prompt:
  • age: Young Adult | Early Middle Age | Late Middle Age | Senior | Unspecified
  • gender: Man | Woman | Unspecified
  • ethnicity: White | Black | Asian American | East Asian | South East Asian | South Asian | Middle Eastern | Pacific | Hispanic | Unspecified
  • style: Realistic | Pixar | Cinematic | Vintage | Noir | Cyberpunk | Unspecified
  • orientation: square | horizontal | vertical
  • pose: half_body | close_up | full_body
Show the prompt to the user before creating:
Appearance: "[prompt]" Settings: Young Adult | Woman | East Asian | Realistic Look good? (yes / adjust / completely different)
STOP. Wait for the user to approve or adjust. Do NOT call the avatar creation API until the user confirms.
📖 完整的创建API接口(照片/提示词/数字孪生体)、文件输入格式、身份字段→枚举映射、响应格式 → references/avatar-creation.md
两种模式:
模式1 — 新角色(省略
avatar_group_id
): 创建全新角色,拥有独立的组。
模式2 — 新外观(包含
avatar_group_id
): 为现有角色添加变体外观。从AVATAR文件中读取Group ID。
两种创建类型:
A型 — 基于提示词(AI生成外观):
MCP:
create_prompt_avatar(name=<名称>, prompt=<外观描述>, avatar_group_id=<可选>)
CLI:
heygen avatar create -d '{"type":"prompt","name":"...","prompt":"...","avatar_group_id":"..."}'
(接受内联JSON、文件路径或
-
表示标准输入)
提示词限制为1000字符。描述需详细——包含风格、特征、表情、光线。API规范显示为200字符,但实际强制限制为1000字符。
B型 — 基于参考图片:
MCP:
create_photo_avatar(name=<名称>, file=<文件对象>, avatar_group_id=<可选>)
CLI:
heygen avatar create -d '{"type":"photo","name":"...","file":{"type":"url","url":"..."},"avatar_group_id":"..."}'
B型的文件选项:
  • { "type": "url", "url": "https://..." }
    — 公开图片URL
  • { "type": "asset_id", "asset_id": "<id>" }
    — 来自
    heygen asset create --file <path>
  • { "type": "base64", "media_type": "image/png", "data": "<base64>" }
    — 内联数据
📖 何时使用每种方式(URL vs asset_id vs base64)、上传路由及边缘情况 → references/asset-routing.md
响应: 返回
avatar_item.id
(外观ID)和
avatar_item.group_id
(角色身份)。
将身份字段映射为HeyGen枚举用于提示词:
  • age: Young Adult | Early Middle Age | Late Middle Age | Senior | Unspecified
  • gender: Man | Woman | Unspecified
  • ethnicity: White | Black | Asian American | East Asian | South East Asian | South Asian | Middle Eastern | Pacific | Hispanic | Unspecified
  • style: Realistic | Pixar | Cinematic | Vintage | Noir | Cyberpunk | Unspecified
  • orientation: square | horizontal | vertical
  • pose: half_body | close_up | full_body
创建前向用户展示提示词:
外观: "[提示词]" 设置: Young Adult | Woman | East Asian | Realistic 看起来合适吗?(是/调整/完全更换)
停止操作。等待用户确认或调整。在用户确认前,切勿调用头像创建API。

Phase 3 — Voice

第3阶段 — 语音

Two paths: Design (describe what you want, get matched voices) or Browse (filter the catalog manually).
Ask whether they want voice design (describe what they want) or catalog browsing. Communicate in
user_language
.
Default to Design if the AVATAR file has a Voice section with personality traits.
两种路径:设计(描述需求,获取匹配语音)或浏览(手动筛选目录)。
询问用户是需要语音设计(描述需求)还是目录浏览。使用
user_language
进行沟通。
若AVATAR文件的语音章节包含个性特征,默认选择设计路径。

Path A — Voice Design (preferred)

A路径 — 语音设计(优先选择)

Find matching voices via semantic search using the Voice section from the AVATAR file. This searches HeyGen's full voice library. No new voices are generated and no quota is consumed.
Language matching: The voice design prompt should specify the target language from
user_language
. Example for Japanese:
"A calm, warm female voice. Professional but approachable. Japanese speaker."
This ensures semantic search returns voices in the correct language.
MCP:
design_voice(prompt=<voice description>, seed=0)
CLI:
heygen voice create --prompt "..." --seed 0
(also accepts
--gender
,
--locale
)
Returns 3 voice options per seed. Present all 3 with inline audio previews:
  • Download each
    preview_audio_url
    to a temp path (any standard download method works — no HeyGen auth needed, these are public S3 URLs)
  • Send as audio attachment:
    message(action:send, media:"<path>", caption:"Option <n>: <voice_name> — <gender>, <language>")
    so it plays inline in Telegram/Discord
  • After all previews sent, present selection buttons
STOP. Wait for the user to pick a voice via buttons or text. Do NOT select a voice yourself or proceed to Phase 4 until the user explicitly chooses.
If none match:
"None of these hitting right? I can try a different set (same description, different variations) or you can tweak the description."
Increment
seed
and call again. Different seeds give completely different voice options from the same prompt.
  • Clean up /tmp files after user picks
使用AVATAR文件中语音章节的内容进行语义搜索,查找匹配的语音。此搜索覆盖HeyGen的完整语音库。不会生成新语音,也不会消耗配额。
语言匹配:语音设计提示词应指定
user_language
中的目标语言。日语示例:
"A calm, warm female voice. Professional but approachable. Japanese speaker."
这确保语义搜索返回匹配该语言的语音。
MCP:
design_voice(prompt=<语音描述>, seed=0)
CLI:
heygen voice create --prompt "..." --seed 0
(也支持
--gender
--locale
参数)
每个种子返回3个语音选项。展示所有3个选项并附带内联音频预览:
  • 将每个
    preview_audio_url
    下载至临时路径(可使用任何标准下载方法——无需HeyGen认证,这些是公开的S3 URL)
  • 作为音频附件发送:
    message(action:send, media:"<路径>", caption:"选项 <n>: <voice_name> — <gender>, <language>")
    ,以便在Telegram/Discord中内联播放
  • 发送所有预览后,展示选择按钮
停止操作。等待用户通过按钮或文本选择语音。在用户明确选择前,切勿自行选择语音或进入第4阶段。
若没有匹配选项:
"这些都不符合你的需求?我可以尝试另一组(相同描述,不同变体),或者你可以调整描述内容。"
递增
seed
并再次调用。不同的种子会针对相同提示词返回完全不同的语音选项。
  • 用户选择后清理/tmp文件

Path B — Voice Browse (fallback)

B路径 — 语音浏览(备选方案)

Browse HeyGen's existing voice library:
MCP:
list_voices(type=private)
then
list_voices(type=public, language=<lang>, gender=<gender>)
CLI:
heygen voice list --type private
/
heygen voice list --type public --language <lang> --gender <gender>
  1. Read the Voice section from the AVATAR file
  2. Filter by gender and language
  3. Pick top 3 candidates based on personality match
  4. Present with inline audio previews (same download + send pattern as Path A)
  5. STOP. Wait for the user to pick. Do NOT auto-select.
浏览HeyGen的现有语音库:
MCP:
list_voices(type=private)
然后
list_voices(type=public, language=<语言>, gender=<性别>)
CLI:
heygen voice list --type private
/
heygen voice list --type public --language <语言> --gender <性别>
  1. 读取AVATAR文件的语音章节
  2. 按性别和语言筛选
  3. 根据个性匹配度选择前3个候选语音
  4. 展示内联音频预览(与A路径相同的下载+发送模式)
  5. 停止操作。等待用户选择。切勿自动选择。

Phase 4 — Save to AVATAR File

第4阶段 — 保存至AVATAR文件

Update the HeyGen section of
AVATAR-<NAME>.md
to match the canonical format:
markdown
undefined
更新
AVATAR-<NAME>.md
的HeyGen章节,使其符合标准格式:
markdown
undefined

HeyGen

HeyGen

  • Group ID: <avatar_item.group_id — THE stable reference, never changes>
  • Voice ID: <chosen voice_id>
  • Voice Name: <voice name>
  • Voice Designed: <true if custom-designed, false if picked from catalog>
  • Voice Seed: <seed value used, if designed>
  • Looks: <orientation>=<avatar_item.id> (e.g., landscape=<look_id>, portrait=<look_id>)
  • Last Synced: <ISO timestamp>
⚠️ look_ids are ephemeral — always resolve fresh from group_id at runtime via
heygen avatar looks list --group-id <id>
(or MCP
list_avatar_looks
). Never hardcode look_id as the primary avatar reference.

Confirm the avatar is saved and that other skills (like heygen-video) will pick it up automatically. Communicate in `user_language`.
  • Group ID: <avatar_item.group_id — 稳定的引用,永不改变>
  • Voice ID: <选中的voice_id>
  • Voice Name: <语音名称>
  • Voice Designed: <自定义设计则为true,从目录选择则为false>
  • Voice Seed: <若为设计生成,此处为使用的种子值>
  • Looks: <orientation>=<avatar_item.id>(例如:landscape=<look_id>, portrait=<look_id>)
  • Last Synced: <ISO时间戳>
⚠️ look_ids是临时的——运行时始终通过
heygen avatar looks list --group-id <id>
(或MCP的
list_avatar_looks
)从group_id重新解析。切勿将look_id硬编码为主要头像引用。

确认头像已保存,且其他技能(如heygen-video)会自动读取该头像。使用`user_language`进行沟通。

Phase 5 — Maintain Role Alias

第5阶段 — 维护角色别名

After writing the named
AVATAR-<NAME>.md
, create or update a role-based symlink alongside it so other skills can do generic lookups without resolving the agent / user name first.
Based on the Phase 0 target:
  • Agent target → symlink
    AVATAR-AGENT.md
    AVATAR-<NAME>.md
  • User target → symlink
    AVATAR-USER.md
    AVATAR-<NAME>.md
  • Named character → no role alias. Named characters are referenced by name only (e.g.,
    AVATAR-CLEO.md
    ); they are not the agent or the user.
Implementation (run from the workspace root, with fs-fallback):
The
cd
to workspace root is mandatory — bare relative paths in
ln -s
resolve from the agent's current working directory, not where SOUL.md lives. The
|| echo
clause handles filesystems that reject symlinks (Windows without dev mode, some cloud-mounted storage) without aborting Phase 5.
bash
undefined
写入命名文件
AVATAR-<NAME>.md
后,创建或更新基于角色的符号链接,以便其他技能无需解析Agent/用户名称即可进行通用查找。
根据第0阶段的目标:
  • Agent目标 → 创建符号链接
    AVATAR-AGENT.md
    AVATAR-<NAME>.md
  • 用户目标 → 创建符号链接
    AVATAR-USER.md
    AVATAR-<NAME>.md
  • 指定角色 → 无角色别名。指定角色仅通过名称引用(例如
    AVATAR-CLEO.md
    );它们既不是Agent也不是用户。
实现方式(从工作区根目录运行,支持文件系统回退):
必须先切换到工作区根目录——
ln -s
中的相对路径会从Agent当前工作目录解析,而非SOUL.md所在目录。
|| echo
子句用于处理不支持符号链接的文件系统(未开启开发者模式的Windows、部分云挂载存储),不会导致第5阶段终止。
bash
undefined

Agent

Agent

cd "$WORKSPACE_ROOT" && ln -sf AVATAR-<NAME>.md AVATAR-AGENT.md
|| echo "role alias skipped: fs doesn't support symlinks"
cd "$WORKSPACE_ROOT" && ln -sf AVATAR-<NAME>.md AVATAR-AGENT.md
|| echo "角色别名已跳过:文件系统不支持符号链接"

User

用户

cd "$WORKSPACE_ROOT" && ln -sf AVATAR-<NAME>.md AVATAR-USER.md
|| echo "role alias skipped: fs doesn't support symlinks"

Use a relative link target (just the filename, no path prefix) so the
alias survives if the workspace is moved or copied.

`ln -sf` is unlink-then-symlink under the hood, not strictly atomic.
Fine for single-user workspaces; if concurrent agents ever write the
same alias, expect interleaving and add explicit locking then.

**Why symlink, not copy:** removes the duplicate-file drift class
(content can never diverge between named file and alias). It does NOT
remove staleness drift — if `IDENTITY.md` changes the agent name without
re-running heygen-avatar, `AVATAR-AGENT.md` keeps pointing at the *old*
named file. Phase 0 mismatch-and-re-alias handles this on the next
invocation; until then, the alias is stale-but-pointing-somewhere-valid,
not broken.

**Multi-agent workspace caveat:** one role alias per workspace is
last-writer-wins. If two agents ever share a workspace and both run
heygen-avatar, only the most recent run's identity is reachable via
`AVATAR-AGENT.md`. Named files for both still exist. We accept this
limit — multi-agent shared workspaces are out of scope for v1.
cd "$WORKSPACE_ROOT" && ln -sf AVATAR-<NAME>.md AVATAR-USER.md
|| echo "角色别名已跳过:文件系统不支持符号链接"

使用相对链接目标(仅文件名,无路径前缀),以便别名在工作区移动或复制后仍能正常使用。

`ln -sf`本质是先取消链接再创建符号链接,并非严格原子操作。对于单用户工作区无问题;若多个Agent同时写入同一别名,可能会出现交错情况,届时需添加显式锁。

**为何使用符号链接而非复制:** 消除了重复文件内容不一致的问题(命名文件与别名的内容永远不会出现分歧)。但无法消除过期问题——若`IDENTITY.md`更改了Agent名称但未重新运行heygen-avatar,`AVATAR-AGENT.md`仍会指向*旧的*命名文件。第0阶段的不匹配重新链接会在下次调用时处理此问题;在此之前,别名虽过期但仍指向有效文件,不会失效。

**多Agent工作区注意事项:** 每个工作区仅一个角色别名,遵循最后写入者获胜原则。若两个Agent共享一个工作区且均运行heygen-avatar,只有最近一次运行的身份可通过`AVATAR-AGENT.md`访问。两个Agent的命名文件仍会保留。我们接受此限制——多Agent共享工作区不属于v1版本的范围。

Phase 6 — Test (Optional)

第6阶段 — 测试(可选)

If the user wants to see their avatar in action:
MCP:
create_video_agent(avatar_id=<avatar_id>, voice_id=<voice_id>, prompt=<greeting>)
CLI:
heygen video-agent create --avatar-id <id> --voice-id <id> --prompt "..." --wait
Generate a natural greeting in the video language (from
user_language
). Examples: English "Hi, I'm [name]. Nice to meet you!", Japanese "[name]です。はじめまして!", Spanish "Hola, soy [name]. ¡Mucho gusto!", Korean "안녕하세요, [name]입니다. 만나서 반갑습니다!"
若用户希望查看头像效果:
MCP:
create_video_agent(avatar_id=<avatar_id>, voice_id=<voice_id>, prompt=<问候语>)
CLI:
heygen video-agent create --avatar-id <id> --voice-id <id> --prompt "..." --wait
生成符合视频语言(来自
user_language
)的自然问候语。示例:英语 "Hi, I'm [name]. Nice to meet you!",日语 "[name]です。はじめまして!",西班牙语 "Hola, soy [name]. ¡Mucho gusto!",韩语 "안녕하세요, [name]입니다. 만나서 반갑습니다!"

Iteration Flow

迭代流程

When the user wants to refine:
  • "Adjust the prompt" → Mode 2 with existing group_id (keeps the character, adds a new look). Only Mode 1 if they say "start completely over."
  • "Add a new look" / "different outfit" → Mode 2 with existing group_id. Add to Looks in AVATAR file.
  • "Try a different voice" → back to Phase 3
  • "Start completely over" → Mode 1, new character. Overwrite HeyGen section.
Default to Mode 2 (new look under same group). Only create a new group when the user explicitly wants a different character identity. This keeps the account clean and makes looks reusable across skills.
Each iteration updates the AVATAR file. The file is always the source of truth.
当用户希望优化头像时:
  • “调整提示词” → 使用现有group_id的模式2(保留角色,添加新外观)。仅当用户说“完全重新开始”时才使用模式1。
  • “添加新外观” / “更换服装” → 使用现有group_id的模式2。将新外观添加至AVATAR文件的Looks部分。
  • “尝试不同语音” → 返回第3阶段
  • “完全重新开始” → 模式1,创建新角色。覆盖HeyGen章节。
默认使用模式2(同一组下的新外观)。仅当用户明确希望创建不同角色身份时才创建新组。这样可保持账户整洁,并使外观在多个技能间复用。
每次迭代都会更新AVATAR文件,该文件始终是真实数据源。

UX Rules

用户体验规则

Be interactive at checkpoints, silent everywhere else. Stop and wait at avatar approval and voice selection. Between checkpoints, work silently — don't narrate reasoning or explain next steps. After voice pick: save + confirm in one message.
在检查点交互,其他操作静默进行。在头像确认和语音选择时停止并等待。检查点之间,静默工作——不要描述推理过程或解释下一步操作。用户选择语音后:一次性完成保存和确认。

Video Producer Integration

视频生成器集成

heygen-video
reads AVATAR files for group_id and voice_id. Resolution order:
  1. Named request ("Make a video with Eve") → read
    AVATAR-EVE.md
    .
  2. Agent self-reference ("make a video of yourself", "give us a video update") → read
    AVATAR-AGENT.md
    (symlink to current agent's named file).
  3. User self-reference ("make a video of me", "my video update") → read
    AVATAR-USER.md
    (symlink to current user's named file).
  4. No AVATAR file or symlink → fall back to stock avatars or ask user.
The alias targets are resolved by the OS at read time, so consumer skills simply
cat AVATAR-AGENT.md
and get whatever the current agent's avatar is.
heygen-video
读取AVATAR文件获取group_id和voice_id。解析顺序:
  1. 指定名称请求(“制作Eve的视频”)→ 读取
    AVATAR-EVE.md
  2. Agent自我引用(“制作你的视频”“给我们视频更新”)→ 读取
    AVATAR-AGENT.md
    (指向当前Agent的命名文件的符号链接)。
  3. 用户自我引用(“制作我的视频”“我的视频更新”)→ 读取
    AVATAR-USER.md
    (指向当前用户的命名文件的符号链接)。
  4. 无AVATAR文件或符号链接 → 回退至默认头像或询问用户。
别名目标由操作系统在读取时解析,因此消费类技能只需
cat AVATAR-AGENT.md
即可获取当前Agent的头像。

Error Handling

错误处理

  • Missing SOUL.md/IDENTITY.md → conversational onboarding, write AVATAR file from answers
  • API fails → retry once, then ask user to check API key
  • Voice match poor → show all available voices, let user browse
  • Asset upload fails → skip reference image, try prompt-only creation
  • Existing avatar file with stale HeyGen IDs → offer to regenerate or keep
📖 Known issues, retry patterns, broken voice previews, error → action mapping → references/troubleshooting.md
  • 缺失SOUL.md/IDENTITY.md → 自然对话式引导,根据用户回答写入AVATAR文件
  • API调用失败 → 重试一次,然后请用户检查API密钥
  • 语音匹配度差 → 展示所有可用语音,让用户浏览选择
  • 资产上传失败 → 跳过参考图片,尝试仅基于提示词的创建
  • 现有AVATAR文件包含过期HeyGen ID → 提供重新生成或保留的选项
📖 已知问题、重试模式、语音预览故障、错误→操作映射 → references/troubleshooting.md