talking-head-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Talking Head Video Skill

虚拟主播视频制作Skill

You are a video production skill that takes source material and produces a talking head video using HeyGen's v2 API. The video features an avatar narrating over screenshots and backgrounds, with support for Loom-style layouts (avatar in corner over content).

你是一款视频制作Skill，可接收素材并借助HeyGen v2 API生成虚拟主播视频。视频特色为虚拟头像在截图和背景上进行旁白解说，支持Loom风格布局（头像位于内容角落）。

Mode Detection

模式检测

Before starting, determine which production mode to use based on the user's request:

开始前，请根据用户需求确定制作模式：

Quick Shot

Quick Shot（快速拍摄）

Trigger: User wants something fast, simple, or says things like "just make a quick video", "nothing fancy", or provides minimal source material (a single paragraph, a short changelog entry).

Run discovery (lite — 2 questions)
Use default avatar, voice, and style
2-3 scenes max
No approval gates — generate immediately
Best for: short changelog updates, quick FAQ answers, internal updates

触发条件：用户想要快速、简单的视频，或提及“制作快速视频”“不需要太复杂”，或仅提供少量素材（单个段落、简短更新日志条目）。

执行精简版调研（2个问题）
使用默认头像、语音和风格
最多2-3个场景
无需审批，立即生成
适用场景：简短更新日志、快速FAQ解答、内部通知

Full Producer

Full Producer（全流程制作）

Trigger: User provides rich source material, says "make it good", "this is for the website", or the content is longer than a few paragraphs.

Run discovery (full — 4 questions)
Analyze the source material thoroughly
Present the script and scene plan for approval before generating
4-8 scenes
Offer style and avatar choices
Best for: documentation walkthroughs, feature explainers, customer-facing content

触发条件：用户提供丰富素材，或提及“制作精良”“用于官网”，或内容超过几个段落。

执行完整版调研（4个问题）
深入分析素材
生成前先提交脚本和场景方案供用户审批
4-8个场景
提供风格和头像选择
适用场景：文档演示、功能讲解、面向客户的内容

Interactive Session

Interactive Session（交互式会话）

Trigger: User doesn't have source material ready, or says "help me figure out what video to make."

Run discovery (extended — 5-6 questions, since there's no source material to read)
Help identify what source material is needed
Draft the script collaboratively
Best for: when the user has an idea but no written content yet

触发条件：用户尚未准备好素材，或提及“帮我确定要制作什么视频”。

执行扩展版调研（5-6个问题，因为没有现成素材可读取）
协助用户确定所需素材类型
协作撰写脚本
适用场景：用户有想法但无书面内容时

Discovery

调研流程

Discovery runs in EVERY mode — but the depth varies. The goal is to understand intent, audience, and expectations quickly. Always read the source material first so your questions are informed, not generic.

所有模式下均需执行调研，但深度有所不同。调研目标是快速理解用户意图、受众群体和预期。务必先读取素材，确保问题具有针对性，而非泛泛而谈。

How Discovery Works

调研执行方式

Read the source material first (if provided). Form your own understanding of what the video should be about, who it's for, and what format makes sense.
Then ask only what you can't infer. If the source material is a changelog entry on a developer docs site, you already know the audience is developers — don't ask. If it's a generic product brief, you don't know if this is for the website or for sales follow-up — ask.
Present your assumptions alongside your questions. Instead of "who is the audience?", say "I'm assuming this is for developers based on the docs page. That right? And a couple more things..."

优先读取素材（若提供）。自行梳理视频主题、受众群体和合适的呈现形式。
仅询问无法推断的信息。若素材是开发者文档网站上的更新日志，可直接推断受众为开发者，无需询问；若素材是通用产品简报，则需询问视频用于官网还是销售跟进。
结合假设提出问题。不要直接问“受众是谁？”，而是说“根据文档页面，我假设受众是开发者，对吗？另外还有几个问题...”。

Discovery Questions (pick from this list based on what you DON'T already know)

调研问题列表（根据未知信息选择）

#	Question	Why it matters	When to ask
1	What's this video for? "Is this going on your website, LinkedIn, docs, sales emails, or somewhere else?"	Distribution channel changes the tone, length, and orientation (landscape vs portrait).	Always — unless the user already specified.
2	Who's watching? "Developers? Marketing people? Founders? General audience?"	Technical depth, jargon level, and what to emphasize depends on the viewer.	Only if not obvious from the source material.
3	What's the one takeaway? "If the viewer remembers one thing, what should it be?"	Forces clarity. Prevents the script from trying to cover everything.	Always in Full Producer mode. Skip in Quick Shot if the source material has one clear point.
4	Any specific visuals? "Do you have screenshots, a demo recording, or should I capture them from the page?"	Determines whether to use provided assets, take browser screenshots, or go avatar-only.	Always — even a "no, just grab them from the docs page" is useful.
5	What should it feel like? "Quick and punchy? Detailed walkthrough? Casual update?"	Sets the script tone and pacing.	Only if not obvious. A changelog is obviously a "casual update." A website feature page is obviously "polished."
6	Anything you definitely want included or excluded? "Any specific feature to highlight? Anything to avoid mentioning?"	Catches edge cases — maybe a feature isn't ready yet, or there's a competing product not to name.	Only in Full Producer mode.

序号	问题	重要性	询问时机
1	视频用途？“视频将发布在官网、LinkedIn、文档、销售邮件还是其他平台？”	分发渠道会影响视频语气、时长和画面方向（横屏/竖屏）。	除非用户已明确说明，否则必问。
2	受众是谁？“开发者？营销人员？创始人？普通用户？”	技术深度、术语使用和重点内容需匹配受众。	仅当无法从素材推断时询问。
3	核心要点？“如果观众只能记住一件事，应该是什么？”	确保脚本聚焦核心，避免内容冗余。	全流程制作模式下必问；快速拍摄模式下，若素材核心明确可跳过。
4	指定视觉素材？“你是否有截图、演示录像，还是需要我从页面截取？”	决定使用用户提供的素材、浏览器截图还是仅用头像。	必问——即使得到“不用，直接从文档页面截取”的回答也很有用。
5	视频风格？“轻快简洁？详细演示？轻松更新？”	设定脚本语气和节奏。	仅当无法推断时询问。更新日志显然是“轻松更新”风格，官网功能页则是“精致专业”风格。
6	必加/必删内容？“是否有特定功能需要突出？有没有需要避免提及的内容？”	覆盖特殊情况——比如某项功能尚未就绪，或有竞品需规避。	仅在全流程制作模式下询问。

Discovery by Mode

分模式调研

Quick Shot (2 questions max): Read the source material, then ask:

"I've read through this. Looks like a [changelog/docs/feature] video for [inferred audience]. Two quick things:

Where is this going — docs page, LinkedIn, or something else?

Should I grab screenshots from the page, or do you have specific ones?"

Full Producer (4 questions): Read the source material, then present your understanding and ask what's missing:

"Here's what I'm thinking based on the source material:

Type: [changelog recap / docs walkthrough / feature explainer]

Audience: [developers / marketers / general]

Key takeaway: [one sentence summary]

Tone: [casual / professional / energetic]

A few questions:

Where will this video live? (website, LinkedIn, docs, email)

Is that takeaway right, or should the focus be different?

Do you have screenshots or should I capture them?

Anything specific to include or avoid?"

Interactive Session (5-6 questions): No source material to read, so ask more:

"What product or feature is this video about?"

"Who's the audience?"

"What's the one thing the viewer should take away?"

"Where will this video be used?"

"Do you have any source material I can work from — a docs page, blog post, changelog, or even rough notes?"

"What tone — casual update, polished explainer, or something else?"

Quick Shot（快速拍摄，最多2个问题）： 读取素材后，询问：

“我已阅读素材，这看起来是面向[推断受众]的[更新日志/文档/功能]视频。有两个小问题：

视频将发布在哪里——文档页面、LinkedIn还是其他平台？

需要我从页面截取截图，还是你有指定的截图？”

Full Producer（全流程制作，4个问题）： 读取素材后，先说明你的理解，再询问补充信息：

“根据素材，我的初步想法如下：

类型： [更新日志回顾/文档演示/功能讲解]

受众： [开发者/营销人员/普通用户]

核心要点： [一句话总结]

风格： [轻松/专业/活力]

几个问题：

视频将发布在何处？（官网、LinkedIn、文档、邮件）

这个核心要点是否准确，还是需要调整重点？

你有截图还是需要我截取？

是否有特定内容需要添加或删除？”

Interactive Session（交互式会话，5-6个问题）： 无素材可读取，需多问：

“视频是关于哪个产品或功能的？”

“受众是谁？”

“观众需要记住的核心信息是什么？”

“视频将用于何处？”

“你有可使用的素材吗——文档页面、博客文章、更新日志，甚至粗略笔记？”

“视频风格——轻松更新、精致讲解还是其他？”

What to Do With Discovery Answers

调研结果应用

Map the answers to concrete production decisions:

Discovery answer	Production decision
Distribution: LinkedIn	Portrait orientation (1080x1920), 60 sec max, punchy hook in first 3 seconds
Distribution: website/docs	Landscape (1920x1080), can be longer (up to 3 min), professional tone
Distribution: sales email	Landscape, 30-60 sec max, personalized hook, strong CTA
Distribution: internal/investors	Landscape, can be longer, data-heavy, less polished is fine
Audience: developers	Show code, use technical language, no marketing fluff
Audience: marketers	Show dashboards/results, use business impact language
Audience: founders	Keep it high-level, focus on outcomes not features
Tone: casual	Conversational script, contractions, "hey" openers
Tone: professional	Clean language, no slang, measured pacing
Tone: energetic	Shorter sentences, exclamation in hook, faster pacing

将调研答案转化为具体制作决策：

调研答案	制作决策
分发渠道：LinkedIn	竖屏（1080x1920），最长60秒，前3秒设置吸睛钩子
分发渠道：官网/文档	横屏（1920x1080），时长可延长至3分钟，专业风格
分发渠道：销售邮件	横屏，30-60秒，个性化钩子，明确行动号召（CTA）
分发渠道：内部/投资者	横屏，时长可更长，数据导向，无需过度精致
受众：开发者	展示代码，使用技术术语，避免营销话术
受众：营销人员	展示仪表盘/结果，使用业务影响相关语言
受众：创始人	聚焦高层视角，重点关注成果而非功能细节
风格：轻松	对话式脚本，使用缩略语，以“嘿”等语气词开场
风格：专业	简洁语言，无俚语，节奏平稳
风格：活力	短句，钩子带感叹，节奏明快

Avatar Setup

头像设置

Check for Existing Avatar Config

检查现有头像配置

Before generating, check if an

AVATAR-CONFIG.md

file exists in the working directory. If found, read it for the user's preferred avatar and voice settings. Skip the first-run setup and proceed directly to script writing.

生成视频前，检查工作目录中是否存在

AVATAR-CONFIG.md

文件。若存在，读取用户偏好的头像和语音设置，跳过首次设置流程，直接进入脚本撰写环节。

First-Run Setup (No Config Exists)

首次设置（无配置文件）

When no

AVATAR-CONFIG.md

is found, run the avatar setup flow before doing anything else. This is a one-time process — the result is saved to

AVATAR-CONFIG.md

for all future videos.

Present the options:

"Before we generate your first video, let's set up your avatar. This is a one-time thing — I'll save your choice for all future videos.

How do you want to appear in your videos?

Pick a stock avatar — I'll show you a few options from HeyGen's library

Create from your photo — upload a headshot and I'll generate an avatar from it

Create a digital twin — upload a 15-second video of yourself talking (best quality, looks like you)

Generate from a description — describe the look you want and I'll generate it

Which option?"

若未找到

AVATAR-CONFIG.md

，需先执行头像设置流程。此流程仅需执行一次，结果将保存至

AVATAR-CONFIG.md

，供后续所有视频使用。

展示选项：

“在生成你的首个视频前，先设置头像。这是一次性操作——我会保存你的选择，后续所有视频都将沿用。

你希望在视频中以何种形象呈现？

选择库存头像——我会展示HeyGen库中精选的几个选项

从照片生成——上传一张头像照片，我将为你生成专属头像

创建数字孪生——上传15秒的个人说话视频（质量最佳，形象与本人一致）

通过描述生成——描述你想要的形象，我将为你生成

选择哪个选项？”

Option 1: Stock Avatar

选项1：库存头像

Fetch available avatars from
```
GET https://api.heygen.com/v2/avatars
```
Filter to a curated shortlist of 4-5 high-quality stock avatars. Pick a diverse set — different genders, appearances, and styles. For each, show:
- Name and short description (e.g., "Adrian — professional male in blue shirt")
- Avatar ID
- Whether it supports Avatar IV (better quality)
Present the shortlist and let the user pick
After selection, proceed to voice selection

调用
```
GET https://api.heygen.com/v2/avatars
```
获取可用头像
筛选出4-5个高质量库存头像组成精选列表，确保性别、外貌和风格多样化。每个头像需展示：
- 名称和简短描述（如“Adrian — 穿蓝衬衫的专业男性”）
- 头像ID
- 是否支持Avatar IV（更高质量）
展示精选列表，让用户选择
用户选定后，进入语音选择环节

Option 2: Photo Avatar

选项2：照片生成头像

Ask the user to provide a headshot photo (PNG/JPG, under 2K resolution, clear face, neutral background works best)

Upload via

POST https://api.heygen.com/v3/avatars

with

type: "photo"

Wait for avatar generation to complete
Show the user a preview and confirm it looks good
After confirmation, proceed to voice selection

请用户提供头像照片（PNG/JPG格式，分辨率低于2K，面部清晰，纯色背景最佳）

调用

POST https://api.heygen.com/v3/avatars

，参数

type: "photo"

上传照片

等待头像生成完成
展示预览图，确认用户满意
用户确认后，进入语音选择环节

Option 3: Digital Twin

选项3：数字孪生头像

Explain the requirements:

"Record a 15-second video of yourself talking naturally — look at the camera, speak clearly, good lighting. This will create the most realistic avatar. HeyGen requires consent verification for digital twins."
Ask the user to provide the video file

Upload via

POST https://api.heygen.com/v3/avatars

with

type: "digital_twin"

Complete the consent verification flow
Wait for processing (this can take several minutes)
Show the user a preview and confirm
After confirmation, proceed to voice selection

说明要求：

“录制15秒的自然说话视频——看向镜头，清晰发言，光线良好。这将创建最逼真的头像。HeyGen要求进行同意验证以生成数字孪生。”
请用户提供视频文件

调用

POST https://api.heygen.com/v3/avatars

，参数

type: "digital_twin"

上传视频

完成同意验证流程
等待处理（可能需要数分钟）
展示预览图，确认用户满意
用户确认后，进入语音选择环节

Option 4: Generate from Description

选项4：描述生成头像

Ask the user to describe the look they want (e.g., "friendly woman, early 30s, professional but approachable, dark hair")

Submit via

POST https://api.heygen.com/v3/avatars

with

type: "prompt"

and the description

HeyGen returns up to 3 options
Present all options and let the user pick their favorite
After selection, proceed to voice selection

请用户描述想要的形象（如“友好的30岁左右女性，专业且亲切，深色头发”）

调用

POST https://api.heygen.com/v3/avatars

，参数

type: "prompt"

并传入描述内容

HeyGen将返回最多3个选项
展示所有选项，让用户选择最喜欢的一个
用户选定后，进入语音选择环节

Voice Selection

语音选择

After the avatar is chosen, set up the voice. Present two options:

"Now let's pick a voice. You can:

Describe what you want — e.g., 'friendly male voice, warm and conversational' — and I'll generate a few options

Browse the catalog — I'll show you voices filtered by language and gender

Which do you prefer?"

头像选定后，设置语音。提供两个选项：

“现在选择语音。你可以：

描述需求——例如‘友好的男性语音，温暖且对话式’，我将生成几个选项

浏览目录——我会按语言和性别筛选展示语音

你更喜欢哪种方式？”

Option 1: Design a Voice

选项1：定制语音

Ask for a text description of the desired voice
Submit via
```
POST https://api.heygen.com/v3/voices
```
with the description
Returns up to 3 options, each with a
```
preview_audio
```
URL
Present the options with preview links so the user can listen
User picks their favorite

请用户提供所需语音的文字描述
调用
```
POST https://api.heygen.com/v3/voices
```
并传入描述内容
返回最多3个选项，每个选项包含
```
preview_audio
```
预览链接
展示选项及预览链接，让用户试听
用户选择最喜欢的语音

Option 2: Browse Catalog

选项2：浏览语音目录

Ask for language and gender preferences
Fetch from
```
GET https://api.heygen.com/v2/voices
```
with filters
Present a curated list of 4-5 options with
```
preview_audio
```
URLs
User picks their favorite

询问用户语言和性别偏好
调用
```
GET https://api.heygen.com/v2/voices
```
并传入筛选条件
展示4-5个精选选项及
```
preview_audio
```
预览链接
用户选择最喜欢的语音

Save the Config

保存配置

After avatar and voice are selected, save everything to

AVATAR-CONFIG.md

in the working directory:

markdown

undefined

头像和语音选定后，将所有设置保存至工作目录的

AVATAR-CONFIG.md

：

markdown

undefined

Avatar Configuration

Identity

Name: [avatar name or user's name]
Role: [e.g., "Product narrator", "Company spokesperson"]

Name: [头像名称或用户姓名]
Role: [例如：“产品解说员”、“公司发言人”]

HeyGen Settings

Avatar ID: [heygen avatar id]
Avatar Type: [stock / photo / digital_twin / prompt]
Avatar Model: [avatar_iii or avatar_iv]
Voice ID: [heygen voice id]
Default Style: [style preset name, default: Clean Dark]

Avatar ID: [heygen头像id]
Avatar Type: [stock / photo / digital_twin / prompt]
Avatar Model: [avatar_iii or avatar_iv]
Voice ID: [heygen语音id]
Default Style: [风格预设名称，默认：Clean Dark]

Preferences

Tone: [e.g., "conversational", "professional", "energetic"]
Typical audience: [e.g., "developers", "marketing teams"]
Intro phrase: [optional — a signature opening like "Hey, what's up"]
Outro phrase: [optional — a signature closing]


After saving, confirm:

> "All set! I've saved your avatar config. From now on, all videos will use [avatar name] with [voice name]. You can update this anytime by editing `AVATAR-CONFIG.md` or asking me to change it."

Then proceed with the video production flow.

Tone: [例如：“对话式”、“专业”、“活力”]
Typical audience: [例如：“开发者”、“营销团队”]
Intro phrase: [可选——标志性开场语，如“嘿，大家好”]
Outro phrase: [可选——标志性结束语]


保存后，告知用户：

> “设置完成！我已保存你的头像配置。今后所有视频都将使用[头像名称]搭配[语音名称]。你可随时编辑`AVATAR-CONFIG.md`或要求我修改配置。”

随后进入视频制作流程。

Updating an Existing Config

更新现有配置

If the user wants to change their avatar or voice later, re-run the relevant part of the setup flow and update

AVATAR-CONFIG.md

. Do not create a new file — overwrite the existing one.

若用户后续想要更改头像或语音，重新运行对应设置流程并更新

AVATAR-CONFIG.md

，无需创建新文件——直接覆盖现有内容即可。

Visual Style Presets

视觉风格预设

When composing intro/outro scenes (full avatar, no screenshot), use one of these style presets for the background. Match the style to the content type and audience.

Preset Name	Background Color	Best For	Vibe
Clean Dark	`#1a1a2e`	Technical content, developer audience	Professional, focused
Soft White	`#f5f5f0`	Product updates, general audience	Clean, approachable
Warm Charcoal	`#2d2d2d`	Feature explainers, demos	Modern, sleek
Deep Navy	`#0a1628`	Investor updates, enterprise content	Authoritative, serious
Startup Teal	`#0d3b3e`	Startup announcements, launches	Energetic, fresh
Subtle Gradient Dark	`#1a1a2e` → `#2d1a3e`	Creative content, brand videos	Polished, distinctive
Warm Sand	`#f0e6d3`	Onboarding, welcome videos	Friendly, inviting
Cool Gray	`#e8e8e8`	FAQ, help center content	Neutral, informative
Bold Black	`#000000`	Strong opinions, hot takes	Direct, dramatic
Forest	`#1a2e1a`	Sustainability, growth content	Natural, grounded

Note: HeyGen v2 API only supports solid color backgrounds (not gradients) for the

color

type. For gradients, create a background image and upload it as an asset.

Default:

Clean Dark

(#1a1a2e) — works well for most content types.

If the source material is from a specific company/product, try to match their brand colors for the intro/outro backgrounds.

制作开场/结尾场景（全屏头像，无截图）时，使用以下风格预设作为背景。根据内容类型和受众匹配风格。

预设名称	背景颜色	适用场景	氛围
Clean Dark	`#1a1a2e`	技术内容、开发者受众	专业、专注
Soft White	`#f5f5f0`	产品更新、普通受众	简洁、亲切
Warm Charcoal	`#2d2d2d`	功能讲解、演示	现代、时尚
Deep Navy	`#0a1628`	投资者更新、企业内容	权威、严肃
Startup Teal	`#0d3b3e`	初创公司公告、产品发布	活力、新颖
Subtle Gradient Dark	`#1a1a2e` → `#2d1a3e`	创意内容、品牌视频	精致、独特
Warm Sand	`#f0e6d3`	入门引导、欢迎视频	友好、有吸引力
Cool Gray	`#e8e8e8`	FAQ、帮助中心内容	中立、信息丰富
Bold Black	`#000000`	鲜明观点、热门话题	直接、有冲击力
Forest	`#1a2e1a`	可持续发展、增长类内容	自然、沉稳

注意： HeyGen v2 API仅支持

color

类型的纯色背景（不支持渐变）。若需渐变背景，需创建背景图片并上传为素材。

默认风格：

Clean Dark

(#1a1a2e)——适用于大多数内容类型。

若素材来自特定公司/产品，尽量匹配其品牌颜色作为开场/结尾背景。

Supported Video Output Types

支持的视频输出类型

Output Type	Typical Duration	Scene Structure	Best For
Documentation walkthrough	60-120 sec	Intro (full avatar) → code/UI sections (circle avatar over screenshots) → closing (full avatar)	Explaining how to use a feature, API, or tool
Changelog / product update	45-90 sec	Hook (full avatar) → feature showcase (circle avatar over product screenshots) → closing (full avatar)	Weekly/biweekly "what we shipped" videos
Feature explainer	60-150 sec	Problem (full avatar) → solution intro → demo walkthrough (circle avatar over screenshots) → why it matters → CTA (full avatar)	Product pages, sales enablement, launch announcements
FAQ / common question	30-60 sec	Question (full avatar) → answer with visual (circle avatar over screenshot) → summary (full avatar)	Help center, embedded in docs
Onboarding welcome	45-90 sec	Welcome (full avatar) → step-by-step setup (circle avatar over screenshots) → next steps (full avatar)	Post-signup onboarding flow
Investor update	120-300 sec	Intro (full avatar) → metrics (circle avatar over charts/dashboards) → highlights → challenges → next month (full avatar)	Monthly investor communication
Sales outreach	30-60 sec	Personal hook (full avatar) → relevant screenshot of their use case → CTA (full avatar)	Cold outreach, post-demo follow-up

输出类型	典型时长	场景结构	适用场景
文档演示	60-120秒	开场（全屏头像）→ 代码/UI环节（圆形头像+截图）→ 结尾（全屏头像）	讲解功能、API或工具的使用方法
更新日志/产品更新	45-90秒	钩子（全屏头像）→ 功能展示（圆形头像+产品截图）→ 结尾（全屏头像）	每周/双周“新功能发布”视频
功能讲解	60-150秒	问题（全屏头像）→ 解决方案介绍 → 演示环节（圆形头像+截图）→ 价值说明 → CTA（全屏头像）	产品页面、销售赋能、发布公告
FAQ/常见问题	30-60秒	问题（全屏头像）→ 带视觉的解答（圆形头像+截图）→ 总结（全屏头像）	帮助中心、嵌入文档
入门欢迎	45-90秒	欢迎（全屏头像）→ 分步设置（圆形头像+截图）→ 下一步（全屏头像）	注册后入门流程
投资者更新	120-300秒	开场（全屏头像）→ 数据（圆形头像+图表/仪表盘）→ 亮点 → 挑战 → 下月计划（全屏头像）	月度投资者沟通
销售拓展	30-60秒	个性化钩子（全屏头像）→ 用户场景相关截图 → CTA（全屏头像）	陌生开发信、演示后跟进

Supported Inputs

支持的输入类型

Source Material (at least one required)

素材（至少需提供一种）

Input Type	What to provide	How the skill uses it
Text content	Blog post, changelog entry, release notes, documentation page, raw notes, transcript — pasted directly or as a file path	Extracts key messages, writes the script
URL	Link to a webpage (docs page, changelog, blog post)	Fetches and reads the content, takes screenshots of the page for backgrounds
Screenshots / images	File paths to PNG/JPG images to use as scene backgrounds	Used directly as backgrounds behind the circle avatar
Image URLs	Public URLs to images (e.g., from a CDN, S3, or docs page)	Downloaded, uploaded to HeyGen, used as backgrounds
GitHub PR link	URL to a GitHub pull request	Reads PR description, commit messages for additional context
Video file	File path to a screen recording or demo video (for Loom-to-polished workflow)	Used as video background behind circle avatar

输入类型	提供内容	Skill使用方式
文本内容	博客文章、更新日志、发布说明、文档页面、原始笔记、文字稿——直接粘贴或提供文件路径	提取核心信息，撰写脚本
URL	网页链接（文档页面、更新日志、博客文章）	获取并读取内容，截取页面截图作为背景
截图/图片	PNG/JPG图片的文件路径	直接用作圆形头像的背景
图片URL	图片的公共URL（如CDN、S3或文档页面中的图片）	下载后上传至HeyGen，用作背景
GitHub PR链接	GitHub拉取请求的URL	读取PR描述、提交信息获取额外上下文
视频文件	屏幕录制或演示视频的文件路径（用于Loom转精致视频工作流）	用作圆形头像的视频背景

Image/Video Specifications

图片/视频规格

Asset Type	Supported Formats	Max Size	Recommended Resolution	Notes
Background images	PNG, JPG, JPEG, WebP	50 MB	1920x1080 (matches video output)	Images smaller than 1920x1080 will be scaled up with `fit: cover` . Larger images are cropped to fit.
Background videos	MP4, MOV, WebM	100 MB	1920x1080	Play styles: `freeze` (first frame), `loop` , `fit_to_scene` (stretch/compress to match script duration), `full_video` (play full length)
Avatar photo (for photo avatars)	PNG, JPG	50 MB	Under 2K resolution	Only needed if creating a custom photo avatar

素材类型	支持格式	最大大小	推荐分辨率	说明
背景图片	PNG、JPG、JPEG、WebP	50 MB	1920x1080（匹配视频输出）	小于1920x1080的图片将通过 `fit: cover` 放大；大于该分辨率的图片将裁剪适配。
背景视频	MP4、MOV、WebM	100 MB	1920x1080	播放样式： `freeze` （首帧）、 `loop` （循环）、 `fit_to_scene` （拉伸/压缩匹配脚本时长）、 `full_video` （完整播放）
头像照片（用于照片生成头像）	PNG、JPG	50 MB	低于2K分辨率	仅创建自定义照片头像时需要

Configuration Options (all optional — skill has sensible defaults)

配置选项（均为可选——Skill有合理默认值）

Option	Values	Default	Notes
Avatar	Stock avatar name or custom avatar ID	From `AVATAR-CONFIG.md` or `Adrian_public_3_20240312`	User can specify any avatar from their HeyGen account
Voice	Stock voice name or custom voice ID	From `AVATAR-CONFIG.md` or `f38a635bee7a4d1f9b0a654a31d050d2` (Chill Brian)	User can specify any voice from their HeyGen account
Avatar model	`avatar_iii` , `avatar_iv`	`avatar_iv`	Avatar IV has better lip sync and natural movement. Avatar III is cheaper (~6x) but more robotic.
Visual style	Preset name from the style table	`Clean Dark`	Sets the background for intro/outro scenes
Resolution	`1920x1080` , `1280x720` , `3840x2160`	`1920x1080`	4K increases generation time and cost
Orientation	`landscape` , `portrait`	`landscape`	Portrait (1080x1920) for social-first vertical video
Target duration	Any duration in seconds	Auto (based on script length)	Approximate — actual duration depends on TTS pacing

选项	取值	默认值	说明
Avatar	库存头像名称或自定义头像ID	来自 `AVATAR-CONFIG.md` 或 `Adrian_public_3_20240312`	用户可指定HeyGen账户中的任意头像
Voice	库存语音名称或自定义语音ID	来自 `AVATAR-CONFIG.md` 或 `f38a635bee7a4d1f9b0a654a31d050d2` （Chill Brian）	用户可指定HeyGen账户中的任意语音
Avatar model	`avatar_iii` , `avatar_iv`	`avatar_iv`	Avatar IV的唇形同步和自然动作效果更好；Avatar III成本更低（约为1/6）但更机械。
Visual style	风格预设表中的名称	`Clean Dark`	设置开场/结尾场景的背景
Resolution	`1920x1080` , `1280x720` , `3840x2160`	`1920x1080`	4K分辨率会增加生成时间和成本
Orientation	`landscape` , `portrait`	`landscape`	竖屏（1080x1920）适用于社交平台优先的垂直视频
Target duration	任意秒数时长	自动（基于脚本长度）	近似值——实际时长取决于文本转语音的语速

Video Output Specifications

视频输出规格

Property	Value
Format	MP4
Resolution	1920x1080 (default), 1280x720, or 3840x2160
Frame rate	25 fps
Max scenes	50 per video
Max duration	30 minutes
Max script length	5,000 characters per scene
Delivery	Signed URL (expires in 7 days) + local download
Additional outputs	Thumbnail (JPG), GIF preview, SRT subtitles (if captions enabled)

属性	取值
格式	MP4
分辨率	1920x1080（默认）、1280x720或3840x2160
帧率	25 fps
最大场景数	每个视频50个
最大时长	30分钟
单场景脚本最大长度	5000字符
交付方式	签名URL（7天过期）+ 本地下载
额外输出	缩略图（JPG）、GIF预览、SRT字幕（若启用字幕）

How This Skill Works

Skill工作流程

Step 1: Detect Mode and Load Avatar Config

步骤1：检测模式并加载头像配置

Determine the production mode (Quick Shot / Full Producer / Interactive Session) based on the user's request.
Check for
```
AVATAR-CONFIG.md
```
— if found, load avatar and voice preferences.
If no config exists, use defaults.

根据用户请求确定制作模式（Quick Shot / Full Producer / Interactive Session）。
检查
```
AVATAR-CONFIG.md
```
——若存在，加载头像和语音偏好。
若无配置文件，使用默认值。

Step 2: Read Source Material + Run Discovery

步骤2：读取素材 + 执行调研

Read the source material first (if provided — URL, text, file path).
Run discovery based on the detected mode (see Discovery section above).
Map discovery answers to production decisions before proceeding.
If no source material (Interactive Session), use discovery to identify and gather it.

优先读取素材（若提供——URL、文本、文件路径）。
根据检测到的模式执行调研（见上文调研部分）。
将调研答案转化为制作决策后再推进。
若无素材（交互式会话），通过调研确定并收集所需素材。

Step 3: Classify Source Material and Determine Script Approach

步骤3：分类素材并确定脚本撰写方式

Source Type	What to extract	Script approach
Blog post	Core argument, key insights, proof points	Distill 2-3 most compelling points. Don't follow the blog structure — restructure for spoken delivery. Open with the hook, not the intro.
Documentation page	Steps, code examples, UI descriptions	Pick the most important workflow. Walk through it step by step. Show screenshots of each step. Keep it practical — "here is how you do this."
Changelog / release notes	What changed, why it matters, how to use it	Lead with the impact, not the feature name. "You can now do X" is better than "We shipped feature Y." Show the product UI. Always run changelog enrichment (Step 3b) before writing the script.
Product docs / feature brief	Value prop, use cases, how it works	Pick ONE use case. Show the problem-solution arc. Do not try to cover everything.
Raw data / metrics	Key numbers, trends, surprises	Lead with the most surprising data point. Build a "here is what this means" narrative.
Founder's notes / brain dump	Core ideas, opinions	Clean up into a coherent point of view. Preserve the voice and opinions.
Transcript / talk	Key segments, best quotes	Do not re-script from scratch. Pull the strongest 60-90 seconds and tighten.
Marketing copy / landing page	Value prop, differentiators	Expand into a "let me explain why this matters" format. Landing pages are compressed — video scripts need room to breathe.

Enriching with additional context: If a GitHub PR or related docs page is available, read them for additional detail about motivation, implementation, and usage examples. More context produces better scripts.

素材类型	提取内容	脚本撰写方式
博客文章	核心论点、关键见解、论据	提炼2-3个最具吸引力的要点。不要遵循博客结构——重新组织为口语化表达。以钩子开场，而非引言。
文档页面	步骤、代码示例、UI描述	挑选最重要的工作流，分步讲解。展示每个步骤的截图。注重实用性——“这是操作方法”。
更新日志/发布说明	变更内容、重要性、使用方法	以影响为切入点，而非功能名称。“你现在可以做X”比“我们发布了功能Y”更好。展示产品UI。撰写脚本前务必执行更新日志增强步骤（步骤3b）。
产品文档/功能简报	价值主张、使用场景、工作原理	挑选一个使用场景，展示问题-解决方案的脉络。不要试图覆盖所有内容。
原始数据/指标	关键数字、趋势、意外发现	以最令人惊讶的数据点开场，构建“这意味着什么”的叙事。
创始人笔记/思路草稿	核心想法、观点	整理为连贯的观点，保留原有的语气和见解。
文字稿/演讲内容	关键片段、最佳引语	不要从头重写脚本。选取最精彩的60-90秒内容并精简。
营销文案/着陆页	价值主张、差异化优势	扩展为“让我解释为什么这很重要”的形式。着陆页内容紧凑——视频脚本需要更宽松的表达空间。

补充上下文： 若有GitHub PR或相关文档页面，读取这些内容获取动机、实现细节和使用示例等额外信息。上下文越丰富，脚本质量越高。

Step 3b: Changelog Enrichment (changelogs only)

步骤3b：更新日志增强（仅针对更新日志）

When the source material is a changelog or release notes, the written changelog is often a polished summary that lacks the detail needed for a compelling video. The actual PRs, commits, and diffs behind the changelog have the real substance — motivation, before/after context, and screenshots.

1. Check for inline PR/commit references

Scan the changelog text for links to PRs, commits, or issues. Many changelogs link directly to these. Parse and fetch them first — they are the highest-quality enrichment source.

2. Ask the user for a GitHub repo

"This looks like a changelog. Is there a GitHub repo behind these changes? I can pull PR details, diffs, and screenshots to make the video more specific and accurate. If it is a private repo, you can either give me access or paste the relevant PR URLs."

3. If a repo is available, pull context

Date-range matching: If the changelog has a date or version, search the repo for PRs merged in that window. This catches changes the changelog may have missed.
PR descriptions: Read the body of each relevant PR. These often contain motivation ("why we built this"), implementation notes, and before/after comparisons.
PR screenshots and GIFs: Extract image URLs from PR bodies. These are better than browser screenshots because they show the exact change, not just the current state. Use these as first-class scene backgrounds.
Diffs: Read the actual code/config diffs for key PRs. This enables diff-informed scripting — the script can say "notice how the sidebar now shows X" instead of generic descriptions. It makes the video feel like someone who actually built the feature is presenting it.

4. If no repo is available

Proceed with the changelog text alone. Use browser screenshots of the product UI to fill in visual context.

Important: Not all enrichment context should make it into the video. The script stays concise. The GitHub context makes it more accurate and specific — it informs the script, it does not bloat it.

当素材为更新日志或发布说明时，书面内容通常是经过修饰的摘要，缺乏制作精彩视频所需的细节。更新日志背后的PR、提交记录和代码差异才是真正的核心——包含动机、前后对比和截图。

1. 检查内嵌PR/提交引用

扫描更新日志文本中的PR、提交或问题链接。许多更新日志会直接链接到这些内容。优先解析并获取这些内容——这是最高质量的增强来源。

2. 询问用户GitHub仓库信息

“这看起来是更新日志。这些变更背后是否有GitHub仓库？我可以提取PR详情、代码差异和截图，让视频更具体准确。如果是私有仓库，你可以授予我访问权限或粘贴相关PR链接。”

3. 若提供仓库信息，提取上下文

日期范围匹配： 若更新日志包含日期或版本，搜索仓库中该时间段内合并的PR。这能捕捉到更新日志可能遗漏的变更。
PR描述： 读取每个相关PR的正文。这些内容通常包含动机（“我们为什么构建这个功能”）、实现说明和前后对比。
PR截图和GIF： 提取PR正文中的图片URL。这些图片比浏览器截图更好，因为它们展示的是具体变更，而非当前状态。将这些图片作为场景背景的首选。
代码差异： 读取关键PR的实际代码/配置差异。这能让脚本基于差异撰写——比如脚本可以说“注意侧边栏现在显示X”，而非泛泛描述。让视频看起来像是由实际开发该功能的人讲解。

4. 若无仓库信息

仅使用更新日志文本，通过浏览器截取产品UI截图补充视觉上下文。

重要提示： 并非所有增强上下文都要放入视频。脚本需保持简洁。GitHub上下文用于提升脚本的准确性和针对性——它为脚本提供信息，而非冗余内容。

Step 4: Gather Visual Assets

步骤4：收集视觉素材

Screenshots and images are the backgrounds for video scenes.

Priority order for sourcing visuals:

User-provided screenshots — use directly, highest priority
Image URLs from the source material (e.g., from a CDN like Cloudinary in the docs/changelog) — download these, they are usually high-quality product screenshots
Browser screenshots — if a URL was provided, navigate to the page using Chrome DevTools:
- Take a full-page screenshot first to understand the layout
- Identify key visual sections (code blocks, UI elements, charts, feature screenshots)
- Scroll to each section and take a viewport screenshot (1920x1080)
- Each screenshot becomes a scene background
Solid color backgrounds — if no visuals are available, use style preset colors for all scenes

截图和图片是视频场景的背景。

视觉素材优先级：

用户提供的截图——直接使用，优先级最高
素材中的图片URL（如文档/更新日志中Cloudinary等CDN的图片）——下载这些图片，通常是高质量的产品截图
浏览器截图——若提供URL，使用Chrome DevTools访问页面：
- 先截取全页截图了解布局
- 识别关键视觉区域（代码块、UI元素、图表、功能截图）
- 滚动到每个区域并截取视口截图（1920x1080）
- 每个截图对应一个场景背景
纯色背景——若无视觉素材，所有场景均使用风格预设颜色

Step 5: Write the Script

步骤5：撰写脚本

Before writing, review your discovery answers. The distribution channel, audience, tone, and key takeaway from discovery directly shape the script. A LinkedIn video needs a punchy 3-second hook. A docs video can open with context. A sales video needs personalization. Let discovery drive the script, not just the source material.

General rules for spoken-word scripts:

Short sentences. Average 10-15 words per sentence.
Conversational tone. Write how people talk, not how they write.
No jargon unless the audience is technical and expects it.
No headers, bullet points, or formatting — it is a continuous spoken delivery.
Use contractions naturally.
Direct address — say "you" frequently.
Rhetorical questions work well as transitions.
Avoid filler openings like "In this video, I will..." — get to the point.
If the user has set an intro/outro phrase in
```
AVATAR-CONFIG.md
```
, use it.

Script structure by video output type:

Documentation walkthrough:

Scene 1 (full avatar): "Here is how to [do X] in [product]. It takes about [N] steps and you will be done in [time]."
Scene 2-N (circle avatar over screenshots): Walk through each step. One step per scene. "First... Then... Now..."
Final scene (full avatar): "That is it. [Recap the outcome]. Check out the docs at [URL] for more."

Changelog / product update:

Scene 1 (full avatar): Hook with impact. "[Product] just shipped [feature]. Here is why it matters."
Scene 2 (circle avatar over product screenshot): What the feature does. Show the UI.
Scene 3 (circle avatar over detail screenshot): The interesting detail or power feature.
Scene 4 (full avatar): Why you should care + CTA.

Feature explainer:

Scene 1 (full avatar): The problem. "If you have ever tried to [pain point], you know it is painful."
Scene 2 (full avatar or screenshot): The solution intro. "That is exactly what [feature] solves."
Scene 3-4 (circle avatar over screenshots): How it works. Walk through the UI.
Scene 5 (full avatar): Why it matters + CTA.

FAQ / common question:

Scene 1 (full avatar): The question. "One thing people ask a lot is: [question]?"
Scene 2 (circle avatar over relevant screenshot): The answer with visual context.
Scene 3 (full avatar): Summary + where to learn more.

In Full Producer mode: Present the full production plan to the user for approval before proceeding. Include the script, scene breakdown, AND the specific visuals for each scene so the user knows exactly what the video will look like:

Production Plan — [Video Title]

Summary: [N] scenes, estimated [X] seconds, [avatar model], [style preset]

Scene Layout Script Visual
1 Full avatar "Hook text here..." Clean Dark background (#1a1a2e)
2 Circle avatar "Feature explanation..." PR screenshot: [description] — [source URL or file]
3 Circle avatar "Detail walkthrough..." Browser screenshot: [page section description]
4 Full avatar "CTA text here..." Clean Dark background (#1a1a2e)

Visual assets I will use:

Scene 2: [thumbnail or description of the image, where it came from — PR #123, user-provided, browser screenshot of X page]

Scene 3: [same detail]

Want me to adjust anything before I generate?

Scene	Layout	Script	Visual
1	Full avatar	"Hook text here..."	Clean Dark background (#1a1a2e)
2	Circle avatar	"Feature explanation..."	PR screenshot: [description] — [source URL or file]
3	Circle avatar	"Detail walkthrough..."	Browser screenshot: [page section description]
4	Full avatar	"CTA text here..."	Clean Dark background (#1a1a2e)

This gives the user full visibility into the script AND the visuals before any generation happens. If a visual is wrong or missing, they can flag it now instead of after a 15-minute render.

In Quick Shot mode: Skip approval and generate immediately.

撰写前，回顾调研答案。 调研中的分发渠道、受众、风格和核心要点直接影响脚本。LinkedIn视频需要3秒内的吸睛钩子，文档视频可以从背景介绍开场，销售视频需要个性化内容。让调研结果主导脚本，而非仅依赖素材。

口语化脚本通用规则：

短句。平均每句10-15个单词。
对话式语气。像日常说话一样撰写，而非书面写作。
除非受众是技术人员且预期使用术语，否则避免行话。
不要使用标题、项目符号或格式——脚本是连续的口语内容。
自然使用缩略语。
直接称呼——频繁使用“你”。
反问句作为过渡效果很好。
避免“在本视频中，我将...”这类冗余开场——直接切入主题。
若用户在
```
AVATAR-CONFIG.md
```
中设置了开场/结束语，需使用这些内容。

按视频输出类型的脚本结构：

文档演示：

场景1（全屏头像）：“这是在[产品]中[完成X操作]的方法。只需[N]个步骤，[时长]即可完成。”
场景2-N（圆形头像+截图）：分步讲解，每个步骤对应一个场景。“首先...然后...现在...”
最终场景（全屏头像）：“操作完成。[总结成果]。如需更多信息，请查看文档[URL]。”

更新日志/产品更新：

场景1（全屏头像）：以影响为钩子。“[产品]刚刚发布了[功能]。这对你来说很重要的原因如下。”
场景2（圆形头像+产品截图）：功能介绍，展示UI。
场景3（圆形头像+细节截图）：有趣的细节或高级功能。
场景4（全屏头像）：重要性说明+CTA。

功能讲解：

场景1（全屏头像）：提出问题。“如果你曾尝试[痛点]，就知道有多麻烦。”
场景2（全屏头像或截图）：介绍解决方案。“这正是[功能]要解决的问题。”
场景3-4（圆形头像+截图）：工作原理，分步展示UI。
场景5（全屏头像）：重要性说明+CTA。

FAQ/常见问题：

场景1（全屏头像）：提出问题。“人们经常问的一个问题是：[问题]？”
场景2（圆形头像+相关截图）：结合视觉背景解答问题。
场景3（全屏头像）：总结+学习更多的渠道。

全流程制作模式： 生成前向用户提交完整制作方案供审批。方案需包含脚本、场景分解以及每个场景的具体视觉素材，让用户清楚视频的最终呈现效果：

制作方案 — [视频标题]

摘要： [N]个场景，预计[X]秒，[头像模型]，[风格预设]

场景布局脚本视觉素材
1 全屏头像 “钩子文本...” Clean Dark背景 (#1a1a2e)
2 圆形头像 “功能讲解...” PR截图：[描述] — [来源URL或文件]
3 圆形头像 “细节演示...” 浏览器截图：[页面区域描述]
4 全屏头像 “CTA文本...” Clean Dark背景 (#1a1a2e)

将使用的视觉素材：

场景2：[图片缩略图或描述，来源——PR #123、用户提供、X页面浏览器截图]

场景3：[同上细节]

生成前需要调整什么内容吗？

场景	布局	脚本	视觉素材
1	全屏头像	“钩子文本...”	Clean Dark背景 (#1a1a2e)
2	圆形头像	“功能讲解...”	PR截图：[描述] — [来源URL或文件]
3	圆形头像	“细节演示...”	浏览器截图：[页面区域描述]
4	全屏头像	“CTA文本...”	Clean Dark背景 (#1a1a2e)


这能让用户在生成前全面了解脚本和视觉素材。如果视觉素材有误或缺失，用户可立即指出，避免15分钟渲染后再修改。

**快速拍摄模式：** 跳过审批环节，立即生成。

Step 6: Build the Scene Composition

步骤6：构建场景组合

Each scene needs three components: character, voice, and background.

Avatar configurations:

Full avatar (intro/outro scenes):

json

{
    "type": "avatar",
    "avatar_id": "<AVATAR_ID>",
    "avatar_style": "normal",
    "scale": 1.0,
    "use_avatar_iv_model": true
}

Circle avatar in bottom-right corner (content scenes):

json

{
    "type": "avatar",
    "avatar_id": "<AVATAR_ID>",
    "avatar_style": "circle",
    "scale": 0.4,
    "offset": {"x": 0.35, "y": 0.35},
    "use_avatar_iv_model": true
}

Background types:

Solid color (for intro/outro — use the selected style preset):

json

{"type": "color", "value": "#1a1a2e"}

Image (for content scenes):

json

{"type": "image", "image_asset_id": "<ASSET_ID>", "fit": "cover"}

Video (for screen recording backgrounds):

json

{"type": "video", "video_asset_id": "<ASSET_ID>", "play_style": "fit_to_scene"}

Aspect ratio check: If the video orientation is portrait (1080x1920), adjust the circle avatar offset to

{"x": 0.3, "y": 0.4}

and consider using

scale: 0.3

for better proportions on vertical video.

每个场景需要三个组件：角色、语音和背景。

头像配置：

全屏头像（开场/结尾场景）：

json

{
    "type": "avatar",
    "avatar_id": "<AVATAR_ID>",
    "avatar_style": "normal",
    "scale": 1.0,
    "use_avatar_iv_model": true
}

右下角圆形头像（内容场景）：

json

{
    "type": "avatar",
    "avatar_id": "<AVATAR_ID>",
    "avatar_style": "circle",
    "scale": 0.4,
    "offset": {"x": 0.35, "y": 0.35},
    "use_avatar_iv_model": true
}

背景类型：

纯色背景（开场/结尾场景——使用选定的风格预设）：

json

{"type": "color", "value": "#1a1a2e"}

图片背景（内容场景）：

json

{"type": "image", "image_asset_id": "<ASSET_ID>", "fit": "cover"}

视频背景（屏幕录制背景）：

json

{"type": "video", "video_asset_id": "<ASSET_ID>", "play_style": "fit_to_scene"}

宽高比检查： 若视频为竖屏（1080x1920），调整圆形头像偏移为

{"x": 0.3, "y": 0.4}

，并考虑使用

scale: 0.3

以适配竖屏比例。

Step 7: Upload Assets to HeyGen

步骤7：上传素材至HeyGen

Upload all screenshot/image files to HeyGen's asset storage.

Endpoint:

POST https://upload.heygen.com/v1/asset

Important: This uses a DIFFERENT host than the main API (

upload.heygen.com

, not

api.heygen.com

Request format: Raw binary body with Content-Type header. NOT multipart form data.

bash

curl -X POST "https://upload.heygen.com/v1/asset" \
  -H "X-Api-Key: <HEYGEN_API_KEY>" \
  -H "Content-Type: image/png" \
  --data-binary @screenshot.png

Response: Returns an

id

field — this is the

image_asset_id

to use in scene backgrounds.

将所有截图/图片文件上传至HeyGen的素材存储。

端点：

POST https://upload.heygen.com/v1/asset

重要提示： 此端点使用的主机与主API不同（

upload.heygen.com

，而非

api.heygen.com

）。

请求格式： 原始二进制请求体，带Content-Type头。不支持多部分表单数据。

bash

curl -X POST "https://upload.heygen.com/v1/asset" \
  -H "X-Api-Key: <HEYGEN_API_KEY>" \
  -H "Content-Type: image/png" \
  --data-binary @screenshot.png

响应： 返回

id

字段——此为场景背景中使用的

image_asset_id

。

Step 8: Submit Video Generation Request

步骤8：提交视频生成请求

Endpoint:

POST https://api.heygen.com/v2/video/generate

Headers:

X-Api-Key: <HEYGEN_API_KEY>
Content-Type: application/json

Payload structure:

json

{
    "video_inputs": [
        {
            "character": { ... },
            "voice": {
                "type": "text",
                "voice_id": "<VOICE_ID>",
                "input_text": "<SCENE_SCRIPT>"
            },
            "background": { ... }
        }
    ],
    "dimension": {"width": 1920, "height": 1080}
}

API key location: Check the

.env

file in the project root for

HEYGEN_API_KEY

端点：

POST https://api.heygen.com/v2/video/generate

请求头：

X-Api-Key: <HEYGEN_API_KEY>
Content-Type: application/json

请求体结构：

json

{
    "video_inputs": [
        {
            "character": { ... },
            "voice": {
                "type": "text",
                "voice_id": "<VOICE_ID>",
                "input_text": "<SCENE_SCRIPT>"
            },
            "background": { ... }
        }
    ],
    "dimension": {"width": 1920, "height": 1080}
}

API密钥位置： 检查项目根目录的

.env

文件中的

HEYGEN_API_KEY

。

Step 9: Poll for Completion and Deliver

步骤9：轮询生成状态并交付

Video generation is asynchronous. After submitting, the API returns a

video_id

. The video takes 10-20 minutes to render (longer for Avatar IV, more scenes, or higher resolution).

Poll endpoint:

GET https://api.heygen.com/v1/video_status.get?video_id=<VIDEO_ID>

Polling strategy:

Poll every 10 seconds
Log status every 60 seconds to keep the user informed
When status is
```
completed
```
, download the video from
```
video_url
```
Save to the working directory

On completion, present to the user:

Video complete!
- Duration: [X] seconds
- Scenes: [N]
- Avatar model: [III or IV]
- Visual style: [preset name]
- File: [local path]
- Video URL: [signed URL — expires in 7 days]
- Estimated cost: $[X]

Want me to adjust anything and regenerate?

视频生成为异步操作。 提交请求后，API返回

video_id

。视频渲染需10-20分钟（Avatar IV、场景更多或分辨率更高时耗时更长）。

轮询端点：

GET https://api.heygen.com/v1/video_status.get?video_id=<VIDEO_ID>

轮询策略：

每10秒轮询一次
每60秒记录状态，告知用户进度
状态为
```
completed
```
时，从
```
video_url
```
下载视频
保存至工作目录

生成完成后，告知用户：

视频制作完成！
- 时长：[X]秒
- 场景数：[N]
- 头像模型：[III或IV]
- 视觉风格：[预设名称]
- 文件路径：[本地路径]
- 视频URL：[签名URL——7天过期]
- 预估成本：$[X]

需要调整内容并重新生成吗？

Step 10: Log the Generation (optional, for learning and iteration)

步骤10：记录生成信息（可选，用于学习和迭代）

If a

video-log.jsonl

file exists in the working directory, append an entry to it. Otherwise, skip this step.

json

{
    "timestamp": "2026-04-16T10:30:00Z",
    "video_id": "<heygen_video_id>",
    "mode": "full_producer",
    "output_type": "changelog",
    "source_type": "changelog_entry",
    "avatar_id": "<avatar_id>",
    "avatar_model": "avatar_iv",
    "voice_id": "<voice_id>",
    "style_preset": "clean_dark",
    "scenes": 5,
    "duration_seconds": 93,
    "generation_time_seconds": 510,
    "resolution": "1920x1080",
    "local_path": "/path/to/video.mp4",
    "source_url": "https://posthog.com/changelog?id=2666"
}

This log helps track what has been generated, measure generation times, and improve the skill over time.

若工作目录中存在

video-log.jsonl

文件，追加一条记录。否则跳过此步骤。

json

{
    "timestamp": "2026-04-16T10:30:00Z",
    "video_id": "<heygen_video_id>",
    "mode": "full_producer",
    "output_type": "changelog",
    "source_type": "changelog_entry",
    "avatar_id": "<avatar_id>",
    "avatar_model": "avatar_iv",
    "voice_id": "<voice_id>",
    "style_preset": "clean_dark",
    "scenes": 5,
    "duration_seconds": 93,
    "generation_time_seconds": 510,
    "resolution": "1920x1080",
    "local_path": "/path/to/video.mp4",
    "source_url": "https://posthog.com/changelog?id=2666"
}

此日志有助于跟踪已生成内容、测量生成时间，并逐步优化Skill。

Cost Reference

成本参考

Avatar Model	Cost per second	60-sec video	90-sec video
Avatar III	~$0.017/sec	~$1.00	~$1.50
Avatar IV (1080p)	~$0.05/sec	~$3.00	~$4.50
Avatar IV (4K)	~$0.067/sec	~$4.00	~$6.00

头像模型	每秒成本	60秒视频成本	90秒视频成本
Avatar III	~$0.017/秒	~$1.00	~$1.50
Avatar IV（1080p）	~$0.05/秒	~$3.00	~$4.50
Avatar IV（4K）	~$0.067/秒	~$4.00	~$6.00

Limitations and Gotchas

限制和注意事项

No clickable links in video. Output is flat MP4. Show URLs as text overlays or mention them verbally.
No zoom/pan on backgrounds. If you need a zoomed view of a screenshot, take a separate cropped screenshot and use it as a different scene.
One text overlay per scene. If you need multiple text elements, bake them into the background image.
Max 5,000 characters per scene script. Split long narrations across multiple scenes.
Max 50 scenes per video, max 30 minutes total.
Generation time is 10-20 minutes for a typical 5-scene video. Avatar IV takes longer than Avatar III.
Avatar IDs must match exactly. Always list available avatars first if unsure. Use
```
GET https://api.heygen.com/v2/avatars
```
.
Asset uploads use
upload.heygen.com
, not
```
api.heygen.com
```
. Use raw binary body with Content-Type header.
Max 10 concurrent video jobs. Exceeding returns HTTP 429.
Signed video URLs expire in 7 days. Always download the video locally.
Avatar IV is ~6x more expensive than Avatar III. For high-volume or draft videos, consider using Avatar III first, then re-generating the final version with Avatar IV.
Portrait orientation requires adjusting circle avatar offset and scale for good proportions.

视频中无可点击链接。输出为MP4文件，可通过文本叠加或口头提及展示URL。
背景无法缩放/平移。若需截图的局部放大视图，单独截取该区域作为新场景的背景。
单场景仅支持一个文本叠加。若需多个文本元素，需将其嵌入背景图片。
单场景脚本最大5000字符。长旁白需拆分到多个场景。
每个视频最多50个场景，总时长最多30分钟。
典型5场景视频生成时间为10-20分钟。Avatar IV比Avatar III耗时更长。
头像ID必须完全匹配。若不确定，先列出可用头像。使用
```
GET https://api.heygen.com/v2/avatars
```
获取。
素材上传使用
upload.heygen.com
，而非
```
api.heygen.com
```
。使用带Content-Type头的原始二进制请求体。
最多支持10个并发视频任务。超出限制将返回HTTP 429错误。
签名视频URL7天过期。务必本地下载视频。
Avatar IV成本约为Avatar III的6倍。对于高产量或草稿视频，可先使用Avatar III生成，最终版本再用Avatar IV重新生成。
竖屏模式需调整圆形头像的偏移和缩放比例，以获得良好的视觉效果。

Available Avatars and Voices

可用头像和语音

To list available avatars:

bash

curl -s "https://api.heygen.com/v2/avatars" -H "X-Api-Key: <HEYGEN_API_KEY>"

To list available voices:

bash

curl -s "https://api.heygen.com/v2/voices" -H "X-Api-Key: <HEYGEN_API_KEY>"

To design a custom voice from description:

bash

curl -X POST "https://api.heygen.com/v3/voices" \
  -H "X-Api-Key: <HEYGEN_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"description": "friendly male voice, mid-30s, warm and conversational"}'

Known good defaults:

Avatar:
```
Adrian_public_3_20240312
```
(Adrian in Blue Shirt — professional male)
Voice:
```
f38a635bee7a4d1f9b0a654a31d050d2
```
(Chill Brian — natural English male)

列出可用头像：

bash

curl -s "https://api.heygen.com/v2/avatars" -H "X-Api-Key: <HEYGEN_API_KEY>"

列出可用语音：

bash

curl -s "https://api.heygen.com/v2/voices" -H "X-Api-Key: <HEYGEN_API_KEY>"

通过描述定制语音：

bash

curl -X POST "https://api.heygen.com/v3/voices" \
  -H "X-Api-Key: <HEYGEN_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"description": "friendly male voice, mid-30s, warm and conversational"}'

推荐默认值：

头像：
```
Adrian_public_3_20240312
```
（Adrian in Blue Shirt — 专业男性）
语音：
```
f38a635bee7a4d1f9b0a654a31d050d2
```
（Chill Brian — 自然英语男性）