talking-head-recut

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Talking Head Recut

访谈类视频重剪

Talking Head Recut takes a local video that plays in full and layers a sequence of timed, designed graphic cards onto it — titles, lower-thirds, data callouts, quotes, side panels, picture-in-picture — synced to what's being said. The agent designs the cards (timing + content) and writes each card's HTML directly in the conversation, then assembles a single composition HTML and renders it to MP4 via
hyperframes
. There is no fixed archetype list and no prescribed card structure — the overlays emerge from what the transcript actually says.
Confirm the route before you build. This skill packages an existing talking-head clip with designed graphic cards (titles, lower-thirds, data callouts, quotes, side panels, PiP). If the user wants plain captions / subtitles (the spoken words as text) →
/embedded-captions
; a single short unnarrated element (one logo sting / lower-third) →
/motion-graphics
. The clip plays untouched — re-timing, recoloring, reframing, reordering, or audio is NLE editing and out of scope. Building from a URL / topic / PR → the creation workflows. Unsure overlays-vs-captions? Read
/hyperframes
first.
Graphic-packaging sibling of
embedded-captions
.
Captions add the spoken words as a readable subtitle; this adds designed graphics on top of the playing video. Plain subtitles →
embedded-captions
. Build a video from scratch → the creation workflows (
product-launch-video
/
faceless-explainer
/ …).
Inspectable intermediate files in the work directory:
  • metadata.json
    — duration / width / height / fps
  • audio.mp3
    — extracted audio
  • transcript.json
    — a flat word array
    [{ text, start, end }, …]
    (Whisper; no
    segments
    , no
    words
    wrapper)
  • storyboard.json
    — lightweight card outline (the agent's plan)
  • public/cards/card-XX.html
    — one HTML fragment per card
  • public/index.html
    — final assembled composition
  • output.mp4
    — rendered video
访谈类视频重剪工具会完整播放本地视频,并在其上叠加一系列定时设计的图形卡片——包括标题、下三分之一字幕、数据标注、引用、侧边栏、画中画等——与视频中的语音内容同步。Agent会设计卡片(时间规划+内容)并直接在对话中编写每个卡片的HTML,然后将所有内容整合为一个合成HTML文件,再通过
hyperframes
渲染为MP4。这里没有固定的卡片模板和结构,叠加层完全根据视频字幕的实际内容生成。
开始制作前请确认流程。此技能用于为现有访谈类视频片段添加设计好的图形卡片(标题、下三分之一字幕、数据标注、引用、侧边栏、画中画)。如果用户需要纯字幕/对白字幕(将语音转为文本)→ 使用
/embedded-captions
;如果需要单个简短无旁白元素(如单个logo动画/下三分之一字幕)→ 使用
/motion-graphics
源视频会完整播放——调整时长、调色、重构图、重新排序或音频编辑属于非线性编辑(NLE)范畴,不在此技能范围内。从URL/主题/公关材料创建视频→ 使用创建工作流。不确定是叠加层还是字幕?先阅读
/hyperframes
embedded-captions
的图形包装姊妹技能
。字幕是将语音内容转为可读的文本;此技能是在播放的视频上添加设计好的图形。纯字幕→使用
embedded-captions
。从零开始制作视频→使用创建工作流(
product-launch-video
/
faceless-explainer
/ …)。
工作目录中的可检查中间文件:
  • metadata.json
    — 时长/宽度/高度/帧率
  • audio.mp3
    — 提取的音频
  • transcript.json
    — 扁平化的单词数组
    [{ text, start, end }, …]
    (由Whisper生成;无
    segments
    ,无
    words
    包装)
  • storyboard.json
    — 轻量化的卡片大纲(Agent的制作规划)
  • public/cards/card-XX.html
    — 每个卡片对应的HTML片段
  • public/index.html
    — 最终整合的合成文件
  • output.mp4
    — 渲染后的视频

CLI Resolution

CLI 说明

bash
undefined
bash
undefined

hyperframes — transcription (local Whisper) + rendering the assembled HTML to MP4

hyperframes — 转录(本地Whisper)+ 将整合后的HTML渲染为MP4

npx hyperframes --help

This skill runs entirely on the **hyperframes** CLI plus system `ffmpeg` / `ffprobe`.
Transcription is local **Whisper** via `hyperframes transcribe` — no third-party
service, API key, or rate-limited proxy.
npx hyperframes --help

此技能完全依赖**hyperframes** CLI以及系统中的`ffmpeg` / `ffprobe`。转录通过`hyperframes transcribe`使用本地**Whisper**完成——无需第三方服务、API密钥或限速代理。

Workflow

工作流

1. Check Environment

1. 检查环境

bash
npx hyperframes doctor          # ffmpeg, headless browser, render deps
bash
npx hyperframes doctor          # 检查ffmpeg、无头浏览器、渲染依赖

confirm bundled assets:

确认捆绑资源:

ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"

Required:

- `ffmpeg` / `ffprobe` (system)
- `<SKILL_DIR>/assets/fonts/*.woff2`, `<SKILL_DIR>/assets/vendor/gsap.min.js` (bundled inside this skill, staged to work dir in Step 9)

Transcription needs no key — `hyperframes transcribe` runs Whisper locally (Step 4).

Strongly recommended on macOS for `hyperframes render`:

```bash
export PRODUCER_BROWSER_GPU_MODE=hardware
ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"

必需依赖:

- `ffmpeg` / `ffprobe`(系统级)
- `<SKILL_DIR>/assets/fonts/*.woff2`, `<SKILL_DIR>/assets/vendor/gsap.min.js`(捆绑在此技能中,步骤9会复制到工作目录)

转录无需密钥——`hyperframes transcribe`会在本地运行Whisper(步骤4)。

在macOS上运行`hyperframes render`时强烈建议设置:

```bash
export PRODUCER_BROWSER_GPU_MODE=hardware

2. Create a Work Directory

2. 创建工作目录

All artifacts live under
videos/<project-name>/
— the same convention as the other video workflows (
product-launch-video
/
faceless-explainer
/
pr-to-video
). Keep the cwd at the workspace root; everything below writes under this one subdirectory.
bash
VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"
所有产物都存放在
videos/<project-name>/
下——与其他视频工作流(
product-launch-video
/
faceless-explainer
/
pr-to-video
)遵循相同约定。保持当前工作目录在工作区根目录;所有后续操作都会写入此子目录。
bash
VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"

3. Extract Audio and Metadata

3. 提取音频和元数据

bash
undefined
bash
undefined

metadata — duration / width / height / fps

元数据 — 时长/宽度/高度/帧率

ffprobe -v error -select_streams v:0
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"
ffprobe -v error -select_streams v:0
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"

audio

音频

ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"

Outputs: `metadata.json` (read `width`/`height`/`duration`; fps = the `r_frame_rate`
fraction evaluated, e.g. `30000/1001 → 29.97`) + `audio.mp3`.
ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"

输出文件:`metadata.json`(包含`width`/`height`/`duration`;帧率为`r_frame_rate`的分数计算值,例如`30000/1001 → 29.97`) + `audio.mp3`。

4. Transcribe

4. 转录

bash
npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en
Local Whisper — no API key, no proxy, no rate limit. Writes a word-level
transcript.json
into the work dir (word
text
+
start
/
end
timestamps). Read it for the word / sentence timings that drive card timing in Step 6; group words into sentences yourself at punctuation / pauses if you need segment-level chunks.
Clamp to media duration. Whisper can return the final word's
end
a hair past the actual clip length — clamp every card
endSec
and
composition.durationSeconds
to the
metadata.json
duration, or the render will show a black tail past the video.
bash
npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en
本地Whisper——无需API密钥、代理或限速。会在工作目录中生成单词级别的
transcript.json
(包含单词
text
+
start
/
end
时间戳)。步骤6中可读取此文件获取单词/句子时间戳来规划卡片时间;如果需要段落级片段,可根据标点/停顿自行将单词分组为句子。
限制在媒体时长内。Whisper返回的最后一个单词的
end
时间可能略超过实际视频长度——需将每个卡片的
endSec
composition.durationSeconds
限制在
metadata.json
的时长内,否则渲染时视频末尾会出现黑屏。

5. Correct Transcript

5. 修正字幕

transcript.json
is a flat array of word objects
[{ "text": "...", "start": s, "end": s }, …]
(no
segments
array, no
words
wrapper; the per-word key is
text
). Read it and fix obvious ASR errors:
  • Homophones, product names, technical terms, punctuation
  • Edit a word's
    text
    in place; preserve its
    start
    /
    end
    timestamps
  • There is no pre-grouped
    segments
    array — group words into sentences yourself (split at terminal punctuation / pauses) when you need segment-level chunks for card timing
transcript.json
扁平化的单词对象数组——
[{ "text": "...", "start": s, "end": s }, …]
(无
segments
数组,无
words
包装;每个单词的键为**
text
**)。读取并修正明显的自动语音识别(ASR)错误:
  • 同音词、产品名称、技术术语、标点符号
  • 直接修改单词的
    text
    保留其
    start
    /
    end
    时间戳
  • 没有预先分组的
    segments
    数组——需要段落级片段时自行将单词分组为句子(根据句末标点/停顿拆分)

6. Draft a Lightweight Storyboard (in chat)

6. 起草轻量化故事板(在对话中)

No CLI involved. Read
transcript.json
+
metadata.json
and design cards directly.
storyboard.json
is an agent-internal planning artifact — no CLI command consumes it; it exists so you can think clearly about timing and content before writing each card's HTML. Keep the shape consistent with the example below so the same outline can drive the composition you author in Step 9:
json
{
  "schemaVersion": 3,
  "composition": {
    "fps": 30,
    "width": 1080,
    "height": 1920,
    "durationSeconds": 121.2,
    "layout": "portrait",
    "themeId": "noir",
    "seed": 42
  },
  "videoTrack": {
    "sourcePath": "input-video.mp4",
    "startSec": 0,
    "endSec": 121.2,
    "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
  },
  "subtitles": { "enabled": false },
  "cards": [
    {
      "id": "card-01",
      "intent": "Hook with the speaker's anxious midnight question",
      "startSec": 0.5,
      "endSec": 13.0,
      "accentIndex": 0,
      "zone": "fullscreen",
      "contentHints": {
        "kicker": "AN HONEST QUESTION",
        "title": "The soul-searching question at 11 PM",
        "detail": "Client's 60-second voice message: 'If the RMB appreciates, does that mean my USD policy is a terrible loss?'"
      }
    }
  ]
}
Required Card fields:
fieldtypepurpose
id
stringstable id used in card HTML & GSAP selectors
intent
stringnatural-language description; fed to card synthesis
startSec
/
endSec
numbertimes in seconds (endSec > startSec)
accentIndex
0 | 1 | 2 | 3 | 4which of the 5 theme accent colors this card pulls
zone
enum (see below)where on the canvas the card lives
contentHints
objectfree-form bag; agent puts kicker/title/detail/data/quote here
archetype
(optional)
stringfree-form label you may attach to remember a card's pattern; absent = free-form, which is the default
transition
(optional)
enum:
cut
|
fade
|
slide
|
wipe
declarative card-to-card transition
Five
zone
values:
zoneresolved boundswhen to use
fullscreen
covers whole canvashero moments, big numbers, mantras
whiteboard-area
inset 40px margin (or 45% of portrait height)dense data / annotated content
lower-third
bottom 30% bandannotation over visible video
side-panel
right 42% (landscape) or bottom 40% (portrait)data side, video other side
video-overlay
full canvas, expects mostly-transparent cardannotation overlays on full-bleed video
When you assemble the composition in Step 9, resolve each card's
zone
into pixel bounds on the card-host wrapper following the table above. Video bounds are set once at composition level (
videoTrack.bounds
); to make video appear to "move between cards", author GSAP tweens against
#video-wrap
in the composition's
<script>
(see Step 9).
No prescribed card roles, no prescribed narrative arc. Cards emerge from what the video actually says — could be all quotes or all data, could open with a number or with a story. Let the transcript drive the rhythm.
How many takeaways? — auto-infer from duration + density. No fixed upper limit. Pick a base pace from the video duration, then adjust by information density. Only floor is fixed: minimum 5 cards so even short videos have rhythm.
Step 1 — base pace by duration (the natural sec/card for medium density):
video durationbase pace (sec per card)rationale
< 60s (short reel)6–8sviewers expect fast cuts in short-form
60s – 3 min8–12snormal social pace
3 – 10 min12–20sgive breathing room; each card carries more
10 – 30 min20–35slong-form lecture / interview rhythm
> 30 min30–60sepisodic, near-chapter feel
Step 2 — density multiplier (multiplies the base pace):
signal in the transcriptmultipliereffect
High density — many numbers, distinct claims, staccato pacing, list-like enumeration, every 1–2 sentences is a new idea× 0.7cuts faster, more cards
Medium density — mixed flow with both data and narrative× 1.0base pace
Low density — one extended story, repeated reframing, slow reflective pacing, single argument unfolding× 1.5cuts slower, fewer cards
Step 3 — compute:
secPerCard = basePace × densityMultiplier
cardCount  = max(5, round(videoDurationSec / secPerCard))
Examples (notice — no upper clamp; long videos naturally produce more cards):
  • 30s reel, single punchline (low density) → 7 × 1.5 = 10.5s/card → round(30/10.5)=3 → floor to 5 cards
  • 60s reflective monologue (low density) → 10 × 1.5 = 15s/card → 4 → floor to 5 cards
  • 121s talking-head with rich data (high density) → 10 × 0.7 = 7s/card → 17 cards
  • 5 min interview, mixed density → 16 × 1.0 = 16s/card → 19 cards
  • 10 min deep-dive, high density → 16 × 0.7 = 11s/card → 55 cards
  • 30 min lecture, medium density → 28 × 1.0 = 28s/card → 64 cards
  • 1 hr podcast, low density → 45 × 1.5 = 67.5s/card → 53 cards
When a card holds longer than ~15s, plan for a richer card (data block, multi-step reveal, several sub-points unfolding with staggered animations) — a static one-liner gets boring past 8s. For long pieces where many cards exceed 30s, consider chunking the timeline into sub-compositions (one .html per chapter, mounted with
data-composition-src
) so the GSAP timeline per file stays manageable — see the
timeline_track_too_dense
HyperFrames lint warning.
content
can be a plain string ("Title: annualized 5.69%\nNotes: ...") or any JSON shape that captures the data. The agent decides the shape per card.
Optional outro. This skill ships no fixed brand outro. If the user wants a closing card, design a neutral one yourself (wordmark + one-line tagline, ~1.5-2s, fade in -> short hold -> fade out), append it to
cards[]
, and extend
composition.durationSeconds
to its
endSec
. Otherwise end on the last content card.
无需使用CLI。读取
transcript.json
+
metadata.json
并直接设计卡片。
storyboard.json
是Agent内部的规划文件——没有CLI命令会解析它;它的作用是让你在编写每个卡片的HTML前清晰规划时间和内容。保持与以下示例一致的结构,这样同一大纲可用于步骤9中的合成文件制作:
json
{
  "schemaVersion": 3,
  "composition": {
    "fps": 30,
    "width": 1080,
    "height": 1920,
    "durationSeconds": 121.2,
    "layout": "portrait",
    "themeId": "noir",
    "seed": 42
  },
  "videoTrack": {
    "sourcePath": "input-video.mp4",
    "startSec": 0,
    "endSec": 121.2,
    "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
  },
  "subtitles": { "enabled": false },
  "cards": [
    {
      "id": "card-01",
      "intent": "用演讲者深夜的焦虑问题吸引观众",
      "startSec": 0.5,
      "endSec": 13.0,
      "accentIndex": 0,
      "zone": "fullscreen",
      "contentHints": {
        "kicker": "一个坦诚的问题",
        "title": "深夜11点的灵魂拷问",
        "detail": "客户60秒语音消息:‘如果人民币升值,是不是意味着我的美元策略亏大了?’"
      }
    }
  ]
}
必需的卡片字段:
字段类型用途
id
字符串用于卡片HTML和GSAP选择器的稳定ID
intent
字符串自然语言描述;用于卡片内容生成
startSec
/
endSec
数字时间(秒,endSec > startSec)
accentIndex
0 | 1 | 2 | 3 | 4此卡片使用的主题强调色索引(共5种)
zone
枚举(见下表)卡片在画布上的位置
contentHints
对象自由格式内容;Agent可在此添加标题/副标题/详情/数据/引用等
archetype
(可选)
字符串用于标记卡片模式的自由格式标签;无此字段则为自由设计(默认)
transition
(可选)
枚举:
cut
|
fade
|
slide
|
wipe
卡片间的过渡效果声明
五种
zone
取值:
zone解析后的边界使用场景
fullscreen
覆盖整个画布核心时刻、大数字、关键口号
whiteboard-area
内边距40px(或竖屏高度的45%)密集数据/带注释的内容
lower-third
底部30%区域视频上的注释内容
side-panel
右侧42%(横屏)或底部40%(竖屏)数据在一侧,视频在另一侧
video-overlay
整个画布,卡片需大部分透明全屏视频上的注释叠加层
步骤9整合合成文件时,需根据上表将每个卡片的
zone
解析为卡片容器的像素边界。视频边界在合成层一次性设置
videoTrack.bounds
);若要让视频在卡片切换时“移动”,需在合成文件的
<script>
中针对
#video-wrap
编写GSAP动画(见步骤9)。
没有固定的卡片角色和叙事结构。卡片完全根据视频内容生成——可以全是引用或全是数据,可以以数字或故事开头。让字幕内容决定节奏。
需要多少个重点内容?——根据时长和信息密度自动推断。没有固定上限。先根据视频时长选择基础节奏,再根据信息密度调整。唯一固定下限:至少5张卡片,即使短视频也有节奏。
步骤1 — 按时长确定基础节奏(中等密度下的自然每张卡片时长):
视频时长基础节奏(每张卡片秒数)理由
< 60秒(短视频)6–8秒观众期望短视频节奏快
60秒 – 3分钟8–12秒正常社交平台节奏
3 – 10分钟12–20秒留出呼吸空间;每张卡片承载更多内容
10 – 30分钟20–35秒长时讲座/访谈节奏
> 30分钟30–60秒章节式节奏,接近分段感受
步骤2 — 密度乘数(乘以基础节奏):
字幕中的信号乘数效果
高密度 — 大量数字、不同观点、急促节奏、列表式列举、每1–2句话就是新观点× 0.7切换更快,卡片更多
中等密度 — 数据与叙事混合× 1.0基础节奏
低密度 — 单一长篇故事、重复重构、缓慢反思节奏、单一论点展开× 1.5切换更慢,卡片更少
步骤3 — 计算:
每张卡片时长 = 基础节奏 × 密度乘数
卡片数量  = max(5, round(视频时长秒数 / 每张卡片时长))
示例(注意——无上限;长视频自然会生成更多卡片):
  • 30秒短视频,单一笑点(低密度) → 7 × 1.5 = 10.5秒/张 → round(30/10.5)=3 → 下限为5张卡片
  • 60秒反思独白(低密度) → 10 × 1.5 = 15秒/张 → 4 → 下限为5张卡片
  • 121秒含丰富数据的访谈(高密度) → 10 × 0.7 = 7秒/张 → 17张卡片
  • 5分钟访谈,混合密度 → 16 × 1.0 = 16秒/张 → 19张卡片
  • 10分钟深度内容,高密度 → 16 × 0.7 = 11秒/张 → 55张卡片
  • 30分钟讲座,中等密度 → 28 × 1.0 = 28秒/张 → 64张卡片
  • 1小时播客,低密度 → 45 × 1.5 = 67.5秒/张 → 53张卡片
当卡片时长超过约15秒时,需设计更丰富的卡片(数据块、分步展示、多个子点交错动画)——静态单行文本超过8秒会显得无聊。对于很多卡片时长超过30秒的长内容,考虑将时间线拆分为子合成文件(每个章节一个.html文件,通过
data-composition-src
挂载),这样每个文件的GSAP时间线更易于管理——可查看
timeline_track_too_dense
HyperFrames lint警告。
content
可以是纯字符串("标题: 年化5.69%\n备注: ...")或任何能捕获数据的JSON结构。Agent可根据每张卡片决定结构。
可选片尾。此技能没有固定品牌片尾。如果用户需要结尾卡片,自行设计一个中性卡片(标志+一行标语,约1.5-2秒,淡入→短暂停留→淡出),将其添加到
cards[]
中,并将
composition.durationSeconds
延长至其
endSec
。否则在最后一张内容卡片结束。

7. Decide Render Strategy

7. 确定渲染策略

Confirm Visual Direction with User (DO THIS FIRST)

与用户确认视觉方向(务必先做)

Before you start designing cards or deciding bounds, ask the user to pick the output ratio, the layout, the style, and the card-density preset. Frames are auto-selected from the chosen layout × style combination (see "Auto-pick frame" table below). Before sending the question, precompute two things:
  1. recommendedRatio
    from the source video's aspect ratio (
    metadata.json
    width / height):
    • sourceAspect = width / height
    • sourceAspect ≥ 1.5
      (≥ ~3:2 wide) → recommend
      16:9
    • sourceAspect ≤ 0.7
      (≤ ~9:13 tall) → recommend
      9:16
    • 0.7 < sourceAspect < 1.5
      (near-square) → recommend
      4:5
    Mark the recommended option's label with " (recommended · matches source video X:Y)" so the user sees why it's recommended.
  2. autoCount
    from Step 6 (
    max(5, round(videoSec / (basePace × densityMultiplier)))
    ) so the "auto" option's label can show the concrete number.
Environment compatibility — pick the best available question channel. Not every runtime exposes the same structured-question tool. Apply this order:
  1. AskUserQuestion
    (Claude Code, Anthropic Console) — use the structured 4-question call below.
  2. Other native clarification tool (e.g.
    ask_question
    ,
    request_user_input
    , IDE-specific prompt) — use that tool with the same 4 question texts and option lists. Preserve the recommendation markers and the precomputed values.
  3. No native tool (Codex CLI, plain text-only runtimes) — ask directly in normal conversation. Use the plain-text template at the end of this section. Keep it to one message, 4 numbered questions (the global cap is 2–5 questions per round; we stay inside it).
Rules that apply to every channel:
  • Ask at most 2–5 questions per round. Our 4 here fits.
  • Even if missing info doesn't block rendering, ask once to confirm the parameters that materially affect the final output (ratio, layout, style, cardCount).
  • If the user has already pre-approved defaults ("just use defaults", "no need to ask", "auto-pick everything") or asked you not to ask — skip the question entirely and use:
    recommendedRatio
    ,
    layout="stack"
    (safest cross-ratio default),
    style
    chosen from transcript tone in the most neutral group (editorial/data),
    autoCount
    . Tell the user what you picked in one sentence and continue.
Channel A — native
AskUserQuestion
:
// Precompute before the call:
//   recommendedRatio = "16:9" | "9:16" | "4:5"
//   autoCount        = integer (from Step 6)

AskUserQuestion({
  questions: [
    {
      question: "Output video aspect ratio (canvas):",
      header: "Aspect ratio",
      multiSelect: false,
      // Reorder so the recommended option appears FIRST (per AskUserQuestion convention).
      // Append " (recommended · matches source video W×H)" to the recommended option's label.
      options: [
        { label: "16:9 (1920×1080) landscape", description: "TV / YouTube / desktop playback. Most natural when the source video is already landscape; widest canvas." },
        { label: "9:16 (1080×1920) portrait", description: "TikTok / Reels / short-form mobile. Most natural for portrait source; native mobile experience." },
        { label: "4:5 (1080×1350) near-portrait", description: "Instagram feed / WeChat Moments. Best when source is near-square or you want to cover both platforms." }
      ]
    },
    {
      question: "Choose the overall layout: how should the video and cards coexist on the canvas?",
      header: "Layout",
      multiSelect: false,
      options: [
        { label: "side-by-side (split)",  description: "Video and card each take half the canvas. Most stable for interview / data side-by-side; clear visual separation." },
        { label: "top-bottom (stack)",    description: "Video on top (~52%), card below. Classic combo of speaker face + summary card; works well in portrait too." },
        { label: "picture-in-picture (pip)", description: "Card fills the canvas, video shrinks to a rounded corner window. Use when content is primary and speaker is secondary." },
        { label: "full-screen overlay (overlay)", description: "Video plays full-bleed, card floats as a glass layer on top. Strong cinematic / emotional feel." }
      ]
    },
    {
      question: "Choose the card visual style (style):",
      header: "Style group",
      multiSelect: false,
      // NOTE: these 3 groups intentionally match the frame auto-pick matrix
      // rows below, so picking a group resolves both `style` group AND the
      // frame matrix column in one step. Memberships are mutually exclusive.
      options: [
        { label: "warm paper (warm-paper)", description: "academic notebook · editorial big-type · whiteboard hand-drawn · xhs social. Best for interview reflections, product launches, lifestyle, emotional stories." },
        { label: "clinical / cold (clinical)",   description: "audit magazine · swiss grid · terminal CLI · minimal modern. Best for financial analysis, investigative reports, technical tutorials, serious presentations." },
        { label: "experimental / avant-garde (experimental)", description: "geom color-clash geometry · spotlight dark-background. Best for short-form highlights, product launches, strong emotion, cinematic feel." }
      ]
    },
    {
      question: "Card count (takeaway pacing): how many cards to cut?",
      header: "Card count",
      multiSelect: false,
      options: [
        { label: "Auto (recommended) · approx N cards", description: "Inferred automatically from video duration and information density (see Step 6 rules). This run estimates approx N cards. Substitute the real N (your autoCount) into the label." },
        { label: "Fewer · approx round(N × 0.6) cards", description: "Sparser cuts, each card holds longer — suits reflective / slow-paced content." },
        { label: "More · approx round(N × 1.5) cards", description: "Tighter cuts, faster rhythm — suits staccato / data-dense / short-form highlight content." }
      ]
    }
  ]
})
About "Other"
AskUserQuestion
automatically adds an "Other" option to the card count question. The user can type a number directly (e.g. "8", "20") as the cardCount target. Parse the input as an integer: if parsing succeeds → use that value (minimum 5 as a floor); if parsing fails → fall back to "auto".
Channel B — plain-text fallback (Codex CLI, runtimes without a native question tool). Post this as one normal message, then wait for the reply. Bullet-style 1/2/3/4 keeps the reply parseable:
I need to confirm four visual decisions with you before I start cutting cards:

1) Output aspect ratio (canvas):
   A. 16:9 landscape (1920×1080) — TV / YouTube / desktop playback
   B. 9:16 portrait (1080×1920) — TikTok / Reels / short-form mobile
   C. 4:5 near-portrait (1080×1350) — Instagram feed / works for both platforms
   ▸ My recommendation:  <recommendedRatio>  (matches source video W×H = <sourceW>×<sourceH>)

2) Overall layout (how video & card coexist):
   A. split   side-by-side (50/50)
   B. stack   top-bottom (video top, card bottom)
   C. pip     picture-in-picture (card full canvas, video rounded corner window)
   D. overlay full-screen glass overlay (video full-bleed, card glass layer)

3) Card style group (maps to frame auto-pick matrix, pick 1 of 3):
   A. warm paper (warm-paper)      (academic / editorial / whiteboard / xhs)
   B. clinical / cold (clinical)   (audit / swiss / terminal / minimal)
   C. experimental (experimental)  (geom / spotlight)

4) Card count (takeaway pacing):
   A. Auto (recommended) — approx <autoCount> cards
   B. Fewer — approx round(<autoCount> × 0.6) cards
   C. More — approx round(<autoCount> × 1.5) cards
   D. Give me a specific number (e.g. "8", "20")

Reply format: "1A 2C 3B 4A" or natural language is fine.
If you want all recommended defaults, reply "default" / "auto" / "use all recommendations".
Parsing the plain-text reply:
  • Accept loose formats:
    "1A 2C 3B 4A"
    ,
    "A C B A"
    ,
    "16:9 / pip / data / auto"
    , full sentences, or
    default
    .
  • If any answer is ambiguous → re-ask only the ambiguous ones (still inside the 2–5 cap).
  • If the user says "default / auto / use all recommendations" → skip without re-asking.
After the user answers (any channel):
  1. Resolve the output canvas from the ratio answer — these are the exact
    storyboard.composition.width / height
    values to write:
    user choicecomposition.width × heightstoryboard.layout field
    16:9
    1920 × 1080
    "landscape"
    9:16
    1080 × 1920
    "portrait"
    4:5
    1080 × 1350
    "portrait"
    (schema treats 4:5 as portrait — height > width)
    For 4:5 bounds inside
    references/layouts/*.html
    — those files only document landscape (1920×1080) and portrait (1080×1920). For 4:5 (1080×1350) derive bounds by proportional scaling from portrait: keep horizontal values, scale vertical values by
    1350/1920 ≈ 0.703
    . Example:
    overlay
    portrait card =
    { x: 24, y: 1280, w: 1032, h: 564 }
    → 4:5 card =
    { x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }
    =
    { x: 24, y: 900, w: 1032, h: 397 }
    .
  2. Map the style group to a specific style by looking at the transcript tone — pick the one that best fits, but stay inside the user's chosen group. If you're unsure between two specific styles inside the group, send a second
    AskUserQuestion
    with those 2–4 specific style options.
  3. Resolve final cardCount from the density answer:
    user choicefinal cardCount
    Auto (recommended)the
    autoCount
    you already computed
    Fewer
    max(5, round(autoCount × 0.6))
    More
    round(autoCount × 1.5)
    (no upper clamp)
    Other = "<n>" (integer)
    max(5, parseInt(n))
    Other = anything elsefall back to
    autoCount
  4. Auto-pick the video frame from this table (frames don't ask the user — they follow from layout × style):
    layoutwarm-paper styles (academic / whiteboard / editorial / xhs)clinical styles (audit / swiss / terminal / minimal)experimental styles (geom / spotlight)
    split
    polaroid
    hairline
    clean
    stack
    polaroid
    hairline
    clean
    pip
    clean
    (pip pill already has chrome)
    clean
    clean
    overlay
    clean
    (full-bleed forbids deco frames)
    clean
    clean
  5. Tell the user what you chose in one sentence — ratio (+ canvas size), layout, specific style, frame, and final cardCount — then proceed with the rest of Step 7 (per-card layouts, motion patterns).
  6. Record the five values (ratio / layout / style / frame / cardCount) in working memory (no schema field needed); you'll reference them while writing each card's HTML in Step 8 and while reading the matching
    references/<dim>/<key>.html
    for tokens and structure.
If the user picks an answer via "Other" with a free-text style name not in the 10-style library, treat it as a hint to design a fresh card visual yourself, but still anchor on the chosen layout's bounds.
开始设计卡片或确定边界前,请用户选择输出比例、布局、风格和卡片密度预设。帧会根据所选布局×风格组合自动选择(见下方“自动选择帧”表格)。发送问题前,预先计算两件事
  1. recommendedRatio
    (根据源视频宽高比,即
    metadata.json
    中的width / height):
    • sourceAspect = width / height
    • sourceAspect ≥ 1.5
      (≥ ~3:2宽屏)→ 推荐**
      16:9
      **
    • sourceAspect ≤ 0.7
      (≤ ~9:13竖屏)→ 推荐**
      9:16
      **
    • 0.7 < sourceAspect < 1.5
      (接近正方形)→ 推荐**
      4:5
      **
    在推荐选项的标签后添加“(推荐 · 匹配源视频X:Y)”,让用户了解推荐理由。
  2. autoCount
    (来自步骤6的计算值,即
    max(5, round(视频时长秒数 / (基础节奏 × 密度乘数)))
    ),这样“自动”选项的标签可显示具体数字。
环境兼容性——选择最佳提问渠道。并非所有运行时都提供相同的结构化提问工具。遵循以下优先级:
  1. AskUserQuestion
    (Claude Code、Anthropic控制台)——使用下方的结构化4问题调用。
  2. 其他原生澄清工具(如
    ask_question
    request_user_input
    、IDE特定提示)——使用该工具,问题文本和选项列表保持一致。保留推荐标记和预先计算的值。
  3. 无原生工具(Codex CLI、纯文本运行时)——直接在对话中提问。使用本节末尾的纯文本模板。保持为一条消息,4个编号问题(全局每轮最多2–5个问题;此处符合要求)。
适用于所有渠道的规则:
  • 每轮最多提问2–5个问题。此处的4个问题符合要求。
  • 即使缺少信息不影响渲染,也要确认对最终输出有重大影响的参数(比例、布局、风格、卡片数量)。
  • 如果用户已预先批准默认值(“使用默认值”“无需提问”“自动选择所有选项”)或要求不要提问——完全跳过提问,使用:
    recommendedRatio
    layout="stack"
    (跨比例最安全的默认值)、根据字幕语气从最中性组(编辑/数据)选择
    style
    autoCount
    。用一句话告知用户你的选择并继续。
渠道A — 原生
AskUserQuestion
// 调用前预先计算:
//   recommendedRatio = "16:9" | "9:16" | "4:5"
//   autoCount        = 整数(来自步骤6)

AskUserQuestion({
  questions: [
    {
      question: "输出视频宽高比(画布):",
      header: "宽高比",
      multiSelect: false,
      // 重新排序,让推荐选项排在最前面(遵循AskUserQuestion约定)。
      // 在推荐选项的标签后添加“(推荐 · 匹配源视频W×H)”。
      options: [
        { label: "16:9 (1920×1080) 横屏", description: "电视/YouTube/桌面播放。源视频为横屏时最自然;画布最宽。" },
        { label: "9:16 (1080×1920) 竖屏", description: "TikTok/Reels/短视频移动端。源视频为竖屏时最自然;原生移动端体验。" },
        { label: "4:5 (1080×1350) 近竖屏", description: "Instagram朋友圈/微信朋友圈。源视频接近正方形或需要覆盖多平台时最佳。" }
      ]
    },
    {
      question: "选择整体布局:视频和卡片如何在画布上共存?",
      header: "布局",
      multiSelect: false,
      options: [
        { label: "左右分栏(split)",  description: "视频和卡片各占画布一半。访谈/数据并列时最稳定;视觉分隔清晰。" },
        { label: "上下堆叠(stack)",    description: "视频在上(约52%),卡片在下。演讲者面部+摘要卡片的经典组合;竖屏也适用。" },
        { label: "画中画(pip)", description: "卡片填满画布,视频缩小为圆角窗口。内容为主、演讲者为辅时使用。" },
        { label: "全屏叠加(overlay)", description: "视频全屏播放,卡片作为玻璃层悬浮在上方。强烈的电影感/情感氛围。" }
      ]
    },
    {
      question: "选择卡片视觉风格(style):",
      header: "风格组",
      multiSelect: false,
      // 注意:这3组与下方的帧自动选择矩阵行完全匹配
      // 选择一组即可同时确定`style`组和帧矩阵列。各组互斥。
      options: [
        { label: "温暖纸张风(warm-paper)", description: "学术笔记本·大字体编辑风格·手绘白板·小红书社交风。访谈反思、产品发布、生活方式、情感故事最佳。" },
        { label: "冷峻专业风(clinical)",   description: "审计杂志·瑞士网格·终端CLI·极简现代风。财务分析、调查报告、技术教程、正式演示最佳。" },
        { label: experimental / avant-garde (experimental)", description: "几何撞色·暗背景聚光灯。短视频高光、产品发布、强烈情感、电影感内容最佳。" }
      ]
    },
    {
      question: "卡片数量(重点内容节奏):需要制作多少张卡片?",
      header: "卡片数量",
      multiSelect: false,
      options: [
        { label: "自动(推荐)· 约N张卡片", description: "根据视频时长和信息密度自动推断(见步骤6规则)。本次运行预计约N张卡片。将实际N值(你的autoCount)替换到标签中。" },
        { label: "更少· 约round(N × 0.6)张卡片", description: "切换更稀疏,每张卡片停留更长——适合反思/慢节奏内容。" },
        { label: "更多· 约round(N × 1.5)张卡片", description: "切换更紧凑,节奏更快——适合急促/数据密集/短视频高光内容。" }
      ]
    }
  ]
})
关于“其他”选项
AskUserQuestion
会自动在卡片数量问题中添加“其他”选项。用户可直接输入数字(如“8”“20”)作为卡片数量目标。将输入解析为整数:解析成功→使用该值(下限为5);解析失败→ fallback到“自动”。
渠道B — 纯文本 fallback(Codex CLI、无原生提问工具的运行时)。将以下内容作为一条普通消息发送,然后等待回复。使用1/2/3/4的项目符号格式让回复易于解析:
开始制作卡片前,我需要与你确认四个视觉决策:

1) 输出宽高比(画布):
   A. 16:9横屏(1920×1080)——电视/YouTube/桌面播放
   B. 9:16竖屏(1080×1920)——TikTok/Reels/短视频移动端
   C. 4:5近竖屏(1080×1350)——Instagram朋友圈/适配多平台
   ▸ 我的推荐: <recommendedRatio> (匹配源视频W×H = <sourceW>×<sourceH>)

2) 整体布局(视频与卡片如何共存):
   A. split 左右分栏(50/50)
   B. stack 上下堆叠(视频在上,卡片在下)
   C. pip 画中画(卡片填满画布,视频为圆角窗口)
   D. overlay 全屏玻璃叠加(视频全屏,卡片为玻璃层)

3) 卡片风格组(对应帧自动选择矩阵,3选1):
   A. warm paper(warm-paper)      (学术/编辑/白板/小红书)
   B. clinical / cold(clinical)   (审计/瑞士风格/终端/极简)
   C. experimental(experimental)  (几何/聚光灯)

4) 卡片数量(重点内容节奏):
   A. 自动(推荐)——约<autoCount>张卡片
   B. 更少——约round(<autoCount> × 0.6)张卡片
   C. 更多——约round(<autoCount> × 1.5)张卡片
   D. 指定具体数字(如“8”“20”)

回复格式:“1A 2C 3B 4A”或自然语言均可。
如果要使用所有推荐默认值,回复“default”/“auto”/“使用所有推荐选项”。
解析纯文本回复:
  • 接受松散格式:
    "1A 2C 3B 4A"
    "A C B A"
    "16:9 / pip / data / auto"
    、完整句子或
    default
  • 如果任何答案模糊→仅重新提问模糊的问题(仍保持在2–5个问题上限内)。
  • 如果用户回复“default / auto / 使用所有推荐选项”→跳过提问。
用户回复后(任何渠道):
  1. 根据宽高比答案解析输出画布——以下是要写入
    storyboard.composition.width / height
    的精确值:
    用户选择合成文件宽×高storyboard.layout字段
    16:9
    1920 × 1080
    "landscape"
    9:16
    1080 × 1920
    "portrait"
    4:5
    1080 × 1350
    "portrait"
    (schema将4:5视为竖屏——高度>宽度)
    对于
    references/layouts/*.html
    中的4:5边界——这些文件仅记录横屏(1920×1080)和竖屏(1080×1920)。对于4:5(1080×1350),需通过竖屏比例缩放推导边界:保持水平值不变,垂直值乘以
    1350/1920 ≈ 0.703
    。示例:竖屏
    overlay
    卡片 =
    { x: 24, y: 1280, w: 1032, h: 564 }
    → 4:5卡片 =
    { x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }
    =
    { x: 24, y: 900, w: 1032, h: 397 }
  2. 根据字幕语气将风格组映射到具体风格——选择最匹配的风格,但需在用户选择的组内。如果组内两个具体风格难以抉择,发送第二个
    AskUserQuestion
    ,提供这2–4个具体风格选项。
  3. 根据密度答案解析最终卡片数量
    用户选择最终卡片数量
    自动(推荐)你已计算的
    autoCount
    更少
    max(5, round(autoCount × 0.6))
    更多
    round(autoCount × 1.5)
    (无上限)
    其他 = "<n>"(整数)
    max(5, parseInt(n))
    其他 = 其他内容fallback到
    autoCount
  4. 根据下表自动选择视频帧(无需询问用户——由布局×风格决定):
    布局warm-paper风格(学术/白板/编辑/小红书)clinical风格(审计/瑞士/终端/极简)experimental风格(几何/聚光灯)
    split
    polaroid
    hairline
    clean
    stack
    polaroid
    hairline
    clean
    pip
    clean
    (画中画已自带边框)
    clean
    clean
    overlay
    clean
    (全屏视频不适合装饰性边框)
    clean
    clean
  5. 用一句话告知用户你的选择——宽高比(+画布尺寸)、布局、具体风格、帧、最终卡片数量——然后继续步骤7的剩余部分(每张卡片的布局、动画模式)。
  6. 将五个值(宽高比/布局/风格/帧/卡片数量)记录到工作内存中(无需schema字段);步骤8编写每张卡片的HTML和步骤9读取匹配的
    references/<dim>/<key>.html
    获取模板和结构时会用到这些值。
如果用户通过“其他”选项选择了10种风格库之外的自由文本风格名称,将其视为设计全新卡片视觉的提示,但仍需基于所选布局的边界。

Render Strategy Inputs

渲染策略输入

With ratio / layout / style / cardCount / frame locked from Step 7.0, the remaining per-card decisions are:
  • Source-video fit inside the GSAP target: video element has
    object-fit: cover
    and is clipped to
    #video-wrap
    's tween bounds. If you want NO cropping (e.g. portrait source on landscape canvas shouldn't get its top/bottom chopped), aim the tween at a rect that matches the source's aspect ratio and let surrounding canvas show through (or fill with the card / a backdrop).
  • card.zone
    per card
    : derive from your chosen composition layout (split → side-panel, stack → lower-third, pip → fullscreen, overlay → video-overlay), OR pick a different zone for one-off variants (fullscreen for hero / quote, whiteboard-area for dense data).
  • accentIndex
    per card
    : each card pulls one of the 5 theme accent colors. Vary across cards for rhythm; reuse the same index when two cards belong to the same narrative beat.
  • Motion vocabulary: pick 2–3 repeatable patterns from
    data-anim
    kinds (see the table later) and stick to them so the composition feels coherent.
Pick from these
themeId
palettes (use them as
--accent-N
/
--bg
/
--text
CSS variables in your composition
<style>
block):
themeIdaccent palette (5 colors)board bgtext
classic
#1971c2 #e03131 #2f9e44 #e8590c #9c36b5
#FFF9E3
(paper)
#1e1e1e
noir
#4cc9f0 #f72585 #4ade80 #fb923c #a78bfa
#1a1a1a
#f1f1f1
mint
#0077b6 #d62828 #2d6a4f #e76f51 #7209b7
#e8faf0
#1b4332
craft
#bf5700 #d62728 #6c757d #e9b54a #3d5a80
#f6efe1
#2d2d2d
slate
#0ea5e9 #ef4444 #22c55e #f97316 #a855f7
#1e293b
#f1f5f9
mono
#000 #555 #888 #aaa #ccc
#fff
#000
Available fonts (woff2 in
<SKILL_DIR>/assets/fonts/
, staged to work dir in Step 9):
Caveat
(handwriting),
LXGW WenKai TC
(Chinese hand-script),
Inter
(modern sans),
Virgil
(geometric hand). Reference via
@font-face
or
font-family
directly.
For inspiration on visual patterns,
<SKILL_DIR>/references/styles/
ships 10 self-contained reference cards (academic / editorial / minimal / spotlight / geom / whiteboard / audit / terminal / swiss / xhs) that you can copy as starting points — but do not feel constrained to match any of these. Each card is your own design.
步骤7.0已锁定宽高比/布局/风格/卡片数量/帧,剩余每张卡片的决策包括:
  • 源视频在GSAP目标中的适配:视频元素设置
    object-fit: cover
    ,并被裁剪到
    #video-wrap
    的动画边界。如果不想裁剪(例如竖屏源视频在横屏画布上不希望顶部/底部被切掉),将动画目标设置为与源视频宽高比匹配的矩形,让周围画布显示(或填充卡片/背景)。
  • 每张卡片的
    card.zone
    :从所选合成布局推导(split→side-panel,stack→lower-third,pip→fullscreen,overlay→video-overlay),或为特殊变体选择不同区域(fullscreen用于核心/引用,whiteboard-area用于密集数据)。
  • 每张卡片的
    accentIndex
    :每张卡片使用5种主题强调色中的一种。在卡片间切换以形成节奏;属于同一叙事节拍的两张卡片可重复使用同一索引。
  • 动画词汇:从
    data-anim
    类型中选择2–3种可重复的模式(见下表)并保持一致,让合成文件感觉连贯。
从以下
themeId
调色板中选择(在合成文件
<style>
块中作为
--accent-N
/
--bg
/
--text
CSS变量使用):
themeId强调色调色板(5种颜色)背景色文本色
classic
#1971c2 #e03131 #2f9e44 #e8590c #9c36b5
#FFF9E3
(纸张色)
#1e1e1e
noir
#4cc9f0 #f72585 #4ade80 #fb923c #a78bfa
#1a1a1a
#f1f1f1
mint
#0077b6 #d62828 #2d6a4f #e76f51 #7209b7
#e8faf0
#1b4332
craft
#bf5700 #d62728 #6c757d #e9b54a #3d5a80
#f6efe1
#2d2d2d
slate
#0ea5e9 #ef4444 #22c55e #f97316 #a855f7
#1e293b
#f1f5f9
mono
#000 #555 #888 #aaa #ccc
#fff
#000
可用字体(
<SKILL_DIR>/assets/fonts/
中的woff2文件,步骤9会复制到工作目录):
Caveat
(手写体)、
LXGW WenKai TC
(中文手写体)、
Inter
(现代无衬线体)、
Virgil
(几何手写体)。可通过
@font-face
或直接使用
font-family
引用。
如需视觉模式灵感,
<SKILL_DIR>/references/styles/
包含10个独立的参考卡片(学术/编辑/极简/聚光灯/几何/白板/审计/终端/瑞士/小红书风格),可作为起点复制——但无需局限于这些模板。每张卡片都可自行设计。

Visual Design Library (<SKILL_DIR>/references/)

视觉设计库(<SKILL_DIR>/references/)

Beyond the composition-level
themeId
, the skill ships a richer reference library at
<SKILL_DIR>/references/
covering three orthogonal visual dimensions you can freely mix:
Style  ×  Layout  ×  VideoFrame
 (10)      (4)         (3)
dimensionkeyswhat it decides
style
academic
editorial
minimal
spotlight
geom
whiteboard
audit
terminal
swiss
xhs
the card's visual language — fonts, colors, ornament, layout-within-card
layout
split
stack
pip
overlay
how the source video and the card share the canvas
frame
clean
hairline
polaroid
the decorative chrome around the video element
Read
<SKILL_DIR>/references/DESIGN_INDEX.md
for the full matrix and a loose decision guide (interview / product launch / data analysis / social clip / technical tutorial / emotional story …). When you decide to use a specific style / layout / frame, Read the corresponding file:
  • references/styles/<key>.html
    — self-contained card fragment with that style's CSS tokens (colors, fonts, padding, ornament) and a placeholder takeaway. Copy the
    .card[data-card-id="ref-<key>"]
    style block, rename the data-card-id to your card's id, swap the placeholder content for the real takeaway, and you're done.
  • references/layouts/<key>.html
    — exact
    videoBounds
    +
    cardBounds
    for both landscape and portrait, with a copy-paste JSON snippet for
    storyboard.json
    's per-card
    layout
    field.
  • references/frames/<key>.html
    — decorative HTML to add as a sibling of
    #video-wrap
    , plus placement instructions for the composition CSS.
Pick
style × layout × frame
per card — you can change all three between cards as long as the transitions read smoothly. A common rhythm: open
editorial × overlay × clean
, switch to
audit × split × hairline
for the data card, close on
whiteboard × pip × polaroid
.
The 10 styles are skill-side design tokens, not composition-level themes — they don't need to be declared in
storyboard.composition
; they live inside each card's HTML. The
themeId
field can still pick a composition-level palette (table above) that controls page-body background and video border chrome.
除了合成层的
themeId
,此技能还提供更丰富的参考库,位于
<SKILL_DIR>/references/
,涵盖三个正交的视觉维度,可自由组合:
风格  ×  布局  ×  视频帧
 (10)      (4)         (3)
维度取值决定内容
风格
academic
editorial
minimal
spotlight
geom
whiteboard
audit
terminal
swiss
xhs
卡片的视觉语言——字体、颜色、装饰、卡片内布局
布局
split
stack
pip
overlay
源视频和卡片如何共享画布
clean
hairline
polaroid
视频元素周围的装饰性边框
阅读
<SKILL_DIR>/references/DESIGN_INDEX.md
获取完整矩阵和宽松决策指南(访谈/产品发布/数据分析/社交视频/技术教程/情感故事……)。决定使用特定风格/布局/帧时,读取对应文件:
  • references/styles/<key>.html
    — 独立的卡片片段,包含该风格的CSS变量(颜色、字体、内边距、装饰)和占位内容。复制
    .card[data-card-id="ref-<key>"]
    样式块,将data-card-id重命名为你的卡片ID,替换占位内容为实际内容即可。
  • references/layouts/<key>.html
    — 横屏和竖屏的精确
    videoBounds
    +
    cardBounds
    ,包含可复制到
    storyboard.json
    每张卡片
    layout
    字段的JSON片段。
  • references/frames/<key>.html
    — 作为
    #video-wrap
    同级元素的装饰性HTML,以及合成文件CSS中的放置说明。
每张卡片可选择风格×布局×帧——只要过渡流畅,卡片间可全部更改。常见节奏:开场使用
editorial × overlay × clean
,数据卡片切换为
audit × split × hairline
,结尾使用
whiteboard × pip × polaroid
这10种风格是技能侧的设计变量,不是合成层主题——无需在
storyboard.composition
中声明;它们存在于每张卡片的HTML中。
themeId
字段仍可选择合成层调色板(见上表),控制页面背景和视频边框。

Layout Compositions (Card + Video)

布局合成(卡片+视频)

Two coordinated decisions per card define how it shares the canvas with the source video:
  • card.zone
    (declared in
    storyboard.json
    ) — one of the 5 schema values; resolve it into pixel bounds (per the table in Step 6) when you write the card-host wrapper's inline
    style
    in Step 9.
  • #video-wrap
    bounds at this card's time window
    (declared imperatively in the composition's GSAP timeline) — the agent tweens
    #video-wrap
    to a target rect for each layout transition.
Schema does NOT store per-card video bounds.
videoTrack.bounds
is one-time at composition level (defaults to full canvas). Video "moving" between cards is purely a GSAP animation authored in
index.html
. There is no
card.layout
field — earlier versions of this doc invented one; the real schema only has
card.zone
.
4 composition layouts (from
references/layouts/
) — each is a recipe pairing a
zone
with a
#video-wrap
tween target:
composition layoutrecommended
card.zone
GSAP target for
#video-wrap
(landscape 1920×1080)
GSAP target for
#video-wrap
(portrait 1080×1920)
when to use
split
side-panel
{ left: 960, top: 0, width: 960, height: 1080 }
{ left: 0, top: 960, width: 1080, height: 960 }
(bottom half)
speaker + data side-by-side / 50:50 weight
stack
lower-third
{ left: 14, top: 14, width: 1892, height: 548 }
(top 52%)
{ left: 0, top: 0, width: 1080, height: 844 }
(top 44%)
speaker on top + summary card below
pip
fullscreen
{ left: 1480, top: 760, width: 400, height: 300 }
+ add
.framed
class
{ left: 690, top: 28, width: 360, height: 203 }
+ add
.framed
content-heavy card + corner pip
overlay
video-overlay
{ left: 0, top: 0, width: 1920, height: 1080 }
(full-bleed)
{ left: 0, top: 0, width: 1080, height: 1920 }
cinematic / dramatic / glass card on full video
For 4:5 (1080×1350), scale portrait y/h values by
1350/1920 ≈ 0.703
(see Step 7.0 Channel A / Channel B
recommendedRatio
resolution table).
Other zone values for one-off variants (still uses
card.zone
; no fake "layout" field):
zone
resolved boundscommon use
fullscreen
covers whole canvashero card, video tweens to hidden/pip
whiteboard-area
inset 40px margin (landscape) or bottom 45% (portrait)dense data card, free margins
lower-third
bottom 30% bandtalking-head annotation
side-panel
right 42% (landscape) or bottom 40% (portrait)sidebar / "split" recipe
video-overlay
full canvas; expect transparent card rootglass overlay on full-bleed video
You can mix recipes per card — choose
card.zone
based on what suits the moment, then write the GSAP tween for
#video-wrap
between cards.
每张卡片的两个协同决策定义其与源视频共享画布的方式:
  • card.zone
    (在
    storyboard.json
    中声明)——5种schema取值之一;步骤9编写卡片容器的内联
    style
    时需将其解析为像素边界(见步骤6的表格)。
  • 此卡片时间窗口内的
    #video-wrap
    边界
    (在合成文件的GSAP时间线中声明)——Agent在每个布局过渡时将
    #video-wrap
    动画到目标矩形。
Schema不存储每张卡片的视频边界。
videoTrack.bounds
合成层一次性设置(默认全屏)。卡片间视频“移动”纯粹是
index.html
中编写的GSAP动画。没有
card.layout
字段——此文档早期版本曾提及,但实际schema只有
card.zone
4种合成布局(来自
references/layouts/
)——每种布局是
zone
#video-wrap
动画目标的组合:
合成布局推荐
card.zone
#video-wrap
的GSAP目标(横屏1920×1080)
#video-wrap
的GSAP目标(竖屏1080×1920)
使用场景
split
side-panel
{ left: 960, top: 0, width: 960, height: 1080 }
{ left: 0, top: 960, width: 1080, height: 960 }
(下半部分)
演讲者+数据并列/权重50:50
stack
lower-third
{ left: 14, top: 14, width: 1892, height: 548 }
(上半部分52%)
{ left: 0, top: 0, width: 1080, height: 844 }
(上半部分44%)
演讲者在上+摘要卡片在下
pip
fullscreen
{ left: 1480, top: 760, width: 400, height: 300 }
+ 添加
.framed
{ left: 690, top: 28, width: 360, height: 203 }
+ 添加
.framed
内容密集卡片+角落画中画
overlay
video-overlay
{ left: 0, top: 0, width: 1920, height: 1080 }
(全屏)
{ left: 0, top: 0, width: 1080, height: 1920 }
电影感/戏剧性/玻璃卡片在全屏视频上
对于4:5(1080×1350),将竖屏的y/h值乘以
1350/1920 ≈ 0.703
(见步骤7.0渠道A/渠道B的
recommendedRatio
解析表格)。
特殊变体的其他zone取值(仍使用
card.zone
;无虚假“layout”字段):
zone
解析后的边界常见使用场景
fullscreen
覆盖整个画布核心卡片,视频动画到隐藏/画中画
whiteboard-area
内边距40px(横屏)或底部45%(竖屏)密集数据卡片,自由边距
lower-third
底部30%区域访谈视频注释
side-panel
右侧42%(横屏)或底部40%(竖屏)侧边栏/“split”布局
video-overlay
整个画布;卡片根元素需透明全屏视频上的玻璃叠加层
可在卡片间混合布局——根据当前场景选择
card.zone
,然后在卡片间编写
#video-wrap
的GSAP动画。

Storyboard Render Contract

故事板渲染约定

storyboard.json
is an agent-internal planning artifact — no CLI command parses it. It exists to keep your timing and content decisions explicit before you write each card's HTML. Stick to the v3-style shape below so the same outline drives the composition you assemble in Step 9.
Required structure (see Step 6 for the full example):
  • schemaVersion: 3
  • composition: { fps, width, height, durationSeconds, layout, themeId, seed }
    — note
    durationSeconds
    /
    fps
    /
    themeId
    /
    layout
    live inside
    composition
    , NOT at top level
  • videoTrack: { sourcePath, startSec, endSec, bounds? }
    — video bounds default to full canvas
  • subtitles: { enabled, ... }
  • cards[]
    — each card has the 6 required fields:
    id
    ,
    intent
    ,
    startSec
    ,
    endSec
    ,
    accentIndex
    ,
    zone
    ,
    contentHints
Rules:
  • Card times stay inside
    composition.durationSeconds
    and should not overlap unless intentional (use
    data-track-index
    to control z-order when they do).
  • Visual details live in card HTML fragments (Step 8), NOT in
    contentHints
    .
    contentHints
    is your own structured prompt for designing the card; the rendered look is the HTML.
  • Keep the storyboard shape stable — even though nothing parses it, you read it back while authoring Step 8/9, and consistency keeps card IDs and timing in sync.
  • Agent-side decisions like "I picked overlay × geom × clean" do NOT belong in
    storyboard.json
    — keep them in working memory and use them when authoring card HTML + GSAP tweens.
Transparent card backgrounds for cards that share canvas with video. When the GSAP tween leaves video visible behind/beside the card (overlay recipe, pip recipe, or any
card.zone = 'lower-third' | 'video-overlay'
moment), the card's
.root
MUST NOT paint a full opaque background — otherwise it occludes the video. Two patterns:
css
/* Pattern A: transparent root, page body provides the cream backdrop */
html,
body {
  background: var(--bg);
}
.card[data-card-id="card-X"] .root {
  background: transparent;
}

/* Pattern B: explicit per-card background ONLY for fullscreen cards */
.card[data-card-id="card-hero"] .root {
  background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
  background: transparent;
}
For
side-panel
-zone cards (split recipe), the card-host is already only half the canvas, so an opaque card bg is fine — it only covers its half.
storyboard.json
是Agent内部的规划文件——没有CLI命令会解析它。它的作用是让你在编写每张卡片的HTML前明确时间和内容决策。保持以下v3风格结构,这样同一大纲可用于步骤9的合成文件整合。
必需结构(见步骤6的完整示例):
  • schemaVersion: 3
  • composition: { fps, width, height, durationSeconds, layout, themeId, seed }
    — 注意
    durationSeconds
    /
    fps
    /
    themeId
    /
    layout
    在**
    composition
    内部**,不在顶层
  • videoTrack: { sourcePath, startSec, endSec, bounds? }
    — 视频边界默认全屏
  • subtitles: { enabled, ... }
  • cards[]
    — 每张卡片包含6个必需字段:
    id
    ,
    intent
    ,
    startSec
    ,
    endSec
    ,
    accentIndex
    ,
    zone
    ,
    contentHints
规则:
  • 卡片时间需在
    composition.durationSeconds
    内,除非有意重叠(重叠时使用
    data-track-index
    控制层级)。
  • 视觉细节在卡片HTML片段中(步骤8),不在
    contentHints
    中。
    contentHints
    是你设计卡片的结构化提示;最终视觉效果由HTML决定。
  • 保持故事板结构稳定——即使没有工具解析它,步骤8/9编写时你会回看它,一致性可保持卡片ID和时间同步。
  • Agent侧的决策(如“我选择了overlay × geom × clean”)不属于
    storyboard.json
    ——记录到工作内存中,编写卡片HTML + GSAP动画时使用。
与视频共享画布的卡片需透明背景。当GSAP动画让视频在卡片后/旁可见时(overlay布局、pip布局,或任何
card.zone = 'lower-third' | 'video-overlay'
场景),卡片的
.root
不能设置完全不透明的背景——否则会遮挡视频。两种模式:
css
/* 模式A:透明根元素,页面背景提供米色背景 */
html,
body {
  background: var(--bg);
}
.card[data-card-id="card-X"] .root {
  background: transparent;
}

/* 模式B:仅全屏卡片设置明确背景 */
.card[data-card-id="card-hero"] .root {
  background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
  background: transparent;
}
对于
side-panel
区域的卡片(split布局),卡片容器仅占画布一半,因此不透明背景是可行的——仅覆盖其所在的一半。

8. Write Each Card's HTML

8. 编写每张卡片的HTML

Create
$WORK_DIR/public/cards/{card-id}.html
for each card. Each file contains a single rooted HTML fragment that follows this contract:
为每张卡片创建
$WORK_DIR/public/cards/{card-id}.html
。每个文件包含一个符合以下约定的根HTML片段:

Card HTML Contract

卡片HTML约定

html
<div class="card" data-card-id="{cardId}">
  <style>
    /* MUST: every rule starts with .card[data-card-id="{cardId}"] */
    .card[data-card-id="card-01"] .root {
      width: 100%; height: 100%;
      display: flex; ...;
      font-family: 'Caveat', 'LXGW WenKai TC', serif;
      color: var(--text);
      background: var(--bg);
    }
    .card[data-card-id="card-01"] .title { font-size: 84px; ... }
  </style>

  <div class="root">
    <h1
      id="card-01-title"
      data-anim="kinetic-chars"
      data-anim-at="0.3"
      data-anim-duration="0.5"
      data-anim-stagger="0.04"
      data-anim-pattern="pop"
    >
      <span class="char">S</span>
      <span class="char">u</span>
    </h1>
    <div
      id="card-01-line"
      data-anim="grow-x"
      data-anim-at="0.65"
      data-anim-duration="0.5"
      data-anim-target-w="420"
      style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
    ></div>
  </div>
</div>
Hard rules (
hyperframes
lint will reject violations):
  • Single root
    <div class="card" data-card-id="{cardId}">
  • Inline
    <style>
    rules MUST be prefixed with the scope selector above
  • No
    <script>
    tags
  • No external URLs in
    src=
    /
    href=
    (no CDN, no remote fonts)
  • No inline event handlers (
    onclick=
    etc.)
  • All assets via relative paths into the same
    public/
    directory
  • Colors via
    var(--accent-N)
    etc. for portability across themes
Animations are declared, not coded. Use
data-anim-*
attributes only; never write
<script>
to animate. You compile every
data-anim-*
declaration into the single master GSAP timeline in Step 9.
html
<div class="card" data-card-id="{cardId}">
  <style>
    /* 必须:每个规则以.card[data-card-id="{cardId}"]开头 */
    .card[data-card-id="card-01"] .root {
      width: 100%; height: 100%;
      display: flex; ...;
      font-family: 'Caveat', 'LXGW WenKai TC', serif;
      color: var(--text);
      background: var(--bg);
    }
    .card[data-card-id="card-01"] .title { font-size: 84px; ... }
  </style>

  <div class="root">
    <h1
      id="card-01-title"
      data-anim="kinetic-chars"
      data-anim-at="0.3"
      data-anim-duration="0.5"
      data-anim-stagger="0.04"
      data-anim-pattern="pop"
    >
      <span class="char">S</span>
      <span class="char">u</span>
    </h1>
    <div
      id="card-01-line"
      data-anim="grow-x"
      data-anim-at="0.65"
      data-anim-duration="0.5"
      data-anim-target-w="420"
      style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
    ></div>
  </div>
</div>
硬性规则
hyperframes
lint会拒绝违规内容):
  • 单个根元素
    <div class="card" data-card-id="{cardId}">
  • 内联
    <style>
    规则必须以上述范围选择器开头
  • 禁止
    <script>
    标签
  • 禁止
    src=
    /
    href=
    中使用外部URL
    (无CDN,无远程字体)
  • 禁止内联事件处理程序(如
    onclick=
  • 所有资源使用同一
    public/
    目录下的相对路径
  • 使用
    var(--accent-N)
    等变量实现跨主题可移植性
动画仅声明,无需编码。仅使用
data-anim-*
属性;永远不要编写
<script>
来实现动画。步骤9会将每个
data-anim-*
声明编译到单个主GSAP时间线中。

Card Sizing — Mobile-First in Portrait

卡片尺寸——竖屏优先

The 10
references/styles/*.html
are sized for a 1920×1080 landscape preview. When
storyboard.layout = "portrait"
(1080×1920, the dominant case for social / mobile), scale every visual size up — phones hold the screen close, and the same pixel count reads smaller than on a landscape TV-style canvas.
tokenlandscape baselineportrait targetscale
title (h1/h2 hero)64–96px88–132px×1.35
detail / body24–30px30–40px×1.30
kicker / chip label14–16px18–22px×1.30
timecode / meta12–14px16–18px×1.30
data block primary number48–60px64–88px×1.40
line-height multiplier1.05–1.5same(don't scale)
Rule of thumb:
portraitPx = round(landscapePx × 1.3)
, then floor to a nearby 4px multiple for visual rhythm. Hero headlines may go up to ×1.4; small meta text stays at ×1.2 to avoid crowding.
Padding shrinks slightly in portrait — the card is narrower so big landscape padding (40–64px) eats too much width. Use 24–36px horizontal padding in portrait.
If you're producing a single card that must work in both layouts, prefer a
@container
query on the card root over hard-coding sizes:
css
.card[data-card-id="X"] .root {
  container-type: inline-size;
}
.card[data-card-id="X"] .title {
  font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
  font-size: clamp(24px, 3.2cqi, 40px);
}
But for most cards, a single layout choice is fine — just pick the size table column that matches the storyboard's
layout
field.
10个
references/styles/*.html
1920×1080横屏为预览尺寸。当
storyboard.layout = "portrait"
(1080×1920,社交/移动端主流)时,放大所有视觉尺寸——手机屏幕观看距离近,相同像素数在竖屏上比横屏电视画布上显得更小。
变量横屏基准竖屏目标缩放比例
标题(h1/h2核心)64–96px88–132px×1.35
详情/正文24–30px30–40px×1.30
副标题/标签14–16px18–22px×1.30
时间码/元数据12–14px16–18px×1.30
数据块主数字48–60px64–88px×1.40
行高乘数1.05–1.5相同(不缩放)
经验法则
竖屏像素 = round(横屏像素 × 1.3)
,然后向下取整到最近的4倍数以保证视觉节奏。核心标题可放大到×1.4;小元文本保持×1.2以避免拥挤。
竖屏中的内边距略微缩小——卡片更窄,横屏的大尺寸内边距(40–64px)会占用过多宽度。竖屏中使用24–36px的水平内边距。
如果要制作同时适配两种布局的单个卡片,优先在卡片根元素上使用
@container
查询,而非硬编码尺寸:
css
.card[data-card-id="X"] .root {
  container-type: inline-size;
}
.card[data-card-id="X"] .title {
  font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
  font-size: clamp(24px, 3.2cqi, 40px);
}
但大多数卡片只需选择一种布局——只需选择与故事板
layout
字段匹配的尺寸表格列即可。

Available
data-anim
Kinds

可用的
data-anim
类型

kinduse forkey params
fade-in
enter
at
,
duration
,
ease?
fade-out
exit
at
,
duration
,
ease?
slide-in
slide enter
at
,
duration
,
from=left|right|top|bottom
,
distance
kinetic-chars
per-char pop
at
,
duration
,
stagger
,
pattern=pop|fade
— element needs
<span class="char">
children
typewriter
per-char fadesame as kinetic-chars but slower default stagger
count-up
animate number
at
,
duration
,
from
,
to
,
format=.0f|.1f|.2f|,d
draw-path
SVG path reveal
at
,
duration
— element should be a
<path>
grow-y
bar height
at
,
duration
,
target-h
(px) — element starts
height:0
grow-x
bar width
at
,
duration
,
target-w
(px) — element starts
width:0
scale-pop
pop entrance
at
,
duration
blur-in
unfocused → focused
at
,
duration
mask-reveal
clip reveal
at
,
duration
,
direction=left|right|top|bottom
morph-to
tween any CSS
at
,
duration
,
props='{...JSON...}'
data-anim-at
is seconds relative to the card's startSec — when you compile each declaration into the GSAP timeline in Step 9, add the card's
startSec
to get the absolute time and quantize to 1/fps.
类型使用场景关键参数
fade-in
入场
at
,
duration
,
ease?
fade-out
退场
at
,
duration
,
ease?
slide-in
滑动入场
at
,
duration
,
from=left|right|top|bottom
,
distance
kinetic-chars
逐字符弹出
at
,
duration
,
stagger
,
pattern=pop|fade
— 元素需包含
<span class="char">
子元素
typewriter
逐字符淡入与kinetic-chars参数相同,但默认延迟更慢
count-up
数字动画
at
,
duration
,
from
,
to
,
format=.0f|.1f|.2f|,d
draw-path
SVG路径展示
at
,
duration
— 元素应为
<path>
grow-y
高度增长
at
,
duration
,
target-h
(px) — 元素初始
height:0
grow-x
宽度增长
at
,
duration
,
target-w
(px) — 元素初始
width:0
scale-pop
缩放入场
at
,
duration
blur-in
失焦到聚焦
at
,
duration
mask-reveal
遮罩展示
at
,
duration
,
direction=left|right|top|bottom
morph-to
任意CSS动画
at
,
duration
,
props='{...JSON...}'
data-anim-at
相对于卡片startSec的秒数——步骤9将每个声明编译到GSAP时间线时,需加上卡片的
startSec
得到绝对时间,并量化到1/fps。

9. Assemble the Composition HTML

9. 整合合成HTML

Stage the assets and write
$WORK_DIR/public/index.html
:
bash
undefined
准备资源并编写
$WORK_DIR/public/index.html
bash
undefined

SKILL_DIR is injected by the host ("Base directory for this skill: …")

SKILL_DIR由宿主注入("此技能的基础目录:…")

SKILL_DIR="<SKILL_DIR>"
mkdir -p "$WORK_DIR/public/fonts" "$WORK_DIR/public/vendor" "$WORK_DIR/public/cards" cp -n "$SKILL_DIR/assets/fonts/"* "$WORK_DIR/public/fonts/" cp -n "$SKILL_DIR/assets/vendor/gsap.min.js" "$WORK_DIR/public/vendor/"
SKILL_DIR="<SKILL_DIR>"
mkdir -p "$WORK_DIR/public/fonts" "$WORK_DIR/public/vendor" "$WORK_DIR/public/cards" cp -n "$SKILL_DIR/assets/fonts/"* "$WORK_DIR/public/fonts/" cp -n "$SKILL_DIR/assets/vendor/gsap.min.js" "$WORK_DIR/public/vendor/"

stage the input video — RE-ENCODE with dense keyframes. Sources with a sparse GOP

准备输入视频——重新编码为密集关键帧。关键帧间隔>~1秒的源视频在渲染器中会出现冻结(叠加层下的帧冻结);-g / -keyint_min设置为合成文件帧率可让每一帧都可寻址。

(keyframe interval > ~1s) freeze on seek in the renderer (a frozen frame under the

(设置为你的帧率——示例为30;可使用24/25/60匹配源视频。)

overlays); -g / -keyint_min set to your composition fps make every frame seekable.

(Set both to your fps — 30 shown; use 24/25/60 to match.)

ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"
undefined
ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"
undefined

Composition Template

合成文件模板

html
<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <style>
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "LXGW WenKai TC";
        src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "Virgil";
        src: url("fonts/Virgil.woff2") format("woff2");
        font-display: block;
      }

      :root {
        /* Pick from the themeId palette table in Step 7 — example: classic */
        --bg: #fff9e3;
        --text: #1e1e1e;
        --accent-0: #1971c2;
        --accent-1: #e03131;
        --accent-2: #2f9e44;
        --accent-3: #e8590c;
        --accent-4: #9c36b5;
        --font-family: "Caveat", "LXGW WenKai TC", serif;
      }
      * {
        box-sizing: border-box;
      }
      /* Body font-family MUST list concrete font names (not just var(--font-family)) —
   the HyperFrames renderer's static analyzer doesn't expand CSS variables when
   resolving fonts, so a var-only chain triggers `font_family_without_font_face`
   lint and falls back to a generic. Use the concrete chain here; cards that
   want the theme font can still reference var(--font-family) internally. */
      html,
      body {
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100%;
        overflow: hidden;
        background: #000;
        font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
      }
      #stage {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }

      /* video-wrapper holds the source video. Its position / size are animated
   over time by the master timeline (one tween per layout transition). */
      .video-wrapper {
        position: absolute;
        left: 0;
        top: 0;
        width: 1920px;
        height: 1080px;
        overflow: hidden;
        border-radius: 0;
        box-shadow: none;
      }
      .video-wrapper video {
        width: 100%;
        height: 100%;
        object-fit: cover;
      }

      .card-host {
        position: absolute;
        pointer-events: none;
        overflow: hidden;
      }
      .card-host .card {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }
      .card-host .char {
        display: inline-block;
        visibility: visible;
      }

      /* Subtle drop shadow + rounded corners for non-fullscreen video framings */
      .video-wrapper.framed {
        border-radius: 16px;
        box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
      }
    </style>
  </head>
  <body>
    <div
      id="stage"
      data-composition-id="talking-head-recut"
      data-start="0"
      data-duration="121.2"
      data-fps="30"
      data-width="1920"
      data-height="1080"
    >
      <!-- Layer 1: source video — initial position matches card-01's layout -->
      <div class="video-wrapper" id="video-wrap">
        <video
          id="bg-video"
          src="input-video.mp4"
          muted
          playsinline
          data-start="0"
          data-duration="121.2"
          data-track-index="1"
        ></video>
      </div>

      <!-- Layer 2: each card-host sits at the bounds dictated by its layout. -->
      <!-- IMPORTANT: every card-host MUST carry BOTH "card-host" and "clip" classes. -->
      <!--   - "card-host"  → our positioning + pointer-events styles                 -->
      <!--   - "clip"       → HyperFrames runtime uses this to enforce visibility     -->
      <!--                    only during data-start … data-start+data-duration.      -->
      <!--                    Without "clip" the host stays visible the whole video   -->
      <!--                    (lint: timed_element_missing_clip_class).               -->
      <!-- Example: card-01 with zone="fullscreen" → card-host covers (0,0,1920,1080) -->
      <div
        class="card-host clip"
        data-card-id="card-01"
        data-start="1.0000"
        data-duration="6.5000"
        data-track-index="2"
        style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- paste the contents of public/cards/card-01.html here -->
      </div>

      <!-- Example: card-02 with zone="side-panel" (split composition layout) → card on left half -->
      <div
        class="card-host clip"
        data-card-id="card-02"
        data-start="8.0000"
        data-duration="12.0000"
        data-track-index="2"
        style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- card-02 HTML -->
      </div>

      <!-- ...one "card-host clip" per card with inline bounds matching resolveZoneBounds(card.zone)... -->

      <script src="vendor/gsap.min.js"></script>
      <script>
        (function () {
          // count-up formatter helper
          window.__fmt = function (v, fmt) {
            if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
              return Number(v).toFixed(Number(fmt.slice(1, -1)));
            }
            if (fmt === ",d") return Math.round(v).toLocaleString();
            return String(Math.round(v));
          };

          const tl = window.gsap.timeline({ paused: true });

          // ── Card lifecycle (one block per card) ──
          // Example for card-01 [1.0, 7.5] with kinetic-chars at +0.3, grow-x at +0.65:

          // Enter (fade in over 0.4s)
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
          tl.fromTo(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            1.0,
          );

          // Card-internal anims (compile each data-anim-* declaration here)
          tl.from(
            '.card[data-card-id="card-01"] #card-01-title .char',
            { opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
            1.3,
          );
          tl.fromTo(
            '.card[data-card-id="card-01"] #card-01-line',
            { width: 0 },
            { width: 420, duration: 0.5, ease: "power2.out" },
            1.65,
          );

          // Exit (fade out over 0.35s, ending at endSec)
          tl.to(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0, duration: 0.35, ease: "power2.in" },
            7.15,
          );
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);

          // ── Video framing transitions ──
          // When the next card uses a different composition layout, animate the
          // video-wrapper to its new bounds. Example: card-01 = fullscreen
          // (video hidden behind), card-02 = split composition (zone="side-panel"
          // → video on right, card on left).

          // Card-02 enters at 8.0s with the split composition. Animate video to
          // the right half during the card-01 → card-02 gap (between 7.5 and 8.0s).
          tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
          tl.to(
            "#video-wrap",
            { left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
            7.5,
          );

          // Card-02 enter — same pattern as card-01
          tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
          tl.fromTo(
            '.card-host[data-card-id="card-02"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            8.0,
          );
          // ...card-02 internal anims...

          // ── repeat for each card; if the NEXT card's layout differs,
          //    insert another tl.to('#video-wrap', ...) tween before its enter ──

          window.__timelines = window.__timelines || {};
          window.__timelines["talking-head-recut"] = tl;
        })();
      </script>
    </div>
  </body>
</html>
html
<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <style>
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "LXGW WenKai TC";
        src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "Virgil";
        src: url("fonts/Virgil.woff2") format("woff2");
        font-display: block;
      }

      :root {
        /* 从步骤7的themeId调色板表格中选择——示例:classic */
        --bg: #fff9e3;
        --text: #1e1e1e;
        --accent-0: #1971c2;
        --accent-1: #e03131;
        --accent-2: #2f9e44;
        --accent-3: #e8590c;
        --accent-4: #9c36b5;
        --font-family: "Caveat", "LXGW WenKai TC", serif;
      }
      * {
        box-sizing: border-box;
      }
      /* 正文font-family必须列出具体字体名称(不能仅使用var(--font-family))——
   HyperFrames渲染器的静态分析器解析字体时不会展开CSS变量,仅使用变量会触发`font_family_without_font_face`
   lint并回退到通用字体。此处使用具体字体链;卡片内部仍可使用var(--font-family)引用主题字体。 */
      html,
      body {
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100%;
        overflow: hidden;
        background: #000;
        font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
      }
      #stage {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }

      /* video-wrapper包含源视频。其位置/尺寸由主时间线动画控制(每个布局过渡一个动画)。 */
      .video-wrapper {
        position: absolute;
        left: 0;
        top: 0;
        width: 1920px;
        height: 1080px;
        overflow: hidden;
        border-radius: 0;
        box-shadow: none;
      }
      .video-wrapper video {
        width: 100%;
        height: 100%;
        object-fit: cover;
      }

      .card-host {
        position: absolute;
        pointer-events: none;
        overflow: hidden;
      }
      .card-host .card {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }
      .card-host .char {
        display: inline-block;
        visibility: visible;
      }

      /* 非全屏视频的细微阴影+圆角 */
      .video-wrapper.framed {
        border-radius: 16px;
        box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
      }
    </style>
  </head>
  <body>
    <div
      id="stage"
      data-composition-id="talking-head-recut"
      data-start="0"
      data-duration="121.2"
      data-fps="30"
      data-width="1920"
      data-height="1080"
    >
      <!-- 第一层:源视频——初始位置匹配card-01的布局 -->
      <div class="video-wrapper" id="video-wrap">
        <video
          id="bg-video"
          src="input-video.mp4"
          muted
          playsinline
          data-start="0"
          data-duration="121.2"
          data-track-index="1"
        ></video>
      </div>

      <!-- 第二层:每个card-host位于其布局指定的边界。 -->
      <!-- 重要:每个card-host必须同时包含"card-host"和"clip"类。 -->
      <!--   - "card-host"  → 我们的定位+指针事件样式                 -->
      <!--   - "clip"       → HyperFrames运行时使用此类控制可见性     -->
      <!--                    仅在data-start … data-start+data-duration期间可见。      -->
      <!--                    没有"clip"类的话,宿主会在整个视频中可见   -->
      <!--                    (lint错误:timed_element_missing_clip_class)。               -->
      <!-- 示例:zone="fullscreen"的card-01 → card-host覆盖(0,0,1920,1080) -->
      <div
        class="card-host clip"
        data-card-id="card-01"
        data-start="1.0000"
        data-duration="6.5000"
        data-track-index="2"
        style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- 粘贴public/cards/card-01.html的内容到这里 -->
      </div>

      <!-- 示例:zone="side-panel"的card-02(split合成布局)→ 卡片在左半部分 -->
      <div
        class="card-host clip"
        data-card-id="card-02"
        data-start="8.0000"
        data-duration="12.0000"
        data-track-index="2"
        style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- card-02的HTML -->
      </div>

      <!-- ...每个卡片对应一个"card-host clip",内联边界匹配resolveZoneBounds(card.zone)... -->

      <script src="vendor/gsap.min.js"></script>
      <script>
        (function () {
          // count-up格式化工具
          window.__fmt = function (v, fmt) {
            if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
              return Number(v).toFixed(Number(fmt.slice(1, -1)));
            }
            if (fmt === ",d") return Math.round(v).toLocaleString();
            return String(Math.round(v));
          };

          const tl = window.gsap.timeline({ paused: true });

          // ── 卡片生命周期(每张卡片一个代码块) ──
          // 示例:card-01 [1.0, 7.5],kinetic-chars在+0.3,grow-x在+0.65:

          // 入场(0.4秒淡入)
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
          tl.fromTo(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            1.0,
          );

          // 卡片内部动画(将每个data-anim-*声明编译到这里)
          tl.from(
            '.card[data-card-id="card-01"] #card-01-title .char',
            { opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
            1.3,
          );
          tl.fromTo(
            '.card[data-card-id="card-01"] #card-01-line',
            { width: 0 },
            { width: 420, duration: 0.5, ease: "power2.out" },
            1.65,
          );

          // 退场(0.35秒淡出,在endSec结束)
          tl.to(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0, duration: 0.35, ease: "power2.in" },
            7.15,
          );
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);

          // ── 视频布局过渡 ──
          // 当下一张卡片使用不同合成布局时,将video-wrapper动画到新边界。示例:card-01=全屏
          // (视频在卡片后隐藏),card-02=split合成布局(zone="side-panel"
          // → 视频在右侧,卡片在左侧)。

          // card-02在8.0秒入场,使用split合成布局。在card-01→card-02的间隙(7.5到8.0秒)将视频动画到右半部分。
          tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
          tl.to(
            "#video-wrap",
            { left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
            7.5,
          );

          // card-02入场——与card-01模式相同
          tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
          tl.fromTo(
            '.card-host[data-card-id="card-02"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            8.0,
          );
          // ...card-02内部动画...

          // ── 为每张卡片重复上述步骤;如果下一张卡片布局不同,
          //    在其入场前插入另一个tl.to('#video-wrap', ...)动画 ──

          window.__timelines = window.__timelines || {};
          window.__timelines["talking-head-recut"] = tl;
        })();
      </script>
    </div>
  </body>
</html>

GSAP Statement Cheat Sheet

GSAP语句速查表

Compile each
data-anim
attribute into a GSAP statement. Times are absolute seconds = card.startSec + data-anim-at, quantized to 1/fps. Selector is
.card[data-card-id="X"] #elementId
.
data-animGSAP statement template
fade-in
tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);
fade-out
tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);
slide-in
(from=left, dist=80)
tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);
kinetic-chars
(pop)
tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);
count-up
(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();
draw-path
(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();
grow-x
(target-w=W)
tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);
grow-y
(target-h=H)
tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);
scale-pop
tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);
mask-reveal
(direction=left)
tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);
Quantize:
T = Math.round(absSec * fps) / fps
. At 30fps the smallest step is
1/30 ≈ 0.0333s
; rounding to 4 decimals (
.toFixed(4)
) is fine inside the JS literal.
将每个
data-anim
属性编译为GSAP语句。时间为绝对秒数 = card.startSec + data-anim-at,量化到1/fps。选择器为
.card[data-card-id="X"] #elementId
data-animGSAP语句模板
fade-in
tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);
fade-out
tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);
slide-in
(from=left, dist=80)
tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);
kinetic-chars
(pop)
tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);
count-up
(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();
draw-path
(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();
grow-x
(target-w=W)
tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);
grow-y
(target-h=H)
tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);
scale-pop
tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);
mask-reveal
(direction=left)
tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);
量化:
T = Math.round(absSec * fps) / fps
。30fps时最小步长为
1/30 ≈ 0.0333s
;在JS字面量中保留4位小数(
.toFixed(4)
)即可。

Video Framing Reference (per
layout
value)

视频布局参考(按
layout
取值)

The selector for the video container is
#video-wrap
. Animate its bounds between cards using
tl.to('#video-wrap', { ...bounds }, T)
. Initial bounds should be set inline on the element to match card-01's layout. Pick a transition duration of 0.5–0.7s with
ease: 'power2.inOut'
.
Decorative frames (
clean
/
hairline
/
polaroid
) sit as a sibling of
#video-wrap
and follow it through layout transitions. See
references/frames/
for each frame's placement HTML, suggested CSS, and which layouts it pairs with. Quick rule:
overlay
layout suppresses decorative frames (the full-bleed video clashes with chrome); PiP layouts already have their own pill treatment (border-radius + white ring + shadow), so add a decorative frame only on top of
split
/
stack
.
GSAP target lookup table for
#video-wrap
per composition layout (landscape 1920×1080 — for portrait & 4:5 see
references/layouts/*.html
which list all three ratios):
composition layouttypical card.zone
#video-wrap
GSAP target
extra css class
split
side-panel
{ left: 960, top: 0, width: 960, height: 1080 }
stack
lower-third
{ left: 14, top: 14, width: 1892, height: 548 }
(top 52%)
pip
(bottom-right)
fullscreen
{ left: 1480, top: 760, width: 400, height: 300 }
pip-pill
(border-radius + ring + shadow)
pip
(top-left)
fullscreen
{ left: 40, top: 40, width: 400, height: 300 }
pip-pill
overlay
(video full-bleed)
video-overlay
{ left: 0, top: 0, width: 1920, height: 1080 }
(no change from default)
hide video (pure-graphic moment)
fullscreen
{ opacity: 0 }
(or move off-canvas)
To toggle the pip-pill chrome (border-radius + white ring + drop shadow) when entering or leaving a pip moment:
js
// Enter pip — add chrome
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
  "#video-wrap",
  { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
  T,
);

// Leave pip — back to clean full-bleed
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
  "#video-wrap",
  { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
  T_NEXT,
);
Card-host bounds match the zone. Resolve the card's
zone
into pixel bounds using the table at the top of Step 6, then write those into the card-host's inline
style="left:Xpx;top:Ypx;width:Wpx; height:Hpx;..."
. For
video-overlay
zone (overlay recipe), the card-host fills the full canvas — your CSS inside
.card .root
decides where the actual visible card sits.
视频容器的选择器为
#video-wrap
。使用
tl.to('#video-wrap', { ...bounds }, T)
在卡片间动画其边界。初始边界应设置在元素内联样式中,匹配card-01的布局。过渡时长选择0.5–0.7s,使用
ease: 'power2.inOut'
装饰性帧
clean
/
hairline
/
polaroid
)作为
#video-wrap
同级元素,并跟随其布局过渡。查看
references/frames/
获取每个帧的放置HTML、建议CSS以及适配的布局。快速规则:
overlay
布局不使用装饰性帧(全屏视频与边框冲突);PiP布局已有自己的胶囊样式(圆角+白边+阴影),因此仅在
split
/
stack
布局上添加装饰性帧。
#video-wrap
的GSAP目标查找表
(按合成布局,横屏1920×1080——竖屏&4:5见
references/layouts/*.html
,包含三种宽高比):
合成布局典型card.zone
#video-wrap
的GSAP目标
额外CSS类
split
side-panel
{ left: 960, top: 0, width: 960, height: 1080 }
stack
lower-third
{ left: 14, top: 14, width: 1892, height: 548 }
(上半部分52%)
pip
(右下角)
fullscreen
{ left: 1480, top: 760, width: 400, height: 300 }
pip-pill
(圆角+边框+阴影)
pip
(左上角)
fullscreen
{ left: 40, top: 40, width: 400, height: 300 }
pip-pill
overlay
(视频全屏)
video-overlay
{ left: 0, top: 0, width: 1920, height: 1080 }
(与默认无变化)
隐藏视频(纯图形时刻)
fullscreen
{ opacity: 0 }
(或移出画布)
进入或退出PiP模式时切换pip-pill样式(圆角+白边+阴影):
js
// 进入PiP——添加样式
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
  "#video-wrap",
  { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
  T,
);

// 退出PiP——回到全屏
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
  "#video-wrap",
  { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
  T_NEXT,
);
card-host边界匹配zone。使用步骤6顶部的表格将卡片的
zone
解析为像素边界,然后写入card-host的内联
style="left:Xpx;top:Ypx;width:Wpx;height:Hpx;..."
。对于
video-overlay
区域(overlay布局),card-host填满整个画布——
.card .root
内部的CSS决定实际可见卡片的位置。

HyperFrames Layout / Animation QA Rules

HyperFrames布局/动画QA规则

  • Build each card's static hero frame first: the moment where the card is fully visible and readable.
  • Confirm video, cards, subtitles/captions, and diagrams do not unintentionally overlap.
  • Confirm hidden video areas are clipped by the frame and not visible outside intended bounds.
  • Register one paused master timeline as
    window.__timelines["talking-head-recut"]
    .
  • Build timelines synchronously at page load; no
    async
    ,
    setTimeout
    , Promises, or media
    play()
    calls.
  • Do not use
    Math.random()
    or
    Date.now()
    in render paths.
  • Do not use
    repeat: -1
    ; calculate finite repeats from the video duration.
  • Prefer GSAP transforms and opacity (
    x
    ,
    y
    ,
    scale
    ,
    rotation
    ,
    opacity
    ) over layout properties (
    top
    ,
    left
    ,
    width
    ,
    height
    ) for motion.
  • Animate wrappers such as
    #video-wrap
    , not the video element dimensions directly.
  • Avoid animating the same property on the same element from multiple timelines at the same time.
  • Use
    data-track-index
    , not
    data-layer
    ; use
    data-duration
    , not
    data-end
    .
  • Every timed element (
    card-host
    , sub-composition, etc.) MUST include
    class="clip"
    alongside its own classes — e.g.
    class="card-host clip"
    . The HyperFrames runtime uses
    .clip
    to gate visibility to the
    data-start … data-start+data-duration
    window. Without it the element is visible for the whole video (lint:
    timed_element_missing_clip_class
    ).
  • For body / global
    font-family
    , list concrete font names (
    'Inter', 'Caveat', …
    ) — not a CSS variable like
    var(--font-family)
    . The HyperFrames font resolver doesn't expand CSS vars during static analysis (lint:
    font_family_without_font_face
    ). Cards may still use
    var(--font-family)
    internally since their
    @font-face
    declarations are loaded.
  • 先制作每张卡片的静态核心帧:卡片完全可见且可读的时刻。
  • 确认视频、卡片、字幕/标题、图表不会意外重叠。
  • 确认隐藏的视频区域被帧裁剪,不会在预期边界外可见。
  • 注册一个暂停的主时间线为
    window.__timelines["talking-head-recut"]
  • 页面加载时同步构建时间线;禁止
    async
    setTimeout
    、Promises或媒体
    play()
    调用。
  • 渲染路径中禁止使用
    Math.random()
    Date.now()
  • 禁止使用
    repeat: -1
    ;根据视频时长计算有限重复次数。
  • 动画优先使用GSAP变换和透明度(
    x
    ,
    y
    ,
    scale
    ,
    rotation
    ,
    opacity
    ),而非布局属性(
    top
    ,
    left
    ,
    width
    ,
    height
    )。
  • 动画容器(如
    #video-wrap
    ),而非直接动画视频元素尺寸。
  • 避免同一时间从多个时间线动画同一元素的同一属性。
  • 使用
    data-track-index
    ,而非
    data-layer
    ;使用
    data-duration
    ,而非
    data-end
  • 每个定时元素(
    card-host
    、子合成文件等)必须在自身类之外包含
    class="clip"
    ——例如
    class="card-host clip"
    。HyperFrames运行时使用
    .clip
    控制元素仅在
    data-start … data-start+data-duration
    窗口可见。没有此类的话元素会在整个视频中可见(lint错误:
    timed_element_missing_clip_class
    )。
  • 正文/全局
    font-family
    需列出具体字体名称
    'Inter', 'Caveat', …
    )——不能仅使用CSS变量如
    var(--font-family)
    。HyperFrames字体解析器静态分析时不会展开CSS变量(lint错误:
    font_family_without_font_face
    )。卡片内部仍可使用
    var(--font-family)
    ,因为其
    @font-face
    声明已加载。

10. Render to MP4

10. 渲染为MP4

bash
cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
  --skill=talking-head-recut \
  -o output.mp4 \
  --fps 30
hyperframes render <dir>
reads
<dir>/index.html
and produces the MP4. The flag
PRODUCER_BROWSER_GPU_MODE=hardware
(or
--browser-gpu
) is strongly recommended on macOS — software-only Chrome rendering times out on most laptops.
For a sanity check before the full render, capture a single frame at a specific timestamp:
bash
npx hyperframes snapshot public --at 5    # → public/snapshots/frame-00-at-5s.png (a single --at ignores --out)
bash
cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
  --skill=talking-head-recut \
  -o output.mp4 \
  --fps 30
hyperframes render <dir>
读取
<dir>/index.html
并生成MP4。在macOS上强烈建议使用
PRODUCER_BROWSER_GPU_MODE=hardware
(或
--browser-gpu
)标志——纯软件Chrome渲染在大多数笔记本电脑上会超时。
完整渲染前可进行 sanity check,在特定时间戳捕获单帧:
bash
npx hyperframes snapshot public --at 5    # → public/snapshots/frame-00-at-5s.png(单个--at忽略--out)

11. Report Results

11. 报告结果

Tell the user:
  • Work directory path
  • storyboard.json
    (the card outline you designed)
  • public/cards/*.html
    (one HTML per card)
  • public/index.html
    (the assembled composition)
  • output.mp4
    (the final video)
  • ASR provider used
  • Card count + how you chose them (in 1 sentence)
  • Any missing keys or quality caveats
Optional live preview (on request only). The clip plays unchanged inside
public/index.html
with the overlays on top, so it previews faithfully. Don't open it during the run. When the user asks, start a long-lived server after render and report the URL:
bash
(cd "$WORK_DIR/public" && npx hyperframes preview)   # or `npx hyperframes play` for a shareable link
Do not delete the work directory unless the user asks.
告知用户:
  • 工作目录路径
  • storyboard.json
    (你设计的卡片大纲)
  • public/cards/*.html
    (每张卡片对应一个HTML)
  • public/index.html
    (整合后的合成文件)
  • output.mp4
    (最终视频)
  • 使用的ASR提供商
  • 卡片数量+选择依据(一句话)
  • 任何缺失的密钥或质量注意事项
可选实时预览(仅在用户要求时)。源视频在
public/index.html
中完整播放,叠加层在上方,因此预览效果准确。运行期间不要打开。用户要求时,在渲染完成后启动长期服务器并报告URL:
bash
(cd "$WORK_DIR/public" && npx hyperframes preview)   # 或`npx hyperframes play`获取可分享链接
除非用户要求,否则不要删除工作目录。