talking-head-recut

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Talking Head Recut

访谈类视频重剪

Talking Head Recut takes a local video that plays in full and layers a sequence of timed, designed graphic cards onto it — titles, lower-thirds, data callouts, quotes, side panels, picture-in-picture — synced to what's being said. The agent designs the cards (timing + content) and writes each card's HTML directly in the conversation, then assembles a single composition HTML and renders it to MP4 via

hyperframes

. There is no fixed archetype list and no prescribed card structure — the overlays emerge from what the transcript actually says.

Confirm the route before you build. This skill packages an existing talking-head clip with designed graphic cards (titles, lower-thirds, data callouts, quotes, side panels, PiP). If the user wants plain captions / subtitles (the spoken words as text) →
/embedded-captions
; a single short unnarrated element (one logo sting / lower-third) →
/motion-graphics
. The clip plays untouched — re-timing, recoloring, reframing, reordering, or audio is NLE editing and out of scope. Building from a URL / topic / PR → the creation workflows. Unsure overlays-vs-captions? Read
/hyperframes
first.

Graphic-packaging sibling of
embedded-captions
. Captions add the spoken words as a readable subtitle; this adds designed graphics on top of the playing video. Plain subtitles →
embedded-captions
. Build a video from scratch → the creation workflows (
product-launch-video
/
faceless-explainer
/ …).

Inspectable intermediate files in the work directory:

```
metadata.json
```
— duration / width / height / fps
```
audio.mp3
```
— extracted audio

transcript.json

— a flat word array

[{ text, start, end }, …]

(Whisper; no

segments

, no

words

wrapper)

```
storyboard.json
```
— lightweight card outline (the agent's plan)
```
public/cards/card-XX.html
```
— one HTML fragment per card
```
public/index.html
```
— final assembled composition
```
output.mp4
```
— rendered video

访谈类视频重剪工具会完整播放本地视频，并在其上叠加一系列定时设计的图形卡片——包括标题、下三分之一字幕、数据标注、引用、侧边栏、画中画等——与视频中的语音内容同步。Agent会设计卡片（时间规划+内容）并直接在对话中编写每个卡片的HTML，然后将所有内容整合为一个合成HTML文件，再通过

hyperframes

渲染为MP4。这里没有固定的卡片模板和结构，叠加层完全根据视频字幕的实际内容生成。

开始制作前请确认流程。此技能用于为现有访谈类视频片段添加设计好的图形卡片（标题、下三分之一字幕、数据标注、引用、侧边栏、画中画）。如果用户需要纯字幕/对白字幕（将语音转为文本）→ 使用
/embedded-captions
；如果需要单个简短无旁白元素（如单个logo动画/下三分之一字幕）→ 使用
/motion-graphics
。源视频会完整播放——调整时长、调色、重构图、重新排序或音频编辑属于非线性编辑（NLE）范畴，不在此技能范围内。从URL/主题/公关材料创建视频→ 使用创建工作流。不确定是叠加层还是字幕？先阅读
/hyperframes
。

embedded-captions
的图形包装姊妹技能。字幕是将语音内容转为可读的文本；此技能是在播放的视频上添加设计好的图形。纯字幕→使用
embedded-captions
。从零开始制作视频→使用创建工作流（
product-launch-video
/
faceless-explainer
/ …）。

工作目录中的可检查中间文件：

```
metadata.json
```
— 时长/宽度/高度/帧率
```
audio.mp3
```
— 提取的音频
```
transcript.json
```
— 扁平化的单词数组
```
[{ text, start, end }, …]
```
（由Whisper生成；无
```
segments
```
，无
```
words
```
包装）
```
storyboard.json
```
— 轻量化的卡片大纲（Agent的制作规划）
```
public/cards/card-XX.html
```
— 每个卡片对应的HTML片段
```
public/index.html
```
— 最终整合的合成文件
```
output.mp4
```
— 渲染后的视频

CLI Resolution

CLI 说明

bash

undefined

bash

undefined

hyperframes — transcription (local Whisper) + rendering the assembled HTML to MP4

hyperframes — 转录（本地Whisper）+ 将整合后的HTML渲染为MP4

npx hyperframes --help


This skill runs entirely on the **hyperframes** CLI plus system `ffmpeg` / `ffprobe`.
Transcription is local **Whisper** via `hyperframes transcribe` — no third-party
service, API key, or rate-limited proxy.

npx hyperframes --help


此技能完全依赖**hyperframes** CLI以及系统中的`ffmpeg` / `ffprobe`。转录通过`hyperframes transcribe`使用本地**Whisper**完成——无需第三方服务、API密钥或限速代理。

Workflow

工作流

1. Check Environment

1. 检查环境

bash

npx hyperframes doctor          # ffmpeg, headless browser, render deps

bash

npx hyperframes doctor          # 检查ffmpeg、无头浏览器、渲染依赖

confirm bundled assets:

确认捆绑资源：

ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"


Required:

- `ffmpeg` / `ffprobe` (system)
- `<SKILL_DIR>/assets/fonts/*.woff2`, `<SKILL_DIR>/assets/vendor/gsap.min.js` (bundled inside this skill, staged to work dir in Step 9)

Transcription needs no key — `hyperframes transcribe` runs Whisper locally (Step 4).

Strongly recommended on macOS for `hyperframes render`:

```bash
export PRODUCER_BROWSER_GPU_MODE=hardware

ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"


必需依赖：

- `ffmpeg` / `ffprobe`（系统级）
- `<SKILL_DIR>/assets/fonts/*.woff2`, `<SKILL_DIR>/assets/vendor/gsap.min.js`（捆绑在此技能中，步骤9会复制到工作目录）

转录无需密钥——`hyperframes transcribe`会在本地运行Whisper（步骤4）。

在macOS上运行`hyperframes render`时强烈建议设置：

```bash
export PRODUCER_BROWSER_GPU_MODE=hardware

2. Create a Work Directory

2. 创建工作目录

All artifacts live under

videos/<project-name>/

— the same convention as the other video workflows (

product-launch-video

faceless-explainer

pr-to-video

). Keep the cwd at the workspace root; everything below writes under this one subdirectory.

bash

VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"

所有产物都存放在

videos/<project-name>/

下——与其他视频工作流（

product-launch-video

faceless-explainer

pr-to-video

）遵循相同约定。保持当前工作目录在工作区根目录；所有后续操作都会写入此子目录。

bash

VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"

3. Extract Audio and Metadata

3. 提取音频和元数据

bash

undefined

bash

undefined

metadata — duration / width / height / fps

元数据 — 时长/宽度/高度/帧率

ffprobe -v error -select_streams v:0
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"

audio

音频

ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"


Outputs: `metadata.json` (read `width`/`height`/`duration`; fps = the `r_frame_rate`
fraction evaluated, e.g. `30000/1001 → 29.97`) + `audio.mp3`.

ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"


输出文件：`metadata.json`（包含`width`/`height`/`duration`；帧率为`r_frame_rate`的分数计算值，例如`30000/1001 → 29.97`） + `audio.mp3`。

4. Transcribe

4. 转录

bash

npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en

Local Whisper — no API key, no proxy, no rate limit. Writes a word-level

transcript.json

into the work dir (word

text

start

end

timestamps). Read it for the word / sentence timings that drive card timing in Step 6; group words into sentences yourself at punctuation / pauses if you need segment-level chunks.

Clamp to media duration. Whisper can return the final word's

end

a hair past the actual clip length — clamp every card

endSec

and

composition.durationSeconds

to the

metadata.json

duration, or the render will show a black tail past the video.

bash

npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en

本地Whisper——无需API密钥、代理或限速。会在工作目录中生成单词级别的

transcript.json

（包含单词

text

start

end

时间戳）。步骤6中可读取此文件获取单词/句子时间戳来规划卡片时间；如果需要段落级片段，可根据标点/停顿自行将单词分组为句子。

限制在媒体时长内。Whisper返回的最后一个单词的

end

时间可能略超过实际视频长度——需将每个卡片的

endSec

和

composition.durationSeconds

限制在

metadata.json

的时长内，否则渲染时视频末尾会出现黑屏。

5. Correct Transcript

5. 修正字幕

transcript.json

is a flat array of word objects —

[{ "text": "...", "start": s, "end": s }, …]

(no

segments

array, no

words

wrapper; the per-word key is text
). Read it and fix obvious ASR errors:

Homophones, product names, technical terms, punctuation
Edit a word's
```
text
```
in place; preserve its
start
/
end
timestamps
There is no pre-grouped
```
segments
```
array — group words into sentences yourself (split at terminal punctuation / pauses) when you need segment-level chunks for card timing

transcript.json

是扁平化的单词对象数组——

[{ "text": "...", "start": s, "end": s }, …]

（无

segments

数组，无

words

包装；每个单词的键为**

text

**）。读取并修正明显的自动语音识别（ASR）错误：

同音词、产品名称、技术术语、标点符号
直接修改单词的
```
text
```
；保留其
start
/
end
时间戳
没有预先分组的
```
segments
```
数组——需要段落级片段时自行将单词分组为句子（根据句末标点/停顿拆分）

6. Draft a Lightweight Storyboard (in chat)

6. 起草轻量化故事板（在对话中）

No CLI involved. Read

transcript.json

metadata.json

and design cards directly.

storyboard.json

is an agent-internal planning artifact — no CLI command consumes it; it exists so you can think clearly about timing and content before writing each card's HTML. Keep the shape consistent with the example below so the same outline can drive the composition you author in Step 9:

json

{
  "schemaVersion": 3,
  "composition": {
    "fps": 30,
    "width": 1080,
    "height": 1920,
    "durationSeconds": 121.2,
    "layout": "portrait",
    "themeId": "noir",
    "seed": 42
  },
  "videoTrack": {
    "sourcePath": "input-video.mp4",
    "startSec": 0,
    "endSec": 121.2,
    "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
  },
  "subtitles": { "enabled": false },
  "cards": [
    {
      "id": "card-01",
      "intent": "Hook with the speaker's anxious midnight question",
      "startSec": 0.5,
      "endSec": 13.0,
      "accentIndex": 0,
      "zone": "fullscreen",
      "contentHints": {
        "kicker": "AN HONEST QUESTION",
        "title": "The soul-searching question at 11 PM",
        "detail": "Client's 60-second voice message: 'If the RMB appreciates, does that mean my USD policy is a terrible loss?'"
      }
    }
  ]
}

Required Card fields:

field	type	purpose
`id`	string	stable id used in card HTML & GSAP selectors
`intent`	string	natural-language description; fed to card synthesis
`startSec` / `endSec`	number	times in seconds (endSec > startSec)
`accentIndex`	0 \| 1 \| 2 \| 3 \| 4	which of the 5 theme accent colors this card pulls
`zone`	enum (see below)	where on the canvas the card lives
`contentHints`	object	free-form bag; agent puts kicker/title/detail/data/quote here
`archetype` (optional)	string	free-form label you may attach to remember a card's pattern; absent = free-form, which is the default
`transition` (optional)	enum: `cut` \| `fade` \| `slide` \| `wipe`	declarative card-to-card transition

Five
zone
values:

zone	resolved bounds	when to use
`fullscreen`	covers whole canvas	hero moments, big numbers, mantras
`whiteboard-area`	inset 40px margin (or 45% of portrait height)	dense data / annotated content
`lower-third`	bottom 30% band	annotation over visible video
`side-panel`	right 42% (landscape) or bottom 40% (portrait)	data side, video other side
`video-overlay`	full canvas, expects mostly-transparent card	annotation overlays on full-bleed video

When you assemble the composition in Step 9, resolve each card's

zone

into pixel bounds on the card-host wrapper following the table above. Video bounds are set once at composition level (

videoTrack.bounds

); to make video appear to "move between cards", author GSAP tweens against

#video-wrap

in the composition's

<script>

(see Step 9).

No prescribed card roles, no prescribed narrative arc. Cards emerge from what the video actually says — could be all quotes or all data, could open with a number or with a story. Let the transcript drive the rhythm.

How many takeaways? — auto-infer from duration + density. No fixed upper limit. Pick a base pace from the video duration, then adjust by information density. Only floor is fixed: minimum 5 cards so even short videos have rhythm.

Step 1 — base pace by duration (the natural sec/card for medium density):

video duration	base pace (sec per card)	rationale
< 60s (short reel)	6–8s	viewers expect fast cuts in short-form
60s – 3 min	8–12s	normal social pace
3 – 10 min	12–20s	give breathing room; each card carries more
10 – 30 min	20–35s	long-form lecture / interview rhythm
> 30 min	30–60s	episodic, near-chapter feel

Step 2 — density multiplier (multiplies the base pace):

signal in the transcript	multiplier	effect
High density — many numbers, distinct claims, staccato pacing, list-like enumeration, every 1–2 sentences is a new idea	× 0.7	cuts faster, more cards
Medium density — mixed flow with both data and narrative	× 1.0	base pace
Low density — one extended story, repeated reframing, slow reflective pacing, single argument unfolding	× 1.5	cuts slower, fewer cards

Step 3 — compute:

secPerCard = basePace × densityMultiplier
cardCount  = max(5, round(videoDurationSec / secPerCard))

Examples (notice — no upper clamp; long videos naturally produce more cards):

30s reel, single punchline (low density) → 7 × 1.5 = 10.5s/card → round(30/10.5)=3 → floor to 5 cards
60s reflective monologue (low density) → 10 × 1.5 = 15s/card → 4 → floor to 5 cards
121s talking-head with rich data (high density) → 10 × 0.7 = 7s/card → 17 cards
5 min interview, mixed density → 16 × 1.0 = 16s/card → 19 cards
10 min deep-dive, high density → 16 × 0.7 = 11s/card → 55 cards
30 min lecture, medium density → 28 × 1.0 = 28s/card → 64 cards
1 hr podcast, low density → 45 × 1.5 = 67.5s/card → 53 cards

When a card holds longer than ~15s, plan for a richer card (data block, multi-step reveal, several sub-points unfolding with staggered animations) — a static one-liner gets boring past 8s. For long pieces where many cards exceed 30s, consider chunking the timeline into sub-compositions (one .html per chapter, mounted with

data-composition-src

) so the GSAP timeline per file stays manageable — see the

timeline_track_too_dense

HyperFrames lint warning.

content

can be a plain string ("Title: annualized 5.69%\nNotes: ...") or any JSON shape that captures the data. The agent decides the shape per card.

Optional outro. This skill ships no fixed brand outro. If the user wants a closing card, design a neutral one yourself (wordmark + one-line tagline, ~1.5-2s, fade in -> short hold -> fade out), append it to

cards[]

, and extend

composition.durationSeconds

to its

endSec

. Otherwise end on the last content card.

无需使用CLI。读取

transcript.json

metadata.json

并直接设计卡片。

storyboard.json

是Agent内部的规划文件——没有CLI命令会解析它；它的作用是让你在编写每个卡片的HTML前清晰规划时间和内容。保持与以下示例一致的结构，这样同一大纲可用于步骤9中的合成文件制作：

json

{
  "schemaVersion": 3,
  "composition": {
    "fps": 30,
    "width": 1080,
    "height": 1920,
    "durationSeconds": 121.2,
    "layout": "portrait",
    "themeId": "noir",
    "seed": 42
  },
  "videoTrack": {
    "sourcePath": "input-video.mp4",
    "startSec": 0,
    "endSec": 121.2,
    "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
  },
  "subtitles": { "enabled": false },
  "cards": [
    {
      "id": "card-01",
      "intent": "用演讲者深夜的焦虑问题吸引观众",
      "startSec": 0.5,
      "endSec": 13.0,
      "accentIndex": 0,
      "zone": "fullscreen",
      "contentHints": {
        "kicker": "一个坦诚的问题",
        "title": "深夜11点的灵魂拷问",
        "detail": "客户60秒语音消息：‘如果人民币升值，是不是意味着我的美元策略亏大了？’"
      }
    }
  ]
}

必需的卡片字段：

字段	类型	用途
`id`	字符串	用于卡片HTML和GSAP选择器的稳定ID
`intent`	字符串	自然语言描述；用于卡片内容生成
`startSec` / `endSec`	数字	时间（秒，endSec > startSec）
`accentIndex`	0 \| 1 \| 2 \| 3 \| 4	此卡片使用的主题强调色索引（共5种）
`zone`	枚举（见下表）	卡片在画布上的位置
`contentHints`	对象	自由格式内容；Agent可在此添加标题/副标题/详情/数据/引用等
`archetype` （可选）	字符串	用于标记卡片模式的自由格式标签；无此字段则为自由设计（默认）
`transition` （可选）	枚举: `cut` \| `fade` \| `slide` \| `wipe`	卡片间的过渡效果声明

五种
zone
取值：

zone	解析后的边界	使用场景
`fullscreen`	覆盖整个画布	核心时刻、大数字、关键口号
`whiteboard-area`	内边距40px（或竖屏高度的45%）	密集数据/带注释的内容
`lower-third`	底部30%区域	视频上的注释内容
`side-panel`	右侧42%（横屏）或底部40%（竖屏）	数据在一侧，视频在另一侧
`video-overlay`	整个画布，卡片需大部分透明	全屏视频上的注释叠加层

步骤9整合合成文件时，需根据上表将每个卡片的

zone

解析为卡片容器的像素边界。视频边界在合成层一次性设置（

videoTrack.bounds

）；若要让视频在卡片切换时“移动”，需在合成文件的

<script>

中针对

#video-wrap

编写GSAP动画（见步骤9）。

没有固定的卡片角色和叙事结构。卡片完全根据视频内容生成——可以全是引用或全是数据，可以以数字或故事开头。让字幕内容决定节奏。

需要多少个重点内容？——根据时长和信息密度自动推断。没有固定上限。先根据视频时长选择基础节奏，再根据信息密度调整。唯一固定下限：至少5张卡片，即使短视频也有节奏。

步骤1 — 按时长确定基础节奏（中等密度下的自然每张卡片时长）：

视频时长	基础节奏（每张卡片秒数）	理由
< 60秒（短视频）	6–8秒	观众期望短视频节奏快
60秒 – 3分钟	8–12秒	正常社交平台节奏
3 – 10分钟	12–20秒	留出呼吸空间；每张卡片承载更多内容
10 – 30分钟	20–35秒	长时讲座/访谈节奏
> 30分钟	30–60秒	章节式节奏，接近分段感受

步骤2 — 密度乘数（乘以基础节奏）：

字幕中的信号	乘数	效果
高密度 — 大量数字、不同观点、急促节奏、列表式列举、每1–2句话就是新观点	× 0.7	切换更快，卡片更多
中等密度 — 数据与叙事混合	× 1.0	基础节奏
低密度 — 单一长篇故事、重复重构、缓慢反思节奏、单一论点展开	× 1.5	切换更慢，卡片更少

步骤3 — 计算：

每张卡片时长 = 基础节奏 × 密度乘数
卡片数量  = max(5, round(视频时长秒数 / 每张卡片时长))

示例（注意——无上限；长视频自然会生成更多卡片）：

30秒短视频，单一笑点（低密度） → 7 × 1.5 = 10.5秒/张 → round(30/10.5)=3 → 下限为5张卡片
60秒反思独白（低密度） → 10 × 1.5 = 15秒/张 → 4 → 下限为5张卡片
121秒含丰富数据的访谈（高密度） → 10 × 0.7 = 7秒/张 → 17张卡片
5分钟访谈，混合密度 → 16 × 1.0 = 16秒/张 → 19张卡片
10分钟深度内容，高密度 → 16 × 0.7 = 11秒/张 → 55张卡片
30分钟讲座，中等密度 → 28 × 1.0 = 28秒/张 → 64张卡片
1小时播客，低密度 → 45 × 1.5 = 67.5秒/张 → 53张卡片

当卡片时长超过约15秒时，需设计更丰富的卡片（数据块、分步展示、多个子点交错动画）——静态单行文本超过8秒会显得无聊。对于很多卡片时长超过30秒的长内容，考虑将时间线拆分为子合成文件（每个章节一个.html文件，通过

data-composition-src

挂载），这样每个文件的GSAP时间线更易于管理——可查看

timeline_track_too_dense

HyperFrames lint警告。

content

可以是纯字符串（"标题: 年化5.69%\n备注: ..."）或任何能捕获数据的JSON结构。Agent可根据每张卡片决定结构。

可选片尾。此技能没有固定品牌片尾。如果用户需要结尾卡片，自行设计一个中性卡片（标志+一行标语，约1.5-2秒，淡入→短暂停留→淡出），将其添加到

cards[]

中，并将

composition.durationSeconds

延长至其

endSec

。否则在最后一张内容卡片结束。

7. Decide Render Strategy

7. 确定渲染策略

Confirm Visual Direction with User (DO THIS FIRST)

与用户确认视觉方向（务必先做）

Before you start designing cards or deciding bounds, ask the user to pick the output ratio, the layout, the style, and the card-density preset. Frames are auto-selected from the chosen layout × style combination (see "Auto-pick frame" table below). Before sending the question, precompute two things:

recommendedRatio
from the source video's aspect ratio (
```
metadata.json
```
width / height):
- ```
sourceAspect = width / height
```
- ```
sourceAspect ≥ 1.5
```
  (≥ ~3:2 wide) → recommend 16:9
- ```
sourceAspect ≤ 0.7
```
  (≤ ~9:13 tall) → recommend 9:16
- ```
0.7 < sourceAspect < 1.5
```
  (near-square) → recommend 4:5
Mark the recommended option's label with " (recommended · matches source video X:Y)" so the user sees why it's recommended.
autoCount
from Step 6 (
```
max(5, round(videoSec / (basePace × densityMultiplier)))
```
) so the "auto" option's label can show the concrete number.

Environment compatibility — pick the best available question channel. Not every runtime exposes the same structured-question tool. Apply this order:

AskUserQuestion
(Claude Code, Anthropic Console) — use the structured 4-question call below.
Other native clarification tool (e.g.
```
ask_question
```
,
```
request_user_input
```
, IDE-specific prompt) — use that tool with the same 4 question texts and option lists. Preserve the recommendation markers and the precomputed values.
No native tool (Codex CLI, plain text-only runtimes) — ask directly in normal conversation. Use the plain-text template at the end of this section. Keep it to one message, 4 numbered questions (the global cap is 2–5 questions per round; we stay inside it).

Rules that apply to every channel:

Ask at most 2–5 questions per round. Our 4 here fits.
Even if missing info doesn't block rendering, ask once to confirm the parameters that materially affect the final output (ratio, layout, style, cardCount).
If the user has already pre-approved defaults ("just use defaults", "no need to ask", "auto-pick everything") or asked you not to ask — skip the question entirely and use:
```
recommendedRatio
```
,
```
layout="stack"
```
(safest cross-ratio default),
```
style
```
chosen from transcript tone in the most neutral group (editorial/data),
```
autoCount
```
. Tell the user what you picked in one sentence and continue.

Channel A — native
AskUserQuestion
:

// Precompute before the call:
//   recommendedRatio = "16:9" | "9:16" | "4:5"
//   autoCount        = integer (from Step 6)

AskUserQuestion({
  questions: [
    {
      question: "Output video aspect ratio (canvas):",
      header: "Aspect ratio",
      multiSelect: false,
      // Reorder so the recommended option appears FIRST (per AskUserQuestion convention).
      // Append " (recommended · matches source video W×H)" to the recommended option's label.
      options: [
        { label: "16:9 (1920×1080) landscape", description: "TV / YouTube / desktop playback. Most natural when the source video is already landscape; widest canvas." },
        { label: "9:16 (1080×1920) portrait", description: "TikTok / Reels / short-form mobile. Most natural for portrait source; native mobile experience." },
        { label: "4:5 (1080×1350) near-portrait", description: "Instagram feed / WeChat Moments. Best when source is near-square or you want to cover both platforms." }
      ]
    },
    {
      question: "Choose the overall layout: how should the video and cards coexist on the canvas?",
      header: "Layout",
      multiSelect: false,
      options: [
        { label: "side-by-side (split)",  description: "Video and card each take half the canvas. Most stable for interview / data side-by-side; clear visual separation." },
        { label: "top-bottom (stack)",    description: "Video on top (~52%), card below. Classic combo of speaker face + summary card; works well in portrait too." },
        { label: "picture-in-picture (pip)", description: "Card fills the canvas, video shrinks to a rounded corner window. Use when content is primary and speaker is secondary." },
        { label: "full-screen overlay (overlay)", description: "Video plays full-bleed, card floats as a glass layer on top. Strong cinematic / emotional feel." }
      ]
    },
    {
      question: "Choose the card visual style (style):",
      header: "Style group",
      multiSelect: false,
      // NOTE: these 3 groups intentionally match the frame auto-pick matrix
      // rows below, so picking a group resolves both `style` group AND the
      // frame matrix column in one step. Memberships are mutually exclusive.
      options: [
        { label: "warm paper (warm-paper)", description: "academic notebook · editorial big-type · whiteboard hand-drawn · xhs social. Best for interview reflections, product launches, lifestyle, emotional stories." },
        { label: "clinical / cold (clinical)",   description: "audit magazine · swiss grid · terminal CLI · minimal modern. Best for financial analysis, investigative reports, technical tutorials, serious presentations." },
        { label: "experimental / avant-garde (experimental)", description: "geom color-clash geometry · spotlight dark-background. Best for short-form highlights, product launches, strong emotion, cinematic feel." }
      ]
    },
    {
      question: "Card count (takeaway pacing): how many cards to cut?",
      header: "Card count",
      multiSelect: false,
      options: [
        { label: "Auto (recommended) · approx N cards", description: "Inferred automatically from video duration and information density (see Step 6 rules). This run estimates approx N cards. Substitute the real N (your autoCount) into the label." },
        { label: "Fewer · approx round(N × 0.6) cards", description: "Sparser cuts, each card holds longer — suits reflective / slow-paced content." },
        { label: "More · approx round(N × 1.5) cards", description: "Tighter cuts, faster rhythm — suits staccato / data-dense / short-form highlight content." }
      ]
    }
  ]
})

About "Other" —

AskUserQuestion

automatically adds an "Other" option to the card count question. The user can type a number directly (e.g. "8", "20") as the cardCount target. Parse the input as an integer: if parsing succeeds → use that value (minimum 5 as a floor); if parsing fails → fall back to "auto".

Channel B — plain-text fallback (Codex CLI, runtimes without a native question tool). Post this as one normal message, then wait for the reply. Bullet-style 1/2/3/4 keeps the reply parseable:

I need to confirm four visual decisions with you before I start cutting cards:

1) Output aspect ratio (canvas):
   A. 16:9 landscape (1920×1080) — TV / YouTube / desktop playback
   B. 9:16 portrait (1080×1920) — TikTok / Reels / short-form mobile
   C. 4:5 near-portrait (1080×1350) — Instagram feed / works for both platforms
   ▸ My recommendation:  <recommendedRatio>  (matches source video W×H = <sourceW>×<sourceH>)

2) Overall layout (how video & card coexist):
   A. split   side-by-side (50/50)
   B. stack   top-bottom (video top, card bottom)
   C. pip     picture-in-picture (card full canvas, video rounded corner window)
   D. overlay full-screen glass overlay (video full-bleed, card glass layer)

3) Card style group (maps to frame auto-pick matrix, pick 1 of 3):
   A. warm paper (warm-paper)      (academic / editorial / whiteboard / xhs)
   B. clinical / cold (clinical)   (audit / swiss / terminal / minimal)
   C. experimental (experimental)  (geom / spotlight)

4) Card count (takeaway pacing):
   A. Auto (recommended) — approx <autoCount> cards
   B. Fewer — approx round(<autoCount> × 0.6) cards
   C. More — approx round(<autoCount> × 1.5) cards
   D. Give me a specific number (e.g. "8", "20")

Reply format: "1A 2C 3B 4A" or natural language is fine.
If you want all recommended defaults, reply "default" / "auto" / "use all recommendations".

Parsing the plain-text reply:

Accept loose formats:

"1A 2C 3B 4A"

"A C B A"

"16:9 / pip / data / auto"

, full sentences, or

default

If any answer is ambiguous → re-ask only the ambiguous ones (still inside the 2–5 cap).
If the user says "default / auto / use all recommendations" → skip without re-asking.

After the user answers (any channel):

Resolve the output canvas from the ratio answer — these are the exact
```
storyboard.composition.width / height
```
values to write:
user choice composition.width × height storyboard.layout field
16:9
1920 × 1080
"landscape"
9:16
1080 × 1920
"portrait"
4:5
1080 × 1350
"portrait"
(schema treats 4:5 as portrait — height > width)
For 4:5 bounds inside
references/layouts/*.html
— those files only document landscape (1920×1080) and portrait (1080×1920). For 4:5 (1080×1350) derive bounds by proportional scaling from portrait: keep horizontal values, scale vertical values by
```
1350/1920 ≈ 0.703
```
. Example:
```
overlay
```
portrait card =
```
{ x: 24, y: 1280, w: 1032, h: 564 }
```
→ 4:5 card =
```
{ x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }
```
=
```
{ x: 24, y: 900, w: 1032, h: 397 }
```
.
Map the style group to a specific style by looking at the transcript tone — pick the one that best fits, but stay inside the user's chosen group. If you're unsure between two specific styles inside the group, send a second
```
AskUserQuestion
```
with those 2–4 specific style options.

user choice	composition.width × height	storyboard.layout field
`16:9`	1920 × 1080	`"landscape"`
`9:16`	1080 × 1920	`"portrait"`
`4:5`	1080 × 1350	`"portrait"` (schema treats 4:5 as portrait — height > width)

Resolve final cardCount from the density answer:

user choice	final cardCount
Auto (recommended)	the `autoCount` you already computed
Fewer	`max(5, round(autoCount × 0.6))`
More	`round(autoCount × 1.5)` (no upper clamp)
Other = "<n>" (integer)	`max(5, parseInt(n))`
Other = anything else	fall back to `autoCount`

Auto-pick the video frame from this table (frames don't ask the user — they follow from layout × style):

layout	warm-paper styles (academic / whiteboard / editorial / xhs)	clinical styles (audit / swiss / terminal / minimal)	experimental styles (geom / spotlight)
`split`	`polaroid`	`hairline`	`clean`
`stack`	`polaroid`	`hairline`	`clean`
`pip`	`clean` (pip pill already has chrome)	`clean`	`clean`
`overlay`	`clean` (full-bleed forbids deco frames)	`clean`	`clean`

Tell the user what you chose in one sentence — ratio (+ canvas size), layout, specific style, frame, and final cardCount — then proceed with the rest of Step 7 (per-card layouts, motion patterns).
Record the five values (ratio / layout / style / frame / cardCount) in working memory (no schema field needed); you'll reference them while writing each card's HTML in Step 8 and while reading the matching
```
references/<dim>/<key>.html
```
for tokens and structure.

If the user picks an answer via "Other" with a free-text style name not in the 10-style library, treat it as a hint to design a fresh card visual yourself, but still anchor on the chosen layout's bounds.

开始设计卡片或确定边界前，请用户选择输出比例、布局、风格和卡片密度预设。帧会根据所选布局×风格组合自动选择（见下方“自动选择帧”表格）。发送问题前，预先计算两件事：

recommendedRatio
（根据源视频宽高比，即
```
metadata.json
```
中的width / height）：
- ```
sourceAspect = width / height
```
- ```
sourceAspect ≥ 1.5
```
  （≥ ~3:2宽屏）→ 推荐**
```
16:9
```
  **
- ```
sourceAspect ≤ 0.7
```
  （≤ ~9:13竖屏）→ 推荐**
```
9:16
```
  **
- ```
0.7 < sourceAspect < 1.5
```
  （接近正方形）→ 推荐**
```
4:5
```
  **
在推荐选项的标签后添加“（推荐 · 匹配源视频X:Y）”，让用户了解推荐理由。
autoCount
（来自步骤6的计算值，即
```
max(5, round(视频时长秒数 / (基础节奏 × 密度乘数)))
```
），这样“自动”选项的标签可显示具体数字。

环境兼容性——选择最佳提问渠道。并非所有运行时都提供相同的结构化提问工具。遵循以下优先级：

AskUserQuestion
（Claude Code、Anthropic控制台）——使用下方的结构化4问题调用。
其他原生澄清工具（如
```
ask_question
```
、
```
request_user_input
```
、IDE特定提示）——使用该工具，问题文本和选项列表保持一致。保留推荐标记和预先计算的值。
无原生工具（Codex CLI、纯文本运行时）——直接在对话中提问。使用本节末尾的纯文本模板。保持为一条消息，4个编号问题（全局每轮最多2–5个问题；此处符合要求）。

适用于所有渠道的规则：

每轮最多提问2–5个问题。此处的4个问题符合要求。
即使缺少信息不影响渲染，也要确认对最终输出有重大影响的参数（比例、布局、风格、卡片数量）。
如果用户已预先批准默认值（“使用默认值”“无需提问”“自动选择所有选项”）或要求不要提问——完全跳过提问，使用：
```
recommendedRatio
```
、
```
layout="stack"
```
（跨比例最安全的默认值）、根据字幕语气从最中性组（编辑/数据）选择
```
style
```
、
```
autoCount
```
。用一句话告知用户你的选择并继续。

渠道A — 原生
AskUserQuestion
：

// 调用前预先计算：
//   recommendedRatio = "16:9" | "9:16" | "4:5"
//   autoCount        = 整数（来自步骤6）

AskUserQuestion({
  questions: [
    {
      question: "输出视频宽高比（画布）：",
      header: "宽高比",
      multiSelect: false,
      // 重新排序，让推荐选项排在最前面（遵循AskUserQuestion约定）。
      // 在推荐选项的标签后添加“（推荐 · 匹配源视频W×H）”。
      options: [
        { label: "16:9 (1920×1080) 横屏", description: "电视/YouTube/桌面播放。源视频为横屏时最自然；画布最宽。" },
        { label: "9:16 (1080×1920) 竖屏", description: "TikTok/Reels/短视频移动端。源视频为竖屏时最自然；原生移动端体验。" },
        { label: "4:5 (1080×1350) 近竖屏", description: "Instagram朋友圈/微信朋友圈。源视频接近正方形或需要覆盖多平台时最佳。" }
      ]
    },
    {
      question: "选择整体布局：视频和卡片如何在画布上共存？",
      header: "布局",
      multiSelect: false,
      options: [
        { label: "左右分栏（split）",  description: "视频和卡片各占画布一半。访谈/数据并列时最稳定；视觉分隔清晰。" },
        { label: "上下堆叠（stack）",    description: "视频在上（约52%），卡片在下。演讲者面部+摘要卡片的经典组合；竖屏也适用。" },
        { label: "画中画（pip）", description: "卡片填满画布，视频缩小为圆角窗口。内容为主、演讲者为辅时使用。" },
        { label: "全屏叠加（overlay）", description: "视频全屏播放，卡片作为玻璃层悬浮在上方。强烈的电影感/情感氛围。" }
      ]
    },
    {
      question: "选择卡片视觉风格（style）：",
      header: "风格组",
      multiSelect: false,
      // 注意：这3组与下方的帧自动选择矩阵行完全匹配
      // 选择一组即可同时确定`style`组和帧矩阵列。各组互斥。
      options: [
        { label: "温暖纸张风（warm-paper）", description: "学术笔记本·大字体编辑风格·手绘白板·小红书社交风。访谈反思、产品发布、生活方式、情感故事最佳。" },
        { label: "冷峻专业风（clinical）",   description: "审计杂志·瑞士网格·终端CLI·极简现代风。财务分析、调查报告、技术教程、正式演示最佳。" },
        { label: experimental / avant-garde (experimental)", description: "几何撞色·暗背景聚光灯。短视频高光、产品发布、强烈情感、电影感内容最佳。" }
      ]
    },
    {
      question: "卡片数量（重点内容节奏）：需要制作多少张卡片？",
      header: "卡片数量",
      multiSelect: false,
      options: [
        { label: "自动（推荐）· 约N张卡片", description: "根据视频时长和信息密度自动推断（见步骤6规则）。本次运行预计约N张卡片。将实际N值（你的autoCount）替换到标签中。" },
        { label: "更少· 约round(N × 0.6)张卡片", description: "切换更稀疏，每张卡片停留更长——适合反思/慢节奏内容。" },
        { label: "更多· 约round(N × 1.5)张卡片", description: "切换更紧凑，节奏更快——适合急促/数据密集/短视频高光内容。" }
      ]
    }
  ]
})

关于“其他”选项 —

AskUserQuestion

会自动在卡片数量问题中添加“其他”选项。用户可直接输入数字（如“8”“20”）作为卡片数量目标。将输入解析为整数：解析成功→使用该值（下限为5）；解析失败→ fallback到“自动”。

渠道B — 纯文本 fallback（Codex CLI、无原生提问工具的运行时）。将以下内容作为一条普通消息发送，然后等待回复。使用1/2/3/4的项目符号格式让回复易于解析：

开始制作卡片前，我需要与你确认四个视觉决策：

1) 输出宽高比（画布）：
   A. 16:9横屏（1920×1080）——电视/YouTube/桌面播放
   B. 9:16竖屏（1080×1920）——TikTok/Reels/短视频移动端
   C. 4:5近竖屏（1080×1350）——Instagram朋友圈/适配多平台
   ▸ 我的推荐： <recommendedRatio> （匹配源视频W×H = <sourceW>×<sourceH>）

2) 整体布局（视频与卡片如何共存）：
   A. split 左右分栏（50/50）
   B. stack 上下堆叠（视频在上，卡片在下）
   C. pip 画中画（卡片填满画布，视频为圆角窗口）
   D. overlay 全屏玻璃叠加（视频全屏，卡片为玻璃层）

3) 卡片风格组（对应帧自动选择矩阵，3选1）：
   A. warm paper（warm-paper）      （学术/编辑/白板/小红书）
   B. clinical / cold（clinical）   （审计/瑞士风格/终端/极简）
   C. experimental（experimental）  （几何/聚光灯）

4) 卡片数量（重点内容节奏）：
   A. 自动（推荐）——约<autoCount>张卡片
   B. 更少——约round(<autoCount> × 0.6)张卡片
   C. 更多——约round(<autoCount> × 1.5)张卡片
   D. 指定具体数字（如“8”“20”）

回复格式：“1A 2C 3B 4A”或自然语言均可。
如果要使用所有推荐默认值，回复“default”/“auto”/“使用所有推荐选项”。

解析纯文本回复：

接受松散格式：

"1A 2C 3B 4A"

、

"A C B A"

、

"16:9 / pip / data / auto"

、完整句子或

default

。

如果任何答案模糊→仅重新提问模糊的问题（仍保持在2–5个问题上限内）。
如果用户回复“default / auto / 使用所有推荐选项”→跳过提问。

用户回复后（任何渠道）：

根据宽高比答案解析输出画布——以下是要写入
```
storyboard.composition.width / height
```
的精确值：
用户选择合成文件宽×高 storyboard.layout字段
16:9
1920 × 1080
"landscape"
9:16
1080 × 1920
"portrait"
4:5
1080 × 1350
"portrait"
（schema将4:5视为竖屏——高度>宽度）
对于
```
references/layouts/*.html
```
中的4:5边界——这些文件仅记录横屏（1920×1080）和竖屏（1080×1920）。对于4:5（1080×1350），需通过竖屏比例缩放推导边界：保持水平值不变，垂直值乘以
```
1350/1920 ≈ 0.703
```
。示例：竖屏
```
overlay
```
卡片 =
```
{ x: 24, y: 1280, w: 1032, h: 564 }
```
→ 4:5卡片 =
```
{ x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }
```
=
```
{ x: 24, y: 900, w: 1032, h: 397 }
```
。
根据字幕语气将风格组映射到具体风格——选择最匹配的风格，但需在用户选择的组内。如果组内两个具体风格难以抉择，发送第二个
```
AskUserQuestion
```
，提供这2–4个具体风格选项。

用户选择	合成文件宽×高	storyboard.layout字段
`16:9`	1920 × 1080	`"landscape"`
`9:16`	1080 × 1920	`"portrait"`
`4:5`	1080 × 1350	`"portrait"` （schema将4:5视为竖屏——高度>宽度）

根据密度答案解析最终卡片数量：

用户选择	最终卡片数量
自动（推荐）	你已计算的 `autoCount` 值
更少	`max(5, round(autoCount × 0.6))`
更多	`round(autoCount × 1.5)` （无上限）
其他 = "<n>"（整数）	`max(5, parseInt(n))`
其他 = 其他内容	fallback到 `autoCount`

根据下表自动选择视频帧（无需询问用户——由布局×风格决定）：

布局	warm-paper风格（学术/白板/编辑/小红书）	clinical风格（审计/瑞士/终端/极简）	experimental风格（几何/聚光灯）
`split`	`polaroid`	`hairline`	`clean`
`stack`	`polaroid`	`hairline`	`clean`
`pip`	`clean` （画中画已自带边框）	`clean`	`clean`
`overlay`	`clean` （全屏视频不适合装饰性边框）	`clean`	`clean`

用一句话告知用户你的选择——宽高比（+画布尺寸）、布局、具体风格、帧、最终卡片数量——然后继续步骤7的剩余部分（每张卡片的布局、动画模式）。
将五个值（宽高比/布局/风格/帧/卡片数量）记录到工作内存中（无需schema字段）；步骤8编写每张卡片的HTML和步骤9读取匹配的
```
references/<dim>/<key>.html
```
获取模板和结构时会用到这些值。

如果用户通过“其他”选项选择了10种风格库之外的自由文本风格名称，将其视为设计全新卡片视觉的提示，但仍需基于所选布局的边界。

Render Strategy Inputs

渲染策略输入

With ratio / layout / style / cardCount / frame locked from Step 7.0, the remaining per-card decisions are:

Source-video fit inside the GSAP target: video element has
```
object-fit: cover
```
and is clipped to
```
#video-wrap
```
's tween bounds. If you want NO cropping (e.g. portrait source on landscape canvas shouldn't get its top/bottom chopped), aim the tween at a rect that matches the source's aspect ratio and let surrounding canvas show through (or fill with the card / a backdrop).
card.zone
per card: derive from your chosen composition layout (split → side-panel, stack → lower-third, pip → fullscreen, overlay → video-overlay), OR pick a different zone for one-off variants (fullscreen for hero / quote, whiteboard-area for dense data).
accentIndex
per card: each card pulls one of the 5 theme accent colors. Vary across cards for rhythm; reuse the same index when two cards belong to the same narrative beat.
Motion vocabulary: pick 2–3 repeatable patterns from
```
data-anim
```
kinds (see the table later) and stick to them so the composition feels coherent.

Pick from these

themeId

palettes (use them as

--accent-N

--bg

--text

CSS variables in your composition

<style>

block):

themeId	accent palette (5 colors)	board bg	text
classic	`#1971c2 #e03131 #2f9e44 #e8590c #9c36b5`	`#FFF9E3` (paper)	`#1e1e1e`
noir	`#4cc9f0 #f72585 #4ade80 #fb923c #a78bfa`	`#1a1a1a`	`#f1f1f1`
mint	`#0077b6 #d62828 #2d6a4f #e76f51 #7209b7`	`#e8faf0`	`#1b4332`
craft	`#bf5700 #d62728 #6c757d #e9b54a #3d5a80`	`#f6efe1`	`#2d2d2d`
slate	`#0ea5e9 #ef4444 #22c55e #f97316 #a855f7`	`#1e293b`	`#f1f5f9`
mono	`#000 #555 #888 #aaa #ccc`	`#fff`	`#000`

Available fonts (woff2 in

<SKILL_DIR>/assets/fonts/

, staged to work dir in Step 9):

Caveat

(handwriting),

LXGW WenKai TC

(Chinese hand-script),

Inter

(modern sans),

Virgil

(geometric hand). Reference via

@font-face

font-family

directly.

For inspiration on visual patterns,

<SKILL_DIR>/references/styles/

ships 10 self-contained reference cards (academic / editorial / minimal / spotlight / geom / whiteboard / audit / terminal / swiss / xhs) that you can copy as starting points — but do not feel constrained to match any of these. Each card is your own design.

步骤7.0已锁定宽高比/布局/风格/卡片数量/帧，剩余每张卡片的决策包括：

源视频在GSAP目标中的适配：视频元素设置
```
object-fit: cover
```
，并被裁剪到
```
#video-wrap
```
的动画边界。如果不想裁剪（例如竖屏源视频在横屏画布上不希望顶部/底部被切掉），将动画目标设置为与源视频宽高比匹配的矩形，让周围画布显示（或填充卡片/背景）。
每张卡片的
card.zone
：从所选合成布局推导（split→side-panel，stack→lower-third，pip→fullscreen，overlay→video-overlay），或为特殊变体选择不同区域（fullscreen用于核心/引用，whiteboard-area用于密集数据）。
每张卡片的
accentIndex
：每张卡片使用5种主题强调色中的一种。在卡片间切换以形成节奏；属于同一叙事节拍的两张卡片可重复使用同一索引。
动画词汇：从
```
data-anim
```
类型中选择2–3种可重复的模式（见下表）并保持一致，让合成文件感觉连贯。

从以下

themeId

调色板中选择（在合成文件

<style>

块中作为

--accent-N

--bg

--text

CSS变量使用）：

themeId	强调色调色板（5种颜色）	背景色	文本色
classic	`#1971c2 #e03131 #2f9e44 #e8590c #9c36b5`	`#FFF9E3` （纸张色）	`#1e1e1e`
noir	`#4cc9f0 #f72585 #4ade80 #fb923c #a78bfa`	`#1a1a1a`	`#f1f1f1`
mint	`#0077b6 #d62828 #2d6a4f #e76f51 #7209b7`	`#e8faf0`	`#1b4332`
craft	`#bf5700 #d62728 #6c757d #e9b54a #3d5a80`	`#f6efe1`	`#2d2d2d`
slate	`#0ea5e9 #ef4444 #22c55e #f97316 #a855f7`	`#1e293b`	`#f1f5f9`
mono	`#000 #555 #888 #aaa #ccc`	`#fff`	`#000`

可用字体（

<SKILL_DIR>/assets/fonts/

中的woff2文件，步骤9会复制到工作目录）：

Caveat

（手写体）、

LXGW WenKai TC

（中文手写体）、

Inter

（现代无衬线体）、

Virgil

（几何手写体）。可通过

@font-face

或直接使用

font-family

引用。

如需视觉模式灵感，

<SKILL_DIR>/references/styles/

包含10个独立的参考卡片（学术/编辑/极简/聚光灯/几何/白板/审计/终端/瑞士/小红书风格），可作为起点复制——但无需局限于这些模板。每张卡片都可自行设计。

Visual Design Library (<SKILL_DIR>/references/)

视觉设计库（<SKILL_DIR>/references/）

Beyond the composition-level

themeId

, the skill ships a richer reference library at

<SKILL_DIR>/references/

covering three orthogonal visual dimensions you can freely mix:

Style  ×  Layout  ×  VideoFrame
 (10)      (4)         (3)

dimension	keys	what it decides
style	`academic` `editorial` `minimal` `spotlight` `geom` `whiteboard` `audit` `terminal` `swiss` `xhs`	the card's visual language — fonts, colors, ornament, layout-within-card
layout	`split` `stack` `pip` `overlay`	how the source video and the card share the canvas
frame	`clean` `hairline` `polaroid`	the decorative chrome around the video element

Read

<SKILL_DIR>/references/DESIGN_INDEX.md

for the full matrix and a loose decision guide (interview / product launch / data analysis / social clip / technical tutorial / emotional story …). When you decide to use a specific style / layout / frame, Read the corresponding file:

```
references/styles/<key>.html
```
— self-contained card fragment with that style's CSS tokens (colors, fonts, padding, ornament) and a placeholder takeaway. Copy the
```
.card[data-card-id="ref-<key>"]
```
style block, rename the data-card-id to your card's id, swap the placeholder content for the real takeaway, and you're done.
```
references/layouts/<key>.html
```
— exact
```
videoBounds
```
+
```
cardBounds
```
for both landscape and portrait, with a copy-paste JSON snippet for
```
storyboard.json
```
's per-card
```
layout
```
field.
```
references/frames/<key>.html
```
— decorative HTML to add as a sibling of
```
#video-wrap
```
, plus placement instructions for the composition CSS.

Pick

style × layout × frame

per card — you can change all three between cards as long as the transitions read smoothly. A common rhythm: open

editorial × overlay × clean

, switch to

audit × split × hairline

for the data card, close on

whiteboard × pip × polaroid

The 10 styles are skill-side design tokens, not composition-level themes — they don't need to be declared in

storyboard.composition

; they live inside each card's HTML. The

themeId

field can still pick a composition-level palette (table above) that controls page-body background and video border chrome.

除了合成层的

themeId

，此技能还提供更丰富的参考库，位于

<SKILL_DIR>/references/

，涵盖三个正交的视觉维度，可自由组合：

风格  ×  布局  ×  视频帧
 (10)      (4)         (3)

维度	取值	决定内容
风格	`academic` `editorial` `minimal` `spotlight` `geom` `whiteboard` `audit` `terminal` `swiss` `xhs`	卡片的视觉语言——字体、颜色、装饰、卡片内布局
布局	`split` `stack` `pip` `overlay`	源视频和卡片如何共享画布
帧	`clean` `hairline` `polaroid`	视频元素周围的装饰性边框

阅读

<SKILL_DIR>/references/DESIGN_INDEX.md

获取完整矩阵和宽松决策指南（访谈/产品发布/数据分析/社交视频/技术教程/情感故事……）。决定使用特定风格/布局/帧时，读取对应文件：

```
references/styles/<key>.html
```
— 独立的卡片片段，包含该风格的CSS变量（颜色、字体、内边距、装饰）和占位内容。复制
```
.card[data-card-id="ref-<key>"]
```
样式块，将data-card-id重命名为你的卡片ID，替换占位内容为实际内容即可。
```
references/layouts/<key>.html
```
— 横屏和竖屏的精确
```
videoBounds
```
+
```
cardBounds
```
，包含可复制到
```
storyboard.json
```
每张卡片
```
layout
```
字段的JSON片段。
```
references/frames/<key>.html
```
— 作为
```
#video-wrap
```
同级元素的装饰性HTML，以及合成文件CSS中的放置说明。

每张卡片可选择风格×布局×帧——只要过渡流畅，卡片间可全部更改。常见节奏：开场使用

editorial × overlay × clean

，数据卡片切换为

audit × split × hairline

，结尾使用

whiteboard × pip × polaroid

。

这10种风格是技能侧的设计变量，不是合成层主题——无需在

storyboard.composition

中声明；它们存在于每张卡片的HTML中。

themeId

字段仍可选择合成层调色板（见上表），控制页面背景和视频边框。

Layout Compositions (Card + Video)

布局合成（卡片+视频）

Two coordinated decisions per card define how it shares the canvas with the source video:

card.zone
(declared in
```
storyboard.json
```
) — one of the 5 schema values; resolve it into pixel bounds (per the table in Step 6) when you write the card-host wrapper's inline
```
style
```
in Step 9.
#video-wrap
bounds at this card's time window (declared imperatively in the composition's GSAP timeline) — the agent tweens
```
#video-wrap
```
to a target rect for each layout transition.

Schema does NOT store per-card video bounds.

videoTrack.bounds

is one-time at composition level (defaults to full canvas). Video "moving" between cards is purely a GSAP animation authored in

index.html

. There is no

card.layout

field — earlier versions of this doc invented one; the real schema only has

card.zone

4 composition layouts (from

references/layouts/

) — each is a recipe pairing a

zone

with a

#video-wrap

tween target:

composition layout	recommended `card.zone`	GSAP target for `#video-wrap` (landscape 1920×1080)	GSAP target for `#video-wrap` (portrait 1080×1920)	when to use
`split`	`side-panel`	`{ left: 960, top: 0, width: 960, height: 1080 }`	`{ left: 0, top: 960, width: 1080, height: 960 }` (bottom half)	speaker + data side-by-side / 50:50 weight
`stack`	`lower-third`	`{ left: 14, top: 14, width: 1892, height: 548 }` (top 52%)	`{ left: 0, top: 0, width: 1080, height: 844 }` (top 44%)	speaker on top + summary card below
`pip`	`fullscreen`	`{ left: 1480, top: 760, width: 400, height: 300 }` + add `.framed` class	`{ left: 690, top: 28, width: 360, height: 203 }` + add `.framed`	content-heavy card + corner pip
`overlay`	`video-overlay`	`{ left: 0, top: 0, width: 1920, height: 1080 }` (full-bleed)	`{ left: 0, top: 0, width: 1080, height: 1920 }`	cinematic / dramatic / glass card on full video

For 4:5 (1080×1350), scale portrait y/h values by

1350/1920 ≈ 0.703

(see Step 7.0 Channel A / Channel B

recommendedRatio

resolution table).

Other zone values for one-off variants (still uses

card.zone

; no fake "layout" field):

`zone`	resolved bounds	common use
`fullscreen`	covers whole canvas	hero card, video tweens to hidden/pip
`whiteboard-area`	inset 40px margin (landscape) or bottom 45% (portrait)	dense data card, free margins
`lower-third`	bottom 30% band	talking-head annotation
`side-panel`	right 42% (landscape) or bottom 40% (portrait)	sidebar / "split" recipe
`video-overlay`	full canvas; expect transparent card root	glass overlay on full-bleed video

You can mix recipes per card — choose

card.zone

based on what suits the moment, then write the GSAP tween for

#video-wrap

between cards.

每张卡片的两个协同决策定义其与源视频共享画布的方式：

card.zone
（在
```
storyboard.json
```
中声明）——5种schema取值之一；步骤9编写卡片容器的内联
```
style
```
时需将其解析为像素边界（见步骤6的表格）。
此卡片时间窗口内的
#video-wrap
边界（在合成文件的GSAP时间线中声明）——Agent在每个布局过渡时将
```
#video-wrap
```
动画到目标矩形。

Schema不存储每张卡片的视频边界。

videoTrack.bounds

在合成层一次性设置（默认全屏）。卡片间视频“移动”纯粹是

index.html

中编写的GSAP动画。没有

card.layout

字段——此文档早期版本曾提及，但实际schema只有

card.zone

。

4种合成布局（来自

references/layouts/

）——每种布局是

zone

与

#video-wrap

动画目标的组合：

合成布局	推荐 `card.zone`	`#video-wrap` 的GSAP目标（横屏1920×1080）	`#video-wrap` 的GSAP目标（竖屏1080×1920）	使用场景
`split`	`side-panel`	`{ left: 960, top: 0, width: 960, height: 1080 }`	`{ left: 0, top: 960, width: 1080, height: 960 }` （下半部分）	演讲者+数据并列/权重50:50
`stack`	`lower-third`	`{ left: 14, top: 14, width: 1892, height: 548 }` （上半部分52%）	`{ left: 0, top: 0, width: 1080, height: 844 }` （上半部分44%）	演讲者在上+摘要卡片在下
`pip`	`fullscreen`	`{ left: 1480, top: 760, width: 400, height: 300 }` + 添加 `.framed` 类	`{ left: 690, top: 28, width: 360, height: 203 }` + 添加 `.framed`	内容密集卡片+角落画中画
`overlay`	`video-overlay`	`{ left: 0, top: 0, width: 1920, height: 1080 }` （全屏）	`{ left: 0, top: 0, width: 1080, height: 1920 }`	电影感/戏剧性/玻璃卡片在全屏视频上

对于4:5（1080×1350），将竖屏的y/h值乘以

1350/1920 ≈ 0.703

（见步骤7.0渠道A/渠道B的

recommendedRatio

解析表格）。

特殊变体的其他zone取值（仍使用

card.zone

；无虚假“layout”字段）：

`zone`	解析后的边界	常见使用场景
`fullscreen`	覆盖整个画布	核心卡片，视频动画到隐藏/画中画
`whiteboard-area`	内边距40px（横屏）或底部45%（竖屏）	密集数据卡片，自由边距
`lower-third`	底部30%区域	访谈视频注释
`side-panel`	右侧42%（横屏）或底部40%（竖屏）	侧边栏/“split”布局
`video-overlay`	整个画布；卡片根元素需透明	全屏视频上的玻璃叠加层

可在卡片间混合布局——根据当前场景选择

card.zone

，然后在卡片间编写

#video-wrap

的GSAP动画。

Storyboard Render Contract

故事板渲染约定

storyboard.json

is an agent-internal planning artifact — no CLI command parses it. It exists to keep your timing and content decisions explicit before you write each card's HTML. Stick to the v3-style shape below so the same outline drives the composition you assemble in Step 9.

Required structure (see Step 6 for the full example):

```
schemaVersion: 3
```

composition: { fps, width, height, durationSeconds, layout, themeId, seed }

— note

durationSeconds

fps

themeId

layout

live inside

composition

, NOT at top level

videoTrack: { sourcePath, startSec, endSec, bounds? }

— video bounds default to full canvas

```
subtitles: { enabled, ... }
```

cards[]

— each card has the 6 required fields:

id

intent

startSec

endSec

accentIndex

zone

contentHints

Rules:

Card times stay inside
```
composition.durationSeconds
```
and should not overlap unless intentional (use
```
data-track-index
```
to control z-order when they do).
Visual details live in card HTML fragments (Step 8), NOT in
```
contentHints
```
.
```
contentHints
```
is your own structured prompt for designing the card; the rendered look is the HTML.
Keep the storyboard shape stable — even though nothing parses it, you read it back while authoring Step 8/9, and consistency keeps card IDs and timing in sync.
Agent-side decisions like "I picked overlay × geom × clean" do NOT belong in
```
storyboard.json
```
— keep them in working memory and use them when authoring card HTML + GSAP tweens.

Transparent card backgrounds for cards that share canvas with video. When the GSAP tween leaves video visible behind/beside the card (overlay recipe, pip recipe, or any

card.zone = 'lower-third' | 'video-overlay'

moment), the card's

.root

MUST NOT paint a full opaque background — otherwise it occludes the video. Two patterns:

css

/* Pattern A: transparent root, page body provides the cream backdrop */
html,
body {
  background: var(--bg);
}
.card[data-card-id="card-X"] .root {
  background: transparent;
}

/* Pattern B: explicit per-card background ONLY for fullscreen cards */
.card[data-card-id="card-hero"] .root {
  background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
  background: transparent;
}

For

side-panel

-zone cards (split recipe), the card-host is already only half the canvas, so an opaque card bg is fine — it only covers its half.

storyboard.json

是Agent内部的规划文件——没有CLI命令会解析它。它的作用是让你在编写每张卡片的HTML前明确时间和内容决策。保持以下v3风格结构，这样同一大纲可用于步骤9的合成文件整合。

必需结构（见步骤6的完整示例）：

```
schemaVersion: 3
```

composition: { fps, width, height, durationSeconds, layout, themeId, seed }

— 注意

durationSeconds

fps

themeId

layout

在**

composition

内部**，不在顶层

videoTrack: { sourcePath, startSec, endSec, bounds? }

— 视频边界默认全屏

```
subtitles: { enabled, ... }
```

cards[]

— 每张卡片包含6个必需字段：

id

intent

startSec

endSec

accentIndex

zone

contentHints

规则：

卡片时间需在
```
composition.durationSeconds
```
内，除非有意重叠（重叠时使用
```
data-track-index
```
控制层级）。
视觉细节在卡片HTML片段中（步骤8），不在
```
contentHints
```
中。
```
contentHints
```
是你设计卡片的结构化提示；最终视觉效果由HTML决定。
保持故事板结构稳定——即使没有工具解析它，步骤8/9编写时你会回看它，一致性可保持卡片ID和时间同步。
Agent侧的决策（如“我选择了overlay × geom × clean”）不属于
```
storyboard.json
```
——记录到工作内存中，编写卡片HTML + GSAP动画时使用。

与视频共享画布的卡片需透明背景。当GSAP动画让视频在卡片后/旁可见时（overlay布局、pip布局，或任何

card.zone = 'lower-third' | 'video-overlay'

场景），卡片的

.root

不能设置完全不透明的背景——否则会遮挡视频。两种模式：

css

/* 模式A：透明根元素，页面背景提供米色背景 */
html,
body {
  background: var(--bg);
}
.card[data-card-id="card-X"] .root {
  background: transparent;
}

/* 模式B：仅全屏卡片设置明确背景 */
.card[data-card-id="card-hero"] .root {
  background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
  background: transparent;
}

对于

side-panel

区域的卡片（split布局），卡片容器仅占画布一半，因此不透明背景是可行的——仅覆盖其所在的一半。

8. Write Each Card's HTML

8. 编写每张卡片的HTML

Create

$WORK_DIR/public/cards/{card-id}.html

for each card. Each file contains a single rooted HTML fragment that follows this contract:

为每张卡片创建

$WORK_DIR/public/cards/{card-id}.html

。每个文件包含一个符合以下约定的根HTML片段：

Card HTML Contract

卡片HTML约定

html

<div class="card" data-card-id="{cardId}">
  <style>
    /* MUST: every rule starts with .card[data-card-id="{cardId}"] */
    .card[data-card-id="card-01"] .root {
      width: 100%; height: 100%;
      display: flex; ...;
      font-family: 'Caveat', 'LXGW WenKai TC', serif;
      color: var(--text);
      background: var(--bg);
    }
    .card[data-card-id="card-01"] .title { font-size: 84px; ... }
  </style>

  <div class="root">
    <h1
      id="card-01-title"
      data-anim="kinetic-chars"
      data-anim-at="0.3"
      data-anim-duration="0.5"
      data-anim-stagger="0.04"
      data-anim-pattern="pop"
    >
      <span class="char">S</span>
      <span class="char">u</span>
    </h1>
    <div
      id="card-01-line"
      data-anim="grow-x"
      data-anim-at="0.65"
      data-anim-duration="0.5"
      data-anim-target-w="420"
      style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
    ></div>
  </div>
</div>

Hard rules (

hyperframes

lint will reject violations):

Single root

<div class="card" data-card-id="{cardId}">

Inline
```
<style>
```
rules MUST be prefixed with the scope selector above
No
<script>
tags
No external URLs in
```
src=
```
/
```
href=
```
(no CDN, no remote fonts)
No inline event handlers (
```
onclick=
```
etc.)
All assets via relative paths into the same
```
public/
```
directory
Colors via
```
var(--accent-N)
```
etc. for portability across themes

Animations are declared, not coded. Use

data-anim-*

attributes only; never write

<script>

to animate. You compile every

data-anim-*

declaration into the single master GSAP timeline in Step 9.

html

<div class="card" data-card-id="{cardId}">
  <style>
    /* 必须：每个规则以.card[data-card-id="{cardId}"]开头 */
    .card[data-card-id="card-01"] .root {
      width: 100%; height: 100%;
      display: flex; ...;
      font-family: 'Caveat', 'LXGW WenKai TC', serif;
      color: var(--text);
      background: var(--bg);
    }
    .card[data-card-id="card-01"] .title { font-size: 84px; ... }
  </style>

  <div class="root">
    <h1
      id="card-01-title"
      data-anim="kinetic-chars"
      data-anim-at="0.3"
      data-anim-duration="0.5"
      data-anim-stagger="0.04"
      data-anim-pattern="pop"
    >
      <span class="char">S</span>
      <span class="char">u</span>
    </h1>
    <div
      id="card-01-line"
      data-anim="grow-x"
      data-anim-at="0.65"
      data-anim-duration="0.5"
      data-anim-target-w="420"
      style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
    ></div>
  </div>
</div>

硬性规则（

hyperframes

lint会拒绝违规内容）：

单个根元素

<div class="card" data-card-id="{cardId}">

内联
```
<style>
```
规则必须以上述范围选择器开头
禁止
<script>
标签
禁止
src=
/
href=
中使用外部URL（无CDN，无远程字体）
禁止内联事件处理程序（如
```
onclick=
```
）
所有资源使用同一
```
public/
```
目录下的相对路径
使用
```
var(--accent-N)
```
等变量实现跨主题可移植性

动画仅声明，无需编码。仅使用

data-anim-*

属性；永远不要编写

<script>

来实现动画。步骤9会将每个

data-anim-*

声明编译到单个主GSAP时间线中。

Card Sizing — Mobile-First in Portrait

卡片尺寸——竖屏优先

The 10

references/styles/*.html

are sized for a 1920×1080 landscape preview. When

storyboard.layout = "portrait"

(1080×1920, the dominant case for social / mobile), scale every visual size up — phones hold the screen close, and the same pixel count reads smaller than on a landscape TV-style canvas.

token	landscape baseline	portrait target	scale
title (h1/h2 hero)	64–96px	88–132px	×1.35
detail / body	24–30px	30–40px	×1.30
kicker / chip label	14–16px	18–22px	×1.30
timecode / meta	12–14px	16–18px	×1.30
data block primary number	48–60px	64–88px	×1.40
line-height multiplier	1.05–1.5	same	(don't scale)

Rule of thumb:

portraitPx = round(landscapePx × 1.3)

, then floor to a nearby 4px multiple for visual rhythm. Hero headlines may go up to ×1.4; small meta text stays at ×1.2 to avoid crowding.

Padding shrinks slightly in portrait — the card is narrower so big landscape padding (40–64px) eats too much width. Use 24–36px horizontal padding in portrait.

If you're producing a single card that must work in both layouts, prefer a

@container

query on the card root over hard-coding sizes:

css

.card[data-card-id="X"] .root {
  container-type: inline-size;
}
.card[data-card-id="X"] .title {
  font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
  font-size: clamp(24px, 3.2cqi, 40px);
}

But for most cards, a single layout choice is fine — just pick the size table column that matches the storyboard's

layout

field.

10个

references/styles/*.html

以1920×1080横屏为预览尺寸。当

storyboard.layout = "portrait"

（1080×1920，社交/移动端主流）时，放大所有视觉尺寸——手机屏幕观看距离近，相同像素数在竖屏上比横屏电视画布上显得更小。

变量	横屏基准	竖屏目标	缩放比例
标题（h1/h2核心）	64–96px	88–132px	×1.35
详情/正文	24–30px	30–40px	×1.30
副标题/标签	14–16px	18–22px	×1.30
时间码/元数据	12–14px	16–18px	×1.30
数据块主数字	48–60px	64–88px	×1.40
行高乘数	1.05–1.5	相同	（不缩放）

经验法则：

竖屏像素 = round(横屏像素 × 1.3)

，然后向下取整到最近的4倍数以保证视觉节奏。核心标题可放大到×1.4；小元文本保持×1.2以避免拥挤。

竖屏中的内边距略微缩小——卡片更窄，横屏的大尺寸内边距（40–64px）会占用过多宽度。竖屏中使用24–36px的水平内边距。

如果要制作同时适配两种布局的单个卡片，优先在卡片根元素上使用

@container

查询，而非硬编码尺寸：

css

.card[data-card-id="X"] .root {
  container-type: inline-size;
}
.card[data-card-id="X"] .title {
  font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
  font-size: clamp(24px, 3.2cqi, 40px);
}

但大多数卡片只需选择一种布局——只需选择与故事板

layout

字段匹配的尺寸表格列即可。

Available

data-anim

Kinds

可用的

data-anim

类型

kind	use for	key params
`fade-in`	enter	`at` , `duration` , `ease?`
`fade-out`	exit	`at` , `duration` , `ease?`
`slide-in`	slide enter	`at` , `duration` , `from=left\|right\|top\|bottom` , `distance`
`kinetic-chars`	per-char pop	`at` , `duration` , `stagger` , `pattern=pop\|fade` — element needs `<span class="char">` children
`typewriter`	per-char fade	same as kinetic-chars but slower default stagger
`count-up`	animate number	`at` , `duration` , `from` , `to` , `format=.0f\|.1f\|.2f\|,d`
`draw-path`	SVG path reveal	`at` , `duration` — element should be a `<path>`
`grow-y`	bar height	`at` , `duration` , `target-h` (px) — element starts `height:0`
`grow-x`	bar width	`at` , `duration` , `target-w` (px) — element starts `width:0`
`scale-pop`	pop entrance	`at` , `duration`
`blur-in`	unfocused → focused	`at` , `duration`
`mask-reveal`	clip reveal	`at` , `duration` , `direction=left\|right\|top\|bottom`
`morph-to`	tween any CSS	`at` , `duration` , `props='{...JSON...}'`

data-anim-at

is seconds relative to the card's startSec — when you compile each declaration into the GSAP timeline in Step 9, add the card's

startSec

to get the absolute time and quantize to 1/fps.

类型	使用场景	关键参数
`fade-in`	入场	`at` , `duration` , `ease?`
`fade-out`	退场	`at` , `duration` , `ease?`
`slide-in`	滑动入场	`at` , `duration` , `from=left\|right\|top\|bottom` , `distance`
`kinetic-chars`	逐字符弹出	`at` , `duration` , `stagger` , `pattern=pop\|fade` — 元素需包含 `<span class="char">` 子元素
`typewriter`	逐字符淡入	与kinetic-chars参数相同，但默认延迟更慢
`count-up`	数字动画	`at` , `duration` , `from` , `to` , `format=.0f\|.1f\|.2f\|,d`
`draw-path`	SVG路径展示	`at` , `duration` — 元素应为 `<path>`
`grow-y`	高度增长	`at` , `duration` , `target-h` （px） — 元素初始 `height:0`
`grow-x`	宽度增长	`at` , `duration` , `target-w` （px） — 元素初始 `width:0`
`scale-pop`	缩放入场	`at` , `duration`
`blur-in`	失焦到聚焦	`at` , `duration`
`mask-reveal`	遮罩展示	`at` , `duration` , `direction=left\|right\|top\|bottom`
`morph-to`	任意CSS动画	`at` , `duration` , `props='{...JSON...}'`

data-anim-at

是相对于卡片startSec的秒数——步骤9将每个声明编译到GSAP时间线时，需加上卡片的

startSec

得到绝对时间，并量化到1/fps。

9. Assemble the Composition HTML

9. 整合合成HTML

Stage the assets and write

$WORK_DIR/public/index.html

bash

undefined

准备资源并编写

$WORK_DIR/public/index.html

：

bash

undefined

SKILL_DIR is injected by the host ("Base directory for this skill: …")

SKILL_DIR由宿主注入（"此技能的基础目录：…"）

SKILL_DIR="<SKILL_DIR>"

mkdir -p "$WORK_DIR/public/fonts" "$WORK_DIR/public/vendor" "$WORK_DIR/public/cards" cp -n "$SKILL_DIR/assets/fonts/"* "$WORK_DIR/public/fonts/" cp -n "$SKILL_DIR/assets/vendor/gsap.min.js" "$WORK_DIR/public/vendor/"

SKILL_DIR="<SKILL_DIR>"

stage the input video — RE-ENCODE with dense keyframes. Sources with a sparse GOP

准备输入视频——重新编码为密集关键帧。关键帧间隔>~1秒的源视频在渲染器中会出现冻结（叠加层下的帧冻结）；-g / -keyint_min设置为合成文件帧率可让每一帧都可寻址。

(keyframe interval > ~1s) freeze on seek in the renderer (a frozen frame under the

（设置为你的帧率——示例为30；可使用24/25/60匹配源视频。）

overlays); -g / -keyint_min set to your composition fps make every frame seekable.

—

(Set both to your fps — 30 shown; use 24/25/60 to match.)

—

ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"

undefined

ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"

undefined

Composition Template

合成文件模板

html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <style>
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "LXGW WenKai TC";
        src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "Virgil";
        src: url("fonts/Virgil.woff2") format("woff2");
        font-display: block;
      }

      :root {
        /* Pick from the themeId palette table in Step 7 — example: classic */
        --bg: #fff9e3;
        --text: #1e1e1e;
        --accent-0: #1971c2;
        --accent-1: #e03131;
        --accent-2: #2f9e44;
        --accent-3: #e8590c;
        --accent-4: #9c36b5;
        --font-family: "Caveat", "LXGW WenKai TC", serif;
      }
      * {
        box-sizing: border-box;
      }
      /* Body font-family MUST list concrete font names (not just var(--font-family)) —
   the HyperFrames renderer's static analyzer doesn't expand CSS variables when
   resolving fonts, so a var-only chain triggers `font_family_without_font_face`
   lint and falls back to a generic. Use the concrete chain here; cards that
   want the theme font can still reference var(--font-family) internally. */
      html,
      body {
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100%;
        overflow: hidden;
        background: #000;
        font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
      }
      #stage {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }

      /* video-wrapper holds the source video. Its position / size are animated
   over time by the master timeline (one tween per layout transition). */
      .video-wrapper {
        position: absolute;
        left: 0;
        top: 0;
        width: 1920px;
        height: 1080px;
        overflow: hidden;
        border-radius: 0;
        box-shadow: none;
      }
      .video-wrapper video {
        width: 100%;
        height: 100%;
        object-fit: cover;
      }

      .card-host {
        position: absolute;
        pointer-events: none;
        overflow: hidden;
      }
      .card-host .card {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }
      .card-host .char {
        display: inline-block;
        visibility: visible;
      }

      /* Subtle drop shadow + rounded corners for non-fullscreen video framings */
      .video-wrapper.framed {
        border-radius: 16px;
        box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
      }
    </style>
  </head>
  <body>
    <div
      id="stage"
      data-composition-id="talking-head-recut"
      data-start="0"
      data-duration="121.2"
      data-fps="30"
      data-width="1920"
      data-height="1080"
    >
      <!-- Layer 1: source video — initial position matches card-01's layout -->
      <div class="video-wrapper" id="video-wrap">
        <video
          id="bg-video"
          src="input-video.mp4"
          muted
          playsinline
          data-start="0"
          data-duration="121.2"
          data-track-index="1"
        ></video>
      </div>

      <!-- Layer 2: each card-host sits at the bounds dictated by its layout. -->
      <!-- IMPORTANT: every card-host MUST carry BOTH "card-host" and "clip" classes. -->
      <!--   - "card-host"  → our positioning + pointer-events styles                 -->
      <!--   - "clip"       → HyperFrames runtime uses this to enforce visibility     -->
      <!--                    only during data-start … data-start+data-duration.      -->
      <!--                    Without "clip" the host stays visible the whole video   -->
      <!--                    (lint: timed_element_missing_clip_class).               -->
      <!-- Example: card-01 with zone="fullscreen" → card-host covers (0,0,1920,1080) -->
      <div
        class="card-host clip"
        data-card-id="card-01"
        data-start="1.0000"
        data-duration="6.5000"
        data-track-index="2"
        style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- paste the contents of public/cards/card-01.html here -->
      </div>

      <!-- Example: card-02 with zone="side-panel" (split composition layout) → card on left half -->
      <div
        class="card-host clip"
        data-card-id="card-02"
        data-start="8.0000"
        data-duration="12.0000"
        data-track-index="2"
        style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- card-02 HTML -->
      </div>

      <!-- ...one "card-host clip" per card with inline bounds matching resolveZoneBounds(card.zone)... -->

      <script src="vendor/gsap.min.js"></script>
      <script>
        (function () {
          // count-up formatter helper
          window.__fmt = function (v, fmt) {
            if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
              return Number(v).toFixed(Number(fmt.slice(1, -1)));
            }
            if (fmt === ",d") return Math.round(v).toLocaleString();
            return String(Math.round(v));
          };

          const tl = window.gsap.timeline({ paused: true });

          // ── Card lifecycle (one block per card) ──
          // Example for card-01 [1.0, 7.5] with kinetic-chars at +0.3, grow-x at +0.65:

          // Enter (fade in over 0.4s)
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
          tl.fromTo(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            1.0,
          );

          // Card-internal anims (compile each data-anim-* declaration here)
          tl.from(
            '.card[data-card-id="card-01"] #card-01-title .char',
            { opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
            1.3,
          );
          tl.fromTo(
            '.card[data-card-id="card-01"] #card-01-line',
            { width: 0 },
            { width: 420, duration: 0.5, ease: "power2.out" },
            1.65,
          );

          // Exit (fade out over 0.35s, ending at endSec)
          tl.to(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0, duration: 0.35, ease: "power2.in" },
            7.15,
          );
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);

          // ── Video framing transitions ──
          // When the next card uses a different composition layout, animate the
          // video-wrapper to its new bounds. Example: card-01 = fullscreen
          // (video hidden behind), card-02 = split composition (zone="side-panel"
          // → video on right, card on left).

          // Card-02 enters at 8.0s with the split composition. Animate video to
          // the right half during the card-01 → card-02 gap (between 7.5 and 8.0s).
          tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
          tl.to(
            "#video-wrap",
            { left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
            7.5,
          );

          // Card-02 enter — same pattern as card-01
          tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
          tl.fromTo(
            '.card-host[data-card-id="card-02"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            8.0,
          );
          // ...card-02 internal anims...

          // ── repeat for each card; if the NEXT card's layout differs,
          //    insert another tl.to('#video-wrap', ...) tween before its enter ──

          window.__timelines = window.__timelines || {};
          window.__timelines["talking-head-recut"] = tl;
        })();
      </script>
    </div>
  </body>
</html>

html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <style>
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "LXGW WenKai TC";
        src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "Virgil";
        src: url("fonts/Virgil.woff2") format("woff2");
        font-display: block;
      }

      :root {
        /* 从步骤7的themeId调色板表格中选择——示例：classic */
        --bg: #fff9e3;
        --text: #1e1e1e;
        --accent-0: #1971c2;
        --accent-1: #e03131;
        --accent-2: #2f9e44;
        --accent-3: #e8590c;
        --accent-4: #9c36b5;
        --font-family: "Caveat", "LXGW WenKai TC", serif;
      }
      * {
        box-sizing: border-box;
      }
      /* 正文font-family必须列出具体字体名称（不能仅使用var(--font-family)）——
   HyperFrames渲染器的静态分析器解析字体时不会展开CSS变量，仅使用变量会触发`font_family_without_font_face`
   lint并回退到通用字体。此处使用具体字体链；卡片内部仍可使用var(--font-family)引用主题字体。 */
      html,
      body {
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100%;
        overflow: hidden;
        background: #000;
        font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
      }
      #stage {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }

      /* video-wrapper包含源视频。其位置/尺寸由主时间线动画控制（每个布局过渡一个动画）。 */
      .video-wrapper {
        position: absolute;
        left: 0;
        top: 0;
        width: 1920px;
        height: 1080px;
        overflow: hidden;
        border-radius: 0;
        box-shadow: none;
      }
      .video-wrapper video {
        width: 100%;
        height: 100%;
        object-fit: cover;
      }

      .card-host {
        position: absolute;
        pointer-events: none;
        overflow: hidden;
      }
      .card-host .card {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }
      .card-host .char {
        display: inline-block;
        visibility: visible;
      }

      /* 非全屏视频的细微阴影+圆角 */
      .video-wrapper.framed {
        border-radius: 16px;
        box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
      }
    </style>
  </head>
  <body>
    <div
      id="stage"
      data-composition-id="talking-head-recut"
      data-start="0"
      data-duration="121.2"
      data-fps="30"
      data-width="1920"
      data-height="1080"
    >
      <!-- 第一层：源视频——初始位置匹配card-01的布局 -->
      <div class="video-wrapper" id="video-wrap">
        <video
          id="bg-video"
          src="input-video.mp4"
          muted
          playsinline
          data-start="0"
          data-duration="121.2"
          data-track-index="1"
        ></video>
      </div>

      <!-- 第二层：每个card-host位于其布局指定的边界。 -->
      <!-- 重要：每个card-host必须同时包含"card-host"和"clip"类。 -->
      <!--   - "card-host"  → 我们的定位+指针事件样式                 -->
      <!--   - "clip"       → HyperFrames运行时使用此类控制可见性     -->
      <!--                    仅在data-start … data-start+data-duration期间可见。      -->
      <!--                    没有"clip"类的话，宿主会在整个视频中可见   -->
      <!--                    （lint错误：timed_element_missing_clip_class）。               -->
      <!-- 示例：zone="fullscreen"的card-01 → card-host覆盖(0,0,1920,1080) -->
      <div
        class="card-host clip"
        data-card-id="card-01"
        data-start="1.0000"
        data-duration="6.5000"
        data-track-index="2"
        style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- 粘贴public/cards/card-01.html的内容到这里 -->
      </div>

      <!-- 示例：zone="side-panel"的card-02（split合成布局）→ 卡片在左半部分 -->
      <div
        class="card-host clip"
        data-card-id="card-02"
        data-start="8.0000"
        data-duration="12.0000"
        data-track-index="2"
        style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- card-02的HTML -->
      </div>

      <!-- ...每个卡片对应一个"card-host clip"，内联边界匹配resolveZoneBounds(card.zone)... -->

      <script src="vendor/gsap.min.js"></script>
      <script>
        (function () {
          // count-up格式化工具
          window.__fmt = function (v, fmt) {
            if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
              return Number(v).toFixed(Number(fmt.slice(1, -1)));
            }
            if (fmt === ",d") return Math.round(v).toLocaleString();
            return String(Math.round(v));
          };

          const tl = window.gsap.timeline({ paused: true });

          // ── 卡片生命周期（每张卡片一个代码块） ──
          // 示例：card-01 [1.0, 7.5]，kinetic-chars在+0.3，grow-x在+0.65：

          // 入场（0.4秒淡入）
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
          tl.fromTo(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            1.0,
          );

          // 卡片内部动画（将每个data-anim-*声明编译到这里）
          tl.from(
            '.card[data-card-id="card-01"] #card-01-title .char',
            { opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
            1.3,
          );
          tl.fromTo(
            '.card[data-card-id="card-01"] #card-01-line',
            { width: 0 },
            { width: 420, duration: 0.5, ease: "power2.out" },
            1.65,
          );

          // 退场（0.35秒淡出，在endSec结束）
          tl.to(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0, duration: 0.35, ease: "power2.in" },
            7.15,
          );
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);

          // ── 视频布局过渡 ──
          // 当下一张卡片使用不同合成布局时，将video-wrapper动画到新边界。示例：card-01=全屏
          // （视频在卡片后隐藏），card-02=split合成布局（zone="side-panel"
          // → 视频在右侧，卡片在左侧）。

          // card-02在8.0秒入场，使用split合成布局。在card-01→card-02的间隙（7.5到8.0秒）将视频动画到右半部分。
          tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
          tl.to(
            "#video-wrap",
            { left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
            7.5,
          );

          // card-02入场——与card-01模式相同
          tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
          tl.fromTo(
            '.card-host[data-card-id="card-02"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            8.0,
          );
          // ...card-02内部动画...

          // ── 为每张卡片重复上述步骤；如果下一张卡片布局不同，
          //    在其入场前插入另一个tl.to('#video-wrap', ...)动画 ──

          window.__timelines = window.__timelines || {};
          window.__timelines["talking-head-recut"] = tl;
        })();
      </script>
    </div>
  </body>
</html>

GSAP Statement Cheat Sheet

GSAP语句速查表

Compile each

data-anim

attribute into a GSAP statement. Times are absolute seconds = card.startSec + data-anim-at, quantized to 1/fps. Selector is

.card[data-card-id="X"] #elementId

data-anim	GSAP statement template
`fade-in`	`tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);`
`fade-out`	`tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);`
`slide-in` (from=left, dist=80)	`tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);`
`kinetic-chars` (pop)	`tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);`
`count-up`	`(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();`
`draw-path`	`(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();`
`grow-x` (target-w=W)	`tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);`
`grow-y` (target-h=H)	`tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);`
`scale-pop`	`tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);`
`mask-reveal` (direction=left)	`tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);`

Quantize:

T = Math.round(absSec * fps) / fps

. At 30fps the smallest step is

1/30 ≈ 0.0333s

; rounding to 4 decimals (

.toFixed(4)

) is fine inside the JS literal.

将每个

data-anim

属性编译为GSAP语句。时间为绝对秒数 = card.startSec + data-anim-at，量化到1/fps。选择器为

.card[data-card-id="X"] #elementId

。

data-anim	GSAP语句模板
`fade-in`	`tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);`
`fade-out`	`tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);`
`slide-in` （from=left, dist=80）	`tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);`
`kinetic-chars` （pop）	`tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);`
`count-up`	`(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();`
`draw-path`	`(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();`
`grow-x` （target-w=W）	`tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);`
`grow-y` （target-h=H）	`tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);`
`scale-pop`	`tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);`
`mask-reveal` （direction=left）	`tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);`

量化：

T = Math.round(absSec * fps) / fps

。30fps时最小步长为

1/30 ≈ 0.0333s

；在JS字面量中保留4位小数（

.toFixed(4)

）即可。

Video Framing Reference (per

layout

value)

视频布局参考（按

layout

取值）

The selector for the video container is

#video-wrap

. Animate its bounds between cards using

tl.to('#video-wrap', { ...bounds }, T)

. Initial bounds should be set inline on the element to match card-01's layout. Pick a transition duration of 0.5–0.7s with

ease: 'power2.inOut'

Decorative frames (

clean

hairline

polaroid

) sit as a sibling of

#video-wrap

and follow it through layout transitions. See

references/frames/

for each frame's placement HTML, suggested CSS, and which layouts it pairs with. Quick rule:

overlay

layout suppresses decorative frames (the full-bleed video clashes with chrome); PiP layouts already have their own pill treatment (border-radius + white ring + shadow), so add a decorative frame only on top of

split

stack

GSAP target lookup table for

#video-wrap

per composition layout (landscape 1920×1080 — for portrait & 4:5 see

references/layouts/*.html

which list all three ratios):

composition layout	typical card.zone	`#video-wrap` GSAP target	extra css class
`split`	`side-panel`	`{ left: 960, top: 0, width: 960, height: 1080 }`	—
`stack`	`lower-third`	`{ left: 14, top: 14, width: 1892, height: 548 }` (top 52%)	—
`pip` (bottom-right)	`fullscreen`	`{ left: 1480, top: 760, width: 400, height: 300 }`	`pip-pill` (border-radius + ring + shadow)
`pip` (top-left)	`fullscreen`	`{ left: 40, top: 40, width: 400, height: 300 }`	`pip-pill`
`overlay` (video full-bleed)	`video-overlay`	`{ left: 0, top: 0, width: 1920, height: 1080 }` (no change from default)	—
hide video (pure-graphic moment)	`fullscreen`	`{ opacity: 0 }` (or move off-canvas)	—

To toggle the pip-pill chrome (border-radius + white ring + drop shadow) when entering or leaving a pip moment:

// Enter pip — add chrome
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
  "#video-wrap",
  { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
  T,
);

// Leave pip — back to clean full-bleed
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
  "#video-wrap",
  { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
  T_NEXT,
);

Card-host bounds match the zone. Resolve the card's

zone

into pixel bounds using the table at the top of Step 6, then write those into the card-host's inline

style="left:Xpx;top:Ypx;width:Wpx; height:Hpx;..."

. For

video-overlay

zone (overlay recipe), the card-host fills the full canvas — your CSS inside

.card .root

decides where the actual visible card sits.

视频容器的选择器为

#video-wrap

。使用

tl.to('#video-wrap', { ...bounds }, T)

在卡片间动画其边界。初始边界应设置在元素内联样式中，匹配card-01的布局。过渡时长选择0.5–0.7s，使用

ease: 'power2.inOut'

。

装饰性帧（

clean

hairline

polaroid

）作为

#video-wrap

的同级元素，并跟随其布局过渡。查看

references/frames/

获取每个帧的放置HTML、建议CSS以及适配的布局。快速规则：

overlay

布局不使用装饰性帧（全屏视频与边框冲突）；PiP布局已有自己的胶囊样式（圆角+白边+阴影），因此仅在

split

stack

布局上添加装饰性帧。

#video-wrap
的GSAP目标查找表（按合成布局，横屏1920×1080——竖屏&4:5见

references/layouts/*.html

，包含三种宽高比）：

合成布局	典型card.zone	`#video-wrap` 的GSAP目标	额外CSS类
`split`	`side-panel`	`{ left: 960, top: 0, width: 960, height: 1080 }`	—
`stack`	`lower-third`	`{ left: 14, top: 14, width: 1892, height: 548 }` （上半部分52%）	—
`pip` （右下角）	`fullscreen`	`{ left: 1480, top: 760, width: 400, height: 300 }`	`pip-pill` （圆角+边框+阴影）
`pip` （左上角）	`fullscreen`	`{ left: 40, top: 40, width: 400, height: 300 }`	`pip-pill`
`overlay` （视频全屏）	`video-overlay`	`{ left: 0, top: 0, width: 1920, height: 1080 }` （与默认无变化）	—
隐藏视频（纯图形时刻）	`fullscreen`	`{ opacity: 0 }` （或移出画布）	—

进入或退出PiP模式时切换pip-pill样式（圆角+白边+阴影）：

// 进入PiP——添加样式
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
  "#video-wrap",
  { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
  T,
);

// 退出PiP——回到全屏
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
  "#video-wrap",
  { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
  T_NEXT,
);

card-host边界匹配zone。使用步骤6顶部的表格将卡片的

zone

解析为像素边界，然后写入card-host的内联

style="left:Xpx;top:Ypx;width:Wpx;height:Hpx;..."

。对于

video-overlay

区域（overlay布局），card-host填满整个画布——

.card .root

内部的CSS决定实际可见卡片的位置。

HyperFrames Layout / Animation QA Rules

HyperFrames布局/动画QA规则

Build each card's static hero frame first: the moment where the card is fully visible and readable.
Confirm video, cards, subtitles/captions, and diagrams do not unintentionally overlap.
Confirm hidden video areas are clipped by the frame and not visible outside intended bounds.

window.__timelines["talking-head-recut"]

Build timelines synchronously at page load; no
```
async
```
,
```
setTimeout
```
, Promises, or media
```
play()
```
calls.
Do not use
```
Math.random()
```
or
```
Date.now()
```
in render paths.
Do not use
```
repeat: -1
```
; calculate finite repeats from the video duration.
Prefer GSAP transforms and opacity (
```
x
```
,
```
y
```
,
```
scale
```
,
```
rotation
```
,
```
opacity
```
) over layout properties (
```
top
```
,
```
left
```
,
```
width
```
,
```
height
```
) for motion.
Animate wrappers such as
```
#video-wrap
```
, not the video element dimensions directly.
Avoid animating the same property on the same element from multiple timelines at the same time.

Use

data-track-index

, not

data-layer

; use

data-duration

, not

data-end

Every timed element (
```
card-host
```
, sub-composition, etc.) MUST include
```
class="clip"
```
alongside its own classes — e.g.
```
class="card-host clip"
```
. The HyperFrames runtime uses
```
.clip
```
to gate visibility to the
```
data-start … data-start+data-duration
```
window. Without it the element is visible for the whole video (lint:
```
timed_element_missing_clip_class
```
).
For body / global
```
font-family
```
, list concrete font names (
```
'Inter', 'Caveat', …
```
) — not a CSS variable like
```
var(--font-family)
```
. The HyperFrames font resolver doesn't expand CSS vars during static analysis (lint:
```
font_family_without_font_face
```
). Cards may still use
```
var(--font-family)
```
internally since their
```
@font-face
```
declarations are loaded.

先制作每张卡片的静态核心帧：卡片完全可见且可读的时刻。
确认视频、卡片、字幕/标题、图表不会意外重叠。
确认隐藏的视频区域被帧裁剪，不会在预期边界外可见。

注册一个暂停的主时间线为

window.__timelines["talking-head-recut"]

。

页面加载时同步构建时间线；禁止
```
async
```
、
```
setTimeout
```
、Promises或媒体
```
play()
```
调用。
渲染路径中禁止使用
```
Math.random()
```
或
```
Date.now()
```
。
禁止使用
```
repeat: -1
```
；根据视频时长计算有限重复次数。
动画优先使用GSAP变换和透明度（
```
x
```
,
```
y
```
,
```
scale
```
,
```
rotation
```
,
```
opacity
```
），而非布局属性（
```
top
```
,
```
left
```
,
```
width
```
,
```
height
```
）。
动画容器（如
```
#video-wrap
```
），而非直接动画视频元素尺寸。
避免同一时间从多个时间线动画同一元素的同一属性。

使用

data-track-index

，而非

data-layer

；使用

data-duration

，而非

data-end

。

每个定时元素（
```
card-host
```
、子合成文件等）必须在自身类之外包含
```
class="clip"
```
——例如
```
class="card-host clip"
```
。HyperFrames运行时使用
```
.clip
```
控制元素仅在
```
data-start … data-start+data-duration
```
窗口可见。没有此类的话元素会在整个视频中可见（lint错误：
```
timed_element_missing_clip_class
```
）。
正文/全局
```
font-family
```
需列出具体字体名称（
```
'Inter', 'Caveat', …
```
）——不能仅使用CSS变量如
```
var(--font-family)
```
。HyperFrames字体解析器静态分析时不会展开CSS变量（lint错误：
```
font_family_without_font_face
```
）。卡片内部仍可使用
```
var(--font-family)
```
，因为其
```
@font-face
```
声明已加载。

10. Render to MP4

10. 渲染为MP4

bash

cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
  --skill=talking-head-recut \
  -o output.mp4 \
  --fps 30

hyperframes render <dir>

reads

<dir>/index.html

and produces the MP4. The flag

PRODUCER_BROWSER_GPU_MODE=hardware

(or

--browser-gpu

) is strongly recommended on macOS — software-only Chrome rendering times out on most laptops.

For a sanity check before the full render, capture a single frame at a specific timestamp:

bash

npx hyperframes snapshot public --at 5    # → public/snapshots/frame-00-at-5s.png (a single --at ignores --out)

bash

cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
  --skill=talking-head-recut \
  -o output.mp4 \
  --fps 30

hyperframes render <dir>

读取

<dir>/index.html

并生成MP4。在macOS上强烈建议使用

PRODUCER_BROWSER_GPU_MODE=hardware

（或

--browser-gpu

）标志——纯软件Chrome渲染在大多数笔记本电脑上会超时。

完整渲染前可进行 sanity check，在特定时间戳捕获单帧：

bash

npx hyperframes snapshot public --at 5    # → public/snapshots/frame-00-at-5s.png（单个--at忽略--out）

11. Report Results

11. 报告结果

Tell the user:

Work directory path
```
storyboard.json
```
(the card outline you designed)
```
public/cards/*.html
```
(one HTML per card)
```
public/index.html
```
(the assembled composition)
```
output.mp4
```
(the final video)
ASR provider used
Card count + how you chose them (in 1 sentence)
Any missing keys or quality caveats

Optional live preview (on request only). The clip plays unchanged inside

public/index.html

with the overlays on top, so it previews faithfully. Don't open it during the run. When the user asks, start a long-lived server after render and report the URL:

bash

(cd "$WORK_DIR/public" && npx hyperframes preview)   # or `npx hyperframes play` for a shareable link

Do not delete the work directory unless the user asks.

告知用户：

工作目录路径
```
storyboard.json
```
（你设计的卡片大纲）
```
public/cards/*.html
```
（每张卡片对应一个HTML）
```
public/index.html
```
（整合后的合成文件）
```
output.mp4
```
（最终视频）
使用的ASR提供商
卡片数量+选择依据（一句话）
任何缺失的密钥或质量注意事项

可选实时预览（仅在用户要求时）。源视频在

public/index.html

中完整播放，叠加层在上方，因此预览效果准确。运行期间不要打开。用户要求时，在渲染完成后启动长期服务器并报告URL：

bash

(cd "$WORK_DIR/public" && npx hyperframes preview)   # 或`npx hyperframes play`获取可分享链接

除非用户要求，否则不要删除工作目录。

talking-head-recut

Original

Translation

Talking Head Recut

访谈类视频重剪

CLI Resolution

CLI 说明

hyperframes — transcription (local Whisper) + rendering the assembled HTML to MP4

hyperframes — 转录（本地Whisper）+ 将整合后的HTML渲染为MP4

Workflow

工作流

1. Check Environment

1. 检查环境

confirm bundled assets:

确认捆绑资源：

2. Create a Work Directory

2. 创建工作目录

3. Extract Audio and Metadata

3. 提取音频和元数据

metadata — duration / width / height / fps

元数据 — 时长/宽度/高度/帧率

audio

音频

4. Transcribe

4. 转录

5. Correct Transcript

5. 修正字幕

6. Draft a Lightweight Storyboard (in chat)

6. 起草轻量化故事板（在对话中）

7. Decide Render Strategy

7. 确定渲染策略

Confirm Visual Direction with User (DO THIS FIRST)

与用户确认视觉方向（务必先做）

Render Strategy Inputs

渲染策略输入

Visual Design Library (<SKILL_DIR>/references/)

视觉设计库（<SKILL_DIR>/references/）

Layout Compositions (Card + Video)

布局合成（卡片+视频）

Storyboard Render Contract

故事板渲染约定

8. Write Each Card's HTML

8. 编写每张卡片的HTML

Card HTML Contract

卡片HTML约定

Card Sizing — Mobile-First in Portrait

卡片尺寸——竖屏优先

Available data-anim Kinds

可用的data-anim类型

9. Assemble the Composition HTML

9. 整合合成HTML

SKILL_DIR is injected by the host ("Base directory for this skill: …")

SKILL_DIR由宿主注入（"此技能的基础目录：…"）

stage the input video — RE-ENCODE with dense keyframes. Sources with a sparse GOP

准备输入视频——重新编码为密集关键帧。关键帧间隔>~1秒的源视频在渲染器中会出现冻结（叠加层下的帧冻结）；-g / -keyint_min设置为合成文件帧率可让每一帧都可寻址。

(keyframe interval > ~1s) freeze on seek in the renderer (a frozen frame under the

（设置为你的帧率——示例为30；可使用24/25/60匹配源视频。）

overlays); -g / -keyint_min set to your composition fps make every frame seekable.

(Set both to your fps — 30 shown; use 24/25/60 to match.)

Composition Template

合成文件模板

GSAP Statement Cheat Sheet

GSAP语句速查表

Video Framing Reference (per layout value)

视频布局参考（按layout取值）

HyperFrames Layout / Animation QA Rules

HyperFrames布局/动画QA规则

10. Render to MP4

10. 渲染为MP4

11. Report Results

11. 报告结果

Available
`data-anim`
Kinds

可用的
`data-anim`
类型

Video Framing Reference (per
`layout`
value)

视频布局参考（按
`layout`
取值）