graphic-overlays

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Graphic Overlays

图形叠加层

Graphic Overlays takes a local video that plays in full and layers a sequence of timed, designed graphic cards onto it — titles, lower-thirds, data callouts, quotes, side panels, picture-in-picture — synced to what's being said. The agent designs the cards (timing + content) and writes each card's HTML directly in the conversation, then assembles a single composition HTML and renders it to MP4 via

hyperframes

. There is no fixed archetype list and no prescribed card structure — the overlays emerge from what the transcript actually says.

Confirm the route before you build. This skill packages an existing talking-head clip with designed graphic cards (titles, lower-thirds, data callouts, quotes, side panels, PiP). If the user wants plain captions / subtitles (the spoken words as text) →
/embedded-captions
; a single short unnarrated element (one logo sting / lower-third) →
/motion-graphics
. The clip plays untouched — re-timing, recoloring, reframing, reordering, or audio is NLE editing and out of scope. Building from a URL / topic / PR → the creation workflows. Unsure overlays-vs-captions? Read
/hyperframes-read-first
first.

Graphic-packaging sibling of
embedded-captions
. Captions add the spoken words as a readable subtitle; this adds designed graphics on top of the playing video. Plain subtitles →
embedded-captions
. Build a video from scratch → the creation workflows (
product-launch-video
/
faceless-explainer
/ …).

Inspectable intermediate files in the work directory:

```
metadata.json
```
— duration / width / height / fps
```
audio.mp3
```
— extracted audio

transcript.json

— a flat word array

[{ text, start, end }, …]

(Whisper; no

segments

, no

words

wrapper)

```
storyboard.json
```
— lightweight card outline (the agent's plan)
```
public/cards/card-XX.html
```
— one HTML fragment per card
```
public/index.html
```
— final assembled composition
```
output.mp4
```
— rendered video

图形叠加层功能会在完整播放的本地视频上，叠加一系列定时设计的图形卡片——包括标题、下三分之一字幕条、数据标注框、引用语、侧边栏、画中画——并与视频中的语音内容同步。Agent会设计卡片（时间规划+内容）并直接在对话中编写每张卡片的HTML，然后将所有内容组装成单个合成HTML文件，再通过

hyperframes

渲染为MP4。这里没有固定的卡片模板和预设结构，叠加层的内容完全由视频字幕的实际内容生成。

构建前请确认使用场景。此技能为现有单人讲话视频片段添加设计好的图形卡片（标题、下三分之一字幕条、数据标注框、引用语、侧边栏、画中画）。如果用户需要纯字幕/对白字幕（将语音内容转为文本）→ 使用
/embedded-captions
；如果需要单个简短无旁白元素（如单个logo动画/下三分之一字幕条）→ 使用
/motion-graphics
。源视频会完整播放——调整时长、调色、重构图、重新排序或音频处理属于非线性编辑（NLE）范畴，不在本技能范围内。从URL/主题/公关素材创建视频→使用创作工作流。不确定是叠加层还是字幕？请先阅读
/hyperframes-read-first
。

embedded-captions
的图形包装姊妹技能。字幕是将语音内容转为可读的文本；而本技能是在播放的视频上添加设计好的图形。纯字幕→使用
embedded-captions
。从零开始制作视频→使用创作工作流（
product-launch-video
/
faceless-explainer
/…）。

工作目录中的可检查中间文件：

```
metadata.json
```
— 视频时长/宽度/高度/帧率
```
audio.mp3
```
— 提取的音频文件
```
transcript.json
```
— 扁平化的单词数组
```
[{ text, start, end }, …]
```
（由Whisper生成；无
```
segments
```
，无
```
words
```
嵌套）
```
storyboard.json
```
— 轻量化的卡片大纲（Agent的规划文件）
```
public/cards/card-XX.html
```
— 每张卡片对应的HTML片段
```
public/index.html
```
— 最终组装的合成文件
```
output.mp4
```
— 渲染完成的视频

CLI Resolution

CLI 说明

bash

undefined

bash

undefined

hyperframes — transcription (local Whisper) + rendering the assembled HTML to MP4

hyperframes — 转录（本地Whisper）+ 将组装好的HTML渲染为MP4

npx hyperframes --help


This skill runs entirely on the **hyperframes** CLI plus system `ffmpeg` / `ffprobe`.
Transcription is local **Whisper** via `hyperframes transcribe` — no third-party
service, API key, or rate-limited proxy.

npx hyperframes --help


本技能完全基于**hyperframes** CLI和系统`ffmpeg`/`ffprobe`运行。转录通过`hyperframes transcribe`使用本地**Whisper**完成——无需第三方服务、API密钥或受限代理。

Workflow

工作流

1. Check Environment

1. 检查环境

bash

npx hyperframes doctor          # ffmpeg, headless browser, render deps

bash

npx hyperframes doctor          # 检查ffmpeg、无头浏览器、渲染依赖

confirm bundled assets:

确认捆绑资源：

ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"


Required:

- `ffmpeg` / `ffprobe` (system)
- `<SKILL_DIR>/assets/fonts/*.woff2`, `<SKILL_DIR>/assets/vendor/gsap.min.js` (bundled inside this skill, staged to work dir in Step 9)

Transcription needs no key — `hyperframes transcribe` runs Whisper locally (Step 4).

Strongly recommended on macOS for `hyperframes render`:

```bash
export PRODUCER_BROWSER_GPU_MODE=hardware

ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"


必需依赖：

- `ffmpeg`/`ffprobe`（系统级）
- `<SKILL_DIR>/assets/fonts/*.woff2`、`<SKILL_DIR>/assets/vendor/gsap.min.js`（已捆绑在本技能中，将在步骤9中部署到工作目录）

转录无需密钥——`hyperframes transcribe`通过本地Whisper运行（步骤4）。

在macOS上运行`hyperframes render`时强烈建议设置：

```bash
export PRODUCER_BROWSER_GPU_MODE=hardware

2. Create a Work Directory

2. 创建工作目录

All artifacts live under

videos/<project-name>/

— the same convention as the other video workflows (

product-launch-video

faceless-explainer

pr-to-video

). Keep the cwd at the workspace root; everything below writes under this one subdirectory.

bash

VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"

所有产物都存放在

videos/<project-name>/

下——与其他视频工作流（

product-launch-video

faceless-explainer

pr-to-video

）遵循相同约定。保持当前工作目录为工作区根目录；所有后续操作都会写入该子目录。

bash

VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"

3. Extract Audio and Metadata

3. 提取音频和元数据

bash

undefined

bash

undefined

metadata — duration / width / height / fps

元数据 — 时长/宽度/高度/帧率

ffprobe -v error -select_streams v:0
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"

audio

音频

ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"


Outputs: `metadata.json` (read `width`/`height`/`duration`; fps = the `r_frame_rate`
fraction evaluated, e.g. `30000/1001 → 29.97`) + `audio.mp3`.

ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"


输出文件：`metadata.json`（包含`width`/`height`/`duration`；帧率为`r_frame_rate`的分数计算值，例如`30000/1001 → 29.97`）+ `audio.mp3`。

4. Transcribe

4. 转录

bash

npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en

Local Whisper — no API key, no proxy, no rate limit. Writes a word-level

transcript.json

into the work dir (word

text

start

end

timestamps). Read it for the word / sentence timings that drive card timing in Step 6; group words into sentences yourself at punctuation / pauses if you need segment-level chunks.

Clamp to media duration. Whisper can return the final word's

end

a hair past the actual clip length — clamp every card

endSec

and

composition.durationSeconds

to the

metadata.json

duration, or the render will show a black tail past the video.

bash

npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en

本地Whisper——无需API密钥、代理或速率限制。将单词级别的

transcript.json

写入工作目录（包含单词

text

start

end

时间戳）。阅读该文件获取单词/句子时间戳，用于步骤6中的卡片时间规划；如果需要片段级别的分块，可以根据标点/停顿自行将单词分组为句子。

限制在媒体时长内。Whisper返回的最后一个单词的

end

时间可能略超过实际视频长度——需将每张卡片的

endSec

和

composition.durationSeconds

限制在

metadata.json

的时长内，否则渲染时视频末尾会出现黑屏。

5. Correct Transcript

5. 修正字幕

transcript.json

is a flat array of word objects —

[{ "text": "...", "start": s, "end": s }, …]

(no

segments

array, no

words

wrapper; the per-word key is text
). Read it and fix obvious ASR errors:

Homophones, product names, technical terms, punctuation
Edit a word's
```
text
```
in place; preserve its
start
/
end
timestamps
There is no pre-grouped
```
segments
```
array — group words into sentences yourself (split at terminal punctuation / pauses) when you need segment-level chunks for card timing

transcript.json

是扁平化的单词对象数组——

[{ "text": "...", "start": s, "end": s }, …]

（无

segments

数组，无

words

嵌套；每个单词的键为**

text

**）。阅读并修正明显的自动语音识别（ASR）错误：

同音词、产品名称、技术术语、标点符号
直接修改单词的
```
text
```
；保留其
start
/
end
时间戳
没有预分组的
```
segments
```
数组——需要时自行将单词分组为句子（根据句末标点/停顿拆分），用于卡片时间规划的片段级分块

6. Draft a Lightweight Storyboard (in chat)

6. 起草轻量化故事板（在对话中）

No CLI involved. Read

transcript.json

metadata.json

and design cards directly.

storyboard.json

is an agent-internal planning artifact — no CLI command consumes it; it exists so you can think clearly about timing and content before writing each card's HTML. Keep the shape consistent with the example below so the same outline can drive the composition you author in Step 9:

json

{
  "schemaVersion": 3,
  "composition": {
    "fps": 30,
    "width": 1080,
    "height": 1920,
    "durationSeconds": 121.2,
    "layout": "portrait",
    "themeId": "noir",
    "seed": 42
  },
  "videoTrack": {
    "sourcePath": "input-video.mp4",
    "startSec": 0,
    "endSec": 121.2,
    "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
  },
  "subtitles": { "enabled": false },
  "cards": [
    {
      "id": "card-01",
      "intent": "Hook with the speaker's anxious midnight question",
      "startSec": 0.5,
      "endSec": 13.0,
      "accentIndex": 0,
      "zone": "fullscreen",
      "contentHints": {
        "kicker": "AN HONEST QUESTION",
        "title": "The soul-searching question at 11 PM",
        "detail": "Client's 60-second voice message: 'If the RMB appreciates, does that mean my USD policy is a terrible loss?'"
      }
    }
  ]
}

Required Card fields:

field	type	purpose
`id`	string	stable id used in card HTML & GSAP selectors
`intent`	string	natural-language description; fed to card synthesis
`startSec` / `endSec`	number	times in seconds (endSec > startSec)
`accentIndex`	0 \| 1 \| 2 \| 3 \| 4	which of the 5 theme accent colors this card pulls
`zone`	enum (see below)	where on the canvas the card lives
`contentHints`	object	free-form bag; agent puts kicker/title/detail/data/quote here
`archetype` (optional)	string	free-form label you may attach to remember a card's pattern; absent = free-form, which is the default
`transition` (optional)	enum: `cut` \| `fade` \| `slide` \| `wipe`	declarative card-to-card transition

Five
zone
values:

zone	resolved bounds	when to use
`fullscreen`	covers whole canvas	hero moments, big numbers, mantras
`whiteboard-area`	inset 40px margin (or 45% of portrait height)	dense data / annotated content
`lower-third`	bottom 30% band	annotation over visible video
`side-panel`	right 42% (landscape) or bottom 40% (portrait)	data side, video other side
`video-overlay`	full canvas, expects mostly-transparent card	annotation overlays on full-bleed video

When you assemble the composition in Step 9, resolve each card's

zone

into pixel bounds on the card-host wrapper following the table above. Video bounds are set once at composition level (

videoTrack.bounds

); to make video appear to "move between cards", author GSAP tweens against

#video-wrap

in the composition's

<script>

(see Step 9).

No prescribed card roles, no prescribed narrative arc. Cards emerge from what the video actually says — could be all quotes or all data, could open with a number or with a story. Let the transcript drive the rhythm.

How many takeaways? — auto-infer from duration + density. No fixed upper limit. Pick a base pace from the video duration, then adjust by information density. Only floor is fixed: minimum 5 cards so even short videos have rhythm.

Step 1 — base pace by duration (the natural sec/card for medium density):

video duration	base pace (sec per card)	rationale
< 60s (short reel)	6–8s	viewers expect fast cuts in short-form
60s – 3 min	8–12s	normal social pace
3 – 10 min	12–20s	give breathing room; each card carries more
10 – 30 min	20–35s	long-form lecture / interview rhythm
> 30 min	30–60s	episodic, near-chapter feel

Step 2 — density multiplier (multiplies the base pace):

signal in the transcript	multiplier	effect
High density — many numbers, distinct claims, staccato pacing, list-like enumeration, every 1–2 sentences is a new idea	× 0.7	cuts faster, more cards
Medium density — mixed flow with both data and narrative	× 1.0	base pace
Low density — one extended story, repeated reframing, slow reflective pacing, single argument unfolding	× 1.5	cuts slower, fewer cards

Step 3 — compute:

secPerCard = basePace × densityMultiplier
cardCount  = max(5, round(videoDurationSec / secPerCard))

Examples (notice — no upper clamp; long videos naturally produce more cards):

30s reel, single punchline (low density) → 7 × 1.5 = 10.5s/card → round(30/10.5)=3 → floor to 5 cards
60s reflective monologue (low density) → 10 × 1.5 = 15s/card → 4 → floor to 5 cards
121s talking-head with rich data (high density) → 10 × 0.7 = 7s/card → 17 cards
5 min interview, mixed density → 16 × 1.0 = 16s/card → 19 cards
10 min deep-dive, high density → 16 × 0.7 = 11s/card → 55 cards
30 min lecture, medium density → 28 × 1.0 = 28s/card → 64 cards
1 hr podcast, low density → 45 × 1.5 = 67.5s/card → 53 cards

When a card holds longer than ~15s, plan for a richer card (data block, multi-step reveal, several sub-points unfolding with staggered animations) — a static one-liner gets boring past 8s. For long pieces where many cards exceed 30s, consider chunking the timeline into sub-compositions (one .html per chapter, mounted with

data-composition-src

) so the GSAP timeline per file stays manageable — see the

timeline_track_too_dense

HyperFrames lint warning.

content

can be a plain string ("Title: annualized 5.69%\nNotes: ...") or any JSON shape that captures the data. The agent decides the shape per card.

Optional outro. This skill ships no fixed brand outro. If the user wants a closing card, design a neutral one yourself (wordmark + one-line tagline, ~1.5-2s, fade in -> short hold -> fade out), append it to

cards[]

, and extend

composition.durationSeconds

to its

endSec

. Otherwise end on the last content card.

无需CLI操作。阅读

transcript.json

metadata.json

并直接设计卡片。

storyboard.json

是Agent内部的规划文件——没有CLI命令会解析它；它的作用是让你在编写每张卡片的HTML前，清晰规划时间和内容。保持与以下示例一致的结构，以便相同的大纲可以驱动步骤9中的合成文件创作：

json

{
  "schemaVersion": 3,
  "composition": {
    "fps": 30,
    "width": 1080,
    "height": 1920,
    "durationSeconds": 121.2,
    "layout": "portrait",
    "themeId": "noir",
    "seed": 42
  },
  "videoTrack": {
    "sourcePath": "input-video.mp4",
    "startSec": 0,
    "endSec": 121.2,
    "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
  },
  "subtitles": { "enabled": false },
  "cards": [
    {
      "id": "card-01",
      "intent": "用主讲人深夜的焦虑问题吸引观众",
      "startSec": 0.5,
      "endSec": 13.0,
      "accentIndex": 0,
      "zone": "fullscreen",
      "contentHints": {
        "kicker": "一个坦诚的问题",
        "title": "深夜11点的灵魂拷问",
        "detail": "客户的60秒语音消息：‘如果人民币升值，是不是意味着我的美元策略亏大了？’"
      }
    }
  ]
}

必需的卡片字段：

字段	类型	用途
`id`	字符串	用于卡片HTML和GSAP选择器的稳定ID
`intent`	字符串	自然语言描述；用于卡片内容生成
`startSec` / `endSec`	数字	时间（秒，endSec > startSec）
`accentIndex`	0 \| 1 \| 2 \| 3 \| 4	卡片使用的主题强调色索引（共5种）
`zone`	枚举（见下文）	卡片在画布上的位置
`contentHints`	对象	自由格式的内容提示；Agent可在此添加标题/副标题/详情/数据/引用语
`archetype` （可选）	字符串	用于标记卡片模式的自由格式标签；未设置则为自由格式（默认）
`transition` （可选）	枚举： `cut` \| `fade` \| `slide` \| `wipe`	卡片间的过渡效果声明

5种
zone
取值：

zone	解析后的边界	使用场景
`fullscreen`	覆盖整个画布	重点时刻、大数字、核心观点
`whiteboard-area`	内边距40px（或竖屏高度的45%）	密集数据/带注释的内容
`lower-third`	底部30%区域	视频上的注释内容
`side-panel`	右侧42%（横屏）或底部40%（竖屏）	数据侧边栏，视频在另一侧
`video-overlay`	整个画布，卡片需设置为半透明	全屏视频上的注释叠加层

在步骤9组装合成文件时，需根据上述表格将每张卡片的

zone

解析为卡片容器的像素边界。视频边界在合成文件级别设置一次（

videoTrack.bounds

）；若要实现视频在卡片间“移动”的效果，需在合成文件的

<script>

中针对

#video-wrap

编写GSAP动画（见步骤9）。

无预设卡片角色和叙事结构。卡片内容完全由视频实际内容生成——可以全是引用语或全是数据，也可以以数字或故事开头。让字幕内容主导节奏。

需要多少个要点？——根据时长和信息密度自动推断。没有固定上限。先根据视频时长选择基础节奏，再根据信息密度调整。唯一固定下限：至少5张卡片，确保即使是短视频也有节奏。

步骤1 — 按时长确定基础节奏（中等密度下的自然单卡时长）：

视频时长范围	基础节奏（每张卡片秒数）	理由
< 60秒（短视频）	6–8秒	观众期望短视频节奏明快
60秒 – 3分钟	8–12秒	常规社交平台节奏
3 – 10分钟	12–20秒	给观众留出消化时间；每张卡片承载更多内容
10 – 30分钟	20–35秒	长讲座/访谈的节奏
> 30分钟	30–60秒	章节式节奏，接近分段体验

步骤2 — 密度乘数（乘以基础节奏）：

字幕中的信号	乘数	效果
高密度 — 包含大量数字、不同观点、急促节奏、列表式枚举、每1–2句话就是新观点	× 0.7	切换更快，卡片更多
中等密度 — 数据与叙事混合	× 1.0	保持基础节奏
低密度 — 单一长篇故事、重复表述、慢节奏反思、单一观点逐步展开	× 1.5	切换更慢，卡片更少

步骤3 — 计算：

单卡时长 = 基础节奏 × 密度乘数
卡片数量  = max(5, round(视频总时长秒数 / 单卡时长))

示例（注意——无上限限制；长视频自然会生成更多卡片）：

30秒短视频，单一笑点（低密度） → 7 × 1.5 = 10.5秒/卡 → round(30/10.5)=3 → 下限为5张卡片
60秒反思独白（低密度） → 10 × 1.5 = 15秒/卡 → 4 → 下限为5张卡片
121秒含丰富数据的单人讲话视频（高密度） → 10 × 0.7 = 7秒/卡 → 17张卡片
5分钟访谈，混合密度 → 16 × 1.0 = 16秒/卡 → 19张卡片
10分钟深度内容，高密度 → 16 × 0.7 = 11秒/卡 → 55张卡片
30分钟讲座，中等密度 → 28 × 1.0 = 28秒/卡 → 64张卡片
1小时播客，低密度 → 45 × 1.5 = 67.5秒/卡 → 53张卡片

当单卡时长超过约15秒时，需设计更丰富的卡片（数据块、多步骤展示、带 staggered 动画的多个子要点）——静态单行文本超过8秒会显得乏味。对于很多卡片时长超过30秒的长内容，考虑将时间轴拆分为子合成文件（每个章节一个.html文件，通过

data-composition-src

挂载），以便每个文件的GSAP时间轴保持可控——可参考

timeline_track_too_dense

HyperFrames lint警告。

content

可以是纯字符串（"标题：年化5.69%\n备注：..."）或任何能捕获数据的JSON结构。Agent可根据每张卡片决定结构。

可选结尾卡片。本技能无固定品牌结尾卡片。如果用户需要结尾卡片，自行设计一个中性结尾（标志+单行标语，约1.5-2秒，淡入→短暂停留→淡出），将其添加到

cards[]

中，并延长

composition.durationSeconds

至其

endSec

。否则以最后一张内容卡片结束。

7. Decide Render Strategy

7. 确定渲染策略

Confirm Visual Direction with User (DO THIS FIRST)

与用户确认视觉方向（务必先做此步骤）

Before you start designing cards or deciding bounds, ask the user to pick the output ratio, the layout, the style, and the card-density preset. Frames are auto-selected from the chosen layout × style combination (see "Auto-pick frame" table below). Before sending the question, precompute two things:

recommendedRatio
from the source video's aspect ratio (
```
metadata.json
```
width / height):
- ```
sourceAspect = width / height
```
- ```
sourceAspect ≥ 1.5
```
  (≥ ~3:2 wide) → recommend 16:9
- ```
sourceAspect ≤ 0.7
```
  (≤ ~9:13 tall) → recommend 9:16
- ```
0.7 < sourceAspect < 1.5
```
  (near-square) → recommend 4:5
Mark the recommended option's label with " (recommended · matches source video X:Y)" so the user sees why it's recommended.
autoCount
from Step 6 (
```
max(5, round(videoSec / (basePace × densityMultiplier)))
```
) so the "auto" option's label can show the concrete number.

Environment compatibility — pick the best available question channel. Not every runtime exposes the same structured-question tool. Apply this order:

AskUserQuestion
(Claude Code, Anthropic Console) — use the structured 4-question call below.
Other native clarification tool (e.g.
```
ask_question
```
,
```
request_user_input
```
, IDE-specific prompt) — use that tool with the same 4 question texts and option lists. Preserve the recommendation markers and the precomputed values.
No native tool (Codex CLI, plain text-only runtimes) — ask directly in normal conversation. Use the plain-text template at the end of this section. Keep it to one message, 4 numbered questions (the global cap is 2–5 questions per round; we stay inside it).

Rules that apply to every channel:

Ask at most 2–5 questions per round. Our 4 here fits.
Even if missing info doesn't block rendering, ask once to confirm the parameters that materially affect the final output (ratio, layout, style, cardCount).
If the user has already pre-approved defaults ("just use defaults", "no need to ask", "auto-pick everything") or asked you not to ask — skip the question entirely and use:
```
recommendedRatio
```
,
```
layout="stack"
```
(safest cross-ratio default),
```
style
```
chosen from transcript tone in the most neutral group (editorial/data),
```
autoCount
```
. Tell the user what you picked in one sentence and continue.

Channel A — native
AskUserQuestion
:

// Precompute before the call:
//   recommendedRatio = "16:9" | "9:16" | "4:5"
//   autoCount        = integer (from Step 6)

AskUserQuestion({
  questions: [
    {
      question: "Output video aspect ratio (canvas):",
      header: "Aspect ratio",
      multiSelect: false,
      // Reorder so the recommended option appears FIRST (per AskUserQuestion convention).
      // Append " (recommended · matches source video W×H)" to the recommended option's label.
      options: [
        { label: "16:9 (1920×1080) landscape", description: "TV / YouTube / desktop playback. Most natural when the source video is already landscape; widest canvas." },
        { label: "9:16 (1080×1920) portrait", description: "TikTok / Reels / short-form mobile. Most natural for portrait source; native mobile experience." },
        { label: "4:5 (1080×1350) near-portrait", description: "Instagram feed / WeChat Moments. Best when source is near-square or you want to cover both platforms." }
      ]
    },
    {
      question: "Choose the overall layout: how should the video and cards coexist on the canvas?",
      header: "Layout",
      multiSelect: false,
      options: [
        { label: "side-by-side (split)",  description: "Video and card each take half the canvas. Most stable for interview / data side-by-side; clear visual separation." },
        { label: "top-bottom (stack)",    description: "Video on top (~52%), card below. Classic combo of speaker face + summary card; works well in portrait too." },
        { label: "picture-in-picture (pip)", description: "Card fills the canvas, video shrinks to a rounded corner window. Use when content is primary and speaker is secondary." },
        { label: "full-screen overlay (overlay)", description: "Video plays full-bleed, card floats as a glass layer on top. Strong cinematic / emotional feel." }
      ]
    },
    {
      question: "Choose the card visual style (style):",
      header: "Style group",
      multiSelect: false,
      // NOTE: these 3 groups intentionally match the frame auto-pick matrix
      // rows below, so picking a group resolves both `style` group AND the
      // frame matrix column in one step. Memberships are mutually exclusive.
      options: [
        { label: "warm paper (warm-paper)", description: "academic notebook · editorial big-type · whiteboard hand-drawn · xhs social. Best for interview reflections, product launches, lifestyle, emotional stories." },
        { label: "clinical / cold (clinical)",   description: "audit magazine · swiss grid · terminal CLI · minimal modern. Best for financial analysis, investigative reports, technical tutorials, serious presentations." },
        { label: "experimental / avant-garde (experimental)", description: "geom color-clash geometry · spotlight dark-background. Best for short-form highlights, product launches, strong emotion, cinematic feel." }
      ]
    },
    {
      question: "Card count (takeaway pacing): how many cards to cut?",
      header: "Card count",
      multiSelect: false,
      options: [
        { label: "Auto (recommended) · approx N cards", description: "Inferred automatically from video duration and information density (see Step 6 rules). This run estimates approx N cards. Substitute the real N (your autoCount) into the label." },
        { label: "Fewer · approx round(N × 0.6) cards", description: "Sparser cuts, each card holds longer — suits reflective / slow-paced content." },
        { label: "More · approx round(N × 1.5) cards", description: "Tighter cuts, faster rhythm — suits staccato / data-dense / short-form highlight content." }
      ]
    }
  ]
})

About "Other" —

AskUserQuestion

automatically adds an "Other" option to the card count question. The user can type a number directly (e.g. "8", "20") as the cardCount target. Parse the input as an integer: if parsing succeeds → use that value (minimum 5 as a floor); if parsing fails → fall back to "auto".

Channel B — plain-text fallback (Codex CLI, runtimes without a native question tool). Post this as one normal message, then wait for the reply. Bullet-style 1/2/3/4 keeps the reply parseable:

I need to confirm four visual decisions with you before I start cutting cards:

1) Output aspect ratio (canvas):
   A. 16:9 landscape (1920×1080) — TV / YouTube / desktop playback
   B. 9:16 portrait (1080×1920) — TikTok / Reels / short-form mobile
   C. 4:5 near-portrait (1080×1350) — Instagram feed / works for both platforms
   ▸ My recommendation:  <recommendedRatio>  (matches source video W×H = <sourceW>×<sourceH>)

2) Overall layout (how video & card coexist):
   A. split   side-by-side (50/50)
   B. stack   top-bottom (video top, card bottom)
   C. pip     picture-in-picture (card full canvas, video rounded corner window)
   D. overlay full-screen glass overlay (video full-bleed, card glass layer)

3) Card style group (maps to frame auto-pick matrix, pick 1 of 3):
   A. warm paper (warm-paper)      (academic / editorial / whiteboard / xhs)
   B. clinical / cold (clinical)   (audit / swiss / terminal / minimal)
   C. experimental (experimental)  (geom / spotlight)

4) Card count (takeaway pacing):
   A. Auto (recommended) — approx <autoCount> cards
   B. Fewer — approx round(<autoCount> × 0.6) cards
   C. More — approx round(<autoCount> × 1.5) cards
   D. Give me a specific number (e.g. "8", "20")

Reply format: "1A 2C 3B 4A" or natural language is fine.
If you want all recommended defaults, reply "default" / "auto" / "use all recommendations".

Parsing the plain-text reply:

Accept loose formats:

"1A 2C 3B 4A"

"A C B A"

"16:9 / pip / data / auto"

, full sentences, or

default

If any answer is ambiguous → re-ask only the ambiguous ones (still inside the 2–5 cap).
If the user says "default / auto / use all recommendations" → skip without re-asking.

After the user answers (any channel):

Resolve the output canvas from the ratio answer — these are the exact
```
storyboard.composition.width / height
```
values to write:
user choice composition.width × height storyboard.layout field
16:9
1920 × 1080
"landscape"
9:16
1080 × 1920
"portrait"
4:5
1080 × 1350
"portrait"
(schema treats 4:5 as portrait — height > width)
For 4:5 bounds inside
references/layouts/*.html
— those files only document landscape (1920×1080) and portrait (1080×1920). For 4:5 (1080×1350) derive bounds by proportional scaling from portrait: keep horizontal values, scale vertical values by
```
1350/1920 ≈ 0.703
```
. Example:
```
overlay
```
portrait card =
```
{ x: 24, y: 1280, w: 1032, h: 564 }
```
→ 4:5 card =
```
{ x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }
```
=
```
{ x: 24, y: 900, w: 1032, h: 397 }
```
.
Map the style group to a specific style by looking at the transcript tone — pick the one that best fits, but stay inside the user's chosen group. If you're unsure between two specific styles inside the group, send a second
```
AskUserQuestion
```
with those 2–4 specific style options.

user choice	composition.width × height	storyboard.layout field
`16:9`	1920 × 1080	`"landscape"`
`9:16`	1080 × 1920	`"portrait"`
`4:5`	1080 × 1350	`"portrait"` (schema treats 4:5 as portrait — height > width)

Resolve final cardCount from the density answer:

user choice	final cardCount
Auto (recommended)	the `autoCount` you already computed
Fewer	`max(5, round(autoCount × 0.6))`
More	`round(autoCount × 1.5)` (no upper clamp)
Other = "<n>" (integer)	`max(5, parseInt(n))`
Other = anything else	fall back to `autoCount`

Auto-pick the video frame from this table (frames don't ask the user — they follow from layout × style):

layout	warm-paper styles (academic / whiteboard / editorial / xhs)	clinical styles (audit / swiss / terminal / minimal)	experimental styles (geom / spotlight)
`split`	`polaroid`	`hairline`	`clean`
`stack`	`polaroid`	`hairline`	`clean`
`pip`	`clean` (pip pill already has chrome)	`clean`	`clean`
`overlay`	`clean` (full-bleed forbids deco frames)	`clean`	`clean`

Tell the user what you chose in one sentence — ratio (+ canvas size), layout, specific style, frame, and final cardCount — then proceed with the rest of Step 7 (per-card layouts, motion patterns).
Record the five values (ratio / layout / style / frame / cardCount) in working memory (no schema field needed); you'll reference them while writing each card's HTML in Step 8 and while reading the matching
```
references/<dim>/<key>.html
```
for tokens and structure.

If the user picks an answer via "Other" with a free-text style name not in the 10-style library, treat it as a hint to design a fresh card visual yourself, but still anchor on the chosen layout's bounds.

在开始设计卡片或确定边界前，请用户选择输出比例、布局、风格和卡片密度预设。画面会根据所选布局×风格组合自动选择（见下文“自动选择画面”表格）。发送问题前，预先计算两件事：

recommendedRatio
（根据源视频的宽高比，即
```
metadata.json
```
中的width/height）：
- ```
sourceAspect = width / height
```
- ```
sourceAspect ≥ 1.5
```
  （≥ ~3:2宽屏）→ 推荐**
```
16:9
```
  **
- ```
sourceAspect ≤ 0.7
```
  （≤ ~9:13竖屏）→ 推荐**
```
9:16
```
  **
- ```
0.7 < sourceAspect < 1.5
```
  （接近正方形）→ 推荐**
```
4:5
```
  **
在推荐选项的标签后添加“（推荐 · 匹配源视频X:Y）”，让用户了解推荐原因。
autoCount
（来自步骤6的计算值，即
```
max(5, round(视频时长秒数 / (基础节奏 × 密度乘数)))
```
），以便“自动”选项的标签可以显示具体数值。

环境兼容性——选择最佳的提问方式。并非所有运行时都支持相同的结构化提问工具。按以下顺序选择：

AskUserQuestion
（Claude Code、Anthropic控制台）——使用以下结构化4问题调用。
其他原生澄清工具（如
```
ask_question
```
、
```
request_user_input
```
、IDE特定提示）——使用该工具并保留相同的4个问题文本和选项列表。保留推荐标记和预先计算的值。
无原生工具（Codex CLI、纯文本运行时）——直接在对话中提问。使用本节末尾的纯文本模板。保持为一条消息，4个编号问题（全局限制为每轮2–5个问题；此处符合要求）。

适用于所有渠道的规则：

每轮最多提问2–5个问题。此处的4个问题符合要求。
即使缺少信息不会阻碍渲染，也要询问一次以确认对最终输出有重大影响的参数（比例、布局、风格、卡片数量）。
如果用户已预先批准默认值（“使用默认值即可”“无需询问”“自动选择所有选项”）或要求不要提问——完全跳过提问，使用：
```
recommendedRatio
```
、
```
layout="stack"
```
（最安全的跨比例默认值）、根据字幕语气选择最中性组（编辑/数据）的
```
style
```
、
```
autoCount
```
。用一句话告知用户你的选择，然后继续。

渠道A — 原生
AskUserQuestion
：

// 调用前预先计算：
//   recommendedRatio = "16:9" | "9:16" | "4:5"
//   autoCount        = 整数（来自步骤6）

AskUserQuestion({
  questions: [
    {
      question: "输出视频宽高比（画布）：",
      header: "宽高比",
      multiSelect: false,
      // 重新排序，让推荐选项排在第一位（遵循AskUserQuestion惯例）。
      // 在推荐选项的标签后添加“（推荐 · 匹配源视频W×H）”。
      options: [
        { label: "16:9 (1920×1080) 横屏", description: "电视/YouTube/桌面播放。最适合已为横屏的源视频；画布最宽。" },
        { label: "9:16 (1080×1920) 竖屏", description: "TikTok/Reels/短视频移动端。最适合竖屏源视频；原生移动端体验。" },
        { label: "4:5 (1080×1350) 近竖屏", description: "Instagram动态/微信朋友圈。最适合接近正方形的源视频，或需要覆盖多平台的场景。" }
      ]
    },
    {
      question: "选择整体布局：视频和卡片如何在画布上共存？",
      header: "布局",
      multiSelect: false,
      options: [
        { label: "side-by-side (split) 分屏",  description: "视频和卡片各占画布一半。最适合访谈/数据并列展示；视觉分隔清晰。" },
        { label: "top-bottom (stack) 上下堆叠",    description: "视频在上（约52%），卡片在下。经典的主讲人画面+摘要卡片组合；也适用于竖屏。" },
        { label: "picture-in-picture (pip) 画中画", description: "卡片填满画布，视频缩小为圆角窗口。适合内容为主、主讲人为辅的场景。" },
        { label: "full-screen overlay (overlay) 全屏叠加", description: "视频全屏播放，卡片作为玻璃层悬浮在上方。具有强烈的电影感/情感氛围。" }
      ]
    },
    {
      question: "选择卡片视觉风格（style）：",
      header: "风格组",
      multiSelect: false,
      // 注意：这3组与下方的画面自动选择矩阵行完全匹配
      // 因此选择一组即可同时确定`style`组和画面矩阵列。各组互斥。
      options: [
        { label: "warm paper (warm-paper) 暖纸风", description: "学术笔记本·大字体编辑风格·手绘白板·小红书社交风。最适合访谈反思、产品发布、生活方式、情感故事。" },
        { label: "clinical / cold (clinical) 冷峻风",   description: "审计杂志·瑞士网格·终端CLI·极简现代风。最适合财务分析、调查报道、技术教程、严肃演示。" },
        { label: "experimental / avant-garde (experimental) 实验风", description: "几何撞色·暗背景聚光灯。最适合短视频高光、产品发布、强烈情感、电影感内容。" }
      ]
    },
    {
      question: "卡片数量（要点节奏）：需要制作多少张卡片？",
      header: "卡片数量",
      multiSelect: false,
      options: [
        { label: "Auto (推荐) · 约N张卡片", description: "根据视频时长和信息密度自动推断（见步骤6规则）。本次运行估计约N张卡片。将实际N值（你的autoCount）替换到标签中。" },
        { label: "Fewer · 约round(N × 0.6)张卡片", description: "切换更稀疏，每张卡片停留更长时间——适合反思/慢节奏内容。" },
        { label: "More · 约round(N × 1.5)张卡片", description: "切换更紧凑，节奏更快——适合急促/高密度数据/短视频高光内容。" }
      ]
    }
  ]
})

关于“其他”选项——

AskUserQuestion

会自动在卡片数量问题中添加“其他”选项。用户可以直接输入数字（如“8”“20”）作为卡片数量目标。将输入解析为整数：如果解析成功→使用该值（下限为5）；如果解析失败→回退到“自动”选项。

渠道B — 纯文本回退（Codex CLI、无原生提问工具的运行时）。将以下内容作为一条普通消息发送，然后等待回复。使用项目符号1/2/3/4格式，便于解析回复：

开始制作卡片前，我需要与你确认四个视觉决策：

1) 输出宽高比（画布）：
   A. 16:9 横屏 (1920×1080) — 电视/YouTube/桌面播放
   B. 9:16 竖屏 (1080×1920) — TikTok/Reels/短视频移动端
   C. 4:5 近竖屏 (1080×1350) — Instagram动态/适配多平台
   ▸ 我的推荐：<recommendedRatio>（匹配源视频W×H = <sourceW>×<sourceH>）

2) 整体布局（视频与卡片如何共存）：
   A. split 分屏（50/50）
   B. stack 上下堆叠（视频在上，卡片在下）
   C. pip 画中画（卡片填满画布，视频为圆角窗口）
   D. overlay 全屏玻璃叠加（视频全屏，卡片为玻璃层）

3) 卡片风格组（对应画面自动选择矩阵，3选1）：
   A. warm paper (warm-paper) 暖纸风（学术/编辑/白板/小红书）
   B. clinical / cold (clinical) 冷峻风（审计/瑞士风格/终端/极简）
   C. experimental (experimental) 实验风（几何/聚光灯）

4) 卡片数量（要点节奏）：
   A. Auto (推荐) — 约<autoCount>张卡片
   B. Fewer — 约round(<autoCount> × 0.6)张卡片
   C. More — 约round(<autoCount> × 1.5)张卡片
   D. 指定具体数量（如“8”“20”）

回复格式：“1A 2C 3B 4A”或自然语言均可。
如果要使用所有推荐默认值，回复“default”/“auto”/“使用所有推荐选项”。

解析纯文本回复：

接受宽松格式：

"1A 2C 3B 4A"

、

"A C B A"

、

"16:9 / pip / 数据风 / auto"

、完整句子或

default

。

如果任何答案不明确→仅重新询问不明确的问题（仍保持在2–5个问题的限制内）。
如果用户回复“default / auto / 使用所有推荐选项”→跳过询问。

用户回复后（任意渠道）：

根据宽高比选择解析输出画布——以下是要写入的
```
storyboard.composition.width / height
```
精确值：
用户选择合成文件宽×高 storyboard.layout字段
16:9
1920 × 1080
"landscape"
（横屏）
9:16
1080 × 1920
"portrait"
（竖屏）
4:5
1080 × 1350
"portrait"
（竖屏，因为高度>宽度）
对于
```
references/layouts/*.html
```
中的4:5边界——这些文件仅记录横屏（1920×1080）和竖屏（1080×1920）的边界。对于4:5（1080×1350），需通过竖屏比例缩放推导边界：保持水平值不变，垂直值按
```
1350/1920 ≈ 0.703
```
缩放。示例：竖屏
```
overlay
```
卡片 =
```
{ x: 24, y: 1280, w: 1032, h: 564 }
```
→ 4:5卡片 =
```
{ x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }
```
=
```
{ x: 24, y: 900, w: 1032, h: 397 }
```
。
根据字幕语气将风格组映射到具体风格——选择最匹配的风格，但需保持在用户选择的组内。如果对组内的两种具体风格不确定，发送第二次
```
AskUserQuestion
```
，列出这2–4种具体风格选项。

用户选择	合成文件宽×高	storyboard.layout字段
`16:9`	1920 × 1080	`"landscape"` （横屏）
`9:16`	1080 × 1920	`"portrait"` （竖屏）
`4:5`	1080 × 1350	`"portrait"` （竖屏，因为高度>宽度）

根据密度选择解析最终卡片数量：

用户选择	最终卡片数量
Auto (推荐)	你已计算的 `autoCount` 值
Fewer	`max(5, round(autoCount × 0.6))`
More	`round(autoCount × 1.5)` （无上限）
Other = "<n>"（整数）	`max(5, parseInt(n))`
Other = 其他内容	回退到 `autoCount`

根据以下表格自动选择视频画面（无需询问用户——由布局×风格决定）：

布局	warm-paper风格（学术/白板/编辑/小红书）	clinical风格（审计/瑞士风格/终端/极简）	experimental风格（几何/聚光灯）
`split`	`polaroid`	`hairline`	`clean`
`stack`	`polaroid`	`hairline`	`clean`
`pip`	`clean` （画中画已自带边框）	`clean`	`clean`
`overlay`	`clean` （全屏视频不适合装饰性边框）	`clean`	`clean`

用一句话告知用户你的选择——宽高比（+画布尺寸）、布局、具体风格、画面、最终卡片数量——然后继续步骤7的剩余部分（单卡布局、动画模式）。
将五个值（宽高比/布局/风格/画面/卡片数量）记录到工作内存中（无需写入schema字段）；在步骤8编写每张卡片的HTML和步骤9读取匹配的
```
references/<dim>/<key>.html
```
获取标记和结构时，会引用这些值。

如果用户通过“其他”选项选择了10种风格库之外的自由文本风格名称，将其视为设计全新卡片视觉的提示，但仍需锚定所选布局的边界。

Render Strategy Inputs

渲染策略输入

With ratio / layout / style / cardCount / frame locked from Step 7.0, the remaining per-card decisions are:

Source-video fit inside the GSAP target: video element has
```
object-fit: cover
```
and is clipped to
```
#video-wrap
```
's tween bounds. If you want NO cropping (e.g. portrait source on landscape canvas shouldn't get its top/bottom chopped), aim the tween at a rect that matches the source's aspect ratio and let surrounding canvas show through (or fill with the card / a backdrop).
card.zone
per card: derive from your chosen composition layout (split → side-panel, stack → lower-third, pip → fullscreen, overlay → video-overlay), OR pick a different zone for one-off variants (fullscreen for hero / quote, whiteboard-area for dense data).
accentIndex
per card: each card pulls one of the 5 theme accent colors. Vary across cards for rhythm; reuse the same index when two cards belong to the same narrative beat.
Motion vocabulary: pick 2–3 repeatable patterns from
```
data-anim
```
kinds (see the table later) and stick to them so the composition feels coherent.

Pick from these

themeId

palettes (use them as

--accent-N

--bg

--text

CSS variables in your composition

<style>

block):

themeId	accent palette (5 colors)	board bg	text
classic	`#1971c2 #e03131 #2f9e44 #e8590c #9c36b5`	`#FFF9E3` (paper)	`#1e1e1e`
noir	`#4cc9f0 #f72585 #4ade80 #fb923c #a78bfa`	`#1a1a1a`	`#f1f1f1`
mint	`#0077b6 #d62828 #2d6a4f #e76f51 #7209b7`	`#e8faf0`	`#1b4332`
craft	`#bf5700 #d62728 #6c757d #e9b54a #3d5a80`	`#f6efe1`	`#2d2d2d`
slate	`#0ea5e9 #ef4444 #22c55e #f97316 #a855f7`	`#1e293b`	`#f1f5f9`
mono	`#000 #555 #888 #aaa #ccc`	`#fff`	`#000`

Available fonts (woff2 in

<SKILL_DIR>/assets/fonts/

, staged to work dir in Step 9):

Caveat

(handwriting),

LXGW WenKai TC

(Chinese hand-script),

Inter

(modern sans),

Virgil

(geometric hand). Reference via

@font-face

font-family

directly.

For inspiration on visual patterns,

<SKILL_DIR>/references/styles/

ships 10 self-contained reference cards (academic / editorial / minimal / spotlight / geom / whiteboard / audit / terminal / swiss / xhs) that you can copy as starting points — but do not feel constrained to match any of these. Each card is your own design.

在步骤7.0确定宽高比/布局/风格/卡片数量/画面后，剩余的单卡决策包括：

GSAP目标中的源视频适配：视频元素设置
```
object-fit: cover
```
并被裁剪到
```
#video-wrap
```
的动画边界。如果不想裁剪（例如竖屏源视频在横屏画布上不希望上下被裁切），将动画目标设置为与源视频宽高比匹配的矩形，让画布周围区域显示（或填充卡片/背景）。
每张卡片的
card.zone
：从所选合成文件布局推导（split→side-panel，stack→lower-third，pip→fullscreen，overlay→video-overlay），或为一次性变体选择不同区域（fullscreen用于重点/引用语，whiteboard-area用于密集数据）。
每张卡片的
accentIndex
：每张卡片使用5种主题强调色中的一种。在卡片间切换以形成节奏；当两张卡片属于同一叙事节拍时，可重复使用相同索引。
动画词汇：从
```
data-anim
```
类型中选择2–3种可重复的模式（见下文表格）并坚持使用，让合成文件感觉连贯。

从以下

themeId

调色板中选择（在合成文件

<style>

块中作为

--accent-N

--bg

--text

CSS变量使用）：

themeId	强调色板（5种颜色）	背景色	文本色
classic	`#1971c2 #e03131 #2f9e44 #e8590c #9c36b5`	`#FFF9E3` （纸张色）	`#1e1e1e`
noir	`#4cc9f0 #f72585 #4ade80 #fb923c #a78bfa`	`#1a1a1a`	`#f1f1f1`
mint	`#0077b6 #d62828 #2d6a4f #e76f51 #7209b7`	`#e8faf0`	`#1b4332`
craft	`#bf5700 #d62728 #6c757d #e9b54a #3d5a80`	`#f6efe1`	`#2d2d2d`
slate	`#0ea5e9 #ef4444 #22c55e #f97316 #a855f7`	`#1e293b`	`#f1f5f9`
mono	`#000 #555 #888 #aaa #ccc`	`#fff`	`#000`

可用字体（存放在

<SKILL_DIR>/assets/fonts/

的woff2文件，步骤9中部署到工作目录）：

Caveat

（手写体）、

LXGW WenKai TC

（中文手写体）、

Inter

（现代无衬线体）、

Virgil

（几何手写体）。通过

@font-face

或直接使用

font-family

引用。

如需视觉模式灵感，

<SKILL_DIR>/references/styles/

包含10个独立的参考卡片（学术/编辑/极简/聚光灯/几何/白板/审计/终端/瑞士风格/小红书），可作为起点复制——但不必局限于这些样式。每张卡片都可以是你自己的设计。

Visual Design Library (<SKILL_DIR>/references/)

视觉设计库（<SKILL_DIR>/references/）

Beyond the composition-level

themeId

, the skill ships a richer reference library at

<SKILL_DIR>/references/

covering three orthogonal visual dimensions you can freely mix:

Style  ×  Layout  ×  VideoFrame
 (10)      (4)         (3)

dimension	keys	what it decides
style	`academic` `editorial` `minimal` `spotlight` `geom` `whiteboard` `audit` `terminal` `swiss` `xhs`	the card's visual language — fonts, colors, ornament, layout-within-card
layout	`split` `stack` `pip` `overlay`	how the source video and the card share the canvas
frame	`clean` `hairline` `polaroid`	the decorative chrome around the video element

Read

<SKILL_DIR>/references/DESIGN_INDEX.md

for the full matrix and a loose decision guide (interview / product launch / data analysis / social clip / technical tutorial / emotional story …). When you decide to use a specific style / layout / frame, Read the corresponding file:

```
references/styles/<key>.html
```
— self-contained card fragment with that style's CSS tokens (colors, fonts, padding, ornament) and a placeholder takeaway. Copy the
```
.card[data-card-id="ref-<key>"]
```
style block, rename the data-card-id to your card's id, swap the placeholder content for the real takeaway, and you're done.
```
references/layouts/<key>.html
```
— exact
```
videoBounds
```
+
```
cardBounds
```
for both landscape and portrait, with a copy-paste JSON snippet for
```
storyboard.json
```
's per-card
```
layout
```
field.
```
references/frames/<key>.html
```
— decorative HTML to add as a sibling of
```
#video-wrap
```
, plus placement instructions for the composition CSS.

Pick

style × layout × frame

per card — you can change all three between cards as long as the transitions read smoothly. A common rhythm: open

editorial × overlay × clean

, switch to

audit × split × hairline

for the data card, close on

whiteboard × pip × polaroid

The 10 styles are skill-side design tokens, not composition-level themes — they don't need to be declared in

storyboard.composition

; they live inside each card's HTML. The

themeId

field can still pick a composition-level palette (table above) that controls page-body background and video border chrome.

除了合成文件级别的

themeId

，本技能还在

<SKILL_DIR>/references/

提供了更丰富的参考库，涵盖三个正交的视觉维度，可自由组合：

风格  ×  布局  ×  视频画面
 (10)      (4)         (3)

维度	取值	决定内容
风格	`academic` `editorial` `minimal` `spotlight` `geom` `whiteboard` `audit` `terminal` `swiss` `xhs`	卡片的视觉语言——字体、颜色、装饰、卡片内布局
布局	`split` `stack` `pip` `overlay`	源视频和卡片如何共享画布
画面	`clean` `hairline` `polaroid`	视频元素周围的装饰性边框

阅读

<SKILL_DIR>/references/DESIGN_INDEX.md

获取完整矩阵和宽松决策指南（访谈/产品发布/数据分析/社交视频/技术教程/情感故事…）。当决定使用特定风格/布局/画面时，阅读对应的文件：

```
references/styles/<key>.html
```
——包含该风格CSS标记（颜色、字体、内边距、装饰）和占位要点的独立卡片片段。复制
```
.card[data-card-id="ref-<key>"]
```
样式块，将data-card-id重命名为你的卡片ID，将占位内容替换为实际要点，即可完成。
```
references/layouts/<key>.html
```
——横屏和竖屏的精确
```
videoBounds
```
+
```
cardBounds
```
，以及可复制粘贴到
```
storyboard.json
```
单卡
```
layout
```
字段的JSON片段。
```
references/frames/<key>.html
```
——添加为
```
#video-wrap
```
同级元素的装饰性HTML，以及合成文件CSS中的放置说明。

为每张卡片选择

风格 × 布局 × 画面

——只要过渡效果流畅，卡片间可以更改这三个维度。常见节奏：以

editorial × overlay × clean

开场，切换到

audit × split × hairline

展示数据卡片，以

whiteboard × pip × polaroid

结束。

这10种风格是技能侧的设计标记，不是合成文件级别的主题——无需在

storyboard.composition

中声明；它们存在于每张卡片的HTML中。

themeId

字段仍可选择合成文件级别的调色板（见上文表格），控制页面主体背景和视频边框。

Layout Compositions (Card + Video)

布局合成（卡片+视频）

Two coordinated decisions per card define how it shares the canvas with the source video:

card.zone
(declared in
```
storyboard.json
```
) — one of the 5 schema values; resolve it into pixel bounds (per the table in Step 6) when you write the card-host wrapper's inline
```
style
```
in Step 9.
#video-wrap
bounds at this card's time window (declared imperatively in the composition's GSAP timeline) — the agent tweens
```
#video-wrap
```
to a target rect for each layout transition.

Schema does NOT store per-card video bounds.

videoTrack.bounds

is one-time at composition level (defaults to full canvas). Video "moving" between cards is purely a GSAP animation authored in

index.html

. There is no

card.layout

field — earlier versions of this doc invented one; the real schema only has

card.zone

4 composition layouts (from

references/layouts/

) — each is a recipe pairing a

zone

with a

#video-wrap

tween target:

composition layout	recommended `card.zone`	GSAP target for `#video-wrap` (landscape 1920×1080)	GSAP target for `#video-wrap` (portrait 1080×1920)	when to use
`split`	`side-panel`	`{ left: 960, top: 0, width: 960, height: 1080 }`	`{ left: 0, top: 960, width: 1080, height: 960 }` (bottom half)	speaker + data side-by-side / 50:50 weight
`stack`	`lower-third`	`{ left: 14, top: 14, width: 1892, height: 548 }` (top 52%)	`{ left: 0, top: 0, width: 1080, height: 844 }` (top 44%)	speaker on top + summary card below
`pip`	`fullscreen`	`{ left: 1480, top: 760, width: 400, height: 300 }` + add `.framed` class	`{ left: 690, top: 28, width: 360, height: 203 }` + add `.framed`	content-heavy card + corner pip
`overlay`	`video-overlay`	`{ left: 0, top: 0, width: 1920, height: 1080 }` (full-bleed)	`{ left: 0, top: 0, width: 1080, height: 1920 }`	cinematic / dramatic / glass card on full video

For 4:5 (1080×1350), scale portrait y/h values by

1350/1920 ≈ 0.703

(see Step 7.0 Channel A / Channel B

recommendedRatio

resolution table).

Other zone values for one-off variants (still uses

card.zone

; no fake "layout" field):

`zone`	resolved bounds	common use
`fullscreen`	covers whole canvas	hero card, video tweens to hidden/pip
`whiteboard-area`	inset 40px margin (landscape) or bottom 45% (portrait)	dense data card, free margins
`lower-third`	bottom 30% band	talking-head annotation
`side-panel`	right 42% (landscape) or bottom 40% (portrait)	sidebar / "split" recipe
`video-overlay`	full canvas; expect transparent card root	glass overlay on full-bleed video

You can mix recipes per card — choose

card.zone

based on what suits the moment, then write the GSAP tween for

#video-wrap

between cards.

每张卡片的两个协同决策定义了它与源视频共享画布的方式：

card.zone
（在
```
storyboard.json
```
中声明）——5种schema取值之一；在步骤9编写卡片容器的内联
```
style
```
时，需将其解析为像素边界（见步骤6的表格）。
该卡片时间窗口内的
#video-wrap
边界（在合成文件的GSAP时间轴中声明）——Agent为每个布局过渡动画
```
#video-wrap
```
到目标矩形。

Schema不存储单卡视频边界。

videoTrack.bounds

是合成文件级别的一次性设置（默认填满整个画布）。视频在卡片间“移动”纯粹是通过在

index.html

中编写GSAP动画实现的。没有

card.layout

字段——早期文档版本曾提出该字段；实际schema只有

card.zone

。

4种合成文件布局（来自

references/layouts/

）——每种布局都是

zone

与

#video-wrap

动画目标的组合：

合成文件布局	推荐 `card.zone`	`#video-wrap` 的GSAP目标（横屏1920×1080）	`#video-wrap` 的GSAP目标（竖屏1080×1920）	使用场景
`split`	`side-panel`	`{ left: 960, top: 0, width: 960, height: 1080 }`	`{ left: 0, top: 960, width: 1080, height: 960 }` （下半部分）	主讲人+数据并列展示/权重50:50
`stack`	`lower-third`	`{ left: 14, top: 14, width: 1892, height: 548 }` （上半部分52%）	`{ left: 0, top: 0, width: 1080, height: 844 }` （上半部分44%）	主讲人在上+摘要卡片在下
`pip`	`fullscreen`	`{ left: 1480, top: 760, width: 400, height: 300 }` + 添加 `.framed` 类	`{ left: 690, top: 28, width: 360, height: 203 }` + 添加 `.framed` 类	内容密集的卡片+角落画中画
`overlay`	`video-overlay`	`{ left: 0, top: 0, width: 1920, height: 1080 }` （全屏）	`{ left: 0, top: 0, width: 1080, height: 1920 }`	电影感/戏剧性/玻璃卡片覆盖全屏视频

对于4:5（1080×1350），将竖屏的y/h值按

1350/1920 ≈ 0.703

缩放（见步骤7.0渠道A/渠道B的

recommendedRatio

解析表格）。

一次性变体的其他zone取值（仍使用

card.zone

；无虚构的“layout”字段）：

`zone`	解析后的边界	常见使用场景
`fullscreen`	覆盖整个画布	重点卡片，视频动画为隐藏/画中画
`whiteboard-area`	内边距40px（横屏）或底部45%（竖屏）	密集数据卡片，自由边距
`lower-third`	底部30%区域	单人讲话视频的注释
`side-panel`	右侧42%（横屏）或底部40%（竖屏）	侧边栏/“分屏”布局
`video-overlay`	整个画布；卡片根元素需透明	全屏视频上的玻璃叠加层

可以为每张卡片混合使用不同布局——根据当前场景选择

card.zone

，然后编写卡片间

#video-wrap

的GSAP动画。

Storyboard Render Contract

故事板渲染约定

storyboard.json

is an agent-internal planning artifact — no CLI command parses it. It exists to keep your timing and content decisions explicit before you write each card's HTML. Stick to the v3-style shape below so the same outline drives the composition you assemble in Step 9.

Required structure (see Step 6 for the full example):

```
schemaVersion: 3
```

composition: { fps, width, height, durationSeconds, layout, themeId, seed }

— note

durationSeconds

fps

themeId

layout

live inside

composition

, NOT at top level

videoTrack: { sourcePath, startSec, endSec, bounds? }

— video bounds default to full canvas

```
subtitles: { enabled, ... }
```

cards[]

— each card has the 6 required fields:

id

intent

startSec

endSec

accentIndex

zone

contentHints

Rules:

Card times stay inside
```
composition.durationSeconds
```
and should not overlap unless intentional (use
```
data-track-index
```
to control z-order when they do).
Visual details live in card HTML fragments (Step 8), NOT in
```
contentHints
```
.
```
contentHints
```
is your own structured prompt for designing the card; the rendered look is the HTML.
Keep the storyboard shape stable — even though nothing parses it, you read it back while authoring Step 8/9, and consistency keeps card IDs and timing in sync.
Agent-side decisions like "I picked overlay × geom × clean" do NOT belong in
```
storyboard.json
```
— keep them in working memory and use them when authoring card HTML + GSAP tweens.

Transparent card backgrounds for cards that share canvas with video. When the GSAP tween leaves video visible behind/beside the card (overlay recipe, pip recipe, or any

card.zone = 'lower-third' | 'video-overlay'

moment), the card's

.root

MUST NOT paint a full opaque background — otherwise it occludes the video. Two patterns:

css

/* Pattern A: transparent root, page body provides the cream backdrop */
html,
body {
  background: var(--bg);
}
.card[data-card-id="card-X"] .root {
  background: transparent;
}

/* Pattern B: explicit per-card background ONLY for fullscreen cards */
.card[data-card-id="card-hero"] .root {
  background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
  background: transparent;
}

For

side-panel

-zone cards (split recipe), the card-host is already only half the canvas, so an opaque card bg is fine — it only covers its half.

storyboard.json

是Agent内部的规划文件——没有CLI命令会解析它。它的作用是让你在编写每张卡片的HTML前，明确时间和内容决策。保持以下v3版本的结构，以便相同的大纲可以驱动步骤9中组装的合成文件。

必需结构（见步骤6的完整示例）：

```
schemaVersion: 3
```

composition: { fps, width, height, durationSeconds, layout, themeId, seed }

——注意

durationSeconds

fps

themeId

layout

位于**

composition

内部**，而非顶层

videoTrack: { sourcePath, startSec, endSec, bounds? }

——视频边界默认填满整个画布

```
subtitles: { enabled, ... }
```

cards[]

——每张卡片包含6个必需字段：

id

intent

startSec

endSec

accentIndex

zone

contentHints

规则：

卡片时间需在
```
composition.durationSeconds
```
范围内，除非有意重叠（重叠时使用
```
data-track-index
```
控制层级）。
视觉细节存放在卡片HTML片段中（步骤8），而非
```
contentHints
```
。
```
contentHints
```
是你自己设计卡片的结构化提示；最终呈现效果由HTML决定。
保持故事板结构稳定——即使没有工具解析它，你在编写步骤8/9时也会回头查看，一致性有助于保持卡片ID和时间同步。
Agent侧的决策（如“我选择了overlay×geom×clean”）不属于
```
storyboard.json
```
的内容——将其记录在工作内存中，并在编写卡片HTML+GSAP动画时使用。

与视频共享画布的卡片需设置透明背景。当GSAP动画让视频在卡片后方/旁边可见时（overlay布局、pip布局，或任何

card.zone = 'lower-third' | 'video-overlay'

的场景），卡片的

.root

不得设置完全不透明的背景——否则会遮挡视频。两种模式：

css

/* 模式A：透明根元素，页面主体提供米色背景 */
html,
body {
  background: var(--bg);
}
.card[data-card-id="card-X"] .root {
  background: transparent;
}

/* 模式B：仅全屏卡片设置明确的单卡背景 */
.card[data-card-id="card-hero"] .root {
  background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
  background: transparent;
}

对于

side-panel

区域的卡片（分屏布局），卡片容器仅占画布的一半，因此不透明卡片背景是可行的——仅覆盖其所在的一半区域。

8. Write Each Card's HTML

8. 编写每张卡片的HTML

Create

$WORK_DIR/public/cards/{card-id}.html

for each card. Each file contains a single rooted HTML fragment that follows this contract:

为每张卡片创建

$WORK_DIR/public/cards/{card-id}.html

。每个文件包含一个符合以下约定的根HTML片段：

Card HTML Contract

卡片HTML约定

html

<div class="card" data-card-id="{cardId}">
  <style>
    /* MUST: every rule starts with .card[data-card-id="{cardId}"] */
    .card[data-card-id="card-01"] .root {
      width: 100%; height: 100%;
      display: flex; ...;
      font-family: 'Caveat', 'LXGW WenKai TC', serif;
      color: var(--text);
      background: var(--bg);
    }
    .card[data-card-id="card-01"] .title { font-size: 84px; ... }
  </style>

  <div class="root">
    <h1
      id="card-01-title"
      data-anim="kinetic-chars"
      data-anim-at="0.3"
      data-anim-duration="0.5"
      data-anim-stagger="0.04"
      data-anim-pattern="pop"
    >
      <span class="char">S</span>
      <span class="char">u</span>
    </h1>
    <div
      id="card-01-line"
      data-anim="grow-x"
      data-anim-at="0.65"
      data-anim-duration="0.5"
      data-anim-target-w="420"
      style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
    ></div>
  </div>
</div>

Hard rules (

hyperframes

lint will reject violations):

Single root

<div class="card" data-card-id="{cardId}">

Inline
```
<style>
```
rules MUST be prefixed with the scope selector above
No
<script>
tags
No external URLs in
```
src=
```
/
```
href=
```
(no CDN, no remote fonts)
No inline event handlers (
```
onclick=
```
etc.)
All assets via relative paths into the same
```
public/
```
directory
Colors via
```
var(--accent-N)
```
etc. for portability across themes

Animations are declared, not coded. Use

data-anim-*

attributes only; never write

<script>

to animate. You compile every

data-anim-*

declaration into the single master GSAP timeline in Step 9.

html

<div class="card" data-card-id="{cardId}">
  <style>
    /* 必须：每个规则都以.card[data-card-id="{cardId}"]开头 */
    .card[data-card-id="card-01"] .root {
      width: 100%; height: 100%;
      display: flex; ...;
      font-family: 'Caveat', 'LXGW WenKai TC', serif;
      color: var(--text);
      background: var(--bg);
    }
    .card[data-card-id="card-01"] .title { font-size: 84px; ... }
  </style>

  <div class="root">
    <h1
      id="card-01-title"
      data-anim="kinetic-chars"
      data-anim-at="0.3"
      data-anim-duration="0.5"
      data-anim-stagger="0.04"
      data-anim-pattern="pop"
    >
      <span class="char">S</span>
      <span class="char">u</span>
    </h1>
    <div
      id="card-01-line"
      data-anim="grow-x"
      data-anim-at="0.65"
      data-anim-duration="0.5"
      data-anim-target-w="420"
      style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
    ></div>
  </div>
</div>

硬性规则（

hyperframes

lint会拒绝违规内容）：

单个根元素

<div class="card" data-card-id="{cardId}">

内联
```
<style>
```
规则必须以上述范围选择器开头
禁止
<script>
标签
禁止在
src=
/
href=
中使用外部URL（无CDN，无远程字体）
禁止内联事件处理程序（如
```
onclick=
```
）
所有资源使用相对于同一
```
public/
```
目录的路径
使用
```
var(--accent-N)
```
等变量设置颜色，以便跨主题移植

动画通过声明实现，而非编码。仅使用

data-anim-*

属性；切勿编写

<script>

来实现动画。你会在步骤9中将每个

data-anim-*

声明编译到单个主GSAP时间轴中。

Card Sizing — Mobile-First in Portrait

卡片尺寸——竖屏优先

The 10

references/styles/*.html

are sized for a 1920×1080 landscape preview. When

storyboard.layout = "portrait"

(1080×1920, the dominant case for social / mobile), scale every visual size up — phones hold the screen close, and the same pixel count reads smaller than on a landscape TV-style canvas.

token	landscape baseline	portrait target	scale
title (h1/h2 hero)	64–96px	88–132px	×1.35
detail / body	24–30px	30–40px	×1.30
kicker / chip label	14–16px	18–22px	×1.30
timecode / meta	12–14px	16–18px	×1.30
data block primary number	48–60px	64–88px	×1.40
line-height multiplier	1.05–1.5	same	(don't scale)

Rule of thumb:

portraitPx = round(landscapePx × 1.3)

, then floor to a nearby 4px multiple for visual rhythm. Hero headlines may go up to ×1.4; small meta text stays at ×1.2 to avoid crowding.

Padding shrinks slightly in portrait — the card is narrower so big landscape padding (40–64px) eats too much width. Use 24–36px horizontal padding in portrait.

If you're producing a single card that must work in both layouts, prefer a

@container

query on the card root over hard-coding sizes:

css

.card[data-card-id="X"] .root {
  container-type: inline-size;
}
.card[data-card-id="X"] .title {
  font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
  font-size: clamp(24px, 3.2cqi, 40px);
}

But for most cards, a single layout choice is fine — just pick the size table column that matches the storyboard's

layout

field.

10个

references/styles/*.html

是为1920×1080横屏预览设计的。当

storyboard.layout = "portrait"

（1080×1920，社交/移动端的主要场景）时，放大所有视觉尺寸——手机屏幕观看距离近，相同像素数在竖屏上比横屏电视画布上显得更小。

标记	横屏基准值	竖屏目标值	缩放比例
标题（h1/h2重点内容）	64–96px	88–132px	×1.35
详情/正文	24–30px	30–40px	×1.30
副标题/标签	14–16px	18–22px	×1.30
时间码/元数据	12–14px	16–18px	×1.30
数据块主数字	48–60px	64–88px	×1.40
行高乘数	1.05–1.5	相同	（不缩放）

经验法则：

竖屏像素值 = round(横屏像素值 × 1.3)

，然后向下取整为接近的4px倍数，以保持视觉节奏。重点标题可放大至×1.4；小元文本保持×1.2，避免拥挤。

竖屏中的内边距略有缩小——卡片更窄，横屏中的大内边距（40–64px）会占用过多宽度。竖屏中使用24–36px的水平内边距。

如果要制作必须同时适配两种布局的单张卡片，优先在卡片根元素上使用

@container

查询，而非硬编码尺寸：

css

.card[data-card-id="X"] .root {
  container-type: inline-size;
}
.card[data-card-id="X"] .title {
  font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
  font-size: clamp(24px, 3.2cqi, 40px);
}

但对于大多数卡片，选择单一布局即可——只需选择与故事板

layout

字段匹配的尺寸表格列。

Available

data-anim

Kinds

可用的

data-anim

类型

kind	use for	key params
`fade-in`	enter	`at` , `duration` , `ease?`
`fade-out`	exit	`at` , `duration` , `ease?`
`slide-in`	slide enter	`at` , `duration` , `from=left\|right\|top\|bottom` , `distance`
`kinetic-chars`	per-char pop	`at` , `duration` , `stagger` , `pattern=pop\|fade` — element needs `<span class="char">` children
`typewriter`	per-char fade	same as kinetic-chars but slower default stagger
`count-up`	animate number	`at` , `duration` , `from` , `to` , `format=.0f\|.1f\|.2f\|,d`
`draw-path`	SVG path reveal	`at` , `duration` — element should be a `<path>`
`grow-y`	bar height	`at` , `duration` , `target-h` (px) — element starts `height:0`
`grow-x`	bar width	`at` , `duration` , `target-w` (px) — element starts `width:0`
`scale-pop`	pop entrance	`at` , `duration`
`blur-in`	unfocused → focused	`at` , `duration`
`mask-reveal`	clip reveal	`at` , `duration` , `direction=left\|right\|top\|bottom`
`morph-to`	tween any CSS	`at` , `duration` , `props='{...JSON...}'`

data-anim-at

is seconds relative to the card's startSec — when you compile each declaration into the GSAP timeline in Step 9, add the card's

startSec

to get the absolute time and quantize to 1/fps.

类型	使用场景	关键参数
`fade-in`	入场	`at` , `duration` , `ease?`
`fade-out`	退场	`at` , `duration` , `ease?`
`slide-in`	滑动入场	`at` , `duration` , `from=left\|right\|top\|bottom` , `distance`
`kinetic-chars`	逐字符弹出	`at` , `duration` , `stagger` , `pattern=pop\|fade` — 元素需包含 `<span class="char">` 子元素
`typewriter`	逐字符淡入	与kinetic-chars参数相同，但默认stagger更慢
`count-up`	数字动画	`at` , `duration` , `from` , `to` , `format=.0f\|.1f\|.2f\|,d`
`draw-path`	SVG路径绘制	`at` , `duration` — 元素应为 `<path>`
`grow-y`	柱状图高度	`at` , `duration` , `target-h` （px） — 元素初始 `height:0`
`grow-x`	柱状图宽度	`at` , `duration` , `target-w` （px） — 元素初始 `width:0`
`scale-pop`	缩放入场	`at` , `duration`
`blur-in`	失焦→聚焦	`at` , `duration`
`mask-reveal`	遮罩显示	`at` , `duration` , `direction=left\|right\|top\|bottom`
`morph-to`	任意CSS动画	`at` , `duration` , `props='{...JSON...}'`

data-anim-at

是相对于卡片startSec的秒数——在步骤9中将每个声明编译到GSAP时间轴时，需加上卡片的

startSec

以获取绝对时间，并量化为1/fps。

9. Assemble the Composition HTML

9. 组装合成文件HTML

Stage the assets and write

$WORK_DIR/public/index.html

bash

undefined

部署资源并编写

$WORK_DIR/public/index.html

：

bash

undefined

SKILL_DIR is injected by the host ("Base directory for this skill: …")

SKILL_DIR由宿主注入（"本技能的基础目录：…"）

SKILL_DIR="<SKILL_DIR>"

mkdir -p "$WORK_DIR/public/fonts" "$WORK_DIR/public/vendor" "$WORK_DIR/public/cards" cp -n "$SKILL_DIR/assets/fonts/"* "$WORK_DIR/public/fonts/" cp -n "$SKILL_DIR/assets/vendor/gsap.min.js" "$WORK_DIR/public/vendor/"

SKILL_DIR="<SKILL_DIR>"

stage the input video — RE-ENCODE with dense keyframes. Sources with a sparse GOP

部署输入视频——重新编码为密集关键帧。关键帧间隔>~1s的源视频在渲染器中会出现 seek 冻结（叠加层下的画面冻结）；-g/-keyint_min设置为合成文件帧率，让每一帧都可seek。

(keyframe interval > ~1s) freeze on seek in the renderer (a frozen frame under the

（设置为你的帧率——示例为30；可使用24/25/60匹配源视频。）

overlays); -g / -keyint_min set to your composition fps make every frame seekable.

—

(Set both to your fps — 30 shown; use 24/25/60 to match.)

—

ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"

undefined

ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"

undefined

Composition Template

合成文件模板

html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <style>
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "LXGW WenKai TC";
        src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "Virgil";
        src: url("fonts/Virgil.woff2") format("woff2");
        font-display: block;
      }

      :root {
        /* Pick from the themeId palette table in Step 7 — example: classic */
        --bg: #fff9e3;
        --text: #1e1e1e;
        --accent-0: #1971c2;
        --accent-1: #e03131;
        --accent-2: #2f9e44;
        --accent-3: #e8590c;
        --accent-4: #9c36b5;
        --font-family: "Caveat", "LXGW WenKai TC", serif;
      }
      * {
        box-sizing: border-box;
      }
      /* Body font-family MUST list concrete font names (not just var(--font-family)) —
   the HyperFrames renderer's static analyzer doesn't expand CSS variables when
   resolving fonts, so a var-only chain triggers `font_family_without_font_face`
   lint and falls back to a generic. Use the concrete chain here; cards that
   want the theme font can still reference var(--font-family) internally. */
      html,
      body {
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100%;
        overflow: hidden;
        background: #000;
        font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
      }
      #stage {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }

      /* video-wrapper holds the source video. Its position / size are animated
   over time by the master timeline (one tween per layout transition). */
      .video-wrapper {
        position: absolute;
        left: 0;
        top: 0;
        width: 1920px;
        height: 1080px;
        overflow: hidden;
        border-radius: 0;
        box-shadow: none;
      }
      .video-wrapper video {
        width: 100%;
        height: 100%;
        object-fit: cover;
      }

      .card-host {
        position: absolute;
        pointer-events: none;
        overflow: hidden;
      }
      .card-host .card {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }
      .card-host .char {
        display: inline-block;
        visibility: visible;
      }

      /* Subtle drop shadow + rounded corners for non-fullscreen video framings */
      .video-wrapper.framed {
        border-radius: 16px;
        box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
      }
    </style>
  </head>
  <body>
    <div
      id="stage"
      data-composition-id="graphic-overlays"
      data-start="0"
      data-duration="121.2"
      data-fps="30"
      data-width="1920"
      data-height="1080"
    >
      <!-- Layer 1: source video — initial position matches card-01's layout -->
      <div class="video-wrapper" id="video-wrap">
        <video
          id="bg-video"
          src="input-video.mp4"
          muted
          playsinline
          data-start="0"
          data-duration="121.2"
          data-track-index="1"
        ></video>
      </div>

      <!-- Layer 2: each card-host sits at the bounds dictated by its layout. -->
      <!-- IMPORTANT: every card-host MUST carry BOTH "card-host" and "clip" classes. -->
      <!--   - "card-host"  → our positioning + pointer-events styles                 -->
      <!--   - "clip"       → HyperFrames runtime uses this to enforce visibility     -->
      <!--                    only during data-start … data-start+data-duration.      -->
      <!--                    Without "clip" the host stays visible the whole video   -->
      <!--                    (lint: timed_element_missing_clip_class).               -->
      <!-- Example: card-01 with zone="fullscreen" → card-host covers (0,0,1920,1080) -->
      <div
        class="card-host clip"
        data-card-id="card-01"
        data-start="1.0000"
        data-duration="6.5000"
        data-track-index="2"
        style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- paste the contents of public/cards/card-01.html here -->
      </div>

      <!-- Example: card-02 with zone="side-panel" (split composition layout) → card on left half -->
      <div
        class="card-host clip"
        data-card-id="card-02"
        data-start="8.0000"
        data-duration="12.0000"
        data-track-index="2"
        style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- card-02 HTML -->
      </div>

      <!-- ...one "card-host clip" per card with inline bounds matching resolveZoneBounds(card.zone)... -->

      <script src="vendor/gsap.min.js"></script>
      <script>
        (function () {
          // count-up formatter helper
          window.__fmt = function (v, fmt) {
            if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
              return Number(v).toFixed(Number(fmt.slice(1, -1)));
            }
            if (fmt === ",d") return Math.round(v).toLocaleString();
            return String(Math.round(v));
          };

          const tl = window.gsap.timeline({ paused: true });

          // ── Card lifecycle (one block per card) ──
          // Example for card-01 [1.0, 7.5] with kinetic-chars at +0.3, grow-x at +0.65:

          // Enter (fade in over 0.4s)
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
          tl.fromTo(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            1.0,
          );

          // Card-internal anims (compile each data-anim-* declaration here)
          tl.from(
            '.card[data-card-id="card-01"] #card-01-title .char',
            { opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
            1.3,
          );
          tl.fromTo(
            '.card[data-card-id="card-01"] #card-01-line',
            { width: 0 },
            { width: 420, duration: 0.5, ease: "power2.out" },
            1.65,
          );

          // Exit (fade out over 0.35s, ending at endSec)
          tl.to(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0, duration: 0.35, ease: "power2.in" },
            7.15,
          );
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);

          // ── Video framing transitions ──
          // When the next card uses a different composition layout, animate the
          // video-wrapper to its new bounds. Example: card-01 = fullscreen
          // (video hidden behind), card-02 = split composition (zone="side-panel"
          // → video on right, card on left).

          // Card-02 enters at 8.0s with the split composition. Animate video to
          // the right half during the card-01 → card-02 gap (between 7.5 and 8.0s).
          tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
          tl.to(
            "#video-wrap",
            { left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
            7.5,
          );

          // Card-02 enter — same pattern as card-01
          tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
          tl.fromTo(
            '.card-host[data-card-id="card-02"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            8.0,
          );
          // ...card-02 internal anims...

          // ── repeat for each card; if the NEXT card's layout differs,
          //    insert another tl.to('#video-wrap', ...) tween before its enter ──

          window.__timelines = window.__timelines || {};
          window.__timelines["graphic-overlays"] = tl;
        })();
      </script>
    </div>
  </body>
</html>

html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <style>
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "LXGW WenKai TC";
        src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "Virgil";
        src: url("fonts/Virgil.woff2") format("woff2");
        font-display: block;
      }

      :root {
        /* 从步骤7的themeId调色板表格中选择——示例：classic */
        --bg: #fff9e3;
        --text: #1e1e1e;
        --accent-0: #1971c2;
        --accent-1: #e03131;
        --accent-2: #2f9e44;
        --accent-3: #e8590c;
        --accent-4: #9c36b5;
        --font-family: "Caveat", "LXGW WenKai TC", serif;
      }
      * {
        box-sizing: border-box;
      }
      /* 主体font-family必须列出具体字体名称（不能仅使用var(--font-family)）——
   HyperFrames渲染器的静态分析器在解析字体时不会展开CSS变量，因此仅使用变量会触发`font_family_without_font_face`
   lint并回退到通用字体。在此处使用具体字体链；需要主题字体的卡片仍可在内部引用var(--font-family)。 */
      html,
      body {
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100%;
        overflow: hidden;
        background: #000;
        font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
      }
      #stage {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }

      /* video-wrapper承载源视频。其位置/尺寸由主时间轴动画（每个布局过渡一个动画）控制。 */
      .video-wrapper {
        position: absolute;
        left: 0;
        top: 0;
        width: 1920px;
        height: 1080px;
        overflow: hidden;
        border-radius: 0;
        box-shadow: none;
      }
      .video-wrapper video {
        width: 100%;
        height: 100%;
        object-fit: cover;
      }

      .card-host {
        position: absolute;
        pointer-events: none;
        overflow: hidden;
      }
      .card-host .card {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }
      .card-host .char {
        display: inline-block;
        visibility: visible;
      }

      /* 非全屏视频画面的细微阴影+圆角 */
      .video-wrapper.framed {
        border-radius: 16px;
        box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
      }
    </style>
  </head>
  <body>
    <div
      id="stage"
      data-composition-id="graphic-overlays"
      data-start="0"
      data-duration="121.2"
      data-fps="30"
      data-width="1920"
      data-height="1080"
    >
      <!-- 第一层：源视频——初始位置匹配card-01的布局 -->
      <div class="video-wrapper" id="video-wrap">
        <video
          id="bg-video"
          src="input-video.mp4"
          muted
          playsinline
          data-start="0"
          data-duration="121.2"
          data-track-index="1"
        ></video>
      </div>

      <!-- 第二层：每个card-host位于其布局指定的边界。 -->
      <!-- 重要：每个card-host必须同时包含"card-host"和"clip"类。 -->
      <!--   - "card-host"  → 我们的定位+指针事件样式                 -->
      <!--   - "clip"       → HyperFrames运行时使用此类来控制可见性     -->
      <!--                    仅在data-start … data-start+data-duration期间可见。      -->
      <!--                    没有"clip"类的话，宿主会在整个视频期间保持可见   -->
      <!--                    （lint错误：timed_element_missing_clip_class）。               -->
      <!-- 示例：zone="fullscreen"的card-01 → card-host覆盖(0,0,1920,1080) -->
      <div
        class="card-host clip"
        data-card-id="card-01"
        data-start="1.0000"
        data-duration="6.5000"
        data-track-index="2"
        style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- 粘贴public/cards/card-01.html的内容到此处 -->
      </div>

      <!-- 示例：zone="side-panel"的card-02（split合成文件布局）→ 卡片在左半部分 -->
      <div
        class="card-host clip"
        data-card-id="card-02"
        data-start="8.0000"
        data-duration="12.0000"
        data-track-index="2"
        style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- card-02的HTML -->
      </div>

      <!-- ...每张卡片对应一个"card-host clip"，内联边界匹配resolveZoneBounds(card.zone)... -->

      <script src="vendor/gsap.min.js"></script>
      <script>
        (function () {
          // count-up格式化工具
          window.__fmt = function (v, fmt) {
            if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
              return Number(v).toFixed(Number(fmt.slice(1, -1)));
            }
            if (fmt === ",d") return Math.round(v).toLocaleString();
            return String(Math.round(v));
          };

          const tl = window.gsap.timeline({ paused: true });

          // ── 卡片生命周期（每张卡片一个代码块） ──
          // 示例：card-01 [1.0, 7.5]，在+0.3处有kinetic-chars动画，+0.65处有grow-x动画：

          // 入场（0.4秒淡入）
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
          tl.fromTo(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            1.0,
          );

          // 卡片内部动画（在此编译每个data-anim-*声明）
          tl.from(
            '.card[data-card-id="card-01"] #card-01-title .char',
            { opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
            1.3,
          );
          tl.fromTo(
            '.card[data-card-id="card-01"] #card-01-line',
            { width: 0 },
            { width: 420, duration: 0.5, ease: "power2.out" },
            1.65,
          );

          // 退场（0.35秒淡出，在endSec结束）
          tl.to(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0, duration: 0.35, ease: "power2.in" },
            7.15,
          );
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);

          // ── 视频画面过渡 ──
          // 当下一张卡片使用不同的合成文件布局时，将video-wrapper动画到新边界。示例：card-01=全屏
          // （视频在后方隐藏），card-02=split合成文件布局（zone="side-panel"
          // → 视频在右侧，卡片在左侧）。

          // card-02在8.0秒以split合成文件布局入场。在card-01→card-02的间隙（7.5到8.0秒之间）将视频动画到右半部分。
          tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
          tl.to(
            "#video-wrap",
            { left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
            7.5,
          );

          // card-02入场——与card-01模式相同
          tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
          tl.fromTo(
            '.card-host[data-card-id="card-02"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            8.0,
          );
          // ...card-02内部动画...

          // ── 为每张卡片重复上述步骤；如果下一张卡片布局不同，
          //    在其入场前插入另一个tl.to('#video-wrap', ...)动画 ──

          window.__timelines = window.__timelines || {};
          window.__timelines["graphic-overlays"] = tl;
        })();
      </script>
    </div>
  </body>
</html>

GSAP Statement Cheat Sheet

GSAP语句速查表

Compile each

data-anim

attribute into a GSAP statement. Times are absolute seconds = card.startSec + data-anim-at, quantized to 1/fps. Selector is

.card[data-card-id="X"] #elementId

data-anim	GSAP statement template
`fade-in`	`tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);`
`fade-out`	`tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);`
`slide-in` (from=left, dist=80)	`tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);`
`kinetic-chars` (pop)	`tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);`
`count-up`	`(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();`
`draw-path`	`(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();`
`grow-x` (target-w=W)	`tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);`
`grow-y` (target-h=H)	`tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);`
`scale-pop`	`tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);`
`mask-reveal` (direction=left)	`tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);`

Quantize:

T = Math.round(absSec * fps) / fps

. At 30fps the smallest step is

1/30 ≈ 0.0333s

; rounding to 4 decimals (

.toFixed(4)

) is fine inside the JS literal.

将每个

data-anim

属性编译为GSAP语句。时间为绝对秒数= card.startSec + data-anim-at，量化为1/fps。选择器为

.card[data-card-id="X"] #elementId

。

data-anim	GSAP语句模板
`fade-in`	`tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);`
`fade-out`	`tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);`
`slide-in` （from=left, dist=80）	`tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);`
`kinetic-chars` （pop）	`tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);`
`count-up`	`(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();`
`draw-path`	`(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();`
`grow-x` （target-w=W）	`tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);`
`grow-y` （target-h=H）	`tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);`
`scale-pop`	`tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);`
`mask-reveal` （direction=left）	`tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);`

量化：

T = Math.round(absSec * fps) / fps

。在30fps下最小步长为

1/30 ≈ 0.0333s

；在JS字面量中四舍五入到4位小数（

.toFixed(4)

）即可。

Video Framing Reference (per

layout

value)

视频画面参考（按

layout

取值）

The selector for the video container is

#video-wrap

. Animate its bounds between cards using

tl.to('#video-wrap', { ...bounds }, T)

. Initial bounds should be set inline on the element to match card-01's layout. Pick a transition duration of 0.5–0.7s with

ease: 'power2.inOut'

Decorative frames (

clean

hairline

polaroid

) sit as a sibling of

#video-wrap

and follow it through layout transitions. See

references/frames/

for each frame's placement HTML, suggested CSS, and which layouts it pairs with. Quick rule:

overlay

layout suppresses decorative frames (the full-bleed video clashes with chrome); PiP layouts already have their own pill treatment (border-radius + white ring + shadow), so add a decorative frame only on top of

split

stack

GSAP target lookup table for

#video-wrap

per composition layout (landscape 1920×1080 — for portrait & 4:5 see

references/layouts/*.html

which list all three ratios):

composition layout	typical card.zone	`#video-wrap` GSAP target	extra css class
`split`	`side-panel`	`{ left: 960, top: 0, width: 960, height: 1080 }`	—
`stack`	`lower-third`	`{ left: 14, top: 14, width: 1892, height: 548 }` (top 52%)	—
`pip` (bottom-right)	`fullscreen`	`{ left: 1480, top: 760, width: 400, height: 300 }`	`pip-pill` (border-radius + ring + shadow)
`pip` (top-left)	`fullscreen`	`{ left: 40, top: 40, width: 400, height: 300 }`	`pip-pill`
`overlay` (video full-bleed)	`video-overlay`	`{ left: 0, top: 0, width: 1920, height: 1080 }` (no change from default)	—
hide video (pure-graphic moment)	`fullscreen`	`{ opacity: 0 }` (or move off-canvas)	—

To toggle the pip-pill chrome (border-radius + white ring + drop shadow) when entering or leaving a pip moment:

// Enter pip — add chrome
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
  "#video-wrap",
  { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
  T,
);

// Leave pip — back to clean full-bleed
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
  "#video-wrap",
  { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
  T_NEXT,
);

Card-host bounds match the zone. Resolve the card's

zone

into pixel bounds using the table at the top of Step 6, then write those into the card-host's inline

style="left:Xpx;top:Ypx;width:Wpx; height:Hpx;..."

. For

video-overlay

zone (overlay recipe), the card-host fills the full canvas — your CSS inside

.card .root

decides where the actual visible card sits.

视频容器的选择器为

#video-wrap

。使用

tl.to('#video-wrap', { ...bounds }, T)

在卡片间动画其边界。初始边界应在元素内联设置，以匹配card-01的布局。过渡时长选择0.5–0.7s，使用

ease: 'power2.inOut'

。

装饰性画面（

clean

hairline

polaroid

）作为

#video-wrap

的同级元素存在，并跟随其进行布局过渡。查看

references/frames/

获取每个画面的放置HTML、建议CSS以及适配的布局。快速规则：

overlay

布局禁用装饰性画面（全屏视频与边框冲突）；PiP布局已自带药丸状样式（圆角+白边+阴影），因此仅在

split

stack

布局上添加装饰性画面。

#video-wrap
的GSAP目标查找表（按合成文件布局，横屏1920×1080——竖屏&4:5请查看

references/layouts/*.html

，其中列出了三种宽高比）：

合成文件布局	典型card.zone	`#video-wrap` 的GSAP目标	额外CSS类
`split`	`side-panel`	`{ left: 960, top: 0, width: 960, height: 1080 }`	—
`stack`	`lower-third`	`{ left: 14, top: 14, width: 1892, height: 548 }` （上半部分52%）	—
`pip` （右下角）	`fullscreen`	`{ left: 1480, top: 760, width: 400, height: 300 }`	`pip-pill` （圆角+白边+阴影）
`pip` （左上角）	`fullscreen`	`{ left: 40, top: 40, width: 400, height: 300 }`	`pip-pill`
`overlay` （视频全屏）	`video-overlay`	`{ left: 0, top: 0, width: 1920, height: 1080 }` （与默认无变化）	—
隐藏视频（纯图形时刻）	`fullscreen`	`{ opacity: 0 }` （或移出画布）	—

进入或退出PiP模式时，切换pip-pill样式（圆角+白边+阴影）：

// 进入PiP——添加样式
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
  "#video-wrap",
  { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
  T,
);

// 退出PiP——回到纯净全屏
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
  "#video-wrap",
  { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
  T_NEXT,
);

card-host边界匹配zone。使用步骤6顶部的表格将卡片的

zone

解析为像素边界，然后将其写入card-host的内联

style="left:Xpx;top:Ypx;width:Wpx;height:Hpx;..."

。对于

video-overlay

zone（overlay布局），card-host填满整个画布——

.card .root

内部的CSS决定实际可见卡片的位置。

HyperFrames Layout / Animation QA Rules

HyperFrames布局/动画QA规则

Build each card's static hero frame first: the moment where the card is fully visible and readable.
Confirm video, cards, subtitles/captions, and diagrams do not unintentionally overlap.
Confirm hidden video areas are clipped by the frame and not visible outside intended bounds.
Register one paused master timeline as
```
window.__timelines["graphic-overlays"]
```
.
Build timelines synchronously at page load; no
```
async
```
,
```
setTimeout
```
, Promises, or media
```
play()
```
calls.
Do not use
```
Math.random()
```
or
```
Date.now()
```
in render paths.
Do not use
```
repeat: -1
```
; calculate finite repeats from the video duration.
Prefer GSAP transforms and opacity (
```
x
```
,
```
y
```
,
```
scale
```
,
```
rotation
```
,
```
opacity
```
) over layout properties (
```
top
```
,
```
left
```
,
```
width
```
,
```
height
```
) for motion.
Animate wrappers such as
```
#video-wrap
```
, not the video element dimensions directly.
Avoid animating the same property on the same element from multiple timelines at the same time.

Use

data-track-index

, not

data-layer

; use

data-duration

, not

data-end

Every timed element (
```
card-host
```
, sub-composition, etc.) MUST include
```
class="clip"
```
alongside its own classes — e.g.
```
class="card-host clip"
```
. The HyperFrames runtime uses
```
.clip
```
to gate visibility to the
```
data-start … data-start+data-duration
```
window. Without it the element is visible for the whole video (lint:
```
timed_element_missing_clip_class
```
).
For body / global
```
font-family
```
, list concrete font names (
```
'Inter', 'Caveat', …
```
) — not a CSS variable like
```
var(--font-family)
```
. The HyperFrames font resolver doesn't expand CSS vars during static analysis (lint:
```
font_family_without_font_face
```
). Cards may still use
```
var(--font-family)
```
internally since their
```
@font-face
```
declarations are loaded.

先构建每张卡片的静态重点画面：卡片完全可见且可读的时刻。
确认视频、卡片、字幕/标题、图表不会意外重叠。
确认隐藏的视频区域被画面裁剪，不会在预期边界外可见。
注册一个暂停的主时间轴为
```
window.__timelines["graphic-overlays"]
```
。
在页面加载时同步构建时间轴；禁止使用
```
async
```
、
```
setTimeout
```
、Promises或媒体
```
play()
```
调用。
在渲染路径中禁止使用
```
Math.random()
```
或
```
Date.now()
```
。
禁止使用
```
repeat: -1
```
；根据视频时长计算有限重复次数。
动画优先使用GSAP变换和透明度（
```
x
```
、
```
y
```
、
```
scale
```
、
```
rotation
```
、
```
opacity
```
），而非布局属性（
```
top
```
、
```
left
```
、
```
width
```
、
```
height
```
）。
动画容器（如
```
#video-wrap
```
），而非直接动画视频元素尺寸。
避免在同一时间从多个时间轴动画同一元素的同一属性。

使用

data-track-index

，而非

data-layer

；使用

data-duration

，而非

data-end

。

每个定时元素（
```
card-host
```
、子合成文件等）必须在自身类之外包含
```
class="clip"
```
——例如
```
class="card-host clip"
```
。HyperFrames运行时使用
```
.clip
```
控制元素仅在
```
data-start … data-start+data-duration
```
窗口内可见。没有该类的话，元素会在整个视频期间可见（lint错误：
```
timed_element_missing_clip_class
```
）。
对于主体/全局
```
font-family
```
，列出具体字体名称（
```
'Inter', 'Caveat', …
```
）——而非CSS变量如
```
var(--font-family)
```
。HyperFrames字体解析器在静态分析时不会展开CSS变量（lint错误：
```
font_family_without_font_face
```
）。卡片仍可在内部使用
```
var(--font-family)
```
，因为它们的
```
@font-face
```
声明已加载。

10. Render to MP4

10. 渲染为MP4

bash

cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
  -o output.mp4 \
  --fps 30

hyperframes render <dir>

reads

<dir>/index.html

and produces the MP4. The flag

PRODUCER_BROWSER_GPU_MODE=hardware

(or

--browser-gpu

) is strongly recommended on macOS — software-only Chrome rendering times out on most laptops.

For a sanity check before the full render, capture a single frame at a specific timestamp:

bash

npx hyperframes snapshot public --at 5    # → public/snapshots/frame-00-at-5s.png (a single --at ignores --out)

bash

cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
  -o output.mp4 \
  --fps 30

hyperframes render <dir>

读取

<dir>/index.html

并生成MP4。在macOS上强烈建议使用

PRODUCER_BROWSER_GPU_MODE=hardware

（或

--browser-gpu

）标志——纯软件Chrome渲染在大多数笔记本电脑上会超时。

在完整渲染前进行 sanity 检查，捕获特定时间戳的单帧画面：

bash

npx hyperframes snapshot public --at 5    # → public/snapshots/frame-00-at-5s.png（单个--at忽略--out）

11. Report Results

11. 报告结果

Tell the user:

Work directory path
```
storyboard.json
```
(the card outline you designed)
```
public/cards/*.html
```
(one HTML per card)
```
public/index.html
```
(the assembled composition)
```
output.mp4
```
(the final video)
ASR provider used
Card count + how you chose them (in 1 sentence)
Any missing keys or quality caveats

Optional live preview (on request only). The clip plays unchanged inside

public/index.html

with the overlays on top, so it previews faithfully. Don't open it during the run. When the user asks, start a long-lived server after render and report the URL:

bash

(cd "$WORK_DIR/public" && npx hyperframes preview)   # or `npx hyperframes play` for a shareable link

Do not delete the work directory unless the user asks.

告知用户：

工作目录路径
```
storyboard.json
```
（你设计的卡片大纲）
```
public/cards/*.html
```
（每张卡片对应一个HTML文件）
```
public/index.html
```
（组装好的合成文件）
```
output.mp4
```
（最终视频）
使用的ASR提供商
卡片数量+选择理由（一句话）
任何缺失的密钥或质量说明

可选实时预览（仅在用户请求时提供）。源视频在

public/index.html

中完整播放，叠加层在上方，因此预览效果准确。运行期间不要打开。当用户请求时，在渲染完成后启动长期服务器并报告URL：

bash

(cd "$WORK_DIR/public" && npx hyperframes preview)   # 或`npx hyperframes play`获取可分享链接

除非用户要求，否则不要删除工作目录。

graphic-overlays

Original

Translation

Graphic Overlays

图形叠加层

CLI Resolution

CLI 说明

hyperframes — transcription (local Whisper) + rendering the assembled HTML to MP4

hyperframes — 转录（本地Whisper）+ 将组装好的HTML渲染为MP4

Workflow

工作流

1. Check Environment

1. 检查环境

confirm bundled assets:

确认捆绑资源：

2. Create a Work Directory

2. 创建工作目录

3. Extract Audio and Metadata

3. 提取音频和元数据

metadata — duration / width / height / fps

元数据 — 时长/宽度/高度/帧率

audio

音频

4. Transcribe

4. 转录

5. Correct Transcript

5. 修正字幕

6. Draft a Lightweight Storyboard (in chat)

6. 起草轻量化故事板（在对话中）

7. Decide Render Strategy

7. 确定渲染策略

Confirm Visual Direction with User (DO THIS FIRST)

与用户确认视觉方向（务必先做此步骤）

Render Strategy Inputs

渲染策略输入

Visual Design Library (<SKILL_DIR>/references/)

视觉设计库（<SKILL_DIR>/references/）

Layout Compositions (Card + Video)

布局合成（卡片+视频）

Storyboard Render Contract

故事板渲染约定

8. Write Each Card's HTML

8. 编写每张卡片的HTML

Card HTML Contract

卡片HTML约定

Card Sizing — Mobile-First in Portrait

卡片尺寸——竖屏优先

Available data-anim Kinds

可用的data-anim类型

9. Assemble the Composition HTML

9. 组装合成文件HTML

SKILL_DIR is injected by the host ("Base directory for this skill: …")

SKILL_DIR由宿主注入（"本技能的基础目录：…"）

stage the input video — RE-ENCODE with dense keyframes. Sources with a sparse GOP

部署输入视频——重新编码为密集关键帧。关键帧间隔>~1s的源视频在渲染器中会出现 seek 冻结（叠加层下的画面冻结）；-g/-keyint_min设置为合成文件帧率，让每一帧都可seek。

(keyframe interval > ~1s) freeze on seek in the renderer (a frozen frame under the

（设置为你的帧率——示例为30；可使用24/25/60匹配源视频。）

overlays); -g / -keyint_min set to your composition fps make every frame seekable.

(Set both to your fps — 30 shown; use 24/25/60 to match.)

Composition Template

合成文件模板

GSAP Statement Cheat Sheet

GSAP语句速查表

Video Framing Reference (per layout value)

视频画面参考（按layout取值）

HyperFrames Layout / Animation QA Rules

HyperFrames布局/动画QA规则

10. Render to MP4

10. 渲染为MP4

11. Report Results

11. 报告结果

Available
`data-anim`
Kinds

可用的
`data-anim`
类型

Video Framing Reference (per
`layout`
value)

视频画面参考（按
`layout`
取值）