graphic-overlays

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Graphic Overlays

图形叠加层

Graphic Overlays takes a local video that plays in full and layers a sequence of timed, designed graphic cards onto it — titles, lower-thirds, data callouts, quotes, side panels, picture-in-picture — synced to what's being said. The agent designs the cards (timing + content) and writes each card's HTML directly in the conversation, then assembles a single composition HTML and renders it to MP4 via
hyperframes
. There is no fixed archetype list and no prescribed card structure — the overlays emerge from what the transcript actually says.
Confirm the route before you build. This skill packages an existing talking-head clip with designed graphic cards (titles, lower-thirds, data callouts, quotes, side panels, PiP). If the user wants plain captions / subtitles (the spoken words as text) →
/embedded-captions
; a single short unnarrated element (one logo sting / lower-third) →
/motion-graphics
. The clip plays untouched — re-timing, recoloring, reframing, reordering, or audio is NLE editing and out of scope. Building from a URL / topic / PR → the creation workflows. Unsure overlays-vs-captions? Read
/hyperframes-read-first
first.
Graphic-packaging sibling of
embedded-captions
.
Captions add the spoken words as a readable subtitle; this adds designed graphics on top of the playing video. Plain subtitles →
embedded-captions
. Build a video from scratch → the creation workflows (
product-launch-video
/
faceless-explainer
/ …).
Inspectable intermediate files in the work directory:
  • metadata.json
    — duration / width / height / fps
  • audio.mp3
    — extracted audio
  • transcript.json
    — a flat word array
    [{ text, start, end }, …]
    (Whisper; no
    segments
    , no
    words
    wrapper)
  • storyboard.json
    — lightweight card outline (the agent's plan)
  • public/cards/card-XX.html
    — one HTML fragment per card
  • public/index.html
    — final assembled composition
  • output.mp4
    — rendered video
图形叠加层功能会在完整播放的本地视频上,叠加一系列定时设计的图形卡片——包括标题、下三分之一字幕条、数据标注框、引用语、侧边栏、画中画——并与视频中的语音内容同步。Agent会设计卡片(时间规划+内容)并直接在对话中编写每张卡片的HTML,然后将所有内容组装成单个合成HTML文件,再通过
hyperframes
渲染为MP4。这里没有固定的卡片模板和预设结构,叠加层的内容完全由视频字幕的实际内容生成。
构建前请确认使用场景。此技能为现有单人讲话视频片段添加设计好的图形卡片(标题、下三分之一字幕条、数据标注框、引用语、侧边栏、画中画)。如果用户需要纯字幕/对白字幕(将语音内容转为文本)→ 使用
/embedded-captions
;如果需要单个简短无旁白元素(如单个logo动画/下三分之一字幕条)→ 使用
/motion-graphics
源视频会完整播放——调整时长、调色、重构图、重新排序或音频处理属于非线性编辑(NLE)范畴,不在本技能范围内。从URL/主题/公关素材创建视频→使用创作工作流。不确定是叠加层还是字幕?请先阅读
/hyperframes-read-first
embedded-captions
的图形包装姊妹技能
。字幕是将语音内容转为可读的文本;而本技能是在播放的视频上添加设计好的图形。纯字幕→使用
embedded-captions
。从零开始制作视频→使用创作工作流(
product-launch-video
/
faceless-explainer
/…)。
工作目录中的可检查中间文件:
  • metadata.json
    — 视频时长/宽度/高度/帧率
  • audio.mp3
    — 提取的音频文件
  • transcript.json
    — 扁平化的单词数组
    [{ text, start, end }, …]
    (由Whisper生成;无
    segments
    ,无
    words
    嵌套)
  • storyboard.json
    — 轻量化的卡片大纲(Agent的规划文件)
  • public/cards/card-XX.html
    — 每张卡片对应的HTML片段
  • public/index.html
    — 最终组装的合成文件
  • output.mp4
    — 渲染完成的视频

CLI Resolution

CLI 说明

bash
undefined
bash
undefined

hyperframes — transcription (local Whisper) + rendering the assembled HTML to MP4

hyperframes — 转录(本地Whisper)+ 将组装好的HTML渲染为MP4

npx hyperframes --help

This skill runs entirely on the **hyperframes** CLI plus system `ffmpeg` / `ffprobe`.
Transcription is local **Whisper** via `hyperframes transcribe` — no third-party
service, API key, or rate-limited proxy.
npx hyperframes --help

本技能完全基于**hyperframes** CLI和系统`ffmpeg`/`ffprobe`运行。转录通过`hyperframes transcribe`使用本地**Whisper**完成——无需第三方服务、API密钥或受限代理。

Workflow

工作流

1. Check Environment

1. 检查环境

bash
npx hyperframes doctor          # ffmpeg, headless browser, render deps
bash
npx hyperframes doctor          # 检查ffmpeg、无头浏览器、渲染依赖

confirm bundled assets:

确认捆绑资源:

ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"

Required:

- `ffmpeg` / `ffprobe` (system)
- `<SKILL_DIR>/assets/fonts/*.woff2`, `<SKILL_DIR>/assets/vendor/gsap.min.js` (bundled inside this skill, staged to work dir in Step 9)

Transcription needs no key — `hyperframes transcribe` runs Whisper locally (Step 4).

Strongly recommended on macOS for `hyperframes render`:

```bash
export PRODUCER_BROWSER_GPU_MODE=hardware
ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"

必需依赖:

- `ffmpeg`/`ffprobe`(系统级)
- `<SKILL_DIR>/assets/fonts/*.woff2`、`<SKILL_DIR>/assets/vendor/gsap.min.js`(已捆绑在本技能中,将在步骤9中部署到工作目录)

转录无需密钥——`hyperframes transcribe`通过本地Whisper运行(步骤4)。

在macOS上运行`hyperframes render`时强烈建议设置:

```bash
export PRODUCER_BROWSER_GPU_MODE=hardware

2. Create a Work Directory

2. 创建工作目录

All artifacts live under
videos/<project-name>/
— the same convention as the other video workflows (
product-launch-video
/
faceless-explainer
/
pr-to-video
). Keep the cwd at the workspace root; everything below writes under this one subdirectory.
bash
VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"
所有产物都存放在
videos/<project-name>/
下——与其他视频工作流(
product-launch-video
/
faceless-explainer
/
pr-to-video
)遵循相同约定。保持当前工作目录为工作区根目录;所有后续操作都会写入该子目录。
bash
VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"

3. Extract Audio and Metadata

3. 提取音频和元数据

bash
undefined
bash
undefined

metadata — duration / width / height / fps

元数据 — 时长/宽度/高度/帧率

ffprobe -v error -select_streams v:0
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"
ffprobe -v error -select_streams v:0
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"

audio

音频

ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"

Outputs: `metadata.json` (read `width`/`height`/`duration`; fps = the `r_frame_rate`
fraction evaluated, e.g. `30000/1001 → 29.97`) + `audio.mp3`.
ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"

输出文件:`metadata.json`(包含`width`/`height`/`duration`;帧率为`r_frame_rate`的分数计算值,例如`30000/1001 → 29.97`)+ `audio.mp3`。

4. Transcribe

4. 转录

bash
npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en
Local Whisper — no API key, no proxy, no rate limit. Writes a word-level
transcript.json
into the work dir (word
text
+
start
/
end
timestamps). Read it for the word / sentence timings that drive card timing in Step 6; group words into sentences yourself at punctuation / pauses if you need segment-level chunks.
Clamp to media duration. Whisper can return the final word's
end
a hair past the actual clip length — clamp every card
endSec
and
composition.durationSeconds
to the
metadata.json
duration, or the render will show a black tail past the video.
bash
npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en
本地Whisper——无需API密钥、代理或速率限制。将单词级别的
transcript.json
写入工作目录(包含单词
text
+
start
/
end
时间戳)。阅读该文件获取单词/句子时间戳,用于步骤6中的卡片时间规划;如果需要片段级别的分块,可以根据标点/停顿自行将单词分组为句子。
限制在媒体时长内。Whisper返回的最后一个单词的
end
时间可能略超过实际视频长度——需将每张卡片的
endSec
composition.durationSeconds
限制在
metadata.json
的时长内,否则渲染时视频末尾会出现黑屏。

5. Correct Transcript

5. 修正字幕

transcript.json
is a flat array of word objects
[{ "text": "...", "start": s, "end": s }, …]
(no
segments
array, no
words
wrapper; the per-word key is
text
). Read it and fix obvious ASR errors:
  • Homophones, product names, technical terms, punctuation
  • Edit a word's
    text
    in place; preserve its
    start
    /
    end
    timestamps
  • There is no pre-grouped
    segments
    array — group words into sentences yourself (split at terminal punctuation / pauses) when you need segment-level chunks for card timing
transcript.json
扁平化的单词对象数组——
[{ "text": "...", "start": s, "end": s }, …]
(无
segments
数组,无
words
嵌套;每个单词的键为**
text
**)。阅读并修正明显的自动语音识别(ASR)错误:
  • 同音词、产品名称、技术术语、标点符号
  • 直接修改单词的
    text
    保留其
    start
    /
    end
    时间戳
  • 没有预分组的
    segments
    数组——需要时自行将单词分组为句子(根据句末标点/停顿拆分),用于卡片时间规划的片段级分块

6. Draft a Lightweight Storyboard (in chat)

6. 起草轻量化故事板(在对话中)

No CLI involved. Read
transcript.json
+
metadata.json
and design cards directly.
storyboard.json
is an agent-internal planning artifact — no CLI command consumes it; it exists so you can think clearly about timing and content before writing each card's HTML. Keep the shape consistent with the example below so the same outline can drive the composition you author in Step 9:
json
{
  "schemaVersion": 3,
  "composition": {
    "fps": 30,
    "width": 1080,
    "height": 1920,
    "durationSeconds": 121.2,
    "layout": "portrait",
    "themeId": "noir",
    "seed": 42
  },
  "videoTrack": {
    "sourcePath": "input-video.mp4",
    "startSec": 0,
    "endSec": 121.2,
    "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
  },
  "subtitles": { "enabled": false },
  "cards": [
    {
      "id": "card-01",
      "intent": "Hook with the speaker's anxious midnight question",
      "startSec": 0.5,
      "endSec": 13.0,
      "accentIndex": 0,
      "zone": "fullscreen",
      "contentHints": {
        "kicker": "AN HONEST QUESTION",
        "title": "The soul-searching question at 11 PM",
        "detail": "Client's 60-second voice message: 'If the RMB appreciates, does that mean my USD policy is a terrible loss?'"
      }
    }
  ]
}
Required Card fields:
fieldtypepurpose
id
stringstable id used in card HTML & GSAP selectors
intent
stringnatural-language description; fed to card synthesis
startSec
/
endSec
numbertimes in seconds (endSec > startSec)
accentIndex
0 | 1 | 2 | 3 | 4which of the 5 theme accent colors this card pulls
zone
enum (see below)where on the canvas the card lives
contentHints
objectfree-form bag; agent puts kicker/title/detail/data/quote here
archetype
(optional)
stringfree-form label you may attach to remember a card's pattern; absent = free-form, which is the default
transition
(optional)
enum:
cut
|
fade
|
slide
|
wipe
declarative card-to-card transition
Five
zone
values:
zoneresolved boundswhen to use
fullscreen
covers whole canvashero moments, big numbers, mantras
whiteboard-area
inset 40px margin (or 45% of portrait height)dense data / annotated content
lower-third
bottom 30% bandannotation over visible video
side-panel
right 42% (landscape) or bottom 40% (portrait)data side, video other side
video-overlay
full canvas, expects mostly-transparent cardannotation overlays on full-bleed video
When you assemble the composition in Step 9, resolve each card's
zone
into pixel bounds on the card-host wrapper following the table above. Video bounds are set once at composition level (
videoTrack.bounds
); to make video appear to "move between cards", author GSAP tweens against
#video-wrap
in the composition's
<script>
(see Step 9).
No prescribed card roles, no prescribed narrative arc. Cards emerge from what the video actually says — could be all quotes or all data, could open with a number or with a story. Let the transcript drive the rhythm.
How many takeaways? — auto-infer from duration + density. No fixed upper limit. Pick a base pace from the video duration, then adjust by information density. Only floor is fixed: minimum 5 cards so even short videos have rhythm.
Step 1 — base pace by duration (the natural sec/card for medium density):
video durationbase pace (sec per card)rationale
< 60s (short reel)6–8sviewers expect fast cuts in short-form
60s – 3 min8–12snormal social pace
3 – 10 min12–20sgive breathing room; each card carries more
10 – 30 min20–35slong-form lecture / interview rhythm
> 30 min30–60sepisodic, near-chapter feel
Step 2 — density multiplier (multiplies the base pace):
signal in the transcriptmultipliereffect
High density — many numbers, distinct claims, staccato pacing, list-like enumeration, every 1–2 sentences is a new idea× 0.7cuts faster, more cards
Medium density — mixed flow with both data and narrative× 1.0base pace
Low density — one extended story, repeated reframing, slow reflective pacing, single argument unfolding× 1.5cuts slower, fewer cards
Step 3 — compute:
secPerCard = basePace × densityMultiplier
cardCount  = max(5, round(videoDurationSec / secPerCard))
Examples (notice — no upper clamp; long videos naturally produce more cards):
  • 30s reel, single punchline (low density) → 7 × 1.5 = 10.5s/card → round(30/10.5)=3 → floor to 5 cards
  • 60s reflective monologue (low density) → 10 × 1.5 = 15s/card → 4 → floor to 5 cards
  • 121s talking-head with rich data (high density) → 10 × 0.7 = 7s/card → 17 cards
  • 5 min interview, mixed density → 16 × 1.0 = 16s/card → 19 cards
  • 10 min deep-dive, high density → 16 × 0.7 = 11s/card → 55 cards
  • 30 min lecture, medium density → 28 × 1.0 = 28s/card → 64 cards
  • 1 hr podcast, low density → 45 × 1.5 = 67.5s/card → 53 cards
When a card holds longer than ~15s, plan for a richer card (data block, multi-step reveal, several sub-points unfolding with staggered animations) — a static one-liner gets boring past 8s. For long pieces where many cards exceed 30s, consider chunking the timeline into sub-compositions (one .html per chapter, mounted with
data-composition-src
) so the GSAP timeline per file stays manageable — see the
timeline_track_too_dense
HyperFrames lint warning.
content
can be a plain string ("Title: annualized 5.69%\nNotes: ...") or any JSON shape that captures the data. The agent decides the shape per card.
Optional outro. This skill ships no fixed brand outro. If the user wants a closing card, design a neutral one yourself (wordmark + one-line tagline, ~1.5-2s, fade in -> short hold -> fade out), append it to
cards[]
, and extend
composition.durationSeconds
to its
endSec
. Otherwise end on the last content card.
无需CLI操作。阅读
transcript.json
+
metadata.json
并直接设计卡片。
storyboard.json
是Agent内部的规划文件——没有CLI命令会解析它;它的作用是让你在编写每张卡片的HTML前,清晰规划时间和内容。保持与以下示例一致的结构,以便相同的大纲可以驱动步骤9中的合成文件创作:
json
{
  "schemaVersion": 3,
  "composition": {
    "fps": 30,
    "width": 1080,
    "height": 1920,
    "durationSeconds": 121.2,
    "layout": "portrait",
    "themeId": "noir",
    "seed": 42
  },
  "videoTrack": {
    "sourcePath": "input-video.mp4",
    "startSec": 0,
    "endSec": 121.2,
    "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
  },
  "subtitles": { "enabled": false },
  "cards": [
    {
      "id": "card-01",
      "intent": "用主讲人深夜的焦虑问题吸引观众",
      "startSec": 0.5,
      "endSec": 13.0,
      "accentIndex": 0,
      "zone": "fullscreen",
      "contentHints": {
        "kicker": "一个坦诚的问题",
        "title": "深夜11点的灵魂拷问",
        "detail": "客户的60秒语音消息:‘如果人民币升值,是不是意味着我的美元策略亏大了?’"
      }
    }
  ]
}
必需的卡片字段:
字段类型用途
id
字符串用于卡片HTML和GSAP选择器的稳定ID
intent
字符串自然语言描述;用于卡片内容生成
startSec
/
endSec
数字时间(秒,endSec > startSec)
accentIndex
0 | 1 | 2 | 3 | 4卡片使用的主题强调色索引(共5种)
zone
枚举(见下文)卡片在画布上的位置
contentHints
对象自由格式的内容提示;Agent可在此添加标题/副标题/详情/数据/引用语
archetype
(可选)
字符串用于标记卡片模式的自由格式标签;未设置则为自由格式(默认)
transition
(可选)
枚举:
cut
|
fade
|
slide
|
wipe
卡片间的过渡效果声明
5种
zone
取值:
zone解析后的边界使用场景
fullscreen
覆盖整个画布重点时刻、大数字、核心观点
whiteboard-area
内边距40px(或竖屏高度的45%)密集数据/带注释的内容
lower-third
底部30%区域视频上的注释内容
side-panel
右侧42%(横屏)或底部40%(竖屏)数据侧边栏,视频在另一侧
video-overlay
整个画布,卡片需设置为半透明全屏视频上的注释叠加层
在步骤9组装合成文件时,需根据上述表格将每张卡片的
zone
解析为卡片容器的像素边界。视频边界在合成文件级别设置一次(
videoTrack.bounds
);若要实现视频在卡片间“移动”的效果,需在合成文件的
<script>
中针对
#video-wrap
编写GSAP动画(见步骤9)。
无预设卡片角色和叙事结构。卡片内容完全由视频实际内容生成——可以全是引用语或全是数据,也可以以数字或故事开头。让字幕内容主导节奏。
需要多少个要点?——根据时长和信息密度自动推断。没有固定上限。先根据视频时长选择基础节奏,再根据信息密度调整。唯一固定下限:至少5张卡片,确保即使是短视频也有节奏。
步骤1 — 按时长确定基础节奏(中等密度下的自然单卡时长):
视频时长范围基础节奏(每张卡片秒数)理由
< 60秒(短视频)6–8秒观众期望短视频节奏明快
60秒 – 3分钟8–12秒常规社交平台节奏
3 – 10分钟12–20秒给观众留出消化时间;每张卡片承载更多内容
10 – 30分钟20–35秒长讲座/访谈的节奏
> 30分钟30–60秒章节式节奏,接近分段体验
步骤2 — 密度乘数(乘以基础节奏):
字幕中的信号乘数效果
高密度 — 包含大量数字、不同观点、急促节奏、列表式枚举、每1–2句话就是新观点× 0.7切换更快,卡片更多
中等密度 — 数据与叙事混合× 1.0保持基础节奏
低密度 — 单一长篇故事、重复表述、慢节奏反思、单一观点逐步展开× 1.5切换更慢,卡片更少
步骤3 — 计算:
单卡时长 = 基础节奏 × 密度乘数
卡片数量  = max(5, round(视频总时长秒数 / 单卡时长))
示例(注意——无上限限制;长视频自然会生成更多卡片):
  • 30秒短视频,单一笑点(低密度) → 7 × 1.5 = 10.5秒/卡 → round(30/10.5)=3 → 下限为5张卡片
  • 60秒反思独白(低密度) → 10 × 1.5 = 15秒/卡 → 4 → 下限为5张卡片
  • 121秒含丰富数据的单人讲话视频(高密度) → 10 × 0.7 = 7秒/卡 → 17张卡片
  • 5分钟访谈,混合密度 → 16 × 1.0 = 16秒/卡 → 19张卡片
  • 10分钟深度内容,高密度 → 16 × 0.7 = 11秒/卡 → 55张卡片
  • 30分钟讲座,中等密度 → 28 × 1.0 = 28秒/卡 → 64张卡片
  • 1小时播客,低密度 → 45 × 1.5 = 67.5秒/卡 → 53张卡片
当单卡时长超过约15秒时,需设计更丰富的卡片(数据块、多步骤展示、带 staggered 动画的多个子要点)——静态单行文本超过8秒会显得乏味。对于很多卡片时长超过30秒的长内容,考虑将时间轴拆分为子合成文件(每个章节一个.html文件,通过
data-composition-src
挂载),以便每个文件的GSAP时间轴保持可控——可参考
timeline_track_too_dense
HyperFrames lint警告。
content
可以是纯字符串("标题:年化5.69%\n备注:...")或任何能捕获数据的JSON结构。Agent可根据每张卡片决定结构。
可选结尾卡片。本技能无固定品牌结尾卡片。如果用户需要结尾卡片,自行设计一个中性结尾(标志+单行标语,约1.5-2秒,淡入→短暂停留→淡出),将其添加到
cards[]
中,并延长
composition.durationSeconds
至其
endSec
。否则以最后一张内容卡片结束。

7. Decide Render Strategy

7. 确定渲染策略

Confirm Visual Direction with User (DO THIS FIRST)

与用户确认视觉方向(务必先做此步骤)

Before you start designing cards or deciding bounds, ask the user to pick the output ratio, the layout, the style, and the card-density preset. Frames are auto-selected from the chosen layout × style combination (see "Auto-pick frame" table below). Before sending the question, precompute two things:
  1. recommendedRatio
    from the source video's aspect ratio (
    metadata.json
    width / height):
    • sourceAspect = width / height
    • sourceAspect ≥ 1.5
      (≥ ~3:2 wide) → recommend
      16:9
    • sourceAspect ≤ 0.7
      (≤ ~9:13 tall) → recommend
      9:16
    • 0.7 < sourceAspect < 1.5
      (near-square) → recommend
      4:5
    Mark the recommended option's label with " (recommended · matches source video X:Y)" so the user sees why it's recommended.
  2. autoCount
    from Step 6 (
    max(5, round(videoSec / (basePace × densityMultiplier)))
    ) so the "auto" option's label can show the concrete number.
Environment compatibility — pick the best available question channel. Not every runtime exposes the same structured-question tool. Apply this order:
  1. AskUserQuestion
    (Claude Code, Anthropic Console) — use the structured 4-question call below.
  2. Other native clarification tool (e.g.
    ask_question
    ,
    request_user_input
    , IDE-specific prompt) — use that tool with the same 4 question texts and option lists. Preserve the recommendation markers and the precomputed values.
  3. No native tool (Codex CLI, plain text-only runtimes) — ask directly in normal conversation. Use the plain-text template at the end of this section. Keep it to one message, 4 numbered questions (the global cap is 2–5 questions per round; we stay inside it).
Rules that apply to every channel:
  • Ask at most 2–5 questions per round. Our 4 here fits.
  • Even if missing info doesn't block rendering, ask once to confirm the parameters that materially affect the final output (ratio, layout, style, cardCount).
  • If the user has already pre-approved defaults ("just use defaults", "no need to ask", "auto-pick everything") or asked you not to ask — skip the question entirely and use:
    recommendedRatio
    ,
    layout="stack"
    (safest cross-ratio default),
    style
    chosen from transcript tone in the most neutral group (editorial/data),
    autoCount
    . Tell the user what you picked in one sentence and continue.
Channel A — native
AskUserQuestion
:
// Precompute before the call:
//   recommendedRatio = "16:9" | "9:16" | "4:5"
//   autoCount        = integer (from Step 6)

AskUserQuestion({
  questions: [
    {
      question: "Output video aspect ratio (canvas):",
      header: "Aspect ratio",
      multiSelect: false,
      // Reorder so the recommended option appears FIRST (per AskUserQuestion convention).
      // Append " (recommended · matches source video W×H)" to the recommended option's label.
      options: [
        { label: "16:9 (1920×1080) landscape", description: "TV / YouTube / desktop playback. Most natural when the source video is already landscape; widest canvas." },
        { label: "9:16 (1080×1920) portrait", description: "TikTok / Reels / short-form mobile. Most natural for portrait source; native mobile experience." },
        { label: "4:5 (1080×1350) near-portrait", description: "Instagram feed / WeChat Moments. Best when source is near-square or you want to cover both platforms." }
      ]
    },
    {
      question: "Choose the overall layout: how should the video and cards coexist on the canvas?",
      header: "Layout",
      multiSelect: false,
      options: [
        { label: "side-by-side (split)",  description: "Video and card each take half the canvas. Most stable for interview / data side-by-side; clear visual separation." },
        { label: "top-bottom (stack)",    description: "Video on top (~52%), card below. Classic combo of speaker face + summary card; works well in portrait too." },
        { label: "picture-in-picture (pip)", description: "Card fills the canvas, video shrinks to a rounded corner window. Use when content is primary and speaker is secondary." },
        { label: "full-screen overlay (overlay)", description: "Video plays full-bleed, card floats as a glass layer on top. Strong cinematic / emotional feel." }
      ]
    },
    {
      question: "Choose the card visual style (style):",
      header: "Style group",
      multiSelect: false,
      // NOTE: these 3 groups intentionally match the frame auto-pick matrix
      // rows below, so picking a group resolves both `style` group AND the
      // frame matrix column in one step. Memberships are mutually exclusive.
      options: [
        { label: "warm paper (warm-paper)", description: "academic notebook · editorial big-type · whiteboard hand-drawn · xhs social. Best for interview reflections, product launches, lifestyle, emotional stories." },
        { label: "clinical / cold (clinical)",   description: "audit magazine · swiss grid · terminal CLI · minimal modern. Best for financial analysis, investigative reports, technical tutorials, serious presentations." },
        { label: "experimental / avant-garde (experimental)", description: "geom color-clash geometry · spotlight dark-background. Best for short-form highlights, product launches, strong emotion, cinematic feel." }
      ]
    },
    {
      question: "Card count (takeaway pacing): how many cards to cut?",
      header: "Card count",
      multiSelect: false,
      options: [
        { label: "Auto (recommended) · approx N cards", description: "Inferred automatically from video duration and information density (see Step 6 rules). This run estimates approx N cards. Substitute the real N (your autoCount) into the label." },
        { label: "Fewer · approx round(N × 0.6) cards", description: "Sparser cuts, each card holds longer — suits reflective / slow-paced content." },
        { label: "More · approx round(N × 1.5) cards", description: "Tighter cuts, faster rhythm — suits staccato / data-dense / short-form highlight content." }
      ]
    }
  ]
})
About "Other"
AskUserQuestion
automatically adds an "Other" option to the card count question. The user can type a number directly (e.g. "8", "20") as the cardCount target. Parse the input as an integer: if parsing succeeds → use that value (minimum 5 as a floor); if parsing fails → fall back to "auto".
Channel B — plain-text fallback (Codex CLI, runtimes without a native question tool). Post this as one normal message, then wait for the reply. Bullet-style 1/2/3/4 keeps the reply parseable:
I need to confirm four visual decisions with you before I start cutting cards:

1) Output aspect ratio (canvas):
   A. 16:9 landscape (1920×1080) — TV / YouTube / desktop playback
   B. 9:16 portrait (1080×1920) — TikTok / Reels / short-form mobile
   C. 4:5 near-portrait (1080×1350) — Instagram feed / works for both platforms
   ▸ My recommendation:  <recommendedRatio>  (matches source video W×H = <sourceW>×<sourceH>)

2) Overall layout (how video & card coexist):
   A. split   side-by-side (50/50)
   B. stack   top-bottom (video top, card bottom)
   C. pip     picture-in-picture (card full canvas, video rounded corner window)
   D. overlay full-screen glass overlay (video full-bleed, card glass layer)

3) Card style group (maps to frame auto-pick matrix, pick 1 of 3):
   A. warm paper (warm-paper)      (academic / editorial / whiteboard / xhs)
   B. clinical / cold (clinical)   (audit / swiss / terminal / minimal)
   C. experimental (experimental)  (geom / spotlight)

4) Card count (takeaway pacing):
   A. Auto (recommended) — approx <autoCount> cards
   B. Fewer — approx round(<autoCount> × 0.6) cards
   C. More — approx round(<autoCount> × 1.5) cards
   D. Give me a specific number (e.g. "8", "20")

Reply format: "1A 2C 3B 4A" or natural language is fine.
If you want all recommended defaults, reply "default" / "auto" / "use all recommendations".
Parsing the plain-text reply:
  • Accept loose formats:
    "1A 2C 3B 4A"
    ,
    "A C B A"
    ,
    "16:9 / pip / data / auto"
    , full sentences, or
    default
    .
  • If any answer is ambiguous → re-ask only the ambiguous ones (still inside the 2–5 cap).
  • If the user says "default / auto / use all recommendations" → skip without re-asking.
After the user answers (any channel):
  1. Resolve the output canvas from the ratio answer — these are the exact
    storyboard.composition.width / height
    values to write:
    user choicecomposition.width × heightstoryboard.layout field
    16:9
    1920 × 1080
    "landscape"
    9:16
    1080 × 1920
    "portrait"
    4:5
    1080 × 1350
    "portrait"
    (schema treats 4:5 as portrait — height > width)
    For 4:5 bounds inside
    references/layouts/*.html
    — those files only document landscape (1920×1080) and portrait (1080×1920). For 4:5 (1080×1350) derive bounds by proportional scaling from portrait: keep horizontal values, scale vertical values by
    1350/1920 ≈ 0.703
    . Example:
    overlay
    portrait card =
    { x: 24, y: 1280, w: 1032, h: 564 }
    → 4:5 card =
    { x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }
    =
    { x: 24, y: 900, w: 1032, h: 397 }
    .
  2. Map the style group to a specific style by looking at the transcript tone — pick the one that best fits, but stay inside the user's chosen group. If you're unsure between two specific styles inside the group, send a second
    AskUserQuestion
    with those 2–4 specific style options.
  3. Resolve final cardCount from the density answer:
    user choicefinal cardCount
    Auto (recommended)the
    autoCount
    you already computed
    Fewer
    max(5, round(autoCount × 0.6))
    More
    round(autoCount × 1.5)
    (no upper clamp)
    Other = "<n>" (integer)
    max(5, parseInt(n))
    Other = anything elsefall back to
    autoCount
  4. Auto-pick the video frame from this table (frames don't ask the user — they follow from layout × style):
    layoutwarm-paper styles (academic / whiteboard / editorial / xhs)clinical styles (audit / swiss / terminal / minimal)experimental styles (geom / spotlight)
    split
    polaroid
    hairline
    clean
    stack
    polaroid
    hairline
    clean
    pip
    clean
    (pip pill already has chrome)
    clean
    clean
    overlay
    clean
    (full-bleed forbids deco frames)
    clean
    clean
  5. Tell the user what you chose in one sentence — ratio (+ canvas size), layout, specific style, frame, and final cardCount — then proceed with the rest of Step 7 (per-card layouts, motion patterns).
  6. Record the five values (ratio / layout / style / frame / cardCount) in working memory (no schema field needed); you'll reference them while writing each card's HTML in Step 8 and while reading the matching
    references/<dim>/<key>.html
    for tokens and structure.
If the user picks an answer via "Other" with a free-text style name not in the 10-style library, treat it as a hint to design a fresh card visual yourself, but still anchor on the chosen layout's bounds.
在开始设计卡片或确定边界前,请用户选择输出比例、布局、风格和卡片密度预设。画面会根据所选布局×风格组合自动选择(见下文“自动选择画面”表格)。发送问题前,预先计算两件事
  1. recommendedRatio
    (根据源视频的宽高比,即
    metadata.json
    中的width/height):
    • sourceAspect = width / height
    • sourceAspect ≥ 1.5
      (≥ ~3:2宽屏)→ 推荐**
      16:9
      **
    • sourceAspect ≤ 0.7
      (≤ ~9:13竖屏)→ 推荐**
      9:16
      **
    • 0.7 < sourceAspect < 1.5
      (接近正方形)→ 推荐**
      4:5
      **
    在推荐选项的标签后添加“(推荐 · 匹配源视频X:Y)”,让用户了解推荐原因。
  2. autoCount
    (来自步骤6的计算值,即
    max(5, round(视频时长秒数 / (基础节奏 × 密度乘数)))
    ),以便“自动”选项的标签可以显示具体数值。
环境兼容性——选择最佳的提问方式。并非所有运行时都支持相同的结构化提问工具。按以下顺序选择:
  1. AskUserQuestion
    (Claude Code、Anthropic控制台)——使用以下结构化4问题调用。
  2. 其他原生澄清工具(如
    ask_question
    request_user_input
    、IDE特定提示)——使用该工具并保留相同的4个问题文本和选项列表。保留推荐标记和预先计算的值。
  3. 无原生工具(Codex CLI、纯文本运行时)——直接在对话中提问。使用本节末尾的纯文本模板。保持为一条消息,4个编号问题(全局限制为每轮2–5个问题;此处符合要求)。
适用于所有渠道的规则:
  • 每轮最多提问2–5个问题。此处的4个问题符合要求。
  • 即使缺少信息不会阻碍渲染,也要询问一次以确认对最终输出有重大影响的参数(比例、布局、风格、卡片数量)。
  • 如果用户已预先批准默认值(“使用默认值即可”“无需询问”“自动选择所有选项”)或要求不要提问——完全跳过提问,使用:
    recommendedRatio
    layout="stack"
    (最安全的跨比例默认值)、根据字幕语气选择最中性组(编辑/数据)的
    style
    autoCount
    。用一句话告知用户你的选择,然后继续。
渠道A — 原生
AskUserQuestion
// 调用前预先计算:
//   recommendedRatio = "16:9" | "9:16" | "4:5"
//   autoCount        = 整数(来自步骤6)

AskUserQuestion({
  questions: [
    {
      question: "输出视频宽高比(画布):",
      header: "宽高比",
      multiSelect: false,
      // 重新排序,让推荐选项排在第一位(遵循AskUserQuestion惯例)。
      // 在推荐选项的标签后添加“(推荐 · 匹配源视频W×H)”。
      options: [
        { label: "16:9 (1920×1080) 横屏", description: "电视/YouTube/桌面播放。最适合已为横屏的源视频;画布最宽。" },
        { label: "9:16 (1080×1920) 竖屏", description: "TikTok/Reels/短视频移动端。最适合竖屏源视频;原生移动端体验。" },
        { label: "4:5 (1080×1350) 近竖屏", description: "Instagram动态/微信朋友圈。最适合接近正方形的源视频,或需要覆盖多平台的场景。" }
      ]
    },
    {
      question: "选择整体布局:视频和卡片如何在画布上共存?",
      header: "布局",
      multiSelect: false,
      options: [
        { label: "side-by-side (split) 分屏",  description: "视频和卡片各占画布一半。最适合访谈/数据并列展示;视觉分隔清晰。" },
        { label: "top-bottom (stack) 上下堆叠",    description: "视频在上(约52%),卡片在下。经典的主讲人画面+摘要卡片组合;也适用于竖屏。" },
        { label: "picture-in-picture (pip) 画中画", description: "卡片填满画布,视频缩小为圆角窗口。适合内容为主、主讲人为辅的场景。" },
        { label: "full-screen overlay (overlay) 全屏叠加", description: "视频全屏播放,卡片作为玻璃层悬浮在上方。具有强烈的电影感/情感氛围。" }
      ]
    },
    {
      question: "选择卡片视觉风格(style):",
      header: "风格组",
      multiSelect: false,
      // 注意:这3组与下方的画面自动选择矩阵行完全匹配
      // 因此选择一组即可同时确定`style`组和画面矩阵列。各组互斥。
      options: [
        { label: "warm paper (warm-paper) 暖纸风", description: "学术笔记本·大字体编辑风格·手绘白板·小红书社交风。最适合访谈反思、产品发布、生活方式、情感故事。" },
        { label: "clinical / cold (clinical) 冷峻风",   description: "审计杂志·瑞士网格·终端CLI·极简现代风。最适合财务分析、调查报道、技术教程、严肃演示。" },
        { label: "experimental / avant-garde (experimental) 实验风", description: "几何撞色·暗背景聚光灯。最适合短视频高光、产品发布、强烈情感、电影感内容。" }
      ]
    },
    {
      question: "卡片数量(要点节奏):需要制作多少张卡片?",
      header: "卡片数量",
      multiSelect: false,
      options: [
        { label: "Auto (推荐) · 约N张卡片", description: "根据视频时长和信息密度自动推断(见步骤6规则)。本次运行估计约N张卡片。将实际N值(你的autoCount)替换到标签中。" },
        { label: "Fewer · 约round(N × 0.6)张卡片", description: "切换更稀疏,每张卡片停留更长时间——适合反思/慢节奏内容。" },
        { label: "More · 约round(N × 1.5)张卡片", description: "切换更紧凑,节奏更快——适合急促/高密度数据/短视频高光内容。" }
      ]
    }
  ]
})
关于“其他”选项——
AskUserQuestion
会自动在卡片数量问题中添加“其他”选项。用户可以直接输入数字(如“8”“20”)作为卡片数量目标。将输入解析为整数:如果解析成功→使用该值(下限为5);如果解析失败→回退到“自动”选项。
渠道B — 纯文本回退(Codex CLI、无原生提问工具的运行时)。将以下内容作为一条普通消息发送,然后等待回复。使用项目符号1/2/3/4格式,便于解析回复:
开始制作卡片前,我需要与你确认四个视觉决策:

1) 输出宽高比(画布):
   A. 16:9 横屏 (1920×1080) — 电视/YouTube/桌面播放
   B. 9:16 竖屏 (1080×1920) — TikTok/Reels/短视频移动端
   C. 4:5 近竖屏 (1080×1350) — Instagram动态/适配多平台
   ▸ 我的推荐:<recommendedRatio>(匹配源视频W×H = <sourceW>×<sourceH>)

2) 整体布局(视频与卡片如何共存):
   A. split 分屏(50/50)
   B. stack 上下堆叠(视频在上,卡片在下)
   C. pip 画中画(卡片填满画布,视频为圆角窗口)
   D. overlay 全屏玻璃叠加(视频全屏,卡片为玻璃层)

3) 卡片风格组(对应画面自动选择矩阵,3选1):
   A. warm paper (warm-paper) 暖纸风(学术/编辑/白板/小红书)
   B. clinical / cold (clinical) 冷峻风(审计/瑞士风格/终端/极简)
   C. experimental (experimental) 实验风(几何/聚光灯)

4) 卡片数量(要点节奏):
   A. Auto (推荐) — 约<autoCount>张卡片
   B. Fewer — 约round(<autoCount> × 0.6)张卡片
   C. More — 约round(<autoCount> × 1.5)张卡片
   D. 指定具体数量(如“8”“20”)

回复格式:“1A 2C 3B 4A”或自然语言均可。
如果要使用所有推荐默认值,回复“default”/“auto”/“使用所有推荐选项”。
解析纯文本回复:
  • 接受宽松格式:
    "1A 2C 3B 4A"
    "A C B A"
    "16:9 / pip / 数据风 / auto"
    、完整句子或
    default
  • 如果任何答案不明确→仅重新询问不明确的问题(仍保持在2–5个问题的限制内)。
  • 如果用户回复“default / auto / 使用所有推荐选项”→跳过询问。
用户回复后(任意渠道):
  1. 根据宽高比选择解析输出画布——以下是要写入的
    storyboard.composition.width / height
    精确值:
    用户选择合成文件宽×高storyboard.layout字段
    16:9
    1920 × 1080
    "landscape"
    (横屏)
    9:16
    1080 × 1920
    "portrait"
    (竖屏)
    4:5
    1080 × 1350
    "portrait"
    (竖屏,因为高度>宽度)
    对于
    references/layouts/*.html
    中的4:5边界——这些文件仅记录横屏(1920×1080)和竖屏(1080×1920)的边界。对于4:5(1080×1350),需通过竖屏比例缩放推导边界:保持水平值不变,垂直值按
    1350/1920 ≈ 0.703
    缩放。示例:竖屏
    overlay
    卡片 =
    { x: 24, y: 1280, w: 1032, h: 564 }
    → 4:5卡片 =
    { x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }
    =
    { x: 24, y: 900, w: 1032, h: 397 }
  2. 根据字幕语气将风格组映射到具体风格——选择最匹配的风格,但需保持在用户选择的组内。如果对组内的两种具体风格不确定,发送第二次
    AskUserQuestion
    ,列出这2–4种具体风格选项。
  3. 根据密度选择解析最终卡片数量
    用户选择最终卡片数量
    Auto (推荐)你已计算的
    autoCount
    Fewer
    max(5, round(autoCount × 0.6))
    More
    round(autoCount × 1.5)
    (无上限)
    Other = "<n>"(整数)
    max(5, parseInt(n))
    Other = 其他内容回退到
    autoCount
  4. 根据以下表格自动选择视频画面(无需询问用户——由布局×风格决定):
    布局warm-paper风格(学术/白板/编辑/小红书)clinical风格(审计/瑞士风格/终端/极简)experimental风格(几何/聚光灯)
    split
    polaroid
    hairline
    clean
    stack
    polaroid
    hairline
    clean
    pip
    clean
    (画中画已自带边框)
    clean
    clean
    overlay
    clean
    (全屏视频不适合装饰性边框)
    clean
    clean
  5. 用一句话告知用户你的选择——宽高比(+画布尺寸)、布局、具体风格、画面、最终卡片数量——然后继续步骤7的剩余部分(单卡布局、动画模式)。
  6. 将五个值(宽高比/布局/风格/画面/卡片数量)记录到工作内存中(无需写入schema字段);在步骤8编写每张卡片的HTML和步骤9读取匹配的
    references/<dim>/<key>.html
    获取标记和结构时,会引用这些值。
如果用户通过“其他”选项选择了10种风格库之外的自由文本风格名称,将其视为设计全新卡片视觉的提示,但仍需锚定所选布局的边界。

Render Strategy Inputs

渲染策略输入

With ratio / layout / style / cardCount / frame locked from Step 7.0, the remaining per-card decisions are:
  • Source-video fit inside the GSAP target: video element has
    object-fit: cover
    and is clipped to
    #video-wrap
    's tween bounds. If you want NO cropping (e.g. portrait source on landscape canvas shouldn't get its top/bottom chopped), aim the tween at a rect that matches the source's aspect ratio and let surrounding canvas show through (or fill with the card / a backdrop).
  • card.zone
    per card
    : derive from your chosen composition layout (split → side-panel, stack → lower-third, pip → fullscreen, overlay → video-overlay), OR pick a different zone for one-off variants (fullscreen for hero / quote, whiteboard-area for dense data).
  • accentIndex
    per card
    : each card pulls one of the 5 theme accent colors. Vary across cards for rhythm; reuse the same index when two cards belong to the same narrative beat.
  • Motion vocabulary: pick 2–3 repeatable patterns from
    data-anim
    kinds (see the table later) and stick to them so the composition feels coherent.
Pick from these
themeId
palettes (use them as
--accent-N
/
--bg
/
--text
CSS variables in your composition
<style>
block):
themeIdaccent palette (5 colors)board bgtext
classic
#1971c2 #e03131 #2f9e44 #e8590c #9c36b5
#FFF9E3
(paper)
#1e1e1e
noir
#4cc9f0 #f72585 #4ade80 #fb923c #a78bfa
#1a1a1a
#f1f1f1
mint
#0077b6 #d62828 #2d6a4f #e76f51 #7209b7
#e8faf0
#1b4332
craft
#bf5700 #d62728 #6c757d #e9b54a #3d5a80
#f6efe1
#2d2d2d
slate
#0ea5e9 #ef4444 #22c55e #f97316 #a855f7
#1e293b
#f1f5f9
mono
#000 #555 #888 #aaa #ccc
#fff
#000
Available fonts (woff2 in
<SKILL_DIR>/assets/fonts/
, staged to work dir in Step 9):
Caveat
(handwriting),
LXGW WenKai TC
(Chinese hand-script),
Inter
(modern sans),
Virgil
(geometric hand). Reference via
@font-face
or
font-family
directly.
For inspiration on visual patterns,
<SKILL_DIR>/references/styles/
ships 10 self-contained reference cards (academic / editorial / minimal / spotlight / geom / whiteboard / audit / terminal / swiss / xhs) that you can copy as starting points — but do not feel constrained to match any of these. Each card is your own design.
在步骤7.0确定宽高比/布局/风格/卡片数量/画面后,剩余的单卡决策包括:
  • GSAP目标中的源视频适配:视频元素设置
    object-fit: cover
    并被裁剪到
    #video-wrap
    的动画边界。如果不想裁剪(例如竖屏源视频在横屏画布上不希望上下被裁切),将动画目标设置为与源视频宽高比匹配的矩形,让画布周围区域显示(或填充卡片/背景)。
  • 每张卡片的
    card.zone
    :从所选合成文件布局推导(split→side-panel,stack→lower-third,pip→fullscreen,overlay→video-overlay),或为一次性变体选择不同区域(fullscreen用于重点/引用语,whiteboard-area用于密集数据)。
  • 每张卡片的
    accentIndex
    :每张卡片使用5种主题强调色中的一种。在卡片间切换以形成节奏;当两张卡片属于同一叙事节拍时,可重复使用相同索引。
  • 动画词汇:从
    data-anim
    类型中选择2–3种可重复的模式(见下文表格)并坚持使用,让合成文件感觉连贯。
从以下
themeId
调色板中选择(在合成文件
<style>
块中作为
--accent-N
/
--bg
/
--text
CSS变量使用):
themeId强调色板(5种颜色)背景色文本色
classic
#1971c2 #e03131 #2f9e44 #e8590c #9c36b5
#FFF9E3
(纸张色)
#1e1e1e
noir
#4cc9f0 #f72585 #4ade80 #fb923c #a78bfa
#1a1a1a
#f1f1f1
mint
#0077b6 #d62828 #2d6a4f #e76f51 #7209b7
#e8faf0
#1b4332
craft
#bf5700 #d62728 #6c757d #e9b54a #3d5a80
#f6efe1
#2d2d2d
slate
#0ea5e9 #ef4444 #22c55e #f97316 #a855f7
#1e293b
#f1f5f9
mono
#000 #555 #888 #aaa #ccc
#fff
#000
可用字体(存放在
<SKILL_DIR>/assets/fonts/
的woff2文件,步骤9中部署到工作目录):
Caveat
(手写体)、
LXGW WenKai TC
(中文手写体)、
Inter
(现代无衬线体)、
Virgil
(几何手写体)。通过
@font-face
或直接使用
font-family
引用。
如需视觉模式灵感,
<SKILL_DIR>/references/styles/
包含10个独立的参考卡片(学术/编辑/极简/聚光灯/几何/白板/审计/终端/瑞士风格/小红书),可作为起点复制——但不必局限于这些样式。每张卡片都可以是你自己的设计。

Visual Design Library (<SKILL_DIR>/references/)

视觉设计库(<SKILL_DIR>/references/)

Beyond the composition-level
themeId
, the skill ships a richer reference library at
<SKILL_DIR>/references/
covering three orthogonal visual dimensions you can freely mix:
Style  ×  Layout  ×  VideoFrame
 (10)      (4)         (3)
dimensionkeyswhat it decides
style
academic
editorial
minimal
spotlight
geom
whiteboard
audit
terminal
swiss
xhs
the card's visual language — fonts, colors, ornament, layout-within-card
layout
split
stack
pip
overlay
how the source video and the card share the canvas
frame
clean
hairline
polaroid
the decorative chrome around the video element
Read
<SKILL_DIR>/references/DESIGN_INDEX.md
for the full matrix and a loose decision guide (interview / product launch / data analysis / social clip / technical tutorial / emotional story …). When you decide to use a specific style / layout / frame, Read the corresponding file:
  • references/styles/<key>.html
    — self-contained card fragment with that style's CSS tokens (colors, fonts, padding, ornament) and a placeholder takeaway. Copy the
    .card[data-card-id="ref-<key>"]
    style block, rename the data-card-id to your card's id, swap the placeholder content for the real takeaway, and you're done.
  • references/layouts/<key>.html
    — exact
    videoBounds
    +
    cardBounds
    for both landscape and portrait, with a copy-paste JSON snippet for
    storyboard.json
    's per-card
    layout
    field.
  • references/frames/<key>.html
    — decorative HTML to add as a sibling of
    #video-wrap
    , plus placement instructions for the composition CSS.
Pick
style × layout × frame
per card — you can change all three between cards as long as the transitions read smoothly. A common rhythm: open
editorial × overlay × clean
, switch to
audit × split × hairline
for the data card, close on
whiteboard × pip × polaroid
.
The 10 styles are skill-side design tokens, not composition-level themes — they don't need to be declared in
storyboard.composition
; they live inside each card's HTML. The
themeId
field can still pick a composition-level palette (table above) that controls page-body background and video border chrome.
除了合成文件级别的
themeId
,本技能还在
<SKILL_DIR>/references/
提供了更丰富的参考库,涵盖三个正交的视觉维度,可自由组合:
风格  ×  布局  ×  视频画面
 (10)      (4)         (3)
维度取值决定内容
风格
academic
editorial
minimal
spotlight
geom
whiteboard
audit
terminal
swiss
xhs
卡片的视觉语言——字体、颜色、装饰、卡片内布局
布局
split
stack
pip
overlay
源视频和卡片如何共享画布
画面
clean
hairline
polaroid
视频元素周围的装饰性边框
阅读
<SKILL_DIR>/references/DESIGN_INDEX.md
获取完整矩阵和宽松决策指南(访谈/产品发布/数据分析/社交视频/技术教程/情感故事…)。当决定使用特定风格/布局/画面时,阅读对应的文件:
  • references/styles/<key>.html
    ——包含该风格CSS标记(颜色、字体、内边距、装饰)和占位要点的独立卡片片段。复制
    .card[data-card-id="ref-<key>"]
    样式块,将data-card-id重命名为你的卡片ID,将占位内容替换为实际要点,即可完成。
  • references/layouts/<key>.html
    ——横屏和竖屏的精确
    videoBounds
    +
    cardBounds
    ,以及可复制粘贴到
    storyboard.json
    单卡
    layout
    字段的JSON片段。
  • references/frames/<key>.html
    ——添加为
    #video-wrap
    同级元素的装饰性HTML,以及合成文件CSS中的放置说明。
为每张卡片选择
风格 × 布局 × 画面
——只要过渡效果流畅,卡片间可以更改这三个维度。常见节奏:以
editorial × overlay × clean
开场,切换到
audit × split × hairline
展示数据卡片,以
whiteboard × pip × polaroid
结束。
这10种风格是技能侧的设计标记,不是合成文件级别的主题——无需在
storyboard.composition
中声明;它们存在于每张卡片的HTML中。
themeId
字段仍可选择合成文件级别的调色板(见上文表格),控制页面主体背景和视频边框。

Layout Compositions (Card + Video)

布局合成(卡片+视频)

Two coordinated decisions per card define how it shares the canvas with the source video:
  • card.zone
    (declared in
    storyboard.json
    ) — one of the 5 schema values; resolve it into pixel bounds (per the table in Step 6) when you write the card-host wrapper's inline
    style
    in Step 9.
  • #video-wrap
    bounds at this card's time window
    (declared imperatively in the composition's GSAP timeline) — the agent tweens
    #video-wrap
    to a target rect for each layout transition.
Schema does NOT store per-card video bounds.
videoTrack.bounds
is one-time at composition level (defaults to full canvas). Video "moving" between cards is purely a GSAP animation authored in
index.html
. There is no
card.layout
field — earlier versions of this doc invented one; the real schema only has
card.zone
.
4 composition layouts (from
references/layouts/
) — each is a recipe pairing a
zone
with a
#video-wrap
tween target:
composition layoutrecommended
card.zone
GSAP target for
#video-wrap
(landscape 1920×1080)
GSAP target for
#video-wrap
(portrait 1080×1920)
when to use
split
side-panel
{ left: 960, top: 0, width: 960, height: 1080 }
{ left: 0, top: 960, width: 1080, height: 960 }
(bottom half)
speaker + data side-by-side / 50:50 weight
stack
lower-third
{ left: 14, top: 14, width: 1892, height: 548 }
(top 52%)
{ left: 0, top: 0, width: 1080, height: 844 }
(top 44%)
speaker on top + summary card below
pip
fullscreen
{ left: 1480, top: 760, width: 400, height: 300 }
+ add
.framed
class
{ left: 690, top: 28, width: 360, height: 203 }
+ add
.framed
content-heavy card + corner pip
overlay
video-overlay
{ left: 0, top: 0, width: 1920, height: 1080 }
(full-bleed)
{ left: 0, top: 0, width: 1080, height: 1920 }
cinematic / dramatic / glass card on full video
For 4:5 (1080×1350), scale portrait y/h values by
1350/1920 ≈ 0.703
(see Step 7.0 Channel A / Channel B
recommendedRatio
resolution table).
Other zone values for one-off variants (still uses
card.zone
; no fake "layout" field):
zone
resolved boundscommon use
fullscreen
covers whole canvashero card, video tweens to hidden/pip
whiteboard-area
inset 40px margin (landscape) or bottom 45% (portrait)dense data card, free margins
lower-third
bottom 30% bandtalking-head annotation
side-panel
right 42% (landscape) or bottom 40% (portrait)sidebar / "split" recipe
video-overlay
full canvas; expect transparent card rootglass overlay on full-bleed video
You can mix recipes per card — choose
card.zone
based on what suits the moment, then write the GSAP tween for
#video-wrap
between cards.
每张卡片的两个协同决策定义了它与源视频共享画布的方式:
  • card.zone
    (在
    storyboard.json
    中声明)——5种schema取值之一;在步骤9编写卡片容器的内联
    style
    时,需将其解析为像素边界(见步骤6的表格)。
  • 该卡片时间窗口内的
    #video-wrap
    边界
    (在合成文件的GSAP时间轴中声明)——Agent为每个布局过渡动画
    #video-wrap
    到目标矩形。
Schema不存储单卡视频边界。
videoTrack.bounds
合成文件级别的一次性设置(默认填满整个画布)。视频在卡片间“移动”纯粹是通过在
index.html
中编写GSAP动画实现的。没有
card.layout
字段——早期文档版本曾提出该字段;实际schema只有
card.zone
4种合成文件布局(来自
references/layouts/
)——每种布局都是
zone
#video-wrap
动画目标的组合:
合成文件布局推荐
card.zone
#video-wrap
的GSAP目标(横屏1920×1080)
#video-wrap
的GSAP目标(竖屏1080×1920)
使用场景
split
side-panel
{ left: 960, top: 0, width: 960, height: 1080 }
{ left: 0, top: 960, width: 1080, height: 960 }
(下半部分)
主讲人+数据并列展示/权重50:50
stack
lower-third
{ left: 14, top: 14, width: 1892, height: 548 }
(上半部分52%)
{ left: 0, top: 0, width: 1080, height: 844 }
(上半部分44%)
主讲人在上+摘要卡片在下
pip
fullscreen
{ left: 1480, top: 760, width: 400, height: 300 }
+ 添加
.framed
{ left: 690, top: 28, width: 360, height: 203 }
+ 添加
.framed
内容密集的卡片+角落画中画
overlay
video-overlay
{ left: 0, top: 0, width: 1920, height: 1080 }
(全屏)
{ left: 0, top: 0, width: 1080, height: 1920 }
电影感/戏剧性/玻璃卡片覆盖全屏视频
对于4:5(1080×1350),将竖屏的y/h值按
1350/1920 ≈ 0.703
缩放(见步骤7.0渠道A/渠道B的
recommendedRatio
解析表格)。
一次性变体的其他zone取值(仍使用
card.zone
;无虚构的“layout”字段):
zone
解析后的边界常见使用场景
fullscreen
覆盖整个画布重点卡片,视频动画为隐藏/画中画
whiteboard-area
内边距40px(横屏)或底部45%(竖屏)密集数据卡片,自由边距
lower-third
底部30%区域单人讲话视频的注释
side-panel
右侧42%(横屏)或底部40%(竖屏)侧边栏/“分屏”布局
video-overlay
整个画布;卡片根元素需透明全屏视频上的玻璃叠加层
可以为每张卡片混合使用不同布局——根据当前场景选择
card.zone
,然后编写卡片间
#video-wrap
的GSAP动画。

Storyboard Render Contract

故事板渲染约定

storyboard.json
is an agent-internal planning artifact — no CLI command parses it. It exists to keep your timing and content decisions explicit before you write each card's HTML. Stick to the v3-style shape below so the same outline drives the composition you assemble in Step 9.
Required structure (see Step 6 for the full example):
  • schemaVersion: 3
  • composition: { fps, width, height, durationSeconds, layout, themeId, seed }
    — note
    durationSeconds
    /
    fps
    /
    themeId
    /
    layout
    live inside
    composition
    , NOT at top level
  • videoTrack: { sourcePath, startSec, endSec, bounds? }
    — video bounds default to full canvas
  • subtitles: { enabled, ... }
  • cards[]
    — each card has the 6 required fields:
    id
    ,
    intent
    ,
    startSec
    ,
    endSec
    ,
    accentIndex
    ,
    zone
    ,
    contentHints
Rules:
  • Card times stay inside
    composition.durationSeconds
    and should not overlap unless intentional (use
    data-track-index
    to control z-order when they do).
  • Visual details live in card HTML fragments (Step 8), NOT in
    contentHints
    .
    contentHints
    is your own structured prompt for designing the card; the rendered look is the HTML.
  • Keep the storyboard shape stable — even though nothing parses it, you read it back while authoring Step 8/9, and consistency keeps card IDs and timing in sync.
  • Agent-side decisions like "I picked overlay × geom × clean" do NOT belong in
    storyboard.json
    — keep them in working memory and use them when authoring card HTML + GSAP tweens.
Transparent card backgrounds for cards that share canvas with video. When the GSAP tween leaves video visible behind/beside the card (overlay recipe, pip recipe, or any
card.zone = 'lower-third' | 'video-overlay'
moment), the card's
.root
MUST NOT paint a full opaque background — otherwise it occludes the video. Two patterns:
css
/* Pattern A: transparent root, page body provides the cream backdrop */
html,
body {
  background: var(--bg);
}
.card[data-card-id="card-X"] .root {
  background: transparent;
}

/* Pattern B: explicit per-card background ONLY for fullscreen cards */
.card[data-card-id="card-hero"] .root {
  background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
  background: transparent;
}
For
side-panel
-zone cards (split recipe), the card-host is already only half the canvas, so an opaque card bg is fine — it only covers its half.
storyboard.json
是Agent内部的规划文件——没有CLI命令会解析它。它的作用是让你在编写每张卡片的HTML前,明确时间和内容决策。保持以下v3版本的结构,以便相同的大纲可以驱动步骤9中组装的合成文件。
必需结构(见步骤6的完整示例):
  • schemaVersion: 3
  • composition: { fps, width, height, durationSeconds, layout, themeId, seed }
    ——注意
    durationSeconds
    /
    fps
    /
    themeId
    /
    layout
    位于**
    composition
    内部**,而非顶层
  • videoTrack: { sourcePath, startSec, endSec, bounds? }
    ——视频边界默认填满整个画布
  • subtitles: { enabled, ... }
  • cards[]
    ——每张卡片包含6个必需字段:
    id
    ,
    intent
    ,
    startSec
    ,
    endSec
    ,
    accentIndex
    ,
    zone
    ,
    contentHints
规则:
  • 卡片时间需在
    composition.durationSeconds
    范围内,除非有意重叠(重叠时使用
    data-track-index
    控制层级)。
  • 视觉细节存放在卡片HTML片段中(步骤8),而非
    contentHints
    contentHints
    是你自己设计卡片的结构化提示;最终呈现效果由HTML决定。
  • 保持故事板结构稳定——即使没有工具解析它,你在编写步骤8/9时也会回头查看,一致性有助于保持卡片ID和时间同步。
  • Agent侧的决策(如“我选择了overlay×geom×clean”)不属于
    storyboard.json
    的内容——将其记录在工作内存中,并在编写卡片HTML+GSAP动画时使用。
与视频共享画布的卡片需设置透明背景。当GSAP动画让视频在卡片后方/旁边可见时(overlay布局、pip布局,或任何
card.zone = 'lower-third' | 'video-overlay'
的场景),卡片的
.root
不得设置完全不透明的背景——否则会遮挡视频。两种模式:
css
/* 模式A:透明根元素,页面主体提供米色背景 */
html,
body {
  background: var(--bg);
}
.card[data-card-id="card-X"] .root {
  background: transparent;
}

/* 模式B:仅全屏卡片设置明确的单卡背景 */
.card[data-card-id="card-hero"] .root {
  background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
  background: transparent;
}
对于
side-panel
区域的卡片(分屏布局),卡片容器仅占画布的一半,因此不透明卡片背景是可行的——仅覆盖其所在的一半区域。

8. Write Each Card's HTML

8. 编写每张卡片的HTML

Create
$WORK_DIR/public/cards/{card-id}.html
for each card. Each file contains a single rooted HTML fragment that follows this contract:
为每张卡片创建
$WORK_DIR/public/cards/{card-id}.html
。每个文件包含一个符合以下约定的根HTML片段:

Card HTML Contract

卡片HTML约定

html
<div class="card" data-card-id="{cardId}">
  <style>
    /* MUST: every rule starts with .card[data-card-id="{cardId}"] */
    .card[data-card-id="card-01"] .root {
      width: 100%; height: 100%;
      display: flex; ...;
      font-family: 'Caveat', 'LXGW WenKai TC', serif;
      color: var(--text);
      background: var(--bg);
    }
    .card[data-card-id="card-01"] .title { font-size: 84px; ... }
  </style>

  <div class="root">
    <h1
      id="card-01-title"
      data-anim="kinetic-chars"
      data-anim-at="0.3"
      data-anim-duration="0.5"
      data-anim-stagger="0.04"
      data-anim-pattern="pop"
    >
      <span class="char">S</span>
      <span class="char">u</span>
    </h1>
    <div
      id="card-01-line"
      data-anim="grow-x"
      data-anim-at="0.65"
      data-anim-duration="0.5"
      data-anim-target-w="420"
      style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
    ></div>
  </div>
</div>
Hard rules (
hyperframes
lint will reject violations):
  • Single root
    <div class="card" data-card-id="{cardId}">
  • Inline
    <style>
    rules MUST be prefixed with the scope selector above
  • No
    <script>
    tags
  • No external URLs in
    src=
    /
    href=
    (no CDN, no remote fonts)
  • No inline event handlers (
    onclick=
    etc.)
  • All assets via relative paths into the same
    public/
    directory
  • Colors via
    var(--accent-N)
    etc. for portability across themes
Animations are declared, not coded. Use
data-anim-*
attributes only; never write
<script>
to animate. You compile every
data-anim-*
declaration into the single master GSAP timeline in Step 9.
html
<div class="card" data-card-id="{cardId}">
  <style>
    /* 必须:每个规则都以.card[data-card-id="{cardId}"]开头 */
    .card[data-card-id="card-01"] .root {
      width: 100%; height: 100%;
      display: flex; ...;
      font-family: 'Caveat', 'LXGW WenKai TC', serif;
      color: var(--text);
      background: var(--bg);
    }
    .card[data-card-id="card-01"] .title { font-size: 84px; ... }
  </style>

  <div class="root">
    <h1
      id="card-01-title"
      data-anim="kinetic-chars"
      data-anim-at="0.3"
      data-anim-duration="0.5"
      data-anim-stagger="0.04"
      data-anim-pattern="pop"
    >
      <span class="char">S</span>
      <span class="char">u</span>
    </h1>
    <div
      id="card-01-line"
      data-anim="grow-x"
      data-anim-at="0.65"
      data-anim-duration="0.5"
      data-anim-target-w="420"
      style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
    ></div>
  </div>
</div>
硬性规则
hyperframes
lint会拒绝违规内容):
  • 单个根元素
    <div class="card" data-card-id="{cardId}">
  • 内联
    <style>
    规则必须以上述范围选择器开头
  • 禁止
    <script>
    标签
  • 禁止在
    src=
    /
    href=
    中使用外部URL
    (无CDN,无远程字体)
  • 禁止内联事件处理程序(如
    onclick=
  • 所有资源使用相对于同一
    public/
    目录的路径
  • 使用
    var(--accent-N)
    等变量设置颜色,以便跨主题移植
动画通过声明实现,而非编码。仅使用
data-anim-*
属性;切勿编写
<script>
来实现动画。你会在步骤9中将每个
data-anim-*
声明编译到单个主GSAP时间轴中。

Card Sizing — Mobile-First in Portrait

卡片尺寸——竖屏优先

The 10
references/styles/*.html
are sized for a 1920×1080 landscape preview. When
storyboard.layout = "portrait"
(1080×1920, the dominant case for social / mobile), scale every visual size up — phones hold the screen close, and the same pixel count reads smaller than on a landscape TV-style canvas.
tokenlandscape baselineportrait targetscale
title (h1/h2 hero)64–96px88–132px×1.35
detail / body24–30px30–40px×1.30
kicker / chip label14–16px18–22px×1.30
timecode / meta12–14px16–18px×1.30
data block primary number48–60px64–88px×1.40
line-height multiplier1.05–1.5same(don't scale)
Rule of thumb:
portraitPx = round(landscapePx × 1.3)
, then floor to a nearby 4px multiple for visual rhythm. Hero headlines may go up to ×1.4; small meta text stays at ×1.2 to avoid crowding.
Padding shrinks slightly in portrait — the card is narrower so big landscape padding (40–64px) eats too much width. Use 24–36px horizontal padding in portrait.
If you're producing a single card that must work in both layouts, prefer a
@container
query on the card root over hard-coding sizes:
css
.card[data-card-id="X"] .root {
  container-type: inline-size;
}
.card[data-card-id="X"] .title {
  font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
  font-size: clamp(24px, 3.2cqi, 40px);
}
But for most cards, a single layout choice is fine — just pick the size table column that matches the storyboard's
layout
field.
10个
references/styles/*.html
是为1920×1080横屏预览设计的。当
storyboard.layout = "portrait"
(1080×1920,社交/移动端的主要场景)时,放大所有视觉尺寸——手机屏幕观看距离近,相同像素数在竖屏上比横屏电视画布上显得更小。
标记横屏基准值竖屏目标值缩放比例
标题(h1/h2重点内容)64–96px88–132px×1.35
详情/正文24–30px30–40px×1.30
副标题/标签14–16px18–22px×1.30
时间码/元数据12–14px16–18px×1.30
数据块主数字48–60px64–88px×1.40
行高乘数1.05–1.5相同(不缩放)
经验法则
竖屏像素值 = round(横屏像素值 × 1.3)
,然后向下取整为接近的4px倍数,以保持视觉节奏。重点标题可放大至×1.4;小元文本保持×1.2,避免拥挤。
竖屏中的内边距略有缩小——卡片更窄,横屏中的大内边距(40–64px)会占用过多宽度。竖屏中使用24–36px的水平内边距。
如果要制作必须同时适配两种布局的单张卡片,优先在卡片根元素上使用
@container
查询,而非硬编码尺寸:
css
.card[data-card-id="X"] .root {
  container-type: inline-size;
}
.card[data-card-id="X"] .title {
  font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
  font-size: clamp(24px, 3.2cqi, 40px);
}
但对于大多数卡片,选择单一布局即可——只需选择与故事板
layout
字段匹配的尺寸表格列。

Available
data-anim
Kinds

可用的
data-anim
类型

kinduse forkey params
fade-in
enter
at
,
duration
,
ease?
fade-out
exit
at
,
duration
,
ease?
slide-in
slide enter
at
,
duration
,
from=left|right|top|bottom
,
distance
kinetic-chars
per-char pop
at
,
duration
,
stagger
,
pattern=pop|fade
— element needs
<span class="char">
children
typewriter
per-char fadesame as kinetic-chars but slower default stagger
count-up
animate number
at
,
duration
,
from
,
to
,
format=.0f|.1f|.2f|,d
draw-path
SVG path reveal
at
,
duration
— element should be a
<path>
grow-y
bar height
at
,
duration
,
target-h
(px) — element starts
height:0
grow-x
bar width
at
,
duration
,
target-w
(px) — element starts
width:0
scale-pop
pop entrance
at
,
duration
blur-in
unfocused → focused
at
,
duration
mask-reveal
clip reveal
at
,
duration
,
direction=left|right|top|bottom
morph-to
tween any CSS
at
,
duration
,
props='{...JSON...}'
data-anim-at
is seconds relative to the card's startSec — when you compile each declaration into the GSAP timeline in Step 9, add the card's
startSec
to get the absolute time and quantize to 1/fps.
类型使用场景关键参数
fade-in
入场
at
,
duration
,
ease?
fade-out
退场
at
,
duration
,
ease?
slide-in
滑动入场
at
,
duration
,
from=left|right|top|bottom
,
distance
kinetic-chars
逐字符弹出
at
,
duration
,
stagger
,
pattern=pop|fade
— 元素需包含
<span class="char">
子元素
typewriter
逐字符淡入与kinetic-chars参数相同,但默认stagger更慢
count-up
数字动画
at
,
duration
,
from
,
to
,
format=.0f|.1f|.2f|,d
draw-path
SVG路径绘制
at
,
duration
— 元素应为
<path>
grow-y
柱状图高度
at
,
duration
,
target-h
(px) — 元素初始
height:0
grow-x
柱状图宽度
at
,
duration
,
target-w
(px) — 元素初始
width:0
scale-pop
缩放入场
at
,
duration
blur-in
失焦→聚焦
at
,
duration
mask-reveal
遮罩显示
at
,
duration
,
direction=left|right|top|bottom
morph-to
任意CSS动画
at
,
duration
,
props='{...JSON...}'
data-anim-at
相对于卡片startSec的秒数——在步骤9中将每个声明编译到GSAP时间轴时,需加上卡片的
startSec
以获取绝对时间,并量化为1/fps。

9. Assemble the Composition HTML

9. 组装合成文件HTML

Stage the assets and write
$WORK_DIR/public/index.html
:
bash
undefined
部署资源并编写
$WORK_DIR/public/index.html
bash
undefined

SKILL_DIR is injected by the host ("Base directory for this skill: …")

SKILL_DIR由宿主注入("本技能的基础目录:…")

SKILL_DIR="<SKILL_DIR>"
mkdir -p "$WORK_DIR/public/fonts" "$WORK_DIR/public/vendor" "$WORK_DIR/public/cards" cp -n "$SKILL_DIR/assets/fonts/"* "$WORK_DIR/public/fonts/" cp -n "$SKILL_DIR/assets/vendor/gsap.min.js" "$WORK_DIR/public/vendor/"
SKILL_DIR="<SKILL_DIR>"
mkdir -p "$WORK_DIR/public/fonts" "$WORK_DIR/public/vendor" "$WORK_DIR/public/cards" cp -n "$SKILL_DIR/assets/fonts/"* "$WORK_DIR/public/fonts/" cp -n "$SKILL_DIR/assets/vendor/gsap.min.js" "$WORK_DIR/public/vendor/"

stage the input video — RE-ENCODE with dense keyframes. Sources with a sparse GOP

部署输入视频——重新编码为密集关键帧。关键帧间隔>~1s的源视频在渲染器中会出现 seek 冻结(叠加层下的画面冻结);-g/-keyint_min设置为合成文件帧率,让每一帧都可seek。

(keyframe interval > ~1s) freeze on seek in the renderer (a frozen frame under the

(设置为你的帧率——示例为30;可使用24/25/60匹配源视频。)

overlays); -g / -keyint_min set to your composition fps make every frame seekable.

(Set both to your fps — 30 shown; use 24/25/60 to match.)

ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"
undefined
ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"
undefined

Composition Template

合成文件模板

html
<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <style>
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "LXGW WenKai TC";
        src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "Virgil";
        src: url("fonts/Virgil.woff2") format("woff2");
        font-display: block;
      }

      :root {
        /* Pick from the themeId palette table in Step 7 — example: classic */
        --bg: #fff9e3;
        --text: #1e1e1e;
        --accent-0: #1971c2;
        --accent-1: #e03131;
        --accent-2: #2f9e44;
        --accent-3: #e8590c;
        --accent-4: #9c36b5;
        --font-family: "Caveat", "LXGW WenKai TC", serif;
      }
      * {
        box-sizing: border-box;
      }
      /* Body font-family MUST list concrete font names (not just var(--font-family)) —
   the HyperFrames renderer's static analyzer doesn't expand CSS variables when
   resolving fonts, so a var-only chain triggers `font_family_without_font_face`
   lint and falls back to a generic. Use the concrete chain here; cards that
   want the theme font can still reference var(--font-family) internally. */
      html,
      body {
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100%;
        overflow: hidden;
        background: #000;
        font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
      }
      #stage {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }

      /* video-wrapper holds the source video. Its position / size are animated
   over time by the master timeline (one tween per layout transition). */
      .video-wrapper {
        position: absolute;
        left: 0;
        top: 0;
        width: 1920px;
        height: 1080px;
        overflow: hidden;
        border-radius: 0;
        box-shadow: none;
      }
      .video-wrapper video {
        width: 100%;
        height: 100%;
        object-fit: cover;
      }

      .card-host {
        position: absolute;
        pointer-events: none;
        overflow: hidden;
      }
      .card-host .card {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }
      .card-host .char {
        display: inline-block;
        visibility: visible;
      }

      /* Subtle drop shadow + rounded corners for non-fullscreen video framings */
      .video-wrapper.framed {
        border-radius: 16px;
        box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
      }
    </style>
  </head>
  <body>
    <div
      id="stage"
      data-composition-id="graphic-overlays"
      data-start="0"
      data-duration="121.2"
      data-fps="30"
      data-width="1920"
      data-height="1080"
    >
      <!-- Layer 1: source video — initial position matches card-01's layout -->
      <div class="video-wrapper" id="video-wrap">
        <video
          id="bg-video"
          src="input-video.mp4"
          muted
          playsinline
          data-start="0"
          data-duration="121.2"
          data-track-index="1"
        ></video>
      </div>

      <!-- Layer 2: each card-host sits at the bounds dictated by its layout. -->
      <!-- IMPORTANT: every card-host MUST carry BOTH "card-host" and "clip" classes. -->
      <!--   - "card-host"  → our positioning + pointer-events styles                 -->
      <!--   - "clip"       → HyperFrames runtime uses this to enforce visibility     -->
      <!--                    only during data-start … data-start+data-duration.      -->
      <!--                    Without "clip" the host stays visible the whole video   -->
      <!--                    (lint: timed_element_missing_clip_class).               -->
      <!-- Example: card-01 with zone="fullscreen" → card-host covers (0,0,1920,1080) -->
      <div
        class="card-host clip"
        data-card-id="card-01"
        data-start="1.0000"
        data-duration="6.5000"
        data-track-index="2"
        style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- paste the contents of public/cards/card-01.html here -->
      </div>

      <!-- Example: card-02 with zone="side-panel" (split composition layout) → card on left half -->
      <div
        class="card-host clip"
        data-card-id="card-02"
        data-start="8.0000"
        data-duration="12.0000"
        data-track-index="2"
        style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- card-02 HTML -->
      </div>

      <!-- ...one "card-host clip" per card with inline bounds matching resolveZoneBounds(card.zone)... -->

      <script src="vendor/gsap.min.js"></script>
      <script>
        (function () {
          // count-up formatter helper
          window.__fmt = function (v, fmt) {
            if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
              return Number(v).toFixed(Number(fmt.slice(1, -1)));
            }
            if (fmt === ",d") return Math.round(v).toLocaleString();
            return String(Math.round(v));
          };

          const tl = window.gsap.timeline({ paused: true });

          // ── Card lifecycle (one block per card) ──
          // Example for card-01 [1.0, 7.5] with kinetic-chars at +0.3, grow-x at +0.65:

          // Enter (fade in over 0.4s)
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
          tl.fromTo(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            1.0,
          );

          // Card-internal anims (compile each data-anim-* declaration here)
          tl.from(
            '.card[data-card-id="card-01"] #card-01-title .char',
            { opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
            1.3,
          );
          tl.fromTo(
            '.card[data-card-id="card-01"] #card-01-line',
            { width: 0 },
            { width: 420, duration: 0.5, ease: "power2.out" },
            1.65,
          );

          // Exit (fade out over 0.35s, ending at endSec)
          tl.to(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0, duration: 0.35, ease: "power2.in" },
            7.15,
          );
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);

          // ── Video framing transitions ──
          // When the next card uses a different composition layout, animate the
          // video-wrapper to its new bounds. Example: card-01 = fullscreen
          // (video hidden behind), card-02 = split composition (zone="side-panel"
          // → video on right, card on left).

          // Card-02 enters at 8.0s with the split composition. Animate video to
          // the right half during the card-01 → card-02 gap (between 7.5 and 8.0s).
          tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
          tl.to(
            "#video-wrap",
            { left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
            7.5,
          );

          // Card-02 enter — same pattern as card-01
          tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
          tl.fromTo(
            '.card-host[data-card-id="card-02"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            8.0,
          );
          // ...card-02 internal anims...

          // ── repeat for each card; if the NEXT card's layout differs,
          //    insert another tl.to('#video-wrap', ...) tween before its enter ──

          window.__timelines = window.__timelines || {};
          window.__timelines["graphic-overlays"] = tl;
        })();
      </script>
    </div>
  </body>
</html>
html
<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <style>
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Caveat";
        src: url("fonts/Caveat-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "LXGW WenKai TC";
        src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-400-latin.woff2") format("woff2");
        font-weight: 400;
        font-display: block;
      }
      @font-face {
        font-family: "Inter";
        src: url("fonts/Inter-700-latin.woff2") format("woff2");
        font-weight: 700;
        font-display: block;
      }
      @font-face {
        font-family: "Virgil";
        src: url("fonts/Virgil.woff2") format("woff2");
        font-display: block;
      }

      :root {
        /* 从步骤7的themeId调色板表格中选择——示例:classic */
        --bg: #fff9e3;
        --text: #1e1e1e;
        --accent-0: #1971c2;
        --accent-1: #e03131;
        --accent-2: #2f9e44;
        --accent-3: #e8590c;
        --accent-4: #9c36b5;
        --font-family: "Caveat", "LXGW WenKai TC", serif;
      }
      * {
        box-sizing: border-box;
      }
      /* 主体font-family必须列出具体字体名称(不能仅使用var(--font-family))——
   HyperFrames渲染器的静态分析器在解析字体时不会展开CSS变量,因此仅使用变量会触发`font_family_without_font_face`
   lint并回退到通用字体。在此处使用具体字体链;需要主题字体的卡片仍可在内部引用var(--font-family)。 */
      html,
      body {
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100%;
        overflow: hidden;
        background: #000;
        font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
      }
      #stage {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }

      /* video-wrapper承载源视频。其位置/尺寸由主时间轴动画(每个布局过渡一个动画)控制。 */
      .video-wrapper {
        position: absolute;
        left: 0;
        top: 0;
        width: 1920px;
        height: 1080px;
        overflow: hidden;
        border-radius: 0;
        box-shadow: none;
      }
      .video-wrapper video {
        width: 100%;
        height: 100%;
        object-fit: cover;
      }

      .card-host {
        position: absolute;
        pointer-events: none;
        overflow: hidden;
      }
      .card-host .card {
        position: relative;
        width: 100%;
        height: 100%;
        overflow: hidden;
      }
      .card-host .char {
        display: inline-block;
        visibility: visible;
      }

      /* 非全屏视频画面的细微阴影+圆角 */
      .video-wrapper.framed {
        border-radius: 16px;
        box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
      }
    </style>
  </head>
  <body>
    <div
      id="stage"
      data-composition-id="graphic-overlays"
      data-start="0"
      data-duration="121.2"
      data-fps="30"
      data-width="1920"
      data-height="1080"
    >
      <!-- 第一层:源视频——初始位置匹配card-01的布局 -->
      <div class="video-wrapper" id="video-wrap">
        <video
          id="bg-video"
          src="input-video.mp4"
          muted
          playsinline
          data-start="0"
          data-duration="121.2"
          data-track-index="1"
        ></video>
      </div>

      <!-- 第二层:每个card-host位于其布局指定的边界。 -->
      <!-- 重要:每个card-host必须同时包含"card-host"和"clip"类。 -->
      <!--   - "card-host"  → 我们的定位+指针事件样式                 -->
      <!--   - "clip"       → HyperFrames运行时使用此类来控制可见性     -->
      <!--                    仅在data-start … data-start+data-duration期间可见。      -->
      <!--                    没有"clip"类的话,宿主会在整个视频期间保持可见   -->
      <!--                    (lint错误:timed_element_missing_clip_class)。               -->
      <!-- 示例:zone="fullscreen"的card-01 → card-host覆盖(0,0,1920,1080) -->
      <div
        class="card-host clip"
        data-card-id="card-01"
        data-start="1.0000"
        data-duration="6.5000"
        data-track-index="2"
        style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- 粘贴public/cards/card-01.html的内容到此处 -->
      </div>

      <!-- 示例:zone="side-panel"的card-02(split合成文件布局)→ 卡片在左半部分 -->
      <div
        class="card-host clip"
        data-card-id="card-02"
        data-start="8.0000"
        data-duration="12.0000"
        data-track-index="2"
        style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
      >
        <!-- card-02的HTML -->
      </div>

      <!-- ...每张卡片对应一个"card-host clip",内联边界匹配resolveZoneBounds(card.zone)... -->

      <script src="vendor/gsap.min.js"></script>
      <script>
        (function () {
          // count-up格式化工具
          window.__fmt = function (v, fmt) {
            if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
              return Number(v).toFixed(Number(fmt.slice(1, -1)));
            }
            if (fmt === ",d") return Math.round(v).toLocaleString();
            return String(Math.round(v));
          };

          const tl = window.gsap.timeline({ paused: true });

          // ── 卡片生命周期(每张卡片一个代码块) ──
          // 示例:card-01 [1.0, 7.5],在+0.3处有kinetic-chars动画,+0.65处有grow-x动画:

          // 入场(0.4秒淡入)
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
          tl.fromTo(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            1.0,
          );

          // 卡片内部动画(在此编译每个data-anim-*声明)
          tl.from(
            '.card[data-card-id="card-01"] #card-01-title .char',
            { opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
            1.3,
          );
          tl.fromTo(
            '.card[data-card-id="card-01"] #card-01-line',
            { width: 0 },
            { width: 420, duration: 0.5, ease: "power2.out" },
            1.65,
          );

          // 退场(0.35秒淡出,在endSec结束)
          tl.to(
            '.card-host[data-card-id="card-01"]',
            { opacity: 0, duration: 0.35, ease: "power2.in" },
            7.15,
          );
          tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);

          // ── 视频画面过渡 ──
          // 当下一张卡片使用不同的合成文件布局时,将video-wrapper动画到新边界。示例:card-01=全屏
          // (视频在后方隐藏),card-02=split合成文件布局(zone="side-panel"
          // → 视频在右侧,卡片在左侧)。

          // card-02在8.0秒以split合成文件布局入场。在card-01→card-02的间隙(7.5到8.0秒之间)将视频动画到右半部分。
          tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
          tl.to(
            "#video-wrap",
            { left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
            7.5,
          );

          // card-02入场——与card-01模式相同
          tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
          tl.fromTo(
            '.card-host[data-card-id="card-02"]',
            { opacity: 0 },
            { opacity: 1, duration: 0.4, ease: "power2.out" },
            8.0,
          );
          // ...card-02内部动画...

          // ── 为每张卡片重复上述步骤;如果下一张卡片布局不同,
          //    在其入场前插入另一个tl.to('#video-wrap', ...)动画 ──

          window.__timelines = window.__timelines || {};
          window.__timelines["graphic-overlays"] = tl;
        })();
      </script>
    </div>
  </body>
</html>

GSAP Statement Cheat Sheet

GSAP语句速查表

Compile each
data-anim
attribute into a GSAP statement. Times are absolute seconds = card.startSec + data-anim-at, quantized to 1/fps. Selector is
.card[data-card-id="X"] #elementId
.
data-animGSAP statement template
fade-in
tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);
fade-out
tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);
slide-in
(from=left, dist=80)
tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);
kinetic-chars
(pop)
tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);
count-up
(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();
draw-path
(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();
grow-x
(target-w=W)
tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);
grow-y
(target-h=H)
tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);
scale-pop
tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);
mask-reveal
(direction=left)
tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);
Quantize:
T = Math.round(absSec * fps) / fps
. At 30fps the smallest step is
1/30 ≈ 0.0333s
; rounding to 4 decimals (
.toFixed(4)
) is fine inside the JS literal.
将每个
data-anim
属性编译为GSAP语句。时间为绝对秒数= card.startSec + data-anim-at,量化为1/fps。选择器为
.card[data-card-id="X"] #elementId
data-animGSAP语句模板
fade-in
tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);
fade-out
tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);
slide-in
(from=left, dist=80)
tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);
kinetic-chars
(pop)
tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);
count-up
(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();
draw-path
(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();
grow-x
(target-w=W)
tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);
grow-y
(target-h=H)
tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);
scale-pop
tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);
mask-reveal
(direction=left)
tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);
量化:
T = Math.round(absSec * fps) / fps
。在30fps下最小步长为
1/30 ≈ 0.0333s
;在JS字面量中四舍五入到4位小数(
.toFixed(4)
)即可。

Video Framing Reference (per
layout
value)

视频画面参考(按
layout
取值)

The selector for the video container is
#video-wrap
. Animate its bounds between cards using
tl.to('#video-wrap', { ...bounds }, T)
. Initial bounds should be set inline on the element to match card-01's layout. Pick a transition duration of 0.5–0.7s with
ease: 'power2.inOut'
.
Decorative frames (
clean
/
hairline
/
polaroid
) sit as a sibling of
#video-wrap
and follow it through layout transitions. See
references/frames/
for each frame's placement HTML, suggested CSS, and which layouts it pairs with. Quick rule:
overlay
layout suppresses decorative frames (the full-bleed video clashes with chrome); PiP layouts already have their own pill treatment (border-radius + white ring + shadow), so add a decorative frame only on top of
split
/
stack
.
GSAP target lookup table for
#video-wrap
per composition layout (landscape 1920×1080 — for portrait & 4:5 see
references/layouts/*.html
which list all three ratios):
composition layouttypical card.zone
#video-wrap
GSAP target
extra css class
split
side-panel
{ left: 960, top: 0, width: 960, height: 1080 }
stack
lower-third
{ left: 14, top: 14, width: 1892, height: 548 }
(top 52%)
pip
(bottom-right)
fullscreen
{ left: 1480, top: 760, width: 400, height: 300 }
pip-pill
(border-radius + ring + shadow)
pip
(top-left)
fullscreen
{ left: 40, top: 40, width: 400, height: 300 }
pip-pill
overlay
(video full-bleed)
video-overlay
{ left: 0, top: 0, width: 1920, height: 1080 }
(no change from default)
hide video (pure-graphic moment)
fullscreen
{ opacity: 0 }
(or move off-canvas)
To toggle the pip-pill chrome (border-radius + white ring + drop shadow) when entering or leaving a pip moment:
js
// Enter pip — add chrome
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
  "#video-wrap",
  { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
  T,
);

// Leave pip — back to clean full-bleed
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
  "#video-wrap",
  { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
  T_NEXT,
);
Card-host bounds match the zone. Resolve the card's
zone
into pixel bounds using the table at the top of Step 6, then write those into the card-host's inline
style="left:Xpx;top:Ypx;width:Wpx; height:Hpx;..."
. For
video-overlay
zone (overlay recipe), the card-host fills the full canvas — your CSS inside
.card .root
decides where the actual visible card sits.
视频容器的选择器为
#video-wrap
。使用
tl.to('#video-wrap', { ...bounds }, T)
在卡片间动画其边界。初始边界应在元素内联设置,以匹配card-01的布局。过渡时长选择0.5–0.7s,使用
ease: 'power2.inOut'
装饰性画面
clean
/
hairline
/
polaroid
)作为
#video-wrap
同级元素存在,并跟随其进行布局过渡。查看
references/frames/
获取每个画面的放置HTML、建议CSS以及适配的布局。快速规则:
overlay
布局禁用装饰性画面(全屏视频与边框冲突);PiP布局已自带药丸状样式(圆角+白边+阴影),因此仅在
split
/
stack
布局上添加装饰性画面。
#video-wrap
的GSAP目标查找表
(按合成文件布局,横屏1920×1080——竖屏&4:5请查看
references/layouts/*.html
,其中列出了三种宽高比):
合成文件布局典型card.zone
#video-wrap
的GSAP目标
额外CSS类
split
side-panel
{ left: 960, top: 0, width: 960, height: 1080 }
stack
lower-third
{ left: 14, top: 14, width: 1892, height: 548 }
(上半部分52%)
pip
(右下角)
fullscreen
{ left: 1480, top: 760, width: 400, height: 300 }
pip-pill
(圆角+白边+阴影)
pip
(左上角)
fullscreen
{ left: 40, top: 40, width: 400, height: 300 }
pip-pill
overlay
(视频全屏)
video-overlay
{ left: 0, top: 0, width: 1920, height: 1080 }
(与默认无变化)
隐藏视频(纯图形时刻)
fullscreen
{ opacity: 0 }
(或移出画布)
进入或退出PiP模式时,切换pip-pill样式(圆角+白边+阴影):
js
// 进入PiP——添加样式
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
  "#video-wrap",
  { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
  T,
);

// 退出PiP——回到纯净全屏
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
  "#video-wrap",
  { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
  T_NEXT,
);
card-host边界匹配zone。使用步骤6顶部的表格将卡片的
zone
解析为像素边界,然后将其写入card-host的内联
style="left:Xpx;top:Ypx;width:Wpx;height:Hpx;..."
。对于
video-overlay
zone(overlay布局),card-host填满整个画布——
.card .root
内部的CSS决定实际可见卡片的位置。

HyperFrames Layout / Animation QA Rules

HyperFrames布局/动画QA规则

  • Build each card's static hero frame first: the moment where the card is fully visible and readable.
  • Confirm video, cards, subtitles/captions, and diagrams do not unintentionally overlap.
  • Confirm hidden video areas are clipped by the frame and not visible outside intended bounds.
  • Register one paused master timeline as
    window.__timelines["graphic-overlays"]
    .
  • Build timelines synchronously at page load; no
    async
    ,
    setTimeout
    , Promises, or media
    play()
    calls.
  • Do not use
    Math.random()
    or
    Date.now()
    in render paths.
  • Do not use
    repeat: -1
    ; calculate finite repeats from the video duration.
  • Prefer GSAP transforms and opacity (
    x
    ,
    y
    ,
    scale
    ,
    rotation
    ,
    opacity
    ) over layout properties (
    top
    ,
    left
    ,
    width
    ,
    height
    ) for motion.
  • Animate wrappers such as
    #video-wrap
    , not the video element dimensions directly.
  • Avoid animating the same property on the same element from multiple timelines at the same time.
  • Use
    data-track-index
    , not
    data-layer
    ; use
    data-duration
    , not
    data-end
    .
  • Every timed element (
    card-host
    , sub-composition, etc.) MUST include
    class="clip"
    alongside its own classes — e.g.
    class="card-host clip"
    . The HyperFrames runtime uses
    .clip
    to gate visibility to the
    data-start … data-start+data-duration
    window. Without it the element is visible for the whole video (lint:
    timed_element_missing_clip_class
    ).
  • For body / global
    font-family
    , list concrete font names (
    'Inter', 'Caveat', …
    ) — not a CSS variable like
    var(--font-family)
    . The HyperFrames font resolver doesn't expand CSS vars during static analysis (lint:
    font_family_without_font_face
    ). Cards may still use
    var(--font-family)
    internally since their
    @font-face
    declarations are loaded.
  • 先构建每张卡片的静态重点画面:卡片完全可见且可读的时刻。
  • 确认视频、卡片、字幕/标题、图表不会意外重叠。
  • 确认隐藏的视频区域被画面裁剪,不会在预期边界外可见。
  • 注册一个暂停的主时间轴为
    window.__timelines["graphic-overlays"]
  • 在页面加载时同步构建时间轴;禁止使用
    async
    setTimeout
    、Promises或媒体
    play()
    调用。
  • 在渲染路径中禁止使用
    Math.random()
    Date.now()
  • 禁止使用
    repeat: -1
    ;根据视频时长计算有限重复次数。
  • 动画优先使用GSAP变换和透明度(
    x
    y
    scale
    rotation
    opacity
    ),而非布局属性(
    top
    left
    width
    height
    )。
  • 动画容器(如
    #video-wrap
    ),而非直接动画视频元素尺寸。
  • 避免在同一时间从多个时间轴动画同一元素的同一属性。
  • 使用
    data-track-index
    ,而非
    data-layer
    ;使用
    data-duration
    ,而非
    data-end
  • 每个定时元素(
    card-host
    、子合成文件等)必须在自身类之外包含
    class="clip"
    ——例如
    class="card-host clip"
    。HyperFrames运行时使用
    .clip
    控制元素仅在
    data-start … data-start+data-duration
    窗口内可见。没有该类的话,元素会在整个视频期间可见(lint错误:
    timed_element_missing_clip_class
    )。
  • 对于主体/全局
    font-family
    ,列出具体字体名称
    'Inter', 'Caveat', …
    )——而非CSS变量如
    var(--font-family)
    。HyperFrames字体解析器在静态分析时不会展开CSS变量(lint错误:
    font_family_without_font_face
    )。卡片仍可在内部使用
    var(--font-family)
    ,因为它们的
    @font-face
    声明已加载。

10. Render to MP4

10. 渲染为MP4

bash
cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
  -o output.mp4 \
  --fps 30
hyperframes render <dir>
reads
<dir>/index.html
and produces the MP4. The flag
PRODUCER_BROWSER_GPU_MODE=hardware
(or
--browser-gpu
) is strongly recommended on macOS — software-only Chrome rendering times out on most laptops.
For a sanity check before the full render, capture a single frame at a specific timestamp:
bash
npx hyperframes snapshot public --at 5    # → public/snapshots/frame-00-at-5s.png (a single --at ignores --out)
bash
cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
  -o output.mp4 \
  --fps 30
hyperframes render <dir>
读取
<dir>/index.html
并生成MP4。在macOS上强烈建议使用
PRODUCER_BROWSER_GPU_MODE=hardware
(或
--browser-gpu
)标志——纯软件Chrome渲染在大多数笔记本电脑上会超时。
在完整渲染前进行 sanity 检查,捕获特定时间戳的单帧画面:
bash
npx hyperframes snapshot public --at 5    # → public/snapshots/frame-00-at-5s.png(单个--at忽略--out)

11. Report Results

11. 报告结果

Tell the user:
  • Work directory path
  • storyboard.json
    (the card outline you designed)
  • public/cards/*.html
    (one HTML per card)
  • public/index.html
    (the assembled composition)
  • output.mp4
    (the final video)
  • ASR provider used
  • Card count + how you chose them (in 1 sentence)
  • Any missing keys or quality caveats
Optional live preview (on request only). The clip plays unchanged inside
public/index.html
with the overlays on top, so it previews faithfully. Don't open it during the run. When the user asks, start a long-lived server after render and report the URL:
bash
(cd "$WORK_DIR/public" && npx hyperframes preview)   # or `npx hyperframes play` for a shareable link
Do not delete the work directory unless the user asks.
告知用户:
  • 工作目录路径
  • storyboard.json
    (你设计的卡片大纲)
  • public/cards/*.html
    (每张卡片对应一个HTML文件)
  • public/index.html
    (组装好的合成文件)
  • output.mp4
    (最终视频)
  • 使用的ASR提供商
  • 卡片数量+选择理由(一句话)
  • 任何缺失的密钥或质量说明
可选实时预览(仅在用户请求时提供)。源视频在
public/index.html
中完整播放,叠加层在上方,因此预览效果准确。运行期间不要打开。当用户请求时,在渲染完成后启动长期服务器并报告URL:
bash
(cd "$WORK_DIR/public" && npx hyperframes preview)   # 或`npx hyperframes play`获取可分享链接
除非用户要求,否则不要删除工作目录。