graphic-overlays
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGraphic Overlays
图形叠加层
Graphic Overlays takes a local video that plays in full and layers a sequence of
timed, designed graphic cards onto it — titles, lower-thirds, data callouts,
quotes, side panels, picture-in-picture — synced to what's being said. The agent
designs the cards (timing + content) and writes each card's HTML directly in the
conversation, then assembles a single composition HTML and renders it to MP4 via
. There is no fixed archetype list and no prescribed card structure —
the overlays emerge from what the transcript actually says.
hyperframesConfirm the route before you build. This skill packages an existing talking-head clip with designed graphic cards (titles, lower-thirds, data callouts, quotes, side panels, PiP). If the user wants plain captions / subtitles (the spoken words as text) →; a single short unnarrated element (one logo sting / lower-third) →/embedded-captions. The clip plays untouched — re-timing, recoloring, reframing, reordering, or audio is NLE editing and out of scope. Building from a URL / topic / PR → the creation workflows. Unsure overlays-vs-captions? Read/motion-graphicsfirst./hyperframes-read-first
Graphic-packaging sibling of. Captions add the spoken words as a readable subtitle; this adds designed graphics on top of the playing video. Plain subtitles →embedded-captions. Build a video from scratch → the creation workflows (embedded-captions/product-launch-video/ …).faceless-explainer
Inspectable intermediate files in the work directory:
- — duration / width / height / fps
metadata.json - — extracted audio
audio.mp3 - — a flat word array
transcript.json(Whisper; no[{ text, start, end }, …], nosegmentswrapper)words - — lightweight card outline (the agent's plan)
storyboard.json - — one HTML fragment per card
public/cards/card-XX.html - — final assembled composition
public/index.html - — rendered video
output.mp4
图形叠加层功能会在完整播放的本地视频上,叠加一系列定时设计的图形卡片——包括标题、下三分之一字幕条、数据标注框、引用语、侧边栏、画中画——并与视频中的语音内容同步。Agent会设计卡片(时间规划+内容)并直接在对话中编写每张卡片的HTML,然后将所有内容组装成单个合成HTML文件,再通过渲染为MP4。这里没有固定的卡片模板和预设结构,叠加层的内容完全由视频字幕的实际内容生成。
hyperframes构建前请确认使用场景。此技能为现有单人讲话视频片段添加设计好的图形卡片(标题、下三分之一字幕条、数据标注框、引用语、侧边栏、画中画)。如果用户需要纯字幕/对白字幕(将语音内容转为文本)→ 使用;如果需要单个简短无旁白元素(如单个logo动画/下三分之一字幕条)→ 使用/embedded-captions。源视频会完整播放——调整时长、调色、重构图、重新排序或音频处理属于非线性编辑(NLE)范畴,不在本技能范围内。从URL/主题/公关素材创建视频→使用创作工作流。不确定是叠加层还是字幕?请先阅读/motion-graphics。/hyperframes-read-first
的图形包装姊妹技能。字幕是将语音内容转为可读的文本;而本技能是在播放的视频上添加设计好的图形。纯字幕→使用embedded-captions。从零开始制作视频→使用创作工作流(embedded-captions/product-launch-video/…)。faceless-explainer
工作目录中的可检查中间文件:
- — 视频时长/宽度/高度/帧率
metadata.json - — 提取的音频文件
audio.mp3 - — 扁平化的单词数组
transcript.json(由Whisper生成;无[{ text, start, end }, …],无segments嵌套)words - — 轻量化的卡片大纲(Agent的规划文件)
storyboard.json - — 每张卡片对应的HTML片段
public/cards/card-XX.html - — 最终组装的合成文件
public/index.html - — 渲染完成的视频
output.mp4
CLI Resolution
CLI 说明
bash
undefinedbash
undefinedhyperframes — transcription (local Whisper) + rendering the assembled HTML to MP4
hyperframes — 转录(本地Whisper)+ 将组装好的HTML渲染为MP4
npx hyperframes --help
This skill runs entirely on the **hyperframes** CLI plus system `ffmpeg` / `ffprobe`.
Transcription is local **Whisper** via `hyperframes transcribe` — no third-party
service, API key, or rate-limited proxy.npx hyperframes --help
本技能完全基于**hyperframes** CLI和系统`ffmpeg`/`ffprobe`运行。转录通过`hyperframes transcribe`使用本地**Whisper**完成——无需第三方服务、API密钥或受限代理。Workflow
工作流
1. Check Environment
1. 检查环境
bash
npx hyperframes doctor # ffmpeg, headless browser, render depsbash
npx hyperframes doctor # 检查ffmpeg、无头浏览器、渲染依赖confirm bundled assets:
确认捆绑资源:
ls "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"
Required:
- `ffmpeg` / `ffprobe` (system)
- `<SKILL_DIR>/assets/fonts/*.woff2`, `<SKILL_DIR>/assets/vendor/gsap.min.js` (bundled inside this skill, staged to work dir in Step 9)
Transcription needs no key — `hyperframes transcribe` runs Whisper locally (Step 4).
Strongly recommended on macOS for `hyperframes render`:
```bash
export PRODUCER_BROWSER_GPU_MODE=hardwarels "<SKILL_DIR>/assets/fonts" "<SKILL_DIR>/assets/vendor/gsap.min.js"
必需依赖:
- `ffmpeg`/`ffprobe`(系统级)
- `<SKILL_DIR>/assets/fonts/*.woff2`、`<SKILL_DIR>/assets/vendor/gsap.min.js`(已捆绑在本技能中,将在步骤9中部署到工作目录)
转录无需密钥——`hyperframes transcribe`通过本地Whisper运行(步骤4)。
在macOS上运行`hyperframes render`时强烈建议设置:
```bash
export PRODUCER_BROWSER_GPU_MODE=hardware2. Create a Work Directory
2. 创建工作目录
All artifacts live under — the same convention as the other
video workflows ( / / ). Keep
the cwd at the workspace root; everything below writes under this one subdirectory.
videos/<project-name>/product-launch-videofaceless-explainerpr-to-videobash
VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"所有产物都存放在下——与其他视频工作流(//)遵循相同约定。保持当前工作目录为工作区根目录;所有后续操作都会写入该子目录。
videos/<project-name>/product-launch-videofaceless-explainerpr-to-videobash
VIDEO_PATH="/absolute/path/input.mp4"
WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')"
mkdir -p "$WORK_DIR"3. Extract Audio and Metadata
3. 提取音频和元数据
bash
undefinedbash
undefinedmetadata — duration / width / height / fps
元数据 — 时长/宽度/高度/帧率
ffprobe -v error -select_streams v:0
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"
ffprobe -v error -select_streams v:0
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"
-show_entries stream=width,height,r_frame_rate
-show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json"
audio
音频
ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"
Outputs: `metadata.json` (read `width`/`height`/`duration`; fps = the `r_frame_rate`
fraction evaluated, e.g. `30000/1001 → 29.97`) + `audio.mp3`.ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3"
输出文件:`metadata.json`(包含`width`/`height`/`duration`;帧率为`r_frame_rate`的分数计算值,例如`30000/1001 → 29.97`)+ `audio.mp3`。4. Transcribe
4. 转录
bash
npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.enLocal Whisper — no API key, no proxy, no rate limit. Writes a word-level
into the work dir (word + / timestamps).
Read it for the word / sentence timings that drive card timing in Step 6; group
words into sentences yourself at punctuation / pauses if you need segment-level
chunks.
transcript.jsontextstartendClamp to media duration. Whisper can return the final word's a hair past the
actual clip length — clamp every card and to the
duration, or the render will show a black tail past the video.
endendSeccomposition.durationSecondsmetadata.jsonbash
npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en本地Whisper——无需API密钥、代理或速率限制。将单词级别的写入工作目录(包含单词+/时间戳)。阅读该文件获取单词/句子时间戳,用于步骤6中的卡片时间规划;如果需要片段级别的分块,可以根据标点/停顿自行将单词分组为句子。
transcript.jsontextstartend限制在媒体时长内。Whisper返回的最后一个单词的时间可能略超过实际视频长度——需将每张卡片的和限制在的时长内,否则渲染时视频末尾会出现黑屏。
endendSeccomposition.durationSecondsmetadata.json5. Correct Transcript
5. 修正字幕
transcript.json[{ "text": "...", "start": s, "end": s }, …]segmentswordstext- Homophones, product names, technical terms, punctuation
- Edit a word's in place; preserve its
text/starttimestampsend - There is no pre-grouped array — group words into sentences yourself (split at terminal punctuation / pauses) when you need segment-level chunks for card timing
segments
transcript.json[{ "text": "...", "start": s, "end": s }, …]segmentswordstext- 同音词、产品名称、技术术语、标点符号
- 直接修改单词的;保留其
text/start时间戳end - 没有预分组的数组——需要时自行将单词分组为句子(根据句末标点/停顿拆分),用于卡片时间规划的片段级分块
segments
6. Draft a Lightweight Storyboard (in chat)
6. 起草轻量化故事板(在对话中)
No CLI involved. Read + and design
cards directly. is an agent-internal planning artifact
— no CLI command consumes it; it exists so you can think clearly
about timing and content before writing each card's HTML. Keep the
shape consistent with the example below so the same outline can drive
the composition you author in Step 9:
transcript.jsonmetadata.jsonstoryboard.jsonjson
{
"schemaVersion": 3,
"composition": {
"fps": 30,
"width": 1080,
"height": 1920,
"durationSeconds": 121.2,
"layout": "portrait",
"themeId": "noir",
"seed": 42
},
"videoTrack": {
"sourcePath": "input-video.mp4",
"startSec": 0,
"endSec": 121.2,
"bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
},
"subtitles": { "enabled": false },
"cards": [
{
"id": "card-01",
"intent": "Hook with the speaker's anxious midnight question",
"startSec": 0.5,
"endSec": 13.0,
"accentIndex": 0,
"zone": "fullscreen",
"contentHints": {
"kicker": "AN HONEST QUESTION",
"title": "The soul-searching question at 11 PM",
"detail": "Client's 60-second voice message: 'If the RMB appreciates, does that mean my USD policy is a terrible loss?'"
}
}
]
}Required Card fields:
| field | type | purpose |
|---|---|---|
| string | stable id used in card HTML & GSAP selectors |
| string | natural-language description; fed to card synthesis |
| number | times in seconds (endSec > startSec) |
| 0 | 1 | 2 | 3 | 4 | which of the 5 theme accent colors this card pulls |
| enum (see below) | where on the canvas the card lives |
| object | free-form bag; agent puts kicker/title/detail/data/quote here |
| string | free-form label you may attach to remember a card's pattern; absent = free-form, which is the default |
| enum: | declarative card-to-card transition |
Five values:
zone| zone | resolved bounds | when to use |
|---|---|---|
| covers whole canvas | hero moments, big numbers, mantras |
| inset 40px margin (or 45% of portrait height) | dense data / annotated content |
| bottom 30% band | annotation over visible video |
| right 42% (landscape) or bottom 40% (portrait) | data side, video other side |
| full canvas, expects mostly-transparent card | annotation overlays on full-bleed video |
When you assemble the composition in Step 9, resolve each card's
into pixel bounds on the card-host wrapper following the table above.
Video bounds are set once at composition level ();
to make video appear to "move between cards", author GSAP tweens against
in the composition's (see Step 9).
zonevideoTrack.bounds#video-wrap<script>No prescribed card roles, no prescribed narrative arc. Cards emerge
from what the video actually says — could be all quotes or all data,
could open with a number or with a story. Let the transcript drive the
rhythm.
How many takeaways? — auto-infer from duration + density. No fixed
upper limit. Pick a base pace from the video duration, then adjust
by information density. Only floor is fixed: minimum 5 cards so
even short videos have rhythm.
Step 1 — base pace by duration (the natural sec/card for medium density):
| video duration | base pace (sec per card) | rationale |
|---|---|---|
| < 60s (short reel) | 6–8s | viewers expect fast cuts in short-form |
| 60s – 3 min | 8–12s | normal social pace |
| 3 – 10 min | 12–20s | give breathing room; each card carries more |
| 10 – 30 min | 20–35s | long-form lecture / interview rhythm |
| > 30 min | 30–60s | episodic, near-chapter feel |
Step 2 — density multiplier (multiplies the base pace):
| signal in the transcript | multiplier | effect |
|---|---|---|
| High density — many numbers, distinct claims, staccato pacing, list-like enumeration, every 1–2 sentences is a new idea | × 0.7 | cuts faster, more cards |
| Medium density — mixed flow with both data and narrative | × 1.0 | base pace |
| Low density — one extended story, repeated reframing, slow reflective pacing, single argument unfolding | × 1.5 | cuts slower, fewer cards |
Step 3 — compute:
secPerCard = basePace × densityMultiplier
cardCount = max(5, round(videoDurationSec / secPerCard))Examples (notice — no upper clamp; long videos naturally produce more cards):
- 30s reel, single punchline (low density) → 7 × 1.5 = 10.5s/card → round(30/10.5)=3 → floor to 5 cards
- 60s reflective monologue (low density) → 10 × 1.5 = 15s/card → 4 → floor to 5 cards
- 121s talking-head with rich data (high density) → 10 × 0.7 = 7s/card → 17 cards
- 5 min interview, mixed density → 16 × 1.0 = 16s/card → 19 cards
- 10 min deep-dive, high density → 16 × 0.7 = 11s/card → 55 cards
- 30 min lecture, medium density → 28 × 1.0 = 28s/card → 64 cards
- 1 hr podcast, low density → 45 × 1.5 = 67.5s/card → 53 cards
When a card holds longer than ~15s, plan for a richer card (data block,
multi-step reveal, several sub-points unfolding with staggered
animations) — a static one-liner gets boring past 8s. For long pieces
where many cards exceed 30s, consider chunking the timeline into
sub-compositions (one .html per chapter, mounted with
) so the GSAP timeline per file stays manageable
— see the HyperFrames lint warning.
data-composition-srctimeline_track_too_densecontentOptional outro. This skill ships no fixed brand outro. If the user wants a closing card, design a neutral one yourself (wordmark + one-line tagline, ~1.5-2s, fade in -> short hold -> fade out), append it to , and extend to its . Otherwise end on the last content card.
cards[]composition.durationSecondsendSec无需CLI操作。阅读+并直接设计卡片。是Agent内部的规划文件——没有CLI命令会解析它;它的作用是让你在编写每张卡片的HTML前,清晰规划时间和内容。保持与以下示例一致的结构,以便相同的大纲可以驱动步骤9中的合成文件创作:
transcript.jsonmetadata.jsonstoryboard.jsonjson
{
"schemaVersion": 3,
"composition": {
"fps": 30,
"width": 1080,
"height": 1920,
"durationSeconds": 121.2,
"layout": "portrait",
"themeId": "noir",
"seed": 42
},
"videoTrack": {
"sourcePath": "input-video.mp4",
"startSec": 0,
"endSec": 121.2,
"bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 }
},
"subtitles": { "enabled": false },
"cards": [
{
"id": "card-01",
"intent": "用主讲人深夜的焦虑问题吸引观众",
"startSec": 0.5,
"endSec": 13.0,
"accentIndex": 0,
"zone": "fullscreen",
"contentHints": {
"kicker": "一个坦诚的问题",
"title": "深夜11点的灵魂拷问",
"detail": "客户的60秒语音消息:‘如果人民币升值,是不是意味着我的美元策略亏大了?’"
}
}
]
}必需的卡片字段:
| 字段 | 类型 | 用途 |
|---|---|---|
| 字符串 | 用于卡片HTML和GSAP选择器的稳定ID |
| 字符串 | 自然语言描述;用于卡片内容生成 |
| 数字 | 时间(秒,endSec > startSec) |
| 0 | 1 | 2 | 3 | 4 | 卡片使用的主题强调色索引(共5种) |
| 枚举(见下文) | 卡片在画布上的位置 |
| 对象 | 自由格式的内容提示;Agent可在此添加标题/副标题/详情/数据/引用语 |
| 字符串 | 用于标记卡片模式的自由格式标签;未设置则为自由格式(默认) |
| 枚举: | 卡片间的过渡效果声明 |
5种取值:
zone| zone | 解析后的边界 | 使用场景 |
|---|---|---|
| 覆盖整个画布 | 重点时刻、大数字、核心观点 |
| 内边距40px(或竖屏高度的45%) | 密集数据/带注释的内容 |
| 底部30%区域 | 视频上的注释内容 |
| 右侧42%(横屏)或底部40%(竖屏) | 数据侧边栏,视频在另一侧 |
| 整个画布,卡片需设置为半透明 | 全屏视频上的注释叠加层 |
在步骤9组装合成文件时,需根据上述表格将每张卡片的解析为卡片容器的像素边界。视频边界在合成文件级别设置一次();若要实现视频在卡片间“移动”的效果,需在合成文件的中针对编写GSAP动画(见步骤9)。
zonevideoTrack.bounds<script>#video-wrap无预设卡片角色和叙事结构。卡片内容完全由视频实际内容生成——可以全是引用语或全是数据,也可以以数字或故事开头。让字幕内容主导节奏。
需要多少个要点?——根据时长和信息密度自动推断。没有固定上限。先根据视频时长选择基础节奏,再根据信息密度调整。唯一固定下限:至少5张卡片,确保即使是短视频也有节奏。
步骤1 — 按时长确定基础节奏(中等密度下的自然单卡时长):
| 视频时长范围 | 基础节奏(每张卡片秒数) | 理由 |
|---|---|---|
| < 60秒(短视频) | 6–8秒 | 观众期望短视频节奏明快 |
| 60秒 – 3分钟 | 8–12秒 | 常规社交平台节奏 |
| 3 – 10分钟 | 12–20秒 | 给观众留出消化时间;每张卡片承载更多内容 |
| 10 – 30分钟 | 20–35秒 | 长讲座/访谈的节奏 |
| > 30分钟 | 30–60秒 | 章节式节奏,接近分段体验 |
步骤2 — 密度乘数(乘以基础节奏):
| 字幕中的信号 | 乘数 | 效果 |
|---|---|---|
| 高密度 — 包含大量数字、不同观点、急促节奏、列表式枚举、每1–2句话就是新观点 | × 0.7 | 切换更快,卡片更多 |
| 中等密度 — 数据与叙事混合 | × 1.0 | 保持基础节奏 |
| 低密度 — 单一长篇故事、重复表述、慢节奏反思、单一观点逐步展开 | × 1.5 | 切换更慢,卡片更少 |
步骤3 — 计算:
单卡时长 = 基础节奏 × 密度乘数
卡片数量 = max(5, round(视频总时长秒数 / 单卡时长))示例(注意——无上限限制;长视频自然会生成更多卡片):
- 30秒短视频,单一笑点(低密度) → 7 × 1.5 = 10.5秒/卡 → round(30/10.5)=3 → 下限为5张卡片
- 60秒反思独白(低密度) → 10 × 1.5 = 15秒/卡 → 4 → 下限为5张卡片
- 121秒含丰富数据的单人讲话视频(高密度) → 10 × 0.7 = 7秒/卡 → 17张卡片
- 5分钟访谈,混合密度 → 16 × 1.0 = 16秒/卡 → 19张卡片
- 10分钟深度内容,高密度 → 16 × 0.7 = 11秒/卡 → 55张卡片
- 30分钟讲座,中等密度 → 28 × 1.0 = 28秒/卡 → 64张卡片
- 1小时播客,低密度 → 45 × 1.5 = 67.5秒/卡 → 53张卡片
当单卡时长超过约15秒时,需设计更丰富的卡片(数据块、多步骤展示、带 staggered 动画的多个子要点)——静态单行文本超过8秒会显得乏味。对于很多卡片时长超过30秒的长内容,考虑将时间轴拆分为子合成文件(每个章节一个.html文件,通过挂载),以便每个文件的GSAP时间轴保持可控——可参考 HyperFrames lint警告。
data-composition-srctimeline_track_too_densecontent可选结尾卡片。本技能无固定品牌结尾卡片。如果用户需要结尾卡片,自行设计一个中性结尾(标志+单行标语,约1.5-2秒,淡入→短暂停留→淡出),将其添加到中,并延长至其。否则以最后一张内容卡片结束。
cards[]composition.durationSecondsendSec7. Decide Render Strategy
7. 确定渲染策略
Confirm Visual Direction with User (DO THIS FIRST)
与用户确认视觉方向(务必先做此步骤)
Before you start designing cards or deciding bounds, ask the user to
pick the output ratio, the layout, the style, and the card-density
preset. Frames are auto-selected from the chosen layout × style
combination (see "Auto-pick frame" table below). Before sending the
question, precompute two things:
-
from the source video's aspect ratio (
recommendedRatiowidth / height):metadata.jsonsourceAspect = width / height- (≥ ~3:2 wide) → recommend
sourceAspect ≥ 1.516:9 - (≤ ~9:13 tall) → recommend
sourceAspect ≤ 0.79:16 - (near-square) → recommend
0.7 < sourceAspect < 1.54:5
Mark the recommended option's label with " (recommended · matches source video X:Y)" so the user sees why it's recommended. -
from Step 6 (
autoCount) so the "auto" option's label can show the concrete number.max(5, round(videoSec / (basePace × densityMultiplier)))
Environment compatibility — pick the best available question channel.
Not every runtime exposes the same structured-question tool. Apply this
order:
- (Claude Code, Anthropic Console) — use the structured 4-question call below.
AskUserQuestion - Other native clarification tool (e.g. ,
ask_question, IDE-specific prompt) — use that tool with the same 4 question texts and option lists. Preserve the recommendation markers and the precomputed values.request_user_input - No native tool (Codex CLI, plain text-only runtimes) — ask directly in normal conversation. Use the plain-text template at the end of this section. Keep it to one message, 4 numbered questions (the global cap is 2–5 questions per round; we stay inside it).
Rules that apply to every channel:
- Ask at most 2–5 questions per round. Our 4 here fits.
- Even if missing info doesn't block rendering, ask once to confirm the parameters that materially affect the final output (ratio, layout, style, cardCount).
- If the user has already pre-approved defaults ("just use defaults",
"no need to ask", "auto-pick everything") or asked you not to ask — skip
the question entirely and use: ,
recommendedRatio(safest cross-ratio default),layout="stack"chosen from transcript tone in the most neutral group (editorial/data),style. Tell the user what you picked in one sentence and continue.autoCount
Channel A — native :
AskUserQuestion// Precompute before the call:
// recommendedRatio = "16:9" | "9:16" | "4:5"
// autoCount = integer (from Step 6)
AskUserQuestion({
questions: [
{
question: "Output video aspect ratio (canvas):",
header: "Aspect ratio",
multiSelect: false,
// Reorder so the recommended option appears FIRST (per AskUserQuestion convention).
// Append " (recommended · matches source video W×H)" to the recommended option's label.
options: [
{ label: "16:9 (1920×1080) landscape", description: "TV / YouTube / desktop playback. Most natural when the source video is already landscape; widest canvas." },
{ label: "9:16 (1080×1920) portrait", description: "TikTok / Reels / short-form mobile. Most natural for portrait source; native mobile experience." },
{ label: "4:5 (1080×1350) near-portrait", description: "Instagram feed / WeChat Moments. Best when source is near-square or you want to cover both platforms." }
]
},
{
question: "Choose the overall layout: how should the video and cards coexist on the canvas?",
header: "Layout",
multiSelect: false,
options: [
{ label: "side-by-side (split)", description: "Video and card each take half the canvas. Most stable for interview / data side-by-side; clear visual separation." },
{ label: "top-bottom (stack)", description: "Video on top (~52%), card below. Classic combo of speaker face + summary card; works well in portrait too." },
{ label: "picture-in-picture (pip)", description: "Card fills the canvas, video shrinks to a rounded corner window. Use when content is primary and speaker is secondary." },
{ label: "full-screen overlay (overlay)", description: "Video plays full-bleed, card floats as a glass layer on top. Strong cinematic / emotional feel." }
]
},
{
question: "Choose the card visual style (style):",
header: "Style group",
multiSelect: false,
// NOTE: these 3 groups intentionally match the frame auto-pick matrix
// rows below, so picking a group resolves both `style` group AND the
// frame matrix column in one step. Memberships are mutually exclusive.
options: [
{ label: "warm paper (warm-paper)", description: "academic notebook · editorial big-type · whiteboard hand-drawn · xhs social. Best for interview reflections, product launches, lifestyle, emotional stories." },
{ label: "clinical / cold (clinical)", description: "audit magazine · swiss grid · terminal CLI · minimal modern. Best for financial analysis, investigative reports, technical tutorials, serious presentations." },
{ label: "experimental / avant-garde (experimental)", description: "geom color-clash geometry · spotlight dark-background. Best for short-form highlights, product launches, strong emotion, cinematic feel." }
]
},
{
question: "Card count (takeaway pacing): how many cards to cut?",
header: "Card count",
multiSelect: false,
options: [
{ label: "Auto (recommended) · approx N cards", description: "Inferred automatically from video duration and information density (see Step 6 rules). This run estimates approx N cards. Substitute the real N (your autoCount) into the label." },
{ label: "Fewer · approx round(N × 0.6) cards", description: "Sparser cuts, each card holds longer — suits reflective / slow-paced content." },
{ label: "More · approx round(N × 1.5) cards", description: "Tighter cuts, faster rhythm — suits staccato / data-dense / short-form highlight content." }
]
}
]
})About "Other" — automatically adds an "Other" option to the card count question. The user can type a number directly (e.g. "8", "20") as the cardCount target. Parse the input as an integer: if parsing succeeds → use that value (minimum 5 as a floor); if parsing fails → fall back to "auto".
AskUserQuestionChannel B — plain-text fallback (Codex CLI, runtimes without a
native question tool). Post this as one normal message, then wait for
the reply. Bullet-style 1/2/3/4 keeps the reply parseable:
I need to confirm four visual decisions with you before I start cutting cards:
1) Output aspect ratio (canvas):
A. 16:9 landscape (1920×1080) — TV / YouTube / desktop playback
B. 9:16 portrait (1080×1920) — TikTok / Reels / short-form mobile
C. 4:5 near-portrait (1080×1350) — Instagram feed / works for both platforms
▸ My recommendation: <recommendedRatio> (matches source video W×H = <sourceW>×<sourceH>)
2) Overall layout (how video & card coexist):
A. split side-by-side (50/50)
B. stack top-bottom (video top, card bottom)
C. pip picture-in-picture (card full canvas, video rounded corner window)
D. overlay full-screen glass overlay (video full-bleed, card glass layer)
3) Card style group (maps to frame auto-pick matrix, pick 1 of 3):
A. warm paper (warm-paper) (academic / editorial / whiteboard / xhs)
B. clinical / cold (clinical) (audit / swiss / terminal / minimal)
C. experimental (experimental) (geom / spotlight)
4) Card count (takeaway pacing):
A. Auto (recommended) — approx <autoCount> cards
B. Fewer — approx round(<autoCount> × 0.6) cards
C. More — approx round(<autoCount> × 1.5) cards
D. Give me a specific number (e.g. "8", "20")
Reply format: "1A 2C 3B 4A" or natural language is fine.
If you want all recommended defaults, reply "default" / "auto" / "use all recommendations".Parsing the plain-text reply:
- Accept loose formats: ,
"1A 2C 3B 4A","A C B A", full sentences, or"16:9 / pip / data / auto".default - If any answer is ambiguous → re-ask only the ambiguous ones (still inside the 2–5 cap).
- If the user says "default / auto / use all recommendations" → skip without re-asking.
After the user answers (any channel):
-
Resolve the output canvas from the ratio answer — these are the exactvalues to write:
storyboard.composition.width / heightuser choice composition.width × height storyboard.layout field 16:91920 × 1080 "landscape"9:161080 × 1920 "portrait"4:51080 × 1350 (schema treats 4:5 as portrait — height > width)"portrait"For 4:5 bounds inside— those files only document landscape (1920×1080) and portrait (1080×1920). For 4:5 (1080×1350) derive bounds by proportional scaling from portrait: keep horizontal values, scale vertical values byreferences/layouts/*.html. Example:1350/1920 ≈ 0.703portrait card =overlay→ 4:5 card ={ x: 24, y: 1280, w: 1032, h: 564 }={ x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }.{ x: 24, y: 900, w: 1032, h: 397 } -
Map the style group to a specific style by looking at the transcript tone — pick the one that best fits, but stay inside the user's chosen group. If you're unsure between two specific styles inside the group, send a secondwith those 2–4 specific style options.
AskUserQuestion -
Resolve final cardCount from the density answer:
user choice final cardCount Auto (recommended) the you already computedautoCountFewer max(5, round(autoCount × 0.6))More (no upper clamp)round(autoCount × 1.5)Other = "<n>" (integer) max(5, parseInt(n))Other = anything else fall back to autoCount -
Auto-pick the video frame from this table (frames don't ask the user — they follow from layout × style):
layout warm-paper styles (academic / whiteboard / editorial / xhs) clinical styles (audit / swiss / terminal / minimal) experimental styles (geom / spotlight) splitpolaroidhairlinecleanstackpolaroidhairlinecleanpip(pip pill already has chrome)cleancleancleanoverlay(full-bleed forbids deco frames)cleancleanclean -
Tell the user what you chose in one sentence — ratio (+ canvas size), layout, specific style, frame, and final cardCount — then proceed with the rest of Step 7 (per-card layouts, motion patterns).
-
Record the five values (ratio / layout / style / frame / cardCount) in working memory (no schema field needed); you'll reference them while writing each card's HTML in Step 8 and while reading the matchingfor tokens and structure.
references/<dim>/<key>.html
If the user picks an answer via "Other" with a free-text style name not
in the 10-style library, treat it as a hint to design a fresh card
visual yourself, but still anchor on the chosen layout's bounds.
在开始设计卡片或确定边界前,请用户选择输出比例、布局、风格和卡片密度预设。画面会根据所选布局×风格组合自动选择(见下文“自动选择画面”表格)。发送问题前,预先计算两件事:
-
(根据源视频的宽高比,即
recommendedRatio中的width/height):metadata.jsonsourceAspect = width / height- (≥ ~3:2宽屏)→ 推荐**
sourceAspect ≥ 1.5**16:9 - (≤ ~9:13竖屏)→ 推荐**
sourceAspect ≤ 0.7**9:16 - (接近正方形)→ 推荐**
0.7 < sourceAspect < 1.5**4:5
在推荐选项的标签后添加“(推荐 · 匹配源视频X:Y)”,让用户了解推荐原因。 -
(来自步骤6的计算值,即
autoCount),以便“自动”选项的标签可以显示具体数值。max(5, round(视频时长秒数 / (基础节奏 × 密度乘数)))
环境兼容性——选择最佳的提问方式。并非所有运行时都支持相同的结构化提问工具。按以下顺序选择:
- (Claude Code、Anthropic控制台)——使用以下结构化4问题调用。
AskUserQuestion - 其他原生澄清工具(如、
ask_question、IDE特定提示)——使用该工具并保留相同的4个问题文本和选项列表。保留推荐标记和预先计算的值。request_user_input - 无原生工具(Codex CLI、纯文本运行时)——直接在对话中提问。使用本节末尾的纯文本模板。保持为一条消息,4个编号问题(全局限制为每轮2–5个问题;此处符合要求)。
适用于所有渠道的规则:
- 每轮最多提问2–5个问题。此处的4个问题符合要求。
- 即使缺少信息不会阻碍渲染,也要询问一次以确认对最终输出有重大影响的参数(比例、布局、风格、卡片数量)。
- 如果用户已预先批准默认值(“使用默认值即可”“无需询问”“自动选择所有选项”)或要求不要提问——完全跳过提问,使用:、
recommendedRatio(最安全的跨比例默认值)、根据字幕语气选择最中性组(编辑/数据)的layout="stack"、style。用一句话告知用户你的选择,然后继续。autoCount
渠道A — 原生:
AskUserQuestion// 调用前预先计算:
// recommendedRatio = "16:9" | "9:16" | "4:5"
// autoCount = 整数(来自步骤6)
AskUserQuestion({
questions: [
{
question: "输出视频宽高比(画布):",
header: "宽高比",
multiSelect: false,
// 重新排序,让推荐选项排在第一位(遵循AskUserQuestion惯例)。
// 在推荐选项的标签后添加“(推荐 · 匹配源视频W×H)”。
options: [
{ label: "16:9 (1920×1080) 横屏", description: "电视/YouTube/桌面播放。最适合已为横屏的源视频;画布最宽。" },
{ label: "9:16 (1080×1920) 竖屏", description: "TikTok/Reels/短视频移动端。最适合竖屏源视频;原生移动端体验。" },
{ label: "4:5 (1080×1350) 近竖屏", description: "Instagram动态/微信朋友圈。最适合接近正方形的源视频,或需要覆盖多平台的场景。" }
]
},
{
question: "选择整体布局:视频和卡片如何在画布上共存?",
header: "布局",
multiSelect: false,
options: [
{ label: "side-by-side (split) 分屏", description: "视频和卡片各占画布一半。最适合访谈/数据并列展示;视觉分隔清晰。" },
{ label: "top-bottom (stack) 上下堆叠", description: "视频在上(约52%),卡片在下。经典的主讲人画面+摘要卡片组合;也适用于竖屏。" },
{ label: "picture-in-picture (pip) 画中画", description: "卡片填满画布,视频缩小为圆角窗口。适合内容为主、主讲人为辅的场景。" },
{ label: "full-screen overlay (overlay) 全屏叠加", description: "视频全屏播放,卡片作为玻璃层悬浮在上方。具有强烈的电影感/情感氛围。" }
]
},
{
question: "选择卡片视觉风格(style):",
header: "风格组",
multiSelect: false,
// 注意:这3组与下方的画面自动选择矩阵行完全匹配
// 因此选择一组即可同时确定`style`组和画面矩阵列。各组互斥。
options: [
{ label: "warm paper (warm-paper) 暖纸风", description: "学术笔记本·大字体编辑风格·手绘白板·小红书社交风。最适合访谈反思、产品发布、生活方式、情感故事。" },
{ label: "clinical / cold (clinical) 冷峻风", description: "审计杂志·瑞士网格·终端CLI·极简现代风。最适合财务分析、调查报道、技术教程、严肃演示。" },
{ label: "experimental / avant-garde (experimental) 实验风", description: "几何撞色·暗背景聚光灯。最适合短视频高光、产品发布、强烈情感、电影感内容。" }
]
},
{
question: "卡片数量(要点节奏):需要制作多少张卡片?",
header: "卡片数量",
multiSelect: false,
options: [
{ label: "Auto (推荐) · 约N张卡片", description: "根据视频时长和信息密度自动推断(见步骤6规则)。本次运行估计约N张卡片。将实际N值(你的autoCount)替换到标签中。" },
{ label: "Fewer · 约round(N × 0.6)张卡片", description: "切换更稀疏,每张卡片停留更长时间——适合反思/慢节奏内容。" },
{ label: "More · 约round(N × 1.5)张卡片", description: "切换更紧凑,节奏更快——适合急促/高密度数据/短视频高光内容。" }
]
}
]
})关于“其他”选项——会自动在卡片数量问题中添加“其他”选项。用户可以直接输入数字(如“8”“20”)作为卡片数量目标。将输入解析为整数:如果解析成功→使用该值(下限为5);如果解析失败→回退到“自动”选项。
AskUserQuestion渠道B — 纯文本回退(Codex CLI、无原生提问工具的运行时)。将以下内容作为一条普通消息发送,然后等待回复。使用项目符号1/2/3/4格式,便于解析回复:
开始制作卡片前,我需要与你确认四个视觉决策:
1) 输出宽高比(画布):
A. 16:9 横屏 (1920×1080) — 电视/YouTube/桌面播放
B. 9:16 竖屏 (1080×1920) — TikTok/Reels/短视频移动端
C. 4:5 近竖屏 (1080×1350) — Instagram动态/适配多平台
▸ 我的推荐:<recommendedRatio>(匹配源视频W×H = <sourceW>×<sourceH>)
2) 整体布局(视频与卡片如何共存):
A. split 分屏(50/50)
B. stack 上下堆叠(视频在上,卡片在下)
C. pip 画中画(卡片填满画布,视频为圆角窗口)
D. overlay 全屏玻璃叠加(视频全屏,卡片为玻璃层)
3) 卡片风格组(对应画面自动选择矩阵,3选1):
A. warm paper (warm-paper) 暖纸风(学术/编辑/白板/小红书)
B. clinical / cold (clinical) 冷峻风(审计/瑞士风格/终端/极简)
C. experimental (experimental) 实验风(几何/聚光灯)
4) 卡片数量(要点节奏):
A. Auto (推荐) — 约<autoCount>张卡片
B. Fewer — 约round(<autoCount> × 0.6)张卡片
C. More — 约round(<autoCount> × 1.5)张卡片
D. 指定具体数量(如“8”“20”)
回复格式:“1A 2C 3B 4A”或自然语言均可。
如果要使用所有推荐默认值,回复“default”/“auto”/“使用所有推荐选项”。解析纯文本回复:
- 接受宽松格式:、
"1A 2C 3B 4A"、"A C B A"、完整句子或"16:9 / pip / 数据风 / auto"。default - 如果任何答案不明确→仅重新询问不明确的问题(仍保持在2–5个问题的限制内)。
- 如果用户回复“default / auto / 使用所有推荐选项”→跳过询问。
用户回复后(任意渠道):
-
根据宽高比选择解析输出画布——以下是要写入的精确值:
storyboard.composition.width / height用户选择 合成文件宽×高 storyboard.layout字段 16:91920 × 1080 (横屏)"landscape"9:161080 × 1920 (竖屏)"portrait"4:51080 × 1350 (竖屏,因为高度>宽度)"portrait"对于中的4:5边界——这些文件仅记录横屏(1920×1080)和竖屏(1080×1920)的边界。对于4:5(1080×1350),需通过竖屏比例缩放推导边界:保持水平值不变,垂直值按references/layouts/*.html缩放。示例:竖屏1350/1920 ≈ 0.703卡片 =overlay→ 4:5卡片 ={ x: 24, y: 1280, w: 1032, h: 564 }={ x: 24, y: round(1280 × 0.703), w: 1032, h: round(564 × 0.703) }。{ x: 24, y: 900, w: 1032, h: 397 } -
根据字幕语气将风格组映射到具体风格——选择最匹配的风格,但需保持在用户选择的组内。如果对组内的两种具体风格不确定,发送第二次,列出这2–4种具体风格选项。
AskUserQuestion -
根据密度选择解析最终卡片数量:
用户选择 最终卡片数量 Auto (推荐) 你已计算的 值autoCountFewer max(5, round(autoCount × 0.6))More (无上限)round(autoCount × 1.5)Other = "<n>"(整数) max(5, parseInt(n))Other = 其他内容 回退到 autoCount -
根据以下表格自动选择视频画面(无需询问用户——由布局×风格决定):
布局 warm-paper风格(学术/白板/编辑/小红书) clinical风格(审计/瑞士风格/终端/极简) experimental风格(几何/聚光灯) splitpolaroidhairlinecleanstackpolaroidhairlinecleanpip(画中画已自带边框)cleancleancleanoverlay(全屏视频不适合装饰性边框)cleancleanclean -
用一句话告知用户你的选择——宽高比(+画布尺寸)、布局、具体风格、画面、最终卡片数量——然后继续步骤7的剩余部分(单卡布局、动画模式)。
-
将五个值(宽高比/布局/风格/画面/卡片数量)记录到工作内存中(无需写入schema字段);在步骤8编写每张卡片的HTML和步骤9读取匹配的获取标记和结构时,会引用这些值。
references/<dim>/<key>.html
如果用户通过“其他”选项选择了10种风格库之外的自由文本风格名称,将其视为设计全新卡片视觉的提示,但仍需锚定所选布局的边界。
Render Strategy Inputs
渲染策略输入
With ratio / layout / style / cardCount / frame locked from Step 7.0,
the remaining per-card decisions are:
- Source-video fit inside the GSAP target: video element has
and is clipped to
object-fit: cover's tween bounds. If you want NO cropping (e.g. portrait source on landscape canvas shouldn't get its top/bottom chopped), aim the tween at a rect that matches the source's aspect ratio and let surrounding canvas show through (or fill with the card / a backdrop).#video-wrap - per card: derive from your chosen composition layout (split → side-panel, stack → lower-third, pip → fullscreen, overlay → video-overlay), OR pick a different zone for one-off variants (fullscreen for hero / quote, whiteboard-area for dense data).
card.zone - per card: each card pulls one of the 5 theme accent colors. Vary across cards for rhythm; reuse the same index when two cards belong to the same narrative beat.
accentIndex - Motion vocabulary: pick 2–3 repeatable patterns from
kinds (see the table later) and stick to them so the composition feels coherent.
data-anim
Pick from these palettes (use them as /
/ CSS variables in your composition block):
themeId--accent-N--bg--text<style>| themeId | accent palette (5 colors) | board bg | text |
|---|---|---|---|
| classic | | | |
| noir | | | |
| mint | | | |
| craft | | | |
| slate | | | |
| mono | | | |
Available fonts (woff2 in , staged to work dir in Step 9): (handwriting),
(Chinese hand-script), (modern sans),
(geometric hand). Reference via or directly.
<SKILL_DIR>/assets/fonts/CaveatLXGW WenKai TCInterVirgil@font-facefont-familyFor inspiration on visual patterns,
ships 10 self-contained reference cards (academic / editorial / minimal
/ spotlight / geom / whiteboard / audit / terminal / swiss / xhs) that
you can copy as starting points — but do not feel constrained to
match any of these. Each card is your own design.
<SKILL_DIR>/references/styles/在步骤7.0确定宽高比/布局/风格/卡片数量/画面后,剩余的单卡决策包括:
- GSAP目标中的源视频适配:视频元素设置并被裁剪到
object-fit: cover的动画边界。如果不想裁剪(例如竖屏源视频在横屏画布上不希望上下被裁切),将动画目标设置为与源视频宽高比匹配的矩形,让画布周围区域显示(或填充卡片/背景)。#video-wrap - 每张卡片的:从所选合成文件布局推导(split→side-panel,stack→lower-third,pip→fullscreen,overlay→video-overlay),或为一次性变体选择不同区域(fullscreen用于重点/引用语,whiteboard-area用于密集数据)。
card.zone - 每张卡片的:每张卡片使用5种主题强调色中的一种。在卡片间切换以形成节奏;当两张卡片属于同一叙事节拍时,可重复使用相同索引。
accentIndex - 动画词汇:从类型中选择2–3种可重复的模式(见下文表格)并坚持使用,让合成文件感觉连贯。
data-anim
从以下调色板中选择(在合成文件块中作为//CSS变量使用):
themeId<style>--accent-N--bg--text| themeId | 强调色板(5种颜色) | 背景色 | 文本色 |
|---|---|---|---|
| classic | | | |
| noir | | | |
| mint | | | |
| craft | | | |
| slate | | | |
| mono | | | |
可用字体(存放在的woff2文件,步骤9中部署到工作目录):(手写体)、(中文手写体)、(现代无衬线体)、(几何手写体)。通过或直接使用引用。
<SKILL_DIR>/assets/fonts/CaveatLXGW WenKai TCInterVirgil@font-facefont-family如需视觉模式灵感,包含10个独立的参考卡片(学术/编辑/极简/聚光灯/几何/白板/审计/终端/瑞士风格/小红书),可作为起点复制——但不必局限于这些样式。每张卡片都可以是你自己的设计。
<SKILL_DIR>/references/styles/Visual Design Library (<SKILL_DIR>/references/)
视觉设计库(<SKILL_DIR>/references/)
Beyond the composition-level , the skill ships a richer reference
library at covering three orthogonal
visual dimensions you can freely mix:
themeId<SKILL_DIR>/references/Style × Layout × VideoFrame
(10) (4) (3)| dimension | keys | what it decides |
|---|---|---|
| style | | the card's visual language — fonts, colors, ornament, layout-within-card |
| layout | | how the source video and the card share the canvas |
| frame | | the decorative chrome around the video element |
Read
for the full matrix and a loose decision guide (interview / product launch / data analysis /
social clip / technical tutorial / emotional story …). When you decide to use a specific
style / layout / frame, Read the corresponding file:
<SKILL_DIR>/references/DESIGN_INDEX.md- — self-contained card fragment with that style's CSS tokens (colors, fonts, padding, ornament) and a placeholder takeaway. Copy the
references/styles/<key>.htmlstyle block, rename the data-card-id to your card's id, swap the placeholder content for the real takeaway, and you're done..card[data-card-id="ref-<key>"] - — exact
references/layouts/<key>.html+videoBoundsfor both landscape and portrait, with a copy-paste JSON snippet forcardBounds's per-cardstoryboard.jsonfield.layout - — decorative HTML to add as a sibling of
references/frames/<key>.html, plus placement instructions for the composition CSS.#video-wrap
Pick per card — you can change all three
between cards as long as the transitions read smoothly. A common rhythm:
open , switch to
for the data card, close on .
style × layout × frameeditorial × overlay × cleanaudit × split × hairlinewhiteboard × pip × polaroidThe 10 styles are skill-side design tokens, not composition-level themes —
they don't need to be declared in ; they live
inside each card's HTML. The field can still pick a
composition-level palette (table above) that controls page-body background
and video border chrome.
storyboard.compositionthemeId除了合成文件级别的,本技能还在提供了更丰富的参考库,涵盖三个正交的视觉维度,可自由组合:
themeId<SKILL_DIR>/references/风格 × 布局 × 视频画面
(10) (4) (3)| 维度 | 取值 | 决定内容 |
|---|---|---|
| 风格 | | 卡片的视觉语言——字体、颜色、装饰、卡片内布局 |
| 布局 | | 源视频和卡片如何共享画布 |
| 画面 | | 视频元素周围的装饰性边框 |
阅读获取完整矩阵和宽松决策指南(访谈/产品发布/数据分析/社交视频/技术教程/情感故事…)。当决定使用特定风格/布局/画面时,阅读对应的文件:
<SKILL_DIR>/references/DESIGN_INDEX.md- ——包含该风格CSS标记(颜色、字体、内边距、装饰)和占位要点的独立卡片片段。复制
references/styles/<key>.html样式块,将data-card-id重命名为你的卡片ID,将占位内容替换为实际要点,即可完成。.card[data-card-id="ref-<key>"] - ——横屏和竖屏的精确
references/layouts/<key>.html+videoBounds,以及可复制粘贴到cardBounds单卡storyboard.json字段的JSON片段。layout - ——添加为
references/frames/<key>.html同级元素的装饰性HTML,以及合成文件CSS中的放置说明。#video-wrap
为每张卡片选择——只要过渡效果流畅,卡片间可以更改这三个维度。常见节奏:以开场,切换到展示数据卡片,以结束。
风格 × 布局 × 画面editorial × overlay × cleanaudit × split × hairlinewhiteboard × pip × polaroid这10种风格是技能侧的设计标记,不是合成文件级别的主题——无需在中声明;它们存在于每张卡片的HTML中。字段仍可选择合成文件级别的调色板(见上文表格),控制页面主体背景和视频边框。
storyboard.compositionthemeIdLayout Compositions (Card + Video)
布局合成(卡片+视频)
Two coordinated decisions per card define how it shares the canvas with
the source video:
- (declared in
card.zone) — one of the 5 schema values; resolve it into pixel bounds (per the table in Step 6) when you write the card-host wrapper's inlinestoryboard.jsonin Step 9.style - bounds at this card's time window (declared imperatively in the composition's GSAP timeline) — the agent tweens
#video-wrapto a target rect for each layout transition.#video-wrap
Schema does NOT store per-card video bounds. is
one-time at composition level (defaults to full canvas). Video
"moving" between cards is purely a GSAP animation authored in
. There is no field — earlier versions of this
doc invented one; the real schema only has .
videoTrack.boundsindex.htmlcard.layoutcard.zone4 composition layouts (from ) — each is a
recipe pairing a with a tween target:
references/layouts/zone#video-wrap| composition layout | recommended | GSAP target for | GSAP target for | when to use |
|---|---|---|---|---|
| | | | speaker + data side-by-side / 50:50 weight |
| | | | speaker on top + summary card below |
| | | | content-heavy card + corner pip |
| | | | cinematic / dramatic / glass card on full video |
For 4:5 (1080×1350), scale portrait y/h values by
(see Step 7.0 Channel A / Channel B resolution
table).
1350/1920 ≈ 0.703recommendedRatioOther zone values for one-off variants (still uses ; no
fake "layout" field):
card.zone | resolved bounds | common use |
|---|---|---|
| covers whole canvas | hero card, video tweens to hidden/pip |
| inset 40px margin (landscape) or bottom 45% (portrait) | dense data card, free margins |
| bottom 30% band | talking-head annotation |
| right 42% (landscape) or bottom 40% (portrait) | sidebar / "split" recipe |
| full canvas; expect transparent card root | glass overlay on full-bleed video |
You can mix recipes per card — choose based on what suits
the moment, then write the GSAP tween for between cards.
card.zone#video-wrap每张卡片的两个协同决策定义了它与源视频共享画布的方式:
- (在
card.zone中声明)——5种schema取值之一;在步骤9编写卡片容器的内联storyboard.json时,需将其解析为像素边界(见步骤6的表格)。style - 该卡片时间窗口内的边界(在合成文件的GSAP时间轴中声明)——Agent为每个布局过渡动画
#video-wrap到目标矩形。#video-wrap
Schema不存储单卡视频边界。是合成文件级别的一次性设置(默认填满整个画布)。视频在卡片间“移动”纯粹是通过在中编写GSAP动画实现的。没有字段——早期文档版本曾提出该字段;实际schema只有。
videoTrack.boundsindex.htmlcard.layoutcard.zone4种合成文件布局(来自)——每种布局都是与动画目标的组合:
references/layouts/zone#video-wrap| 合成文件布局 | 推荐 | | | 使用场景 |
|---|---|---|---|---|
| | | | 主讲人+数据并列展示/权重50:50 |
| | | | 主讲人在上+摘要卡片在下 |
| | | | 内容密集的卡片+角落画中画 |
| | | | 电影感/戏剧性/玻璃卡片覆盖全屏视频 |
对于4:5(1080×1350),将竖屏的y/h值按缩放(见步骤7.0渠道A/渠道B的解析表格)。
1350/1920 ≈ 0.703recommendedRatio一次性变体的其他zone取值(仍使用;无虚构的“layout”字段):
card.zone | 解析后的边界 | 常见使用场景 |
|---|---|---|
| 覆盖整个画布 | 重点卡片,视频动画为隐藏/画中画 |
| 内边距40px(横屏)或底部45%(竖屏) | 密集数据卡片,自由边距 |
| 底部30%区域 | 单人讲话视频的注释 |
| 右侧42%(横屏)或底部40%(竖屏) | 侧边栏/“分屏”布局 |
| 整个画布;卡片根元素需透明 | 全屏视频上的玻璃叠加层 |
可以为每张卡片混合使用不同布局——根据当前场景选择,然后编写卡片间的GSAP动画。
card.zone#video-wrapStoryboard Render Contract
故事板渲染约定
storyboard.jsonRequired structure (see Step 6 for the full example):
schemaVersion: 3- — note
composition: { fps, width, height, durationSeconds, layout, themeId, seed }/durationSeconds/fps/themeIdlive insidelayout, NOT at top levelcomposition - — video bounds default to full canvas
videoTrack: { sourcePath, startSec, endSec, bounds? } subtitles: { enabled, ... }- — each card has the 6 required fields:
cards[],id,intent,startSec,endSec,accentIndex,zonecontentHints
Rules:
- Card times stay inside and should not overlap unless intentional (use
composition.durationSecondsto control z-order when they do).data-track-index - Visual details live in card HTML fragments (Step 8), NOT in .
contentHintsis your own structured prompt for designing the card; the rendered look is the HTML.contentHints - Keep the storyboard shape stable — even though nothing parses it, you read it back while authoring Step 8/9, and consistency keeps card IDs and timing in sync.
- Agent-side decisions like "I picked overlay × geom × clean" do NOT belong in — keep them in working memory and use them when authoring card HTML + GSAP tweens.
storyboard.json
Transparent card backgrounds for cards that share canvas with video.
When the GSAP tween leaves video visible behind/beside the card (overlay
recipe, pip recipe, or any
moment), the card's MUST NOT paint a full opaque background —
otherwise it occludes the video. Two patterns:
card.zone = 'lower-third' | 'video-overlay'.rootcss
/* Pattern A: transparent root, page body provides the cream backdrop */
html,
body {
background: var(--bg);
}
.card[data-card-id="card-X"] .root {
background: transparent;
}
/* Pattern B: explicit per-card background ONLY for fullscreen cards */
.card[data-card-id="card-hero"] .root {
background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
background: transparent;
}For -zone cards (split recipe), the card-host is already
only half the canvas, so an opaque card bg is fine — it only covers its
half.
side-panelstoryboard.json必需结构(见步骤6的完整示例):
schemaVersion: 3- ——注意
composition: { fps, width, height, durationSeconds, layout, themeId, seed }/durationSeconds/fps/themeId位于**layout内部**,而非顶层composition - ——视频边界默认填满整个画布
videoTrack: { sourcePath, startSec, endSec, bounds? } subtitles: { enabled, ... }- ——每张卡片包含6个必需字段:
cards[],id,intent,startSec,endSec,accentIndex,zonecontentHints
规则:
- 卡片时间需在范围内,除非有意重叠(重叠时使用
composition.durationSeconds控制层级)。data-track-index - 视觉细节存放在卡片HTML片段中(步骤8),而非。
contentHints是你自己设计卡片的结构化提示;最终呈现效果由HTML决定。contentHints - 保持故事板结构稳定——即使没有工具解析它,你在编写步骤8/9时也会回头查看,一致性有助于保持卡片ID和时间同步。
- Agent侧的决策(如“我选择了overlay×geom×clean”)不属于的内容——将其记录在工作内存中,并在编写卡片HTML+GSAP动画时使用。
storyboard.json
与视频共享画布的卡片需设置透明背景。当GSAP动画让视频在卡片后方/旁边可见时(overlay布局、pip布局,或任何的场景),卡片的不得设置完全不透明的背景——否则会遮挡视频。两种模式:
card.zone = 'lower-third' | 'video-overlay'.rootcss
/* 模式A:透明根元素,页面主体提供米色背景 */
html,
body {
background: var(--bg);
}
.card[data-card-id="card-X"] .root {
background: transparent;
}
/* 模式B:仅全屏卡片设置明确的单卡背景 */
.card[data-card-id="card-hero"] .root {
background: var(--bg);
}
.card[data-card-id="card-overlay"] .root {
background: transparent;
}对于区域的卡片(分屏布局),卡片容器仅占画布的一半,因此不透明卡片背景是可行的——仅覆盖其所在的一半区域。
side-panel8. Write Each Card's HTML
8. 编写每张卡片的HTML
Create for each card. Each file
contains a single rooted HTML fragment that follows this contract:
$WORK_DIR/public/cards/{card-id}.html为每张卡片创建。每个文件包含一个符合以下约定的根HTML片段:
$WORK_DIR/public/cards/{card-id}.htmlCard HTML Contract
卡片HTML约定
html
<div class="card" data-card-id="{cardId}">
<style>
/* MUST: every rule starts with .card[data-card-id="{cardId}"] */
.card[data-card-id="card-01"] .root {
width: 100%; height: 100%;
display: flex; ...;
font-family: 'Caveat', 'LXGW WenKai TC', serif;
color: var(--text);
background: var(--bg);
}
.card[data-card-id="card-01"] .title { font-size: 84px; ... }
</style>
<div class="root">
<h1
id="card-01-title"
data-anim="kinetic-chars"
data-anim-at="0.3"
data-anim-duration="0.5"
data-anim-stagger="0.04"
data-anim-pattern="pop"
>
<span class="char">S</span>
<span class="char">u</span>
</h1>
<div
id="card-01-line"
data-anim="grow-x"
data-anim-at="0.65"
data-anim-duration="0.5"
data-anim-target-w="420"
style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
></div>
</div>
</div>Hard rules ( lint will reject violations):
hyperframes- Single root
<div class="card" data-card-id="{cardId}"> - Inline rules MUST be prefixed with the scope selector above
<style> - No tags
<script> - No external URLs in /
src=(no CDN, no remote fonts)href= - No inline event handlers (etc.)
onclick= - All assets via relative paths into the same directory
public/ - Colors via etc. for portability across themes
var(--accent-N)
Animations are declared, not coded. Use attributes
only; never write to animate. You compile every
declaration into the single master GSAP timeline in Step 9.
data-anim-*<script>data-anim-*html
<div class="card" data-card-id="{cardId}">
<style>
/* 必须:每个规则都以.card[data-card-id="{cardId}"]开头 */
.card[data-card-id="card-01"] .root {
width: 100%; height: 100%;
display: flex; ...;
font-family: 'Caveat', 'LXGW WenKai TC', serif;
color: var(--text);
background: var(--bg);
}
.card[data-card-id="card-01"] .title { font-size: 84px; ... }
</style>
<div class="root">
<h1
id="card-01-title"
data-anim="kinetic-chars"
data-anim-at="0.3"
data-anim-duration="0.5"
data-anim-stagger="0.04"
data-anim-pattern="pop"
>
<span class="char">S</span>
<span class="char">u</span>
</h1>
<div
id="card-01-line"
data-anim="grow-x"
data-anim-at="0.65"
data-anim-duration="0.5"
data-anim-target-w="420"
style="width:0;height:8px;background:var(--accent-0);border-radius:4px;"
></div>
</div>
</div>硬性规则( lint会拒绝违规内容):
hyperframes- 单个根元素
<div class="card" data-card-id="{cardId}"> - 内联规则必须以上述范围选择器开头
<style> - 禁止标签
<script> - 禁止在/
src=中使用外部URL(无CDN,无远程字体)href= - 禁止内联事件处理程序(如)
onclick= - 所有资源使用相对于同一目录的路径
public/ - 使用等变量设置颜色,以便跨主题移植
var(--accent-N)
动画通过声明实现,而非编码。仅使用属性;切勿编写来实现动画。你会在步骤9中将每个声明编译到单个主GSAP时间轴中。
data-anim-*<script>data-anim-*Card Sizing — Mobile-First in Portrait
卡片尺寸——竖屏优先
The 10 are sized for a 1920×1080 landscape
preview. When (1080×1920, the dominant
case for social / mobile), scale every visual size up — phones hold
the screen close, and the same pixel count reads smaller than on a
landscape TV-style canvas.
references/styles/*.htmlstoryboard.layout = "portrait"| token | landscape baseline | portrait target | scale |
|---|---|---|---|
| title (h1/h2 hero) | 64–96px | 88–132px | ×1.35 |
| detail / body | 24–30px | 30–40px | ×1.30 |
| kicker / chip label | 14–16px | 18–22px | ×1.30 |
| timecode / meta | 12–14px | 16–18px | ×1.30 |
| data block primary number | 48–60px | 64–88px | ×1.40 |
| line-height multiplier | 1.05–1.5 | same | (don't scale) |
Rule of thumb: , then floor
to a nearby 4px multiple for visual rhythm. Hero headlines may go up to
×1.4; small meta text stays at ×1.2 to avoid crowding.
portraitPx = round(landscapePx × 1.3)Padding shrinks slightly in portrait — the card is narrower so big
landscape padding (40–64px) eats too much width. Use 24–36px horizontal
padding in portrait.
If you're producing a single card that must work in both layouts,
prefer a query on the card root over hard-coding sizes:
@containercss
.card[data-card-id="X"] .root {
container-type: inline-size;
}
.card[data-card-id="X"] .title {
font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
font-size: clamp(24px, 3.2cqi, 40px);
}But for most cards, a single layout choice is fine — just pick the size
table column that matches the storyboard's field.
layout10个是为1920×1080横屏预览设计的。当(1080×1920,社交/移动端的主要场景)时,放大所有视觉尺寸——手机屏幕观看距离近,相同像素数在竖屏上比横屏电视画布上显得更小。
references/styles/*.htmlstoryboard.layout = "portrait"| 标记 | 横屏基准值 | 竖屏目标值 | 缩放比例 |
|---|---|---|---|
| 标题(h1/h2重点内容) | 64–96px | 88–132px | ×1.35 |
| 详情/正文 | 24–30px | 30–40px | ×1.30 |
| 副标题/标签 | 14–16px | 18–22px | ×1.30 |
| 时间码/元数据 | 12–14px | 16–18px | ×1.30 |
| 数据块主数字 | 48–60px | 64–88px | ×1.40 |
| 行高乘数 | 1.05–1.5 | 相同 | (不缩放) |
经验法则:,然后向下取整为接近的4px倍数,以保持视觉节奏。重点标题可放大至×1.4;小元文本保持×1.2,避免拥挤。
竖屏像素值 = round(横屏像素值 × 1.3)竖屏中的内边距略有缩小——卡片更窄,横屏中的大内边距(40–64px)会占用过多宽度。竖屏中使用24–36px的水平内边距。
如果要制作必须同时适配两种布局的单张卡片,优先在卡片根元素上使用查询,而非硬编码尺寸:
@containercss
.card[data-card-id="X"] .root {
container-type: inline-size;
}
.card[data-card-id="X"] .title {
font-size: clamp(64px, 8.5cqi, 132px);
}
.card[data-card-id="X"] .detail {
font-size: clamp(24px, 3.2cqi, 40px);
}但对于大多数卡片,选择单一布局即可——只需选择与故事板字段匹配的尺寸表格列。
layoutAvailable data-anim
Kinds
data-anim可用的data-anim
类型
data-anim| kind | use for | key params |
|---|---|---|
| enter | |
| exit | |
| slide enter | |
| per-char pop | |
| per-char fade | same as kinetic-chars but slower default stagger |
| animate number | |
| SVG path reveal | |
| bar height | |
| bar width | |
| pop entrance | |
| unfocused → focused | |
| clip reveal | |
| tween any CSS | |
data-anim-atstartSec| 类型 | 使用场景 | 关键参数 |
|---|---|---|
| 入场 | |
| 退场 | |
| 滑动入场 | |
| 逐字符弹出 | |
| 逐字符淡入 | 与kinetic-chars参数相同,但默认stagger更慢 |
| 数字动画 | |
| SVG路径绘制 | |
| 柱状图高度 | |
| 柱状图宽度 | |
| 缩放入场 | |
| 失焦→聚焦 | |
| 遮罩显示 | |
| 任意CSS动画 | |
data-anim-atstartSec9. Assemble the Composition HTML
9. 组装合成文件HTML
Stage the assets and write :
$WORK_DIR/public/index.htmlbash
undefined部署资源并编写:
$WORK_DIR/public/index.htmlbash
undefinedSKILL_DIR is injected by the host ("Base directory for this skill: …")
SKILL_DIR由宿主注入("本技能的基础目录:…")
SKILL_DIR="<SKILL_DIR>"
mkdir -p "$WORK_DIR/public/fonts" "$WORK_DIR/public/vendor" "$WORK_DIR/public/cards"
cp -n "$SKILL_DIR/assets/fonts/"* "$WORK_DIR/public/fonts/"
cp -n "$SKILL_DIR/assets/vendor/gsap.min.js" "$WORK_DIR/public/vendor/"
SKILL_DIR="<SKILL_DIR>"
mkdir -p "$WORK_DIR/public/fonts" "$WORK_DIR/public/vendor" "$WORK_DIR/public/cards"
cp -n "$SKILL_DIR/assets/fonts/"* "$WORK_DIR/public/fonts/"
cp -n "$SKILL_DIR/assets/vendor/gsap.min.js" "$WORK_DIR/public/vendor/"
stage the input video — RE-ENCODE with dense keyframes. Sources with a sparse GOP
部署输入视频——重新编码为密集关键帧。关键帧间隔>~1s的源视频在渲染器中会出现 seek 冻结(叠加层下的画面冻结);-g/-keyint_min设置为合成文件帧率,让每一帧都可seek。
(keyframe interval > ~1s) freeze on seek in the renderer (a frozen frame under the
(设置为你的帧率——示例为30;可使用24/25/60匹配源视频。)
overlays); -g / -keyint_min set to your composition fps make every frame seekable.
—
(Set both to your fps — 30 shown; use 24/25/60 to match.)
—
ffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"
undefinedffmpeg -y -i "$VIDEO_PATH" -c:v libx264 -crf 18 -g 30 -keyint_min 30
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"
-pix_fmt yuv420p -movflags +faststart -c:a aac "$WORK_DIR/public/input-video.mp4"
undefinedComposition Template
合成文件模板
html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<style>
@font-face {
font-family: "Caveat";
src: url("fonts/Caveat-400-latin.woff2") format("woff2");
font-weight: 400;
font-display: block;
}
@font-face {
font-family: "Caveat";
src: url("fonts/Caveat-700-latin.woff2") format("woff2");
font-weight: 700;
font-display: block;
}
@font-face {
font-family: "LXGW WenKai TC";
src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
font-weight: 400;
font-display: block;
}
@font-face {
font-family: "Inter";
src: url("fonts/Inter-400-latin.woff2") format("woff2");
font-weight: 400;
font-display: block;
}
@font-face {
font-family: "Inter";
src: url("fonts/Inter-700-latin.woff2") format("woff2");
font-weight: 700;
font-display: block;
}
@font-face {
font-family: "Virgil";
src: url("fonts/Virgil.woff2") format("woff2");
font-display: block;
}
:root {
/* Pick from the themeId palette table in Step 7 — example: classic */
--bg: #fff9e3;
--text: #1e1e1e;
--accent-0: #1971c2;
--accent-1: #e03131;
--accent-2: #2f9e44;
--accent-3: #e8590c;
--accent-4: #9c36b5;
--font-family: "Caveat", "LXGW WenKai TC", serif;
}
* {
box-sizing: border-box;
}
/* Body font-family MUST list concrete font names (not just var(--font-family)) —
the HyperFrames renderer's static analyzer doesn't expand CSS variables when
resolving fonts, so a var-only chain triggers `font_family_without_font_face`
lint and falls back to a generic. Use the concrete chain here; cards that
want the theme font can still reference var(--font-family) internally. */
html,
body {
margin: 0;
padding: 0;
width: 100%;
height: 100%;
overflow: hidden;
background: #000;
font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
}
#stage {
position: relative;
width: 100%;
height: 100%;
overflow: hidden;
}
/* video-wrapper holds the source video. Its position / size are animated
over time by the master timeline (one tween per layout transition). */
.video-wrapper {
position: absolute;
left: 0;
top: 0;
width: 1920px;
height: 1080px;
overflow: hidden;
border-radius: 0;
box-shadow: none;
}
.video-wrapper video {
width: 100%;
height: 100%;
object-fit: cover;
}
.card-host {
position: absolute;
pointer-events: none;
overflow: hidden;
}
.card-host .card {
position: relative;
width: 100%;
height: 100%;
overflow: hidden;
}
.card-host .char {
display: inline-block;
visibility: visible;
}
/* Subtle drop shadow + rounded corners for non-fullscreen video framings */
.video-wrapper.framed {
border-radius: 16px;
box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
}
</style>
</head>
<body>
<div
id="stage"
data-composition-id="graphic-overlays"
data-start="0"
data-duration="121.2"
data-fps="30"
data-width="1920"
data-height="1080"
>
<!-- Layer 1: source video — initial position matches card-01's layout -->
<div class="video-wrapper" id="video-wrap">
<video
id="bg-video"
src="input-video.mp4"
muted
playsinline
data-start="0"
data-duration="121.2"
data-track-index="1"
></video>
</div>
<!-- Layer 2: each card-host sits at the bounds dictated by its layout. -->
<!-- IMPORTANT: every card-host MUST carry BOTH "card-host" and "clip" classes. -->
<!-- - "card-host" → our positioning + pointer-events styles -->
<!-- - "clip" → HyperFrames runtime uses this to enforce visibility -->
<!-- only during data-start … data-start+data-duration. -->
<!-- Without "clip" the host stays visible the whole video -->
<!-- (lint: timed_element_missing_clip_class). -->
<!-- Example: card-01 with zone="fullscreen" → card-host covers (0,0,1920,1080) -->
<div
class="card-host clip"
data-card-id="card-01"
data-start="1.0000"
data-duration="6.5000"
data-track-index="2"
style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
>
<!-- paste the contents of public/cards/card-01.html here -->
</div>
<!-- Example: card-02 with zone="side-panel" (split composition layout) → card on left half -->
<div
class="card-host clip"
data-card-id="card-02"
data-start="8.0000"
data-duration="12.0000"
data-track-index="2"
style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
>
<!-- card-02 HTML -->
</div>
<!-- ...one "card-host clip" per card with inline bounds matching resolveZoneBounds(card.zone)... -->
<script src="vendor/gsap.min.js"></script>
<script>
(function () {
// count-up formatter helper
window.__fmt = function (v, fmt) {
if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
return Number(v).toFixed(Number(fmt.slice(1, -1)));
}
if (fmt === ",d") return Math.round(v).toLocaleString();
return String(Math.round(v));
};
const tl = window.gsap.timeline({ paused: true });
// ── Card lifecycle (one block per card) ──
// Example for card-01 [1.0, 7.5] with kinetic-chars at +0.3, grow-x at +0.65:
// Enter (fade in over 0.4s)
tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
tl.fromTo(
'.card-host[data-card-id="card-01"]',
{ opacity: 0 },
{ opacity: 1, duration: 0.4, ease: "power2.out" },
1.0,
);
// Card-internal anims (compile each data-anim-* declaration here)
tl.from(
'.card[data-card-id="card-01"] #card-01-title .char',
{ opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
1.3,
);
tl.fromTo(
'.card[data-card-id="card-01"] #card-01-line',
{ width: 0 },
{ width: 420, duration: 0.5, ease: "power2.out" },
1.65,
);
// Exit (fade out over 0.35s, ending at endSec)
tl.to(
'.card-host[data-card-id="card-01"]',
{ opacity: 0, duration: 0.35, ease: "power2.in" },
7.15,
);
tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);
// ── Video framing transitions ──
// When the next card uses a different composition layout, animate the
// video-wrapper to its new bounds. Example: card-01 = fullscreen
// (video hidden behind), card-02 = split composition (zone="side-panel"
// → video on right, card on left).
// Card-02 enters at 8.0s with the split composition. Animate video to
// the right half during the card-01 → card-02 gap (between 7.5 and 8.0s).
tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
tl.to(
"#video-wrap",
{ left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
7.5,
);
// Card-02 enter — same pattern as card-01
tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
tl.fromTo(
'.card-host[data-card-id="card-02"]',
{ opacity: 0 },
{ opacity: 1, duration: 0.4, ease: "power2.out" },
8.0,
);
// ...card-02 internal anims...
// ── repeat for each card; if the NEXT card's layout differs,
// insert another tl.to('#video-wrap', ...) tween before its enter ──
window.__timelines = window.__timelines || {};
window.__timelines["graphic-overlays"] = tl;
})();
</script>
</div>
</body>
</html>html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<style>
@font-face {
font-family: "Caveat";
src: url("fonts/Caveat-400-latin.woff2") format("woff2");
font-weight: 400;
font-display: block;
}
@font-face {
font-family: "Caveat";
src: url("fonts/Caveat-700-latin.woff2") format("woff2");
font-weight: 700;
font-display: block;
}
@font-face {
font-family: "LXGW WenKai TC";
src: url("fonts/LXGWWenKaiTC-400-latin.woff2") format("woff2");
font-weight: 400;
font-display: block;
}
@font-face {
font-family: "Inter";
src: url("fonts/Inter-400-latin.woff2") format("woff2");
font-weight: 400;
font-display: block;
}
@font-face {
font-family: "Inter";
src: url("fonts/Inter-700-latin.woff2") format("woff2");
font-weight: 700;
font-display: block;
}
@font-face {
font-family: "Virgil";
src: url("fonts/Virgil.woff2") format("woff2");
font-display: block;
}
:root {
/* 从步骤7的themeId调色板表格中选择——示例:classic */
--bg: #fff9e3;
--text: #1e1e1e;
--accent-0: #1971c2;
--accent-1: #e03131;
--accent-2: #2f9e44;
--accent-3: #e8590c;
--accent-4: #9c36b5;
--font-family: "Caveat", "LXGW WenKai TC", serif;
}
* {
box-sizing: border-box;
}
/* 主体font-family必须列出具体字体名称(不能仅使用var(--font-family))——
HyperFrames渲染器的静态分析器在解析字体时不会展开CSS变量,因此仅使用变量会触发`font_family_without_font_face`
lint并回退到通用字体。在此处使用具体字体链;需要主题字体的卡片仍可在内部引用var(--font-family)。 */
html,
body {
margin: 0;
padding: 0;
width: 100%;
height: 100%;
overflow: hidden;
background: #000;
font-family: "Inter", "Caveat", "LXGW WenKai TC", ui-sans-serif, system-ui, sans-serif;
}
#stage {
position: relative;
width: 100%;
height: 100%;
overflow: hidden;
}
/* video-wrapper承载源视频。其位置/尺寸由主时间轴动画(每个布局过渡一个动画)控制。 */
.video-wrapper {
position: absolute;
left: 0;
top: 0;
width: 1920px;
height: 1080px;
overflow: hidden;
border-radius: 0;
box-shadow: none;
}
.video-wrapper video {
width: 100%;
height: 100%;
object-fit: cover;
}
.card-host {
position: absolute;
pointer-events: none;
overflow: hidden;
}
.card-host .card {
position: relative;
width: 100%;
height: 100%;
overflow: hidden;
}
.card-host .char {
display: inline-block;
visibility: visible;
}
/* 非全屏视频画面的细微阴影+圆角 */
.video-wrapper.framed {
border-radius: 16px;
box-shadow: 0 12px 40px rgba(0, 0, 0, 0.35);
}
</style>
</head>
<body>
<div
id="stage"
data-composition-id="graphic-overlays"
data-start="0"
data-duration="121.2"
data-fps="30"
data-width="1920"
data-height="1080"
>
<!-- 第一层:源视频——初始位置匹配card-01的布局 -->
<div class="video-wrapper" id="video-wrap">
<video
id="bg-video"
src="input-video.mp4"
muted
playsinline
data-start="0"
data-duration="121.2"
data-track-index="1"
></video>
</div>
<!-- 第二层:每个card-host位于其布局指定的边界。 -->
<!-- 重要:每个card-host必须同时包含"card-host"和"clip"类。 -->
<!-- - "card-host" → 我们的定位+指针事件样式 -->
<!-- - "clip" → HyperFrames运行时使用此类来控制可见性 -->
<!-- 仅在data-start … data-start+data-duration期间可见。 -->
<!-- 没有"clip"类的话,宿主会在整个视频期间保持可见 -->
<!-- (lint错误:timed_element_missing_clip_class)。 -->
<!-- 示例:zone="fullscreen"的card-01 → card-host覆盖(0,0,1920,1080) -->
<div
class="card-host clip"
data-card-id="card-01"
data-start="1.0000"
data-duration="6.5000"
data-track-index="2"
style="left:0;top:0;width:1920px;height:1080px;visibility:hidden;opacity:0;"
>
<!-- 粘贴public/cards/card-01.html的内容到此处 -->
</div>
<!-- 示例:zone="side-panel"的card-02(split合成文件布局)→ 卡片在左半部分 -->
<div
class="card-host clip"
data-card-id="card-02"
data-start="8.0000"
data-duration="12.0000"
data-track-index="2"
style="left:0;top:0;width:960px;height:1080px;visibility:hidden;opacity:0;"
>
<!-- card-02的HTML -->
</div>
<!-- ...每张卡片对应一个"card-host clip",内联边界匹配resolveZoneBounds(card.zone)... -->
<script src="vendor/gsap.min.js"></script>
<script>
(function () {
// count-up格式化工具
window.__fmt = function (v, fmt) {
if (typeof fmt === "string" && /^\.[0-9]+f$/.test(fmt)) {
return Number(v).toFixed(Number(fmt.slice(1, -1)));
}
if (fmt === ",d") return Math.round(v).toLocaleString();
return String(Math.round(v));
};
const tl = window.gsap.timeline({ paused: true });
// ── 卡片生命周期(每张卡片一个代码块) ──
// 示例:card-01 [1.0, 7.5],在+0.3处有kinetic-chars动画,+0.65处有grow-x动画:
// 入场(0.4秒淡入)
tl.set('.card-host[data-card-id="card-01"]', { visibility: "visible" }, 1.0);
tl.fromTo(
'.card-host[data-card-id="card-01"]',
{ opacity: 0 },
{ opacity: 1, duration: 0.4, ease: "power2.out" },
1.0,
);
// 卡片内部动画(在此编译每个data-anim-*声明)
tl.from(
'.card[data-card-id="card-01"] #card-01-title .char',
{ opacity: 0, y: 8, scale: 0.8, duration: 0.5, ease: "power2.out", stagger: 0.04 },
1.3,
);
tl.fromTo(
'.card[data-card-id="card-01"] #card-01-line',
{ width: 0 },
{ width: 420, duration: 0.5, ease: "power2.out" },
1.65,
);
// 退场(0.35秒淡出,在endSec结束)
tl.to(
'.card-host[data-card-id="card-01"]',
{ opacity: 0, duration: 0.35, ease: "power2.in" },
7.15,
);
tl.set('.card-host[data-card-id="card-01"]', { visibility: "hidden" }, 7.5);
// ── 视频画面过渡 ──
// 当下一张卡片使用不同的合成文件布局时,将video-wrapper动画到新边界。示例:card-01=全屏
// (视频在后方隐藏),card-02=split合成文件布局(zone="side-panel"
// → 视频在右侧,卡片在左侧)。
// card-02在8.0秒以split合成文件布局入场。在card-01→card-02的间隙(7.5到8.0秒之间)将视频动画到右半部分。
tl.set("#video-wrap", { className: "video-wrapper framed" }, 7.5);
tl.to(
"#video-wrap",
{ left: 960, top: 0, width: 960, height: 1080, duration: 0.6, ease: "power2.inOut" },
7.5,
);
// card-02入场——与card-01模式相同
tl.set('.card-host[data-card-id="card-02"]', { visibility: "visible" }, 8.0);
tl.fromTo(
'.card-host[data-card-id="card-02"]',
{ opacity: 0 },
{ opacity: 1, duration: 0.4, ease: "power2.out" },
8.0,
);
// ...card-02内部动画...
// ── 为每张卡片重复上述步骤;如果下一张卡片布局不同,
// 在其入场前插入另一个tl.to('#video-wrap', ...)动画 ──
window.__timelines = window.__timelines || {};
window.__timelines["graphic-overlays"] = tl;
})();
</script>
</div>
</body>
</html>GSAP Statement Cheat Sheet
GSAP语句速查表
Compile each attribute into a GSAP statement. Times are
absolute seconds = card.startSec + data-anim-at, quantized to 1/fps.
Selector is .
data-anim.card[data-card-id="X"] #elementId| data-anim | GSAP statement template |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Quantize: . At 30fps the smallest
step is ; rounding to 4 decimals () is fine
inside the JS literal.
T = Math.round(absSec * fps) / fps1/30 ≈ 0.0333s.toFixed(4)将每个属性编译为GSAP语句。时间为绝对秒数= card.startSec + data-anim-at,量化为1/fps。选择器为。
data-anim.card[data-card-id="X"] #elementId| data-anim | GSAP语句模板 |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
量化:。在30fps下最小步长为;在JS字面量中四舍五入到4位小数()即可。
T = Math.round(absSec * fps) / fps1/30 ≈ 0.0333s.toFixed(4)Video Framing Reference (per layout
value)
layout视频画面参考(按layout
取值)
layoutThe selector for the video container is . Animate its
bounds between cards using .
Initial bounds should be set inline on the element to match card-01's
layout. Pick a transition duration of 0.5–0.7s with .
#video-wraptl.to('#video-wrap', { ...bounds }, T)ease: 'power2.inOut'Decorative frames ( / / ) sit as a
sibling of and follow it through layout transitions.
See
for each frame's placement
HTML, suggested CSS, and which layouts it pairs with. Quick rule:
layout suppresses decorative frames (the full-bleed video
clashes with chrome); PiP layouts already have their own pill treatment
(border-radius + white ring + shadow), so add a decorative frame only on
top of / .
cleanhairlinepolaroid#video-wrapreferences/frames/overlaysplitstackGSAP target lookup table for per composition layout
(landscape 1920×1080 — for portrait & 4:5 see
which list all three ratios):
#video-wrapreferences/layouts/*.html| composition layout | typical card.zone | | extra css class |
|---|---|---|---|
| | | — |
| | | — |
| | | |
| | | |
| | | — |
| hide video (pure-graphic moment) | | | — |
To toggle the pip-pill chrome (border-radius + white ring + drop shadow)
when entering or leaving a pip moment:
js
// Enter pip — add chrome
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
"#video-wrap",
{ left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
T,
);
// Leave pip — back to clean full-bleed
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
"#video-wrap",
{ left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
T_NEXT,
);Card-host bounds match the zone. Resolve the card's into
pixel bounds using the table at the top of Step 6, then write those
into the card-host's inline . For zone (overlay recipe), the
card-host fills the full canvas — your CSS inside
decides where the actual visible card sits.
zonestyle="left:Xpx;top:Ypx;width:Wpx; height:Hpx;..."video-overlay.card .root视频容器的选择器为。使用在卡片间动画其边界。初始边界应在元素内联设置,以匹配card-01的布局。过渡时长选择0.5–0.7s,使用。
#video-wraptl.to('#video-wrap', { ...bounds }, T)ease: 'power2.inOut'装饰性画面(//)作为的同级元素存在,并跟随其进行布局过渡。查看
获取每个画面的放置HTML、建议CSS以及适配的布局。快速规则:布局禁用装饰性画面(全屏视频与边框冲突);PiP布局已自带药丸状样式(圆角+白边+阴影),因此仅在/布局上添加装饰性画面。
cleanhairlinepolaroid#video-wrapreferences/frames/overlaysplitstack#video-wrapreferences/layouts/*.html| 合成文件布局 | 典型card.zone | | 额外CSS类 |
|---|---|---|---|
| | | — |
| | | — |
| | | |
| | | |
| | | — |
| 隐藏视频(纯图形时刻) | | | — |
进入或退出PiP模式时,切换pip-pill样式(圆角+白边+阴影):
js
// 进入PiP——添加样式
tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T);
tl.to(
"#video-wrap",
{ left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" },
T,
);
// 退出PiP——回到纯净全屏
tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT);
tl.to(
"#video-wrap",
{ left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" },
T_NEXT,
);card-host边界匹配zone。使用步骤6顶部的表格将卡片的解析为像素边界,然后将其写入card-host的内联。对于zone(overlay布局),card-host填满整个画布——内部的CSS决定实际可见卡片的位置。
zonestyle="left:Xpx;top:Ypx;width:Wpx;height:Hpx;..."video-overlay.card .rootHyperFrames Layout / Animation QA Rules
HyperFrames布局/动画QA规则
- Build each card's static hero frame first: the moment where the card is fully visible and readable.
- Confirm video, cards, subtitles/captions, and diagrams do not unintentionally overlap.
- Confirm hidden video areas are clipped by the frame and not visible outside intended bounds.
- Register one paused master timeline as .
window.__timelines["graphic-overlays"] - Build timelines synchronously at page load; no ,
async, Promises, or mediasetTimeoutcalls.play() - Do not use or
Math.random()in render paths.Date.now() - Do not use ; calculate finite repeats from the video duration.
repeat: -1 - Prefer GSAP transforms and opacity (,
x,y,scale,rotation) over layout properties (opacity,top,left,width) for motion.height - Animate wrappers such as , not the video element dimensions directly.
#video-wrap - Avoid animating the same property on the same element from multiple timelines at the same time.
- Use , not
data-track-index; usedata-layer, notdata-duration.data-end - Every timed element (, sub-composition, etc.) MUST include
card-hostalongside its own classes — e.g.class="clip". The HyperFrames runtime usesclass="card-host clip"to gate visibility to the.clipwindow. Without it the element is visible for the whole video (lint:data-start … data-start+data-duration).timed_element_missing_clip_class - For body / global , list concrete font names (
font-family) — not a CSS variable like'Inter', 'Caveat', …. The HyperFrames font resolver doesn't expand CSS vars during static analysis (lint:var(--font-family)). Cards may still usefont_family_without_font_faceinternally since theirvar(--font-family)declarations are loaded.@font-face
- 先构建每张卡片的静态重点画面:卡片完全可见且可读的时刻。
- 确认视频、卡片、字幕/标题、图表不会意外重叠。
- 确认隐藏的视频区域被画面裁剪,不会在预期边界外可见。
- 注册一个暂停的主时间轴为。
window.__timelines["graphic-overlays"] - 在页面加载时同步构建时间轴;禁止使用、
async、Promises或媒体setTimeout调用。play() - 在渲染路径中禁止使用或
Math.random()。Date.now() - 禁止使用;根据视频时长计算有限重复次数。
repeat: -1 - 动画优先使用GSAP变换和透明度(、
x、y、scale、rotation),而非布局属性(opacity、top、left、width)。height - 动画容器(如),而非直接动画视频元素尺寸。
#video-wrap - 避免在同一时间从多个时间轴动画同一元素的同一属性。
- 使用,而非
data-track-index;使用data-layer,而非data-duration。data-end - 每个定时元素(、子合成文件等)必须在自身类之外包含
card-host——例如class="clip"。HyperFrames运行时使用class="card-host clip"控制元素仅在.clip窗口内可见。没有该类的话,元素会在整个视频期间可见(lint错误:data-start … data-start+data-duration)。timed_element_missing_clip_class - 对于主体/全局,列出具体字体名称(
font-family)——而非CSS变量如'Inter', 'Caveat', …。HyperFrames字体解析器在静态分析时不会展开CSS变量(lint错误:var(--font-family))。卡片仍可在内部使用font_family_without_font_face,因为它们的var(--font-family)声明已加载。@font-face
10. Render to MP4
10. 渲染为MP4
bash
cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
-o output.mp4 \
--fps 30hyperframes render <dir><dir>/index.htmlPRODUCER_BROWSER_GPU_MODE=hardware--browser-gpuFor a sanity check before the full render, capture a single frame at a
specific timestamp:
bash
npx hyperframes snapshot public --at 5 # → public/snapshots/frame-00-at-5s.png (a single --at ignores --out)bash
cd "$WORK_DIR"
PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \
-o output.mp4 \
--fps 30hyperframes render <dir><dir>/index.htmlPRODUCER_BROWSER_GPU_MODE=hardware--browser-gpu在完整渲染前进行 sanity 检查,捕获特定时间戳的单帧画面:
bash
npx hyperframes snapshot public --at 5 # → public/snapshots/frame-00-at-5s.png(单个--at忽略--out)11. Report Results
11. 报告结果
Tell the user:
- Work directory path
- (the card outline you designed)
storyboard.json - (one HTML per card)
public/cards/*.html - (the assembled composition)
public/index.html - (the final video)
output.mp4 - ASR provider used
- Card count + how you chose them (in 1 sentence)
- Any missing keys or quality caveats
Optional live preview (on request only). The clip plays unchanged inside with the overlays on top, so it previews faithfully. Don't open it during the run. When the user asks, start a long-lived server after render and report the URL:
public/index.htmlbash
(cd "$WORK_DIR/public" && npx hyperframes preview) # or `npx hyperframes play` for a shareable linkDo not delete the work directory unless the user asks.
告知用户:
- 工作目录路径
- (你设计的卡片大纲)
storyboard.json - (每张卡片对应一个HTML文件)
public/cards/*.html - (组装好的合成文件)
public/index.html - (最终视频)
output.mp4 - 使用的ASR提供商
- 卡片数量+选择理由(一句话)
- 任何缺失的密钥或质量说明
可选实时预览(仅在用户请求时提供)。源视频在中完整播放,叠加层在上方,因此预览效果准确。运行期间不要打开。当用户请求时,在渲染完成后启动长期服务器并报告URL:
public/index.htmlbash
(cd "$WORK_DIR/public" && npx hyperframes preview) # 或`npx hyperframes play`获取可分享链接除非用户要求,否则不要删除工作目录。