wjs-overlaying-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

wjs-overlaying-video

wjs-overlaying-video

Post-production for a video clip: cover, captions, illustrations, CTA, custom motion graphics — all composed in ONE HyperFrames project and rendered in a SINGLE final encode. No cascade of decodes/re-encodes (each cascade pass degrades quality and burns time).
视频片段后期制作:封面、字幕、插画、CTA、自定义动态图形——所有内容都在一个HyperFrames项目中合成,并通过一次最终编码渲染完成。无需多次解码/编码(每次重复编码都会降低画质并耗费时间)。

When to use

适用场景

  • Downstream of
    /wjs-segmenting-video
    — the segmentation skill hands you cropped clips + per-clip SRTs; this skill turns them into upload-ready MP4s with cover/captions/illustrations/CTA.
  • User has a finished video and wants to dress it up with motion graphics: opening hook, key-quote callout, closing slogan, chapter cards, AI-generated cover as first frame.
  • User wants HTML/CSS-quality captions on a video (kinetic word-by-word highlighting, custom fonts, large outlined text, seekable per cue).
  • User wants illustration overlays at specific hook moments — diagrams, big text emphasis, flow charts.
Don't use for:
  • Splitting one long video into clips → use
    /wjs-segmenting-video
    .
  • Creating the source SRT → use
    /wjs-transcribing-audio
    (then
    /wjs-translating-subtitles
    if you need a different language).
  • Full HyperFrames productions where the source isn't a fixed video → use
    hyperframes
    directly.
  • 微信视频号 / 抖音 upload (no public API for those) → this skill produces the MP4; upload is manual.
  • 作为
    /wjs-segmenting-video
    的下游技能
    —— 分割技能会输出裁剪后的片段+对应片段的SRT字幕;本技能将其转换为带有封面/字幕/插画/CTA的可直接上传MP4文件。
  • 用户已有成品视频,想要添加动态图形进行美化:开场钩子、关键引用标注、结尾标语、章节卡片、将AI生成的封面设为第一帧。
  • 用户想要为视频添加HTML/CSS级别的字幕(逐词高亮动态效果、自定义字体、带描边的大字号文本、可跳转的单句字幕)。
  • 用户想要在特定关键节点添加插画叠加层——图表、文本强调、流程图。
不适用场景
  • 将长视频分割为片段 → 使用
    /wjs-segmenting-video
  • 生成源SRT字幕 → 使用
    /wjs-transcribing-audio
    (如需其他语言,再使用
    /wjs-translating-subtitles
    )。
  • 源素材非固定视频的完整HyperFrames制作 → 直接使用
    hyperframes
  • 微信视频号/抖音上传(无公开API)→ 本技能仅生成MP4文件,需手动上传。

What this skill IS — and IS NOT

本技能的适用与不适用范围

IsIs not
Everything that goes ON TOP of a video clip: cover, caption, chapter, illustration, CTACutting / cropping a video (that's
/wjs-segmenting-video
+
/wjs-reframing-video
)
One HyperFrames composition per clip = ONE final encodeA multi-step decode/encode cascade
cover
is the literal first frame of the output (platforms auto-pick it as thumbnail)
A separate thumbnail file the user uploads alongside
Captions are HTML/CSS —
-webkit-text-stroke
for white-on-anything readability
libass burn-in (deprecated)
Illustrations: re-usable
stack
/
hammer
patterns + custom escape hatch
One bespoke HTML/CSS per illustration without re-use
AI covers regenerated at native target aspect (1024×1792 for vertical, 1536×1024 for horizontal)Single 1024×1536 default that letterboxes or crops on the platform
适用范围不适用范围
所有叠加在视频片段之上的内容:封面、字幕、章节、插画、CTA视频剪辑/裁剪(该功能由
/wjs-segmenting-video
+
/wjs-reframing-video
实现)
每个片段对应一个HyperFrames合成项目,仅需一次最终编码多步骤解码/编码流程
cover
是输出视频的第一帧(平台会自动将其选为缩略图)
用户需单独上传的缩略图文件
字幕采用HTML/CSS实现——通过
-webkit-text-stroke
确保在任何背景下都清晰可读
libass硬字幕(已废弃)
插画:可复用的
stack
/
hammer
模板 + 自定义扩展入口
每个插画都单独编写HTML/CSS,无复用性
AI封面会根据目标分辨率重新生成(竖版为1024×1792,横版为1536×1024)单一默认尺寸1024×1536,在平台上会出现黑边或被裁剪

The pipeline

工作流程

clip.mp4 + clip.zh-CN.burn.srt   (from /wjs-segmenting-video hand-off)
1. (Optional) Generate AI cover via gpt-image-2
   make_cover.py --segments S.json --out output/ --size 1024x1792
   cover_NN_slug.png

2. Scaffold a HyperFrames project per clip
   hf_clip_NN/1080/{index.html, clip.mp4, cover.png, captions.json}

3. Compose: cover scene + body video + caption track + chapter chip
            + 1-2 illustrations at hook moments + CTA scene

4. npm run check (lint + validate + visual inspect)
   npm run render → upload-ready MP4
A 2-minute vertical 1080×1920 composition renders in ~2-3 min on M-series Mac.
clip.mp4 + clip.zh-CN.burn.srt   (来自/wjs-segmenting-video的输出)
1. (可选) 通过gpt-image-2生成AI封面
   make_cover.py --segments S.json --out output/ --size 1024x1792
   cover_NN_slug.png

2. 为每个片段搭建HyperFrames项目
   hf_clip_NN/1080/{index.html, clip.mp4, cover.png, captions.json}

3. 合成:封面场景 + 主体视频 + 字幕轨道 + 章节标签
            + 1-2个关键节点插画 + CTA场景

4. npm run check (代码检查 + 验证 + 视觉预览)
   npm run render → 生成可上传的MP4文件
一段2分钟的1080×1920竖版合成视频,在M系列Mac上渲染耗时约2-3分钟。

Standard overlay types (the 6 building blocks)

标准叠加类型(6种基础组件)

Every clip's final composition is built from some combination of these. The agent picks the right ones per clip — typically all 6 for a podcast highlight, or just 1-2 for a single annotation overlay.
每个片段的最终合成由这些组件的组合构成。Agent会根据每个片段选择合适的组件——播客高光片段通常会用到全部6种,而单一标注叠加层可能只需要1-2种。

1.
cover
— full-frame AI image as first frame

1.
cover
—— 全屏AI图像作为第一帧

The cover IS the first frame (no animation, no zoom) so platforms that auto-pick the first frame as the thumbnail get your designed cover by default. Always verify with
ffmpeg -ss 0 -vframes 1
— frame 0 must NOT be black or platform thumbnails will be black.
HTML:
html
<div id="cover" class="clip" data-start="0" data-duration="1.6"
     data-track-index="1" data-layout-allow-overflow>
  <img src="cover.png" alt="" data-layout-allow-overflow />
</div>
CSS:
css
#cover { position: absolute; inset: 0; background: #0c0d10; overflow: hidden; }
#cover img { position: absolute; inset: 0; width: 100%; height: 100%; object-fit: cover; }
Generation: use
/wjs-segmenting-video/scripts/make_cover.py
(wraps
gpt-image-2 images edit
with the midpoint frame as ref):
bash
undefined
封面就是视频的第一帧(无动画、无缩放),因此自动选取第一帧作为缩略图的平台会默认使用你设计的封面。务必使用
ffmpeg -ss 0 -vframes 1
验证
——第0帧不能是黑色,否则平台缩略图会显示为黑色。
HTML:
html
<div id="cover" class="clip" data-start="0" data-duration="1.6"
     data-track-index="1" data-layout-allow-overflow>
  <img src="cover.png" alt="" data-layout-allow-overflow />
</div>
CSS:
css
#cover { position: absolute; inset: 0; background: #0c0d10; overflow: hidden; }
#cover img { position: absolute; inset: 0; width: 100%; height: 100%; object-fit: cover; }
生成方式: 使用
/wjs-segmenting-video/scripts/make_cover.py
(基于
gpt-image-2 images edit
,以中间帧为参考):
bash
undefined

For 1080×1920 vertical output (视频号 / 抖音):

For 1080×1920 vertical output (视频号 / 抖音):

make_cover.py --segments S.json --out output/ --size 1024x1792 [--single N]
make_cover.py --segments S.json --out output/ --size 1024x1792 [--single N]

For 1920×1080 horizontal output (YouTube / B站):

For 1920×1080 horizontal output (YouTube / B站):

make_cover.py --segments S.json --out output/ --size 1536x1024

**Aspect must match output frame.** `--size 1024x1536` (2:3, the
script default) gets letterboxed or cropped on 9:16 output — always
pass `1024x1792` for vertical. The cover image's aspect is what the
viewer sees full-frame, so mismatch is visible. Re-roll one with
`--single N`; codex provider can transient-fail mid-batch.

**Codex auth required**: the script calls codex CLI via
`gpt-image-2-skill`. If `~/.codex/auth.json` is missing, the script
errors. See `gpt-image-2-skill` for setup.
make_cover.py --segments S.json --out output/ --size 1536x1024

**分辨率必须匹配输出帧**。默认的`--size 1024x1536`(2:3)在9:16输出时会出现黑边或被裁剪——竖版输出务必传入`1024x1792`。封面图像的分辨率决定了观众看到的全屏效果,不匹配会很明显。可使用`--single N`重新生成某一帧;批量生成时可能会出现临时失败。

**需要Codex授权**: 该脚本通过`gpt-image-2-skill`调用codex CLI。如果`~/.codex/auth.json`缺失,脚本会报错。请查看`gpt-image-2-skill`获取设置方法。

2.
caption
— outlined HTML/CSS captions synced to SRT

2.
caption
—— 与SRT同步的描边HTML/CSS字幕

White text with thick black stroke, no bubble background, vertically centered in a fixed zone (so 1-line vs 2-line captions don't make the visual center jump up and down).
HTML:
html
<div id="caption" class="clip" data-start="{body_start}"
     data-duration="{body_dur}" data-track-index="4"></div>
CSS (vertical 1080×1920):
css
#caption {
  position: absolute; left: 0; right: 0; bottom: 240px;
  height: 240px; z-index: 10; overflow: visible;
}
#caption .bubble {
  position: absolute; top: 50%; left: 50%;
  display: inline-block;
  padding: 0 24px;
  font-size: 56px; line-height: 1.18; font-weight: 900;
  color: #ffffff; max-width: 1020px; text-align: center;
  -webkit-text-stroke: 5px #000;
  paint-order: stroke fill;
  text-shadow: 0 6px 12px rgba(0,0,0,0.55), 0 0 4px rgba(0,0,0,0.6);
  letter-spacing: 0.01em;
}
JS (one bubble per cue + GSAP fade in/out, all centered at container midpoint):
js
// SRT cues are loaded as inline JSON. Each cue's start/end is offset
// by the cover-scene duration (e.g., 1.5s) so the timing aligns with
// the composition timeline (not the body's own t=0).
const captionEl = document.getElementById("caption");
const groups = JSON.parse(document.getElementById("captions-data").textContent);
const bubbles = groups.map((g, i) => {
  const b = document.createElement("span");
  b.className = "bubble"; b.id = "cap-" + i;
  b.textContent = g.text; b.style.opacity = "0";
  captionEl.appendChild(b);
  return b;
});
// GSAP xPercent/yPercent for centering (CSS transform would get
// overwritten the moment we tween y).
gsap.set(bubbles, { xPercent: -50, yPercent: -50 });
groups.forEach((g, i) => {
  const el = bubbles[i];
  tl.fromTo(el, { opacity: 0, y: 12 }, { opacity: 1, y: 0, duration: 0.18, ease: "power2.out" }, g.start);
  const exitStart = Math.max(g.start + 0.18, g.end - 0.12);
  tl.to(el, { opacity: 0, duration: 0.12, ease: "power2.in" }, exitStart);
  tl.set(el, { opacity: 0 }, g.end);
});
Source SRT — slice + shift before inlining. Take
clip_NN.zh-CN.burn.srt
from segmentation, parse each cue, add the cover duration to every
start
/
end
, and inline as JSON in a
<script id="captions-data" type="application/json">
block.
MarginV / position notes:
  • Vertical (1080×1920):
    bottom: 240px
    keeps captions clear of the 视频号/抖音 bottom UI overlay (likes/comments/share buttons).
  • Horizontal (1920×1080):
    bottom: 100px
    ,
    font-size: 48px
    ,
    -webkit-text-stroke: 4px
    is a reasonable default.
Caption length cap. If a single cue exceeds ~18 Chinese chars on 1080-wide at 56px, it wraps to 2 lines awkwardly. This is upstream discipline —
/wjs-translating-subtitles
should cap cues at ~18 chars using word-gap split + punctuation split. If you receive longer cues, either reduce
font-size
to 48px or accept the wrap.
白色文本搭配粗黑描边,无气泡背景,垂直居中固定区域(这样单行与双行字幕不会导致视觉中心上下跳动)。
HTML:
html
<div id="caption" class="clip" data-start="{body_start}"
     data-duration="{body_dur}" data-track-index="4"></div>
CSS (竖版1080×1920):
css
#caption {
  position: absolute; left: 0; right: 0; bottom: 240px;
  height: 240px; z-index: 10; overflow: visible;
}
#caption .bubble {
  position: absolute; top: 50%; left: 50%;
  display: inline-block;
  padding: 0 24px;
  font-size: 56px; line-height: 1.18; font-weight: 900;
  color: #ffffff; max-width: 1020px; text-align: center;
  -webkit-text-stroke: 5px #000;
  paint-order: stroke fill;
  text-shadow: 0 6px 12px rgba(0,0,0,0.55), 0 0 4px rgba(0,0,0,0.6);
  letter-spacing: 0.01em;
}
JS (每个字幕段对应一个气泡 + GSAP淡入淡出效果,全部居中于容器中点):
js
// SRT cues are loaded as inline JSON. Each cue's start/end is offset
// by the cover-scene duration (e.g., 1.5s) so the timing aligns with
// the composition timeline (not the body's own t=0).
const captionEl = document.getElementById("caption");
const groups = JSON.parse(document.getElementById("captions-data").textContent);
const bubbles = groups.map((g, i) => {
  const b = document.createElement("span");
  b.className = "bubble"; b.id = "cap-" + i;
  b.textContent = g.text; b.style.opacity = "0";
  captionEl.appendChild(b);
  return b;
});
// GSAP xPercent/yPercent for centering (CSS transform would get
// overwritten the moment we tween y).
gsap.set(bubbles, { xPercent: -50, yPercent: -50 });
groups.forEach((g, i) => {
  const el = bubbles[i];
  tl.fromTo(el, { opacity: 0, y: 12 }, { opacity: 1, y: 0, duration: 0.18, ease: "power2.out" }, g.start);
  const exitStart = Math.max(g.start + 0.18, g.end - 0.12);
  tl.to(el, { opacity: 0, duration: 0.12, ease: "power2.in" }, exitStart);
  tl.set(el, { opacity: 0 }, g.end);
});
源SRT处理——嵌入前需切片+偏移。取分割技能输出的
clip_NN.zh-CN.burn.srt
,解析每个字幕段,将封面场景时长添加到每个
start
/
end
,然后以JSON形式嵌入到
<script id="captions-data" type="application/json">
块中。
垂直边距/位置说明:
  • 竖版(1080×1920):
    bottom: 240px
    可确保字幕避开视频号/抖音底部UI叠加层(点赞/评论/分享按钮)。
  • 横版(1920×1080):
    bottom: 100px
    ,
    font-size: 48px
    ,
    -webkit-text-stroke: 4px
    是合理的默认值。
字幕长度限制。在1080宽度下,56px字号的单个字幕段若超过约18个中文字符,会尴尬地换行。这需要上游控制——
/wjs-translating-subtitles
应通过分词+标点分割将字幕段限制在约18个字符以内。如果收到较长的字幕段,可将
font-size
减小到48px,或接受换行。

3.
chapter
— top-left chapter chip (4s reveal then fade)

3.
chapter
—— 左上角章节标签(显示4秒后淡出)

A subtle badge identifying the segment. Enters at body start, fades after a few seconds so it doesn't compete with the rest of the composition.
HTML:
html
<div id="chapter" class="clip" data-start="{body_start}"
     data-duration="{body_dur}" data-track-index="3">
  <span class="dot"></span>
  <span class="text">第一段 · 自然语言才是新代码</span>
</div>
CSS:
css
#chapter {
  position: absolute; top: 80px; left: 60px; z-index: 9;
  display: inline-flex; align-items: center; gap: 12px;
  padding: 12px 20px;
  background: rgba(12,13,16,0.78);
  border: 1px solid rgba(199,150,85,0.4);
  border-radius: 999px;
}
#chapter .dot { width: 10px; height: 10px; border-radius: 999px; background: #e8b063; }
#chapter .text {
  font-size: 24px; color: #f4f4f5; letter-spacing: 0.04em; font-weight: 600;
}
GSAP:
js
tl.from("#chapter", { x: -40, opacity: 0, duration: 0.5, ease: "expo.out" }, body_start + 0.4);
tl.to("#chapter", { opacity: 0, duration: 0.4, ease: "power2.in" }, body_start + 4.0);
用于标识片段的微妙徽章。在主体视频开始时出现,几秒后淡出,避免干扰其他合成内容。
HTML:
html
<div id="chapter" class="clip" data-start="{body_start}"
     data-duration="{body_dur}" data-track-index="3">
  <span class="dot"></span>
  <span class="text">第一段 · 自然语言才是新代码</span>
</div>
CSS:
css
#chapter {
  position: absolute; top: 80px; left: 60px; z-index: 9;
  display: inline-flex; align-items: center; gap: 12px;
  padding: 12px 20px;
  background: rgba(12,13,16,0.78);
  border: 1px solid rgba(199,150,85,0.4);
  border-radius: 999px;
}
#chapter .dot { width: 10px; height: 10px; border-radius: 999px; background: #e8b063; }
#chapter .text {
  font-size: 24px; color: #f4f4f5; letter-spacing: 0.04em; font-weight: 600;
}
GSAP:
js
tl.from("#chapter", { x: -40, opacity: 0, duration: 0.5, ease: "expo.out" }, body_start + 0.4);
tl.to("#chapter", { opacity: 0, duration: 0.4, ease: "power2.in" }, body_start + 4.0);

4.
stack
illustration — top-right vertical list card

4.
stack
插画 —— 右上角垂直列表卡片

A list of items (e.g., language hierarchy, workflow steps, levels) in a dark card at the top-right. One item can be accented in amber to highlight the relevant level/step.
Use for: showing a hierarchy or list while the speaker explains it. Card stays visible 8-50s.
HTML:
html
<div id="ill-stack" class="clip" data-start="{start}" data-duration="{dur}" data-track-index="5">
  <div class="ill-card">
    <div class="ill-card-label">我们写的层级</div>
    <div class="ill-row"><span class="ill-tag accent">自然语言</span></div>
    <div class="ill-row"><span class="ill-tag">Python</span></div>
    <div class="ill-row"><span class="ill-tag">C</span></div>
    <div class="ill-row"><span class="ill-tag">Assembly</span></div>
  </div>
</div>
CSS: (see
references/illustration_patterns.md
for the full canonical CSS — copy verbatim)
GSAP — slide in from right + stagger rows:
js
tl.fromTo("#ill-stack", { x: 360, opacity: 0 }, { x: 0, opacity: 1, duration: 0.6, ease: "expo.out" }, start + 0.2);
tl.from("#ill-stack .ill-row", { y: 20, opacity: 0, duration: 0.4, stagger: 0.12, ease: "power2.out" }, start + 0.4);
tl.to("#ill-stack", { x: 360, opacity: 0, duration: 0.5, ease: "power2.in" }, end - 0.5);
显示项目列表(如语言层级、工作流步骤、等级)的深色卡片,位于右上角。其中一个项目可使用琥珀色高亮,以突出相关等级/步骤。
适用场景: 演讲者讲解层级或列表时显示。卡片显示时长为8-50秒。
HTML:
html
<div id="ill-stack" class="clip" data-start="{start}" data-duration="{dur}" data-track-index="5">
  <div class="ill-card">
    <div class="ill-card-label">我们写的层级</div>
    <div class="ill-row"><span class="ill-tag accent">自然语言</span></div>
    <div class="ill-row"><span class="ill-tag">Python</span></div>
    <div class="ill-row"><span class="ill-tag">C</span></div>
    <div class="ill-row"><span class="ill-tag">Assembly</span></div>
  </div>
</div>
CSS: (完整标准CSS请查看
references/illustration_patterns.md
——直接复制使用)
GSAP — 从右侧滑入 + 逐行动画:
js
tl.fromTo("#ill-stack", { x: 360, opacity: 0 }, { x: 0, opacity: 1, duration: 0.6, ease: "expo.out" }, start + 0.2);
tl.from("#ill-stack .ill-row", { y: 20, opacity: 0, duration: 0.4, stagger: 0.12, ease: "power2.out" }, start + 0.4);
tl.to("#ill-stack", { x: 360, opacity: 0, duration: 0.5, ease: "power2.in" }, end - 0.5);

5.
hammer
illustration — center-frame big equation/text overlay

5.
hammer
插画 —— 居中大文本/公式叠加层

A BIG center-frame text/equation that visually "hammers" a key claim. Best for the single most quotable moment in a clip (e.g., "LLM = 编译器", "Token = 新 GDP", "AI ≠ 更快的轿子"). Visible 4–8s.
HTML:
html
<div id="ill-hammer" class="clip" data-start="{start}" data-duration="{dur}" data-track-index="6">
  <div class="ill-h-content">
    <div class="ill-h-eq">
      <span class="ill-h-left">LLM</span>
      <span class="ill-h-equals">=</span>
      <span class="ill-h-right">新编译器</span>
    </div>
    <div class="ill-h-foot">自然语言 → Python → 汇编</div>
  </div>
</div>
GSAP — scale-pop entrance + stagger each piece + scale-fade exit:
js
tl.fromTo("#ill-hammer", { scale: 0.85, opacity: 0 },
  { scale: 1.0, opacity: 1, duration: 0.45, ease: "back.out(1.6)" }, start);
tl.from("#ill-hammer .ill-h-left", { x: -40, opacity: 0, duration: 0.4, ease: "expo.out" }, start + 0.2);
tl.from("#ill-hammer .ill-h-equals", { scale: 0, opacity: 0, duration: 0.4, ease: "back.out(2)" }, start + 0.4);
tl.from("#ill-hammer .ill-h-right", { x: 40, opacity: 0, duration: 0.4, ease: "expo.out" }, start + 0.6);
tl.from("#ill-hammer .ill-h-foot", { y: 20, opacity: 0, duration: 0.4, ease: "power2.out" }, start + 0.8);
tl.to("#ill-hammer", { scale: 1.05, opacity: 0, duration: 0.45, ease: "power2.in" }, end - 0.45);
(see
references/illustration_patterns.md
for full canonical CSS)
居中显示的大文本/公式,用于视觉强化关键观点。最适合片段中最具引用价值的时刻(如"LLM = 编译器", "Token = 新 GDP", "AI ≠ 更快的轿子")。显示时长为4–8秒。
HTML:
html
<div id="ill-hammer" class="clip" data-start="{start}" data-duration="{dur}" data-track-index="6">
  <div class="ill-h-content">
    <div class="ill-h-eq">
      <span class="ill-h-left">LLM</span>
      <span class="ill-h-equals">=</span>
      <span class="ill-h-right">新编译器</span>
    </div>
    <div class="ill-h-foot">自然语言 → Python → 汇编</div>
  </div>
</div>
GSAP — 缩放弹出式入场 + 逐元素动画 + 缩放淡出退场:
js
tl.fromTo("#ill-hammer", { scale: 0.85, opacity: 0 },
  { scale: 1.0, opacity: 1, duration: 0.45, ease: "back.out(1.6)" }, start);
tl.from("#ill-hammer .ill-h-left", { x: -40, opacity: 0, duration: 0.4, ease: "expo.out" }, start + 0.2);
tl.from("#ill-hammer .ill-h-equals", { scale: 0, opacity: 0, duration: 0.4, ease: "back.out(2)" }, start + 0.4);
tl.from("#ill-hammer .ill-h-right", { x: 40, opacity: 0, duration: 0.4, ease: "expo.out" }, start + 0.6);
tl.from("#ill-hammer .ill-h-foot", { y: 20, opacity: 0, duration: 0.4, ease: "power2.out" }, start + 0.8);
tl.to("#ill-hammer", { scale: 1.05, opacity: 0, duration: 0.45, ease: "power2.in" }, end - 0.45);
(完整标准CSS请查看
references/illustration_patterns.md
)

6.
cta
— end-card with channel CTA

6.
cta
—— 片尾CTA卡片

A branded outro for the final 3 seconds. Use 王建硕 as the channel name (per global instructions) — never put a guest's name in the CTA slot.
HTML:
html
<div id="cta" class="clip" data-start="{cta_start}" data-duration="3.24" data-track-index="1">
  <div class="cta-line-1">关注王建硕</div>
  <div class="arrow"></div>
  <div class="cta-line-2">微信公众号 · 视频号</div>
  <div class="cta-foot">AI 炼金术 · 持续更新</div>
</div>
CSS / GSAP: see
references/illustration_patterns.md
.
最后3秒的品牌片尾。根据全局要求,使用王建硕作为频道名称——请勿将嘉宾姓名放在CTA位置。
HTML:
html
<div id="cta" class="clip" data-start="{cta_start}" data-duration="3.24" data-track-index="1">
  <div class="cta-line-1">关注王建硕</div>
  <div class="arrow"></div>
  <div class="cta-line-2">微信公众号 · 视频号</div>
  <div class="cta-foot">AI 炼金术 · 持续更新</div>
</div>
CSS / GSAP: 请查看
references/illustration_patterns.md

Legacy types (for one-off overlays on a single video)

旧版类型(单视频一次性叠加)

The
spec.json + scaffold.py
workflow also supports these older overlay types — useful when you want to dress up ONE existing video without going through the full post-production workflow above:
  • quote
    — full-width kinetic typography, top or bottom gradient. Best for opening hooks and key-quote callouts.
  • slogan
    — alias for
    quote
    with
    position: bottom
    and larger type. Best for closing slogans.
  • callout
    — small annotation panel in a corner. Best for chapter labels, lower-thirds, "as seen in" notes.
  • custom
    — escape hatch. Claude writes the overlay's HTML/CSS/GSAP inside an
    overlays/<name>.html
    fragment file. See
    references/custom_overlay_recipes.md
    .
spec.json + scaffold.py
工作流还支持这些旧版叠加类型——适用于想要为单个现有视频添加少量临时叠加层(标题卡、标注、下三分之一字幕)的场景:
  • quote
    —— 全屏动态排版,顶部或底部渐变。最适合开场钩子和关键引用标注。
  • slogan
    ——
    quote
    的别名,
    position: bottom
    且字号更大。最适合结尾标语。
  • callout
    —— 角落的小型标注面板。最适合章节标签、下三分之一字幕、"出处"说明。
  • custom
    —— 扩展入口。Claude会在
    overlays/<name>.html
    片段文件中编写叠加层的HTML/CSS/GSAP代码。请查看
    references/custom_overlay_recipes.md

Workflow A — Post-segmentation preset (most common)

工作流A — 分割后预设(最常用)

Use this when you're coming directly from
/wjs-segmenting-video
and want the standard
cover + caption + chapter + illustrations + CTA
treatment for each clip.
当你直接从
/wjs-segmenting-video
承接任务,想要为每个片段添加标准的
封面+字幕+章节+插画+CTA
处理时使用此工作流。

Step 1 — Generate AI covers at the right aspect

步骤1 — 生成对应分辨率的AI封面

bash
undefined
bash
undefined

For vertical 9:16 output (视频号 / 抖音):

For vertical 9:16 output (视频号 / 抖音):

python3 ~/.claude/skills/wjs-segmenting-video/scripts/make_cover.py
--segments segments.json --out output/ --size 1024x1792 --single 1
python3 ~/.claude/skills/wjs-segmenting-video/scripts/make_cover.py
--segments segments.json --out output/ --size 1024x1792 --single 1

Verify segment 1's cover; then batch:

Verify segment 1's cover; then batch:

python3 ~/.claude/skills/wjs-segmenting-video/scripts/make_cover.py
--segments segments.json --out output/ --size 1024x1792
undefined
python3 ~/.claude/skills/wjs-segmenting-video/scripts/make_cover.py
--segments segments.json --out output/ --size 1024x1792
undefined

Step 2 — For each clip, scaffold a HyperFrames project

步骤2 — 为每个片段搭建HyperFrames项目

hf_clip_NN/1080/
with:
  • index.html
    — the composition (from template; see
    references/post_segmentation_template.html
    )
  • clip.mp4
    — copied from
    output/clip_NN_slug.mp4
  • cover.png
    — copied from
    output/cover_NN_slug.png
  • captions.json
    — generated from
    output/clip_NN_slug.zh-CN.burn.srt
    with every cue's start/end shifted by +cover_duration (so cues align with the composition timeline, not the body's own clock)
The build script at
references/build_hf_clips.py
does this for all segments in one pass. It reads
segments.json
+ an
ILLUSTRATIONS
dict (illustrations per clip, see Step 3) + the template, and emits 5 ready-to-render projects.
生成
hf_clip_NN/1080/
目录,包含:
  • index.html
    —— 合成文件(基于模板;请查看
    references/post_segmentation_template.html
  • clip.mp4
    —— 从
    output/clip_NN_slug.mp4
    复制而来
  • cover.png
    —— 从
    output/cover_NN_slug.png
    复制而来
  • captions.json
    —— 从
    output/clip_NN_slug.zh-CN.burn.srt
    生成,每个字幕段的start/end都加上封面场景时长(确保字幕与合成时间线对齐,而非主体视频自身时间线)
references/build_hf_clips.py
构建脚本可一次性处理所有片段。它读取
segments.json
+
ILLUSTRATIONS
字典(每个片段的插画信息,见步骤3) + 模板,生成5个可直接渲染的项目。

Step 3 — Define illustrations per clip

步骤3 — 定义每个片段的插画

For each clip, identify 1-2 hook moments and pick
stack
or
hammer
:
python
ILLUSTRATIONS = {
    1: [
        # The language hierarchy as a stack card during the opening
        {"key": "stack", "pattern": "stack", "body_start": 0.3, "body_end": 9.0,
         "label": "我们写的层级",
         "rows": [
             {"text": "自然语言", "accent": True},
             {"text": "Python",   "accent": False},
             {"text": "C",        "accent": False},
             {"text": "Assembly", "accent": False},
         ]},
        # The hammer at the most quotable moment
        {"key": "hammer", "pattern": "hammer", "body_start": 10.8, "body_end": 14.6,
         "left": "LLM", "equals": "=", "right": "新编译器",
         "foot": "自然语言 → Python → 汇编"},
    ],
    # ... clips 2-5
}
Timestamps are body-relative (after the cover-scene duration); the build script adds the cover offset when emitting GSAP positions.
为每个片段确定1-2个关键节点,选择
stack
hammer
类型:
python
ILLUSTRATIONS = {
    1: [
        # 开场时显示语言层级的stack卡片
        {"key": "stack", "pattern": "stack", "body_start": 0.3, "body_end": 9.0,
         "label": "我们写的层级",
         "rows": [
             {"text": "自然语言", "accent": True},
             {"text": "Python",   "accent": False},
             {"text": "C",        "accent": False},
             {"text": "Assembly", "accent": False},
         ]},
        # 在最具引用价值的时刻显示hammer插画
        {"key": "hammer", "pattern": "hammer", "body_start": 10.8, "body_end": 14.6,
         "left": "LLM", "equals": "=", "right": "新编译器",
         "foot": "自然语言 → Python → 汇编"},
    ],
    # ... 片段2-5
}
时间戳为主体视频相对时间(即封面场景之后);构建脚本在生成GSAP位置时会添加封面偏移量。

Step 4 — Build + render

步骤4 — 构建 + 渲染

bash
python3 references/build_hf_clips.py    # scaffolds all projects
for n in 01 02 03 04 05; do
  cd "hf_clip_$n/1080"
  npx hyperframes lint
  npx hyperframes validate
  npx hyperframes render
  cd ../..
done
A 2:30 clip renders in ~3 min. Output:
hf_clip_NN/1080/renders/*.mp4
.
bash
python3 references/build_hf_clips.py    # 搭建所有项目
for n in 01 02 03 04 05; do
  cd "hf_clip_$n/1080"
  npx hyperframes lint
  npx hyperframes validate
  npx hyperframes render
  cd ../..
done
一段2分30秒的片段渲染耗时约3分钟。输出文件:
hf_clip_NN/1080/renders/*.mp4

Workflow B — Custom overlays on a single video (legacy spec.json)

工作流B — 单视频自定义叠加层(旧版spec.json)

Use this when you have ONE existing video and want to add a few ad-hoc overlays (title cards, annotations, lower-thirds).
当你有一个现有视频,想要添加一些临时叠加层(标题卡、标注、下三分之一字幕)时使用此工作流。

spec.json schema

spec.json schema

json
{
  "source_video": "../path/to/source.mp4",
  "duration": 135.4,
  "size": "1920x1080",
  "name": "clip_01_animated",
  "overlays": [
    {"id": "o1", "type": "quote", "start": 8.0, "duration": 6.0,
     "position": "top", "lines": ["代码不存在错误", "只存在意图错配"],
     "accent": [false, true]},
    {"id": "o2", "type": "callout", "start": 30.0, "duration": 5.0,
     "anchor": "top-right", "text": "FRP 概念"},
    {"id": "o3", "type": "slogan", "start": 122.0, "duration": 13.4,
     "lines": ["改 prompt", "不改 AI 生成的代码"], "accent": [false, true]}
  ]
}
FieldRequiredNotes
source_video
YesPath to source MP4. Symlinked into the project as
source.mp4
.
duration
YesTotal composition length in seconds — match the source video.
size
No
WIDTHxHEIGHT
(default
1920x1080
).
overlays[].type
Yes
quote
,
slogan
,
callout
, or
custom
.
overlays[].start
YesStart time in seconds.
overlays[].duration
YesHow long the overlay is on screen.
json
{
  "source_video": "../path/to/source.mp4",
  "duration": 135.4,
  "size": "1920x1080",
  "name": "clip_01_animated",
  "overlays": [
    {"id": "o1", "type": "quote", "start": 8.0, "duration": 6.0,
     "position": "top", "lines": ["代码不存在错误", "只存在意图错配"],
     "accent": [false, true]},
    {"id": "o2", "type": "callout", "start": 30.0, "duration": 5.0,
     "anchor": "top-right", "text": "FRP 概念"},
    {"id": "o3", "type": "slogan", "start": 122.0, "duration": 13.4,
     "lines": ["改 prompt", "不改 AI 生成的代码"], "accent": [false, true]}
  ]
}
字段是否必填说明
source_video
源MP4文件路径。会被符号链接到项目中作为
source.mp4
duration
合成视频总时长(秒)——需与源视频匹配。
size
宽x高
(默认
1920x1080
)。
overlays[].type
quote
,
slogan
,
callout
, 或
custom
overlays[].start
开始时间(秒)。
overlays[].duration
叠加层显示时长(秒)。

Scaffold + render

搭建 + 渲染

bash
python3 ~/.claude/skills/wjs-overlaying-video/scripts/scaffold.py spec.json
cd <name> && npm run check && npm run render
bash
python3 ~/.claude/skills/wjs-overlaying-video/scripts/scaffold.py spec.json
cd <name> && npm run check && npm run render

Output checklist

输出检查清单

Before considering a clip done:
  • Frame 0 is the cover (not black) —
    ffmpeg -ss 0 -vframes 1 out.mp4
  • Captions are synced with audio (lint a few seconds with audio playback)
  • All illustrations enter and exit at the speech moments they support
  • CTA renders correctly (
    关注王建硕
    , not a guest's name)
  • npx hyperframes lint && npx hyperframes validate
    both pass
  • npx hyperframes inspect
    shows no layout overflow
  • Total duration matches the source clip + cover + CTA durations
确认片段完成前需检查:
  • 第0帧是封面(非黑色)—— 使用
    ffmpeg -ss 0 -vframes 1 out.mp4
    验证
  • 字幕与音频同步(播放音频检查几秒内容)
  • 所有插画在对应的演讲时刻入场和退场
  • CTA渲染正确(显示
    关注王建硕
    ,而非嘉宾姓名)
  • npx hyperframes lint && npx hyperframes validate
    均通过
  • npx hyperframes inspect
    显示无布局溢出
  • 总时长匹配源片段+封面+CTA时长

Common mistakes

常见错误

  • Cover aspect ≠ output aspect.
    1024x1536
    (the default make_cover.py size) is 2:3 and gets letterboxed or cropped on 9:16 output. Always pass
    --size 1024x1792
    for vertical.
  • Caption alignment jumps with line count. Anchor by CENTER (translate(-50%, -50%)) inside a fixed-height container so 1-line vs 2-line cues share the same visual midline. NOT anchored from bottom (causes growth-upward).
  • GSAP overwrites CSS transform centering. If you set
    transform: translate(-50%, -50%)
    in CSS and then tween
    y
    , GSAP replaces the transform and centering breaks. Use
    gsap.set(el, { xPercent: -50, yPercent: -50 })
    instead so xPercent/yPercent compose with subsequent y/x tweens.
  • Burning libass subs on top of HTML/CSS captions. Pick ONE caption system per output video. If you're using this skill's HTML/CSS captions, do NOT also burn subs in
    /wjs-segmenting-video
    — request the raw clip via the hand-off package.
  • Frame 0 is black. If your cover scene has an opacity fade-in starting from 0, the literal first frame is black and the platform thumbnail will be black. Place the cover statically (no opacity tween) and verify with
    ffmpeg -ss 0 -vframes 1
    .
  • Channel name in CTA = guest's name. Always use
    王建硕
    . Guests belong in description text inside the metadata, not in the on-screen CTA.
  • Cover image cropped because of object-fit: cover on mismatched aspect. Either regenerate the cover at the right aspect (see Step
    1. or letterbox with
      object-fit: contain
      + dark background.
  • 封面分辨率与输出分辨率不匹配
    make_cover.py
    默认的
    1024x1536
    是2:3,在9:16输出时会出现黑边或被裁剪。竖版输出务必传入
    --size 1024x1792
  • 字幕对齐随行数变化。在固定高度容器内使用CENTER锚点(translate(-50%, -50%)),这样单行与双行字幕共享同一视觉中线。不要使用底部锚点(会导致向上偏移)。
  • GSAP覆盖CSS变换居中。如果在CSS中设置
    transform: translate(-50%, -50%)
    然后补间
    y
    ,GSAP会替换变换属性,导致居中失效。请使用
    gsap.set(el, { xPercent: -50, yPercent: -50 })
    ,这样xPercent/yPercent可与后续的y/x补间组合使用。
  • 在HTML/CSS字幕上叠加libass硬字幕。每个输出视频只能选择一种字幕系统。如果使用本技能的HTML/CSS字幕,请勿在
    /wjs-segmenting-video
    中添加硬字幕——请通过交接包获取原始片段。
  • 第0帧是黑色。如果封面场景有从0开始的透明度淡入动画,第一帧会是黑色,平台缩略图会显示为黑色。请静态放置封面(无透明度补间),并使用
    ffmpeg -ss 0 -vframes 1
    验证。
  • CTA中的频道名称是嘉宾姓名。务必使用
    王建硕
    。嘉宾信息应放在元数据的描述文本中,而非屏幕上的CTA。
  • 封面因分辨率不匹配被裁剪。要么重新生成对应分辨率的封面(见步骤1),要么使用
    object-fit: contain
    +深色背景添加黑边。

Integration with other skills

与其他技能的集成

  • /wjs-segmenting-video
    — the typical upstream. After it cuts
    • crops + slices SRTs, this skill picks up. The hand-off package is
      clip_NN.mp4
      +
      clip_NN.zh-CN.burn.srt
      +
      segments.json
      .
  • /wjs-transcribing-audio
    +
    /wjs-translating-subtitles
    — if no SRT exists, run them first. The word-level Whisper or Volcano/豆包 ASR output is preferred for accurate cue timing.
  • hyperframes
    — the underlying composition framework. This skill is a thin wrapper that encodes the proven post-production patterns; everything in the
    hyperframes
    skill applies (preview, render, transitions, audio-reactive, etc.). Read it whenever you write
    custom
    overlays.
  • hyperframes-cli
    — the CLI commands the project uses (
    init
    ,
    lint
    ,
    validate
    ,
    inspect
    ,
    render
    ).
  • gpt-image-2-skill
    — the cover generator.
    make_cover.py
    invokes it via the codex CLI; the codex auth in
    ~/.codex/auth.json
    is required.
  • /wjs-uploading-video
    — the next downstream after this skill produces an MP4. Uploads the renders to YouTube with title / description / tags from a metadata file.
  • /wjs-segmenting-video
    —— 典型上游技能。在它完成切割+裁剪+SRT切片后,本技能承接任务。交接包为
    clip_NN.mp4
    +
    clip_NN.zh-CN.burn.srt
    +
    segments.json
  • /wjs-transcribing-audio
    +
    /wjs-translating-subtitles
    —— 如果没有SRT字幕,先运行这两个技能。基于词级别的Whisper或火山/豆包ASR输出更适合精准的字幕时间同步。
  • hyperframes
    —— 底层合成框架。本技能是经过验证的后期制作模式的轻量封装;
    hyperframes
    技能中的所有内容(预览、渲染、过渡、音频响应等)均适用。编写
    custom
    叠加层时请参考该技能文档。
  • hyperframes-cli
    —— 项目使用的CLI命令(
    init
    ,
    lint
    ,
    validate
    ,
    inspect
    ,
    render
    )。
  • gpt-image-2-skill
    —— 封面生成器。
    make_cover.py
    通过codex CLI调用它;需要
    ~/.codex/auth.json
    中的codex授权。
  • /wjs-uploading-video
    —— 本技能生成MP4后的下游技能。将渲染文件上传到YouTube,并从元数据文件中获取标题/描述/标签。

Files & references

文件与参考资料

  • scripts/scaffold.py
    — Workflow B scaffolder (legacy spec.json for ad-hoc overlays)
  • references/post_segmentation_template.html
    — Workflow A template: the canonical
    cover + caption + chapter + illustration + CTA
    composition shape, with placeholder substitutions
  • references/build_hf_clips.py
    — Workflow A multi-clip builder. Reads
    segments.json
    + per-clip illustrations dict, scaffolds and populates one project per clip
  • references/illustration_patterns.md
    — canonical CSS / GSAP for the
    stack
    and
    hammer
    illustration patterns
  • references/custom_overlay_recipes.md
    — reusable
    custom
    overlay recipes (terminal demo, layer-stack diagram, callout with arrow)
  • references/example_spec.json
    — Workflow B example
  • scripts/scaffold.py
    —— 工作流B搭建工具(用于临时叠加层的旧版spec.json)
  • references/post_segmentation_template.html
    —— 工作流A模板:标准的
    封面+字幕+章节+插画+CTA
    合成结构,包含占位符替换
  • references/build_hf_clips.py
    —— 工作流A多片段构建工具。读取
    segments.json
    + 每个片段的插画字典,为每个片段搭建并填充项目
  • references/illustration_patterns.md
    ——
    stack
    hammer
    插画模式的标准CSS/GSAP代码
  • references/custom_overlay_recipes.md
    —— 可复用的
    custom
    叠加层模板(终端演示、图层堆栈图、带箭头的标注)
  • references/example_spec.json
    —— 工作流B示例