wjs-converting-text-to-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

wjs-converting-text-to-video

wjs-converting-text-to-video

把一篇王建硕风格的微信公众号
article.md
做成 1080×1920 竖屏、30-90 秒 的中文解说短视频:TTS 旁白 + HyperFrames CSS/GSAP 动画 + 抽象水彩背景 + 转场 SFX。输出 MP4 给视频号 / 抖音 / 小红书 / Reels。
Convert a Wang Jianshuo-style WeChat Official Account
article.md
into a 1080×1920 vertical, 30-90 second Chinese narrated short video: TTS voiceover + HyperFrames CSS/GSAP animations + abstract watercolor backgrounds + transition SFX. Output MP4 for WeChat Channels / Douyin / Xiaohongshu / Reels.

What this skill produces

What this skill produces

维度默认
尺寸1080×1920 竖屏 (9:16)
时长30-90 秒
Scene 数5-10
旁白火山引擎 Volcano TTS,默认阿虎对话男声
背景GPT Image 2 生成的抽象水彩 (
bg.png
) + blur 30 + 暖黑半透明 overlay
字体Noto Sans SC,hero 900,主文字暖奶白
输出
<article-folder>/<slug>.mp4
(与
video/
平行,不放
video/
里)
发布自动上传到 YouTube — Portrait → Shorts,Landscape → 普通 video;重新渲染会替换老视频(不累积)
DimensionDefault
Resolution1080×1920 vertical (9:16)
Duration30-90 seconds
Number of Scenes5-10
VoiceoverVolcano Engine Volcano TTS, default "Ahu Conversation" male voice
BackgroundAbstract watercolor generated by GPT Image 2 (
bg.png
) + blur 30 + warm black semi-transparent overlay
FontNoto Sans SC, hero weight 900, main text warm cream white
Output
<article-folder>/<slug>.mp4
(parallel to
video/
, not stored in
video/
)
PublishingAuto-upload to YouTube — Portrait → Shorts, Landscape → regular video; re-rendering replaces old video (no accumulation)

When this skill fires

When this skill fires

  • 用户已有
    article.md
    ,说「做成视频」「做一个解说」「讲一遍」
  • 用户跑
    /wjs-converting-text-to-video <article-folder>
  • 用户说「把昨天发的那 X 篇都做成视频」之类的批量请求
  • The user already has
    article.md
    and says: 「做成视频」「做一个解说」「讲一遍」
  • The user runs
    /wjs-converting-text-to-video <article-folder>
  • The user requests batch conversion like "Turn all X articles I posted yesterday into videos"

When NOT to use

When NOT to use

  • 没有文章稿,只是一个想法 → 先用
    /wjs-publishing-wechat
    写出 article.md,再来
  • 用户要的是字幕烧录 / 翻译 / 配音替换 → 用
    /wjs-burning-subtitles
    /
    /wjs-dubbing-video
    /
    /wjs-localizing-video
  • 视频要英文 / 西语等非中文 → 本 skill 专注中文 TTS (Volcano 火山引擎);非中文走 hyperframes 自带 tts 命令 (kokoro 英文还可以)
  • 横屏 16:9 → 本 skill 默认竖屏;横屏仅在用户明确要求时改
  • No article draft, only an idea → First use
    /wjs-publishing-wechat
    to write article.md, then proceed
  • User needs subtitle burning / translation / voiceover replacement → Use
    /wjs-burning-subtitles
    /
    /wjs-dubbing-video
    /
    /wjs-localizing-video
  • Video requires non-Chinese languages like English/Spanish → This skill focuses on Chinese TTS (Volcano Engine); use hyperframes' built-in tts command for non-Chinese (Kokoro works well for English)
  • Landscape 16:9 format → This skill defaults to vertical; only change to landscape if explicitly requested by user

Core Principle

Core Principle

视频不是文章的可视化朗读,而是文章的视觉重构。
每个 scene 是一个独立的视觉时刻 —— 一个对比、一个排比、一个数字、一个比喻。文字撑满屏幕,黑体加粗,重点字橙色高亮。背景是抽象水彩 (blur 后柔化),整体调子稳重、克制、有冲击力。
节奏 > 模板。一段 5-10 scene 的视频,如果从头到尾都是"两行对照"的同一种排版,就不是视频,是 slideshow。现代感来自对比 —— 极端字号差、不对称布局、短 scene 与长 scene 交替、纯文字 scene 与几何元素 scene 交替、水彩底 scene 与亮色 punch scene 交替。
默认是平庸的。如果只是从模板表顶端挑几种最容易的,结果一定是"平铺直叙的两行格式"。强制走 Step 1b Scene Mix Rule 配比。
Video is not a visual reading of the article, but a visual reconstruction of it.
Each scene is an independent visual moment — a contrast, a parallelism, a number, a metaphor. Text fills the screen, bolded, with key words highlighted in orange. The background is abstract watercolor (softened with blur), with an overall tone that is steady, restrained, and impactful.
Rhythm > Templates. A video with 5-10 scenes that uses the same "two-line comparison" layout throughout is not a video, it's a slideshow. Modernity comes from contrast — extreme font size differences, asymmetric layouts, alternating short and long scenes, alternating text-only and geometric-element scenes, alternating watercolor-background and bright punch scenes.
Default is mediocre. If you just pick the easiest templates from the top of the list, the result will definitely be a "flat two-line format". Mandatorily follow the Step 1b Scene Mix Rule ratio.

Workflow

Workflow

Step 1: 设计 5-10 个视觉时刻

Step 1: Design 5-10 visual moments

<article-folder>/article.md
,按论证结构拆成 5-10 个 scene(控制在 30-90 秒总时长)。短文(核心 1-2 个要点)做 5-6 scene / 30-50s;长文 8-10 scene / 60-90s。每个 scene 一段叙述(旁白)+ 一个清晰的视觉骨架。
模板表 —— 6 类共 16 种,按需混搭
Read
<article-folder>/article.md
, split it into 5-10 scenes according to the argument structure (control total duration to 30-90 seconds). Short articles (1-2 core points) use 5-6 scenes / 30-50s; long articles use 8-10 scenes / 60-90s. Each scene includes a narration segment + a clear visual framework.
Template Library — 6 categories, 16 templates total, mix as needed:

A. Hero / Punch(强对比 climax,每片 ≥1,时长 ≤4s)

A. Hero / Punch (High-contrast climax, ≥1 per video, duration ≤4s)

模板适合
A1. 全屏单字 hero1-3 字 climax 词撑满屏,字号 280-400px
A2. Outline hero空心字
-webkit-text-stroke: 4px #f5efe5; color: transparent;
A3. Color-flip punch整屏背景换亮色(橙/红/金/翠绿等),反白字
A4. Gradient text hero大字加
background: linear-gradient(...); -webkit-background-clip: text;
TemplateSuitable for
A1. Full-screen single-character hero1-3 climax words filling the screen, font size 280-400px
A2. Outline heroHollow text with
-webkit-text-stroke: 4px #f5efe5; color: transparent;
A3. Color-flip punchFull-screen background changes to bright color (orange/red/gold/green etc.), with reversed text color
A4. Gradient text heroLarge text with
background: linear-gradient(...); -webkit-background-clip: text;

B. Contrast / 对照(反差结构,每片 1-2 个,时长 5-8s)

B. Contrast / Comparison (Contrast structure, 1-2 per video, duration 5-8s)

模板适合
B1. 双行对照 + strikethrough「以前 X,现在 Y」「不是 A,是 B」 — 整片最多 2 个
B2. 左右分屏对照屏幕一分为二(可加竖线分隔)
B3. 对角线对照左上 ↔ 右下,中间大量留白
TemplateSuitable for
B1. Two-line comparison + strikethrough"Previously X, now Y" / "Not A, but B" — max 2 per video
B2. Split-screen left-right comparisonScreen divided into two halves (can add vertical separator line)
B3. Diagonal comparisonTop-left ↔ Bottom-right, with large blank space in the middle

C. List / 结构(多项并列,每片 1-2 个,时长 6-10s)

C. List / Structure (Parallel items, 1-2 per video, duration 6-10s)

模板适合
C1. N 个卡片横排3-5 个并列,用深暖黑 + 单色边框
C2. 垂直堆叠关键词6-8 个排比项,可加大数字编号 01-08
C3. 真网格2×2 / 3×2 网格,每格图标 + 标签(竖屏宽度有限,4 列横排会挤)
C4. 阶梯 / 错位列表每项
margin-left
递增
TemplateSuitable for
C1. Horizontal row of N cards3-5 parallel items, using dark warm black + monochrome border
C2. Vertically stacked keywords6-8 parallel items, can add large numbering 01-08
C3. True grid2×2 / 3×2 grid, each cell with icon + label (vertical screen width is limited, 4 columns will be crowded)
C4. Stepped / staggered listEach item has increasing
margin-left

D. Stat / 数据(数字 climax,每片 ≥1,时长 4-6s)

D. Stat / Data (Number climax, ≥1 per video, duration 4-6s)

模板适合
D1. 数字 ticker0 → N 滚动动画(
gsap.to({textContent})
D2. 数字 + 标签主数字 200-400px + 60-80px 解释
D3. 进度条 / 时间轴横向 progress bar + 节点
TemplateSuitable for
D1. Number ticker0 → N scrolling animation (
gsap.to({textContent})
)
D2. Number + labelMain number 200-400px + 60-80px explanation
D3. Progress bar / timelineHorizontal progress bar + nodes

E. Quote / Climax(金句落点,每片 1-2 个,时长 6-10s)

E. Quote / Climax (Key quote conclusion, 1-2 per video, duration 6-10s)

模板适合
E1. 段落级 hero text一句 60-100px 金句,左对齐 + 左侧 emphasis bar
E2. 大引号 + 内文巨大半透明开引号作背景装饰
TemplateSuitable for
E1. Paragraph-level hero textA 60-100px key quote, left-aligned + left emphasis bar
E2. Large quotation marks + contentHuge semi-transparent opening quotation marks as background decoration

F. 装饰 / 几何(节奏调味,可选)

F. Decoration / Geometry (Rhythm seasoning, optional)

模板适合
F1. 格子 + spinner / 进度条多并发画面
F2. 对话气泡 ↔ 回应角色 A 说 → 角色 B 做
每个 scene 的旁白控制在 3-12 秒(短 punch 3-4s,长 breath 10-12s,不要全部都是 5-7s)。所有 scene 加起来 30-90 秒,不要超过 90 秒。文章短就做短,5 个 scene × 6s = 30s 也是合格。
TemplateSuitable for
F1. Grid + spinner / progress barMulti-concurrent visuals
F2. Dialogue bubble ↔ ResponseCharacter A speaks → Character B acts
Each scene's narration should be 3-12 seconds (short punch scenes 3-4s, long breathing scenes 10-12s, don't make all scenes 5-7s). Total duration of all scenes should be 30-90 seconds, no more than 90 seconds. Short articles should be made short — 5 scenes × 6s = 30s is acceptable.

Step 1b: Scene Mix Rule(强制)

Step 1b: Scene Mix Rule (Mandatory)

写完 5-10 个 scene 设计后,按下面 checklist 自查。任何一条不满足 → 回去调整。
After designing 5-10 scenes, self-check using the following checklist. If any item is not met → go back and adjust.

配比硬规则

Ratio Hard Rules

  • ≥1 个 A 类 / D 类 / C 类 / E 类
  • ≤2 个 B1 模板(双行 strikethrough — 历史上最容易被滥用)
  • ≥1 个 A3 color-flip scene(亮色背景反白字)
  • ≥4 种不同的模板类型(A/B/C/D/E/F 至少 4 类)
  • ≤2 个连续 scene 用同一类
  • ≥1 scene from Category A / D / C / E
  • ≤2 scenes using B1 template (two-line strikethrough — the most overused template in history)
  • ≥1 A3 color-flip scene (bright background with reversed text)
  • ≥4 different template categories (at least 4 from A/B/C/D/E/F)
  • ≤2 consecutive scenes using the same category

节奏硬规则

Rhythm Hard Rules

  • scene 时长跨度 ≥ 6s(最短 ≤ 4s、最长 ≥ 9s)
  • ≥2 次"短 → 长 → 短"或"长 → 短"节奏切换
  • 字号跨度 ≥ 240px(最大 hero ≥ 320px,最小 ≤ 80px)
  • Scene duration span ≥ 6s (shortest ≤ 4s, longest ≥ 9s)
  • ≥2 rhythm switches like "short → long → short" or "long → short"
  • Font size span ≥ 240px (largest hero ≥ 320px, smallest ≤ 80px)

布局硬规则

Layout Hard Rules

  • ≥2 个 scene 非居中(贴角、对角、左对齐、阶梯等)
  • ≥1 个 scene 留白占 ≥ 60% 屏幕(呼吸)
  • ≥1 个 scene 含几何装饰(粗线、色块、箭头、圆点、大编号)
  • ≥2 scenes with non-centered layout (corner-aligned, diagonal, left-aligned, stepped etc.)
  • ≥1 scene with blank space occupying ≥ 60% of the screen (breathing space)
  • ≥1 scene containing geometric decorations (thick lines, color blocks, arrows, dots, large numbering)

配色硬规则

Color Hard Rules

  • 大部分 scene 没有
    background:
    — 让水彩 bg-image 透出;只有 A3 color-flip 才用纯色 bg
  • color-flip scene 颜色不只是橙/蓝/白(深红 / 深金 / 翠绿 / 青松 / 暗紫 等都可)
  • emphasis 至少 2-3 种颜色(技术词用蓝、价值词用金、增长词用绿、警告词用红)
  • Most scenes do not have
    background:
    color
    — let the watercolor bg-image show through; only use solid color bg for A3 color-flip scenes
  • Color-flip scene colors are not limited to orange/blue/white (deep red / deep gold / emerald green / pine green / dark purple etc. are all acceptable)
  • At least 2-3 different emphasis colors (technical terms in blue, value terms in gold, growth terms in green, warning terms in red)

反单调自检

Anti-Monotony Self-Check

  1. 把所有 scene 截图缩成缩略图并排 — 能一眼分辨吗?如果 8 个看起来一样 → 重做
  2. 第 1、4、7 scene 的视觉密度是不是不一样?应该有的密、有的极简
  3. 有"meta-rhythm"吗?比如 A 开场 → 3 个 B/C 展开 → D climax → E 收尾 — 比线性铺更有戏剧弧
  1. Screenshot all scenes, shrink to thumbnails and arrange side by side — can you tell them apart at a glance? If 8 look the same → redo
  2. Are the visual densities of scenes 1,4,7 different? Some should be dense, some extremely minimal
  3. Is there a "meta-rhythm"? For example: A opening → 3 B/C expansion scenes → D climax → E conclusion — more dramatic than linear layout

Step 2: 写
narration_chunks.json

Step 2: Write
narration_chunks.json

json
[
  {"id": "s01", "text": "我们以前,是 AI 的领导。现在,我们就是它的维修工。"},
  {"id": "s02", "text": "..."}
]
写旁白细节
  • 比 article.md 更口语、更短促,逗号/句号多用让 TTS 自然停顿
  • 数字 / 英文混排 OK("Claude Code"、"100 倍"),Volcano 都能读
  • 不写括号注释、不写
    ...
    、不写破折号
    ——
    (TTS 会念出 "破折号" 三字)
  • 删掉 article.md 里的
    **加粗 markdown**
    ,只留纯文字
  • 去掉百姓网相关 facts:article.md 里如出现「百姓网」「百姓网现在 X 人」「百姓网员工」等都要 strip 或泛化("百姓网现在 158 个人" → "现实里没几个真人")。这是过时信息,不要进视频。同理 visuals 不要出现 "百姓网" label 或 "158 人" stat。详见 [[no-baixing-facts]]
json
[
  {"id": "s01", "text": "我们以前,是 AI 的领导。现在,我们就是它的维修工。"},
  {"id": "s02", "text": "..."}
]
Narration Writing Details:
  • More colloquial and concise than article.md, use more commas/periods to allow natural pauses in TTS
  • Mixed numbers/English is OK ("Claude Code", "100 倍"), Volcano TTS can read them correctly
  • Do not write parenthetical comments,
    ...
    , or em dashes
    ——
    (TTS will read "em dash" aloud)
  • Remove
    **bold markdown**
    from article.md, leave only plain text
  • Remove Baixing.com-related facts: If article.md contains "百姓网", "百姓网 now has X people", "百姓网 employees" etc., strip or generalize them ("百姓网 now has 158 people" → "There are very few real people in reality"). This is outdated information and should not be included in the video. Similarly, do not include "百姓网" labels or "158 people" stats in visuals. See [[no-baixing-facts]]

Step 3: 生成 TTS narration

Step 3: Generate TTS narration

bash
cd <article-folder>/video
python3 tts_narration.py
脚本默认用
zh_male_ahu_conversation_wvae_bigtts
(阿虎对话)— 段间插 0.35s 静音,输出
narration.mp3
+
timing.json
Volcano TTS 注意事项(踩过的坑)
  • 用 resource
    volc.service_type.10029
    ,speaker 选
    zh_*_*_bigtts
  • 绝对不要传
    emotion
    /
    emotion_scale
    — 大部分
    _bigtts
    声音会返回
    data: null
    静默失败
  • 绝对不要用 kokoro(hyperframes 自带 tts)— 中文质量差,用户明确不接受
  • 避免
    zh_male_jieshuonansheng_mars_bigtts
    — 含英文专名(如 "Claude Code")会循环 hallucinate
备用声音(按推荐顺序):
  • zh_male_ahu_conversation_wvae_bigtts
    (阿虎对话) — 默认,自然口语
  • zh_male_M392_conversation_wvae_bigtts
    — 同 wvae 系列
  • zh_male_wennuanahu_moon_bigtts
    (温暖阿虎) — 更暖、播音感
  • zh_male_silang_mars_bigtts
    (思朗) — 沉稳思考,戏剧感强
  • zh_male_baqiqingshu_mars_bigtts
    (霸气) — 更有力度
切声音:
python3 tts_narration.py --voice zh_male_silang_mars_bigtts
bash
cd <article-folder>/video
python3 tts_narration.py
The script defaults to
zh_male_ahu_conversation_wvae_bigtts
(Ahu Conversation) — inserts 0.35s silence between segments, outputs
narration.mp3
+
timing.json
.
Volcano TTS Notes (Lessons Learned):
  • Use resource
    volc.service_type.10029
    , select speakers with
    zh_*_*_bigtts
  • Never pass
    emotion
    /
    emotion_scale
    parameters
    — most
    _bigtts
    voices will return
    data: null
    and fail silently
  • Never use Kokoro (hyperframes' built-in tts) — Chinese quality is poor, users explicitly reject it
  • Avoid
    zh_male_jieshuonansheng_mars_bigtts
    — will loop hallucinate when containing English proper nouns (e.g., "Claude Code")
Alternative Voices (in recommended order):
  • zh_male_ahu_conversation_wvae_bigtts
    (Ahu Conversation) — default, natural colloquial
  • zh_male_M392_conversation_wvae_bigtts
    — same wvae series
  • zh_male_wennuanahu_moon_bigtts
    (Warm Ahu) — warmer, broadcast-style
  • zh_male_silang_mars_bigtts
    (Silang) — calm, thoughtful, dramatic
  • zh_male_baqiqingshu_mars_bigtts
    (Domineering) — more powerful
Switch voices:
python3 tts_narration.py --voice zh_male_silang_mars_bigtts

Step 4: 生成水彩背景图

Step 4: Generate watercolor background image

bg-image 是视觉主基调(柔化的抽象水彩)。不要用 article 的
illustration.png
— 手绘示意图细节太多,blur 后变成均匀深色泥(视觉上仍是纯黑)。必须用专门生成的抽象水彩。
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme>
<theme>
选(根据文章主题):
theme色板适合
personal
bright warm yellow, soft coral pink, terracotta, sage green, cream个人、手作、温暖
tech
cool teal, electric blue, deep purple, mint, whiteAI、技术、数据
reflection
sage green, dusty blue, lavender, pearl, cream反思、沉静
warning
burnt orange, deep red, mustard, charcoal警示、张力
growth
fresh green, gold, soft yellow, sky blue增长、复利
abstract
lavender, dusty rose, sage, soft amber抽象、哲思
输出:
<article-folder>/video/bg.png
(1088×1920, ~3MB)。
⚠️ 图片必须在
video/
目录内
— 不能用
../illustration.png
,hyperframes render 不解析跨目录相对路径,会渲染成纯黑。
The bg-image is the main visual tone (softened abstract watercolor). Do not use the article's
illustration.png
— hand-drawn schematics have too many details, and become uniform dark mud after blur (visually still pure black). Must use specially generated abstract watercolor.
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme>
Choose
<theme>
(based on article topic):
themeColor PaletteSuitable for
personal
bright warm yellow, soft coral pink, terracotta, sage green, creamPersonal, handcrafted, warm topics
tech
cool teal, electric blue, deep purple, mint, whiteAI, technology, data topics
reflection
sage green, dusty blue, lavender, pearl, creamReflection, calm topics
warning
burnt orange, deep red, mustard, charcoalWarning, tension topics
growth
fresh green, gold, soft yellow, sky blueGrowth, compound interest topics
abstract
lavender, dusty rose, sage, soft amberAbstract, philosophical topics
Output:
<article-folder>/video/bg.png
(1088×1920, ~3MB).
⚠️ The image must be in the
video/
directory
— cannot use
../illustration.png
, hyperframes render does not resolve cross-directory relative paths and will render pure black.

Step 5: 写 HyperFrames composition (
index.html
)

Step 5: Write HyperFrames composition (
index.html
)

timing.json
,按每个 chunk 的 start/end 设计 scene。竖屏 1080×1920 结构:
html
<html><head><script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
  html, body {
    width: 1080px; height: 1920px; margin: 0; overflow: hidden;
    background: #0e0b08;
    font-family: 'Noto Sans SC', 'PingFang SC', 'Heiti SC', sans-serif;
    font-weight: 900;
    color: #f5efe5;
    letter-spacing: -0.02em;
    -webkit-font-smoothing: antialiased;
  }
  #bg-image {
    position: absolute; inset: 0;
    background-image: url('bg.png');
    background-size: cover;
    background-position: center;
    filter: blur(30px) brightness(0.65) saturate(0.85);
    z-index: 0;
    transform: scale(1.1);
  }
  #bg-overlay {
    position: absolute; inset: 0;
    background: rgba(14, 11, 8, 0.28);
    z-index: 1;
  }
  .scene { position: absolute; inset: 0; overflow: hidden; opacity: 0; z-index: 2; }
  #s1 { opacity: 1; }
  /* ... scene-specific styles ... */
</style></head>
<body>
  <div id="root" data-composition-id="main" data-start="0" data-duration="<total+2>" data-width="1080" data-height="1920">
    <div id="bg-image"></div>
    <div id="bg-overlay"></div>
    <!-- scene divs s1..sN -->
    <!-- audio: narration + ticks + chimes + bell -->
  </div>
  <script>
    /* GSAP timeline: paused + register to window.__timelines['main'] */
  </script>
</body></html>
Read
timing.json
, design scenes according to each chunk's start/end times. 1080×1920 vertical screen structure:
html
<html><head><script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
  html, body {
    width: 1080px; height: 1920px; margin: 0; overflow: hidden;
    background: #0e0b08;
    font-family: 'Noto Sans SC', 'PingFang SC', 'Heiti SC', sans-serif;
    font-weight: 900;
    color: #f5efe5;
    letter-spacing: -0.02em;
    -webkit-font-smoothing: antialiased;
  }
  #bg-image {
    position: absolute; inset: 0;
    background-image: url('bg.png');
    background-size: cover;
    background-position: center;
    filter: blur(30px) brightness(0.65) saturate(0.85);
    z-index: 0;
    transform: scale(1.1);
  }
  #bg-overlay {
    position: absolute; inset: 0;
    background: rgba(14, 11, 8, 0.28);
    z-index: 1;
  }
  .scene { position: absolute; inset: 0; overflow: hidden; opacity: 0; z-index: 2; }
  #s1 { opacity: 1; }
  /* ... scene-specific styles ... */
</style></head>
<body>
  <div id="root" data-composition-id="main" data-start="0" data-duration="<total+2>" data-width="1080" data-height="1920">
    <div id="bg-image"></div>
    <div id="bg-overlay"></div>
    <!-- scene divs s1..sN -->
    <!-- audio: narration + ticks + chimes + bell -->
  </div>
  <script>
    /* GSAP timeline: paused + register to window.__timelines['main'] */
  </script>
</body></html>

🎬 第一帧规则(硬性)

🎬 First Frame Rule (Mandatory)

视频 t=0 必须包含:
  1. bg-image 完全可见 — 永远 opacity 1,从不 fade-in(CSS 默认就可见,别在 GSAP 里改它的 opacity)
  2. 标题元素可见 — s1 的主要标题元素
    tl.from({y:30, scale:0.95})
    可,但 不要
    tl.from({opacity:0})
    ,否则 t=0 就是黑屏
  3. s1 不能是 A3 color-flip — 否则盖住 bg-image,第一帧就看不到水彩。color-flip 留给 s2+
At video t=0, it must include:
  1. Full visibility of bg-image — always opacity 1, never fade-in (visible by default in CSS, do not change its opacity in GSAP)
  2. Visible title element — s1's main title element can use
    tl.from({y:30, scale:0.95})
    , but do not use
    tl.from({opacity:0})
    , otherwise t=0 will be black screen
  3. s1 cannot be A3 color-flip — otherwise it will cover the bg-image, and the watercolor will not be visible in the first frame. Save color-flip for s2+

色彩系统

Color System

主文字 / 锚定色(design system,全片一致)
角色用法
主文字
#f5efe5
暖奶白
hero / 主要内容
二级文字(副标题、caption)
#f5efe5
+
opacity: 0.7
+ 小字号
不要用灰色
#8a7e72
在水彩底上看不清)。用 opacity + 缩字号做 hierarchy
划掉文字本身
#f5efe5
+
opacity: 0.5
+ strikethrough line
不要用
#6d635a
暗灰
— 在水彩底上看不清。改用 opacity 弱化 + 橙色 strike line
装饰大编号(01-08)
#f5efe5
+
opacity: 0.18
#e87a3e
+
opacity: 0.35
不要用
#2b2620
等深灰
(水彩底上完全消失)
Outline 描边
#f5efe5
4-8px stroke +
color: transparent
A2 空心字
默认 fallback bg
#0e0b08
深暖黑
被 bg-image + overlay 覆盖;color-flip 不用
核心原则:所有文字用
#f5efe5
cream 或
#e87a3e
橙系(accent palette),用 opacity + size 做 hierarchy,不用色相变化。灰色是黑底时代的遗物,水彩底上一律不用。
详见 [[no-low-contrast-text]]
Color-flip 背景 palette(A3,不只是橙/蓝/白)
hex适合
#e87a3e
经典橙
警示、强调、climax punch
#6b9bc4
亮蓝
数据、技术 climax
#f5efe5
暖奶白
收尾、安静的反差
#c45c3e
深红
警告、错误 climax
#d4a040
深金
成就、价值 climax
#5a8c6a
翠绿
增长、复利、生命力
#4a8a8a
青松
冷静、长期主义
#7a5a8a
暗紫
智慧、神秘 climax
#c48a8a
暗粉
柔软、人性
color-flip 上的文字用
#0e0b08
#f5efe5
反相。
Emphasis / Accent palette(不只是橙)
hex适合
#e87a3e
默认 emphasis
#6b9bc4
数据、技术词、AI
#d4a040
价值、成就
#5a8c6a
翠绿
增长、好结果
#4a8a8a
青松
长期、稳定
#c45c3e
深红
警告、反差
#8a7aaa
暗紫
抽象、智慧
#c48a8a
暗粉
柔软、人性化
整片 emphasis ≥ 2-3 种,根据 scene 主题选 accent。
Main Text / Anchor Colors (design system, consistent throughout the video):
RoleValueUsage
Main Text
#f5efe5
warm cream white
Hero text / main content
Secondary Text (subtitle, caption)
#f5efe5
+
opacity: 0.7
+ smaller font size
Do not use gray (
#8a7e72
is unreadable on watercolor background). Use opacity + font size to create hierarchy
Strikethrough text itself
#f5efe5
+
opacity: 0.5
+ strikethrough line
Do not use
#6d635a
dark gray
— unreadable on watercolor background. Use opacity to weaken + orange strike line instead
Decorative large numbering (01-08)
#f5efe5
+
opacity: 0.18
or
#e87a3e
+
opacity: 0.35
Do not use
#2b2620
or other dark grays
— completely disappears on watercolor background
Outline stroke
#f5efe5
4-8px stroke +
color: transparent
A2 hollow text
Default fallback bg
#0e0b08
dark warm black
Covered by bg-image + overlay; not used for color-flip
Core Principle: All text uses
#f5efe5
cream or
#e87a3e
orange series (accent palette), use opacity + size for hierarchy, do not use hue changes. Gray is a relic of the black background era, never use it on watercolor backgrounds.
See [[no-low-contrast-text]]
Color-flip Background Palette (A3, not limited to orange/blue/white):
hexSuitable for
#e87a3e
classic orange
Warning, emphasis, climax punch
#6b9bc4
bright blue
Data, technology climax
#f5efe5
warm cream white
Conclusion, quiet contrast
#c45c3e
deep red
Warning, error climax
#d4a040
deep gold
Achievement, value climax
#5a8c6a
emerald green
Growth, compound interest, vitality
#4a8a8a
pine green
Calm, long-termism
#7a5a8a
dark purple
Wisdom, mysterious climax
#c48a8a
dark pink
Soft, humanistic topics
Text on color-flip backgrounds uses
#0e0b08
or
#f5efe5
reversed colors.
Emphasis / Accent Palette (not limited to orange):
hexSuitable for
#e87a3e
orange
Default emphasis
#6b9bc4
blue
Data, technical terms, AI
#d4a040
gold
Value, achievement
#5a8c6a
emerald green
Growth, positive results
#4a8a8a
pine green
Long-term, stable
#c45c3e
deep red
Warning, contrast
#8a7aaa
dark purple
Abstract, wisdom
#c48a8a
dark pink
Soft, humanistic
Use at least 2-3 different emphasis colors throughout the video, choose accents based on scene themes.

字体系统(竖屏 1080 宽

Font System (1080px wide vertical screen)

字重hero 900 / 主文 800 / 二级 600-700 / caption 500
字距hero
-0.04em
-0.06em
/ 主文
-0.02em
/ caption
0
Punch hero (A1/A2,1-3 字)280-400px
短句 hero (4-6 字)160-240px
长句 hero (7-10 字)100-150px
卡片内容56-130px
副标题40-72px
Caption / 序号 / 标签20-40px
ItemValue
Font Weighthero 900 / main text 800 / secondary 600-700 / caption 500
Letter Spacinghero
-0.04em
to
-0.06em
/ main text
-0.02em
/ caption
0
Punch hero (A1/A2, 1-3 characters)280-400px
Short sentence hero (4-6 characters)160-240px
Long sentence hero (7-10 characters)100-150px
Card content56-130px
Subtitle40-72px
Caption / numbering / label20-40px

布局系统(反居中惯性

Layout System (Anti-centering inertia)

布局CSS 关键适合
居中
flex; center; center;
A 类 hero,但 ≤50% scene
左对齐贴顶
padding: 80px 80px 0 80px;
E 类金句、长 quote
右下角锚定
position: absolute; right: 80px; bottom: 80px;
落款、climax 词
对角线top-left / bottom-rightB3 对角对照
网格
display: grid; grid-template-columns: repeat(2, 1fr);
C3(竖屏 2×N 而非 3×N)
阶梯每项
margin-left: calc(60px * var(--i));
C4 错位列表
贴底 + 上方留白
position: absolute; bottom: 60px;
上方空白
呼吸 scene
边角小元素文字小贴一角,其他全空极简 / 留白 punch
Padding:撑满型 40-80px,呼吸型 120-200px。不要所有 scene 都用同一个 padding。
LayoutCSS Key PointsSuitable for
Centered
flex; center; center;
Category A hero scenes, but ≤50% of total scenes
Left-aligned top
padding: 80px 80px 0 80px;
Category E key quotes, long quotes
Bottom-right anchored
position: absolute; right: 80px; bottom: 80px;
Signature, climax words
Diagonaltop-left / bottom-rightB3 diagonal comparison
Grid
display: grid; grid-template-columns: repeat(2, 1fr);
C3 (2×N instead of 3×N for vertical screen)
SteppedEach item has
margin-left: calc(60px * var(--i));
C4 staggered list
Bottom-aligned + top blank space
position: absolute; bottom: 60px;
with blank space above
Breathing scenes
Corner small elementSmall text anchored to one corner, rest blankMinimalist / blank punch scenes
Padding: 40-80px for full-screen scenes, 120-200px for breathing scenes. Do not use the same padding for all scenes.

几何装饰元素

Geometric Decorative Elements

每隔几个 scene 用一个:
  • 粗短线 8-16px × 40-200px,emphasis bar,橙色
  • 左侧 emphasis bar 6px × 100%,配长 quote
  • 大数字编号 01-08,list 项序号(淡灰、巨大、装饰性)
  • 大引号字符
    "
    半透明超大置左上
  • 横向分隔线 2-4px 奶白 30% 透明
  • 圆点 / 方块 12-20px、橙色,list bullet
  • 箭头 ➜ 或自绘 SVG
Use one every few scenes:
  • Thick short line 8-16px × 40-200px, emphasis bar, orange
  • Left emphasis bar 6px × 100%, paired with long quotes
  • Large numbering 01-08, list item numbering (light gray, huge, decorative)
  • Large quotation mark character
    "
    semi-transparent, placed top-left
  • Horizontal separator line 2-4px cream white with 30% transparency
  • Dot / square 12-20px, orange, list bullet
  • Arrow ➜ or custom SVG

Scene 转场(4 种 + 混用规则)

Scene Transitions (4 types + mixing rules)

不要全片都 blur crossfade。每 4 个转场必须 ≥2 种类型。
T1. Blur crossfade(默认柔和)
  • 0.6s,
    sine.inOut
  • 后 scene
    opacity: 0, filter: blur(24px)
    opacity: 1, filter: blur(0)
  • 前 scene 同时 fade-out + blur
T2. White flash cut(punch 切,最现代)
  • 0.18s 总长:60ms 白闪 → 切 → 40ms 新 scene scale 1.05 → 1
  • 适合:进入 A 类 hero、D 类 stat、climax 切换
js
tl.to('.flash', { opacity: 1, duration: 0.06, ease: 'none' }, T - 0.06)
  .set(prevScene, { opacity: 0 }, T)
  .set(nextScene, { opacity: 1 }, T)
  .to('.flash', { opacity: 0, duration: 0.12, ease: 'power2.out' }, T)
  .from(nextScene, { scale: 1.05, duration: 0.25, ease: 'expo.out' }, T);
T3. Scale push(推进感)
  • 0.55s,前 scene
    scale: 1 → 0.85
    ,后 scene
    scale: 1.15 → 1
  • 适合:从概览推到细节
T4. Color flash cut(橙/蓝闪一下,强烈节奏)
  • 0.22s 总长:80ms 全屏橙 → 切 → 40ms 收
  • 适合:进入 A3 color-flip 或关键转折
  • 全片最多 2 次
flash overlay 在 HTML 里加
<div class="flash">
全屏定位、默认 opacity 0、z-index 100。
Do not use blur crossfade for all transitions. For every 4 transitions, use at least 2 different types.
T1. Blur crossfade (default soft)
  • 0.6s,
    sine.inOut
  • Next scene transitions from
    opacity: 0, filter: blur(24px)
    opacity: 1, filter: blur(0)
  • Previous scene fades out + blurs simultaneously
T2. White flash cut (punch cut, most modern)
  • Total 0.18s: 60ms white flash → cut → 40ms new scene scale 1.05 → 1
  • Suitable for: entering Category A hero, Category D stat, climax transitions
js
tl.to('.flash', { opacity: 1, duration: 0.06, ease: 'none' }, T - 0.06)
  .set(prevScene, { opacity: 0 }, T)
  .set(nextScene, { opacity: 1 }, T)
  .to('.flash', { opacity: 0, duration: 0.12, ease: 'power2.out' }, T)
  .from(nextScene, { scale: 1.05, duration: 0.25, ease: 'expo.out' }, T);
T3. Scale push (sense of advancement)
  • 0.55s, previous scene
    scale: 1 → 0.85
    , next scene
    scale: 1.15 → 1
  • Suitable for: pushing from overview to details
T4. Color flash cut (orange/blue flash, strong rhythm)
  • Total 0.22s: 80ms full-screen orange → cut → 40ms fade out
  • Suitable for: entering A3 color-flip or key turning points
  • Max 2 times per video
Add flash overlay in HTML:
<div class="flash">
positioned full-screen, default opacity 0, z-index 100.

入场动画规则

Entrance Animation Rules

  • 每个 scene 的每个元素都用
    tl.from(...)
    入场(y/opacity/scale)
  • 入场 stagger 0.1-0.3s;首元素 t = scene.start + 0.3 起
  • ≥3 种不同 ease(
    power3.out
    /
    back.out(1.3)
    /
    expo.out
    /
    elastic.out(1, 0.5)
  • 不要
    gsap.to({opacity: 0})
    退场
    — 转场已处理。只有最后 scene 可 fade-to-black
  • 整片必须用到 ≥3 种 Modern Motion Techniques
  • Every element in every scene uses
    tl.from(...)
    entrance animation (y/opacity/scale)
  • Entrance stagger 0.1-0.3s; first element starts at scene.start + 0.3s
  • Use at least 3 different eases (
    power3.out
    /
    back.out(1.3)
    /
    expo.out
    /
    elastic.out(1, 0.5)
    )
  • Do not use
    gsap.to({opacity: 0})
    for exit
    — transitions handle exit. Only the last scene can fade-to-black
  • Must use at least 3 types of Modern Motion Techniques throughout the video

Modern Motion Techniques

Modern Motion Techniques

平庸视频和现代视频的差别一半在排版、一半在 motion。下面 7 种每片必须用 ≥3 种(特定 scene 用,不要全片堆)。
Half the difference between mediocre and modern videos is layout, the other half is motion. Use at least 3 of the following 7 techniques per video (use in specific scenes, don't stack all throughout)

1. Kinetic Typography(字符 stagger 入场)—— A 类 hero

1. Kinetic Typography (Character stagger entrance) — Category A hero

html
<h1 class="kinetic">维 修 工</h1>
js
tl.from('.kinetic span', {
  y: 180, opacity: 0, rotateX: -90,
  duration: 0.7, stagger: 0.06,
  ease: 'back.out(1.4)',
  transformOrigin: '50% 100%',
}, T);
html
<h1 class="kinetic">维 修 工</h1>
js
tl.from('.kinetic span', {
  y: 180, opacity: 0, rotateX: -90,
  duration: 0.7, stagger: 0.06,
  ease: 'back.out(1.4)',
  transformOrigin: '50% 100%',
}, T);

2. Camera Punch(推近 / 拉远)—— A3、D 类

2. Camera Punch (Push in / Pull out) — A3, Category D

js
tl.from(scene, { scale: 1.15, opacity: 0, duration: 0.5, ease: 'expo.out' }, sceneStart);
js
tl.from(scene, { scale: 1.15, opacity: 0, duration: 0.5, ease: 'expo.out' }, sceneStart);

3. Mask Reveal(clip-path 揭示)—— E 类 quote

3. Mask Reveal (clip-path reveal) — Category E quote

css
.reveal { clip-path: inset(0 100% 0 0); }
js
tl.to('.reveal', { clipPath: 'inset(0 0% 0 0)', duration: 0.9, ease: 'expo.inOut' }, T);
css
.reveal { clip-path: inset(0 100% 0 0); }
js
tl.to('.reveal', { clipPath: 'inset(0 0% 0 0)', duration: 0.9, ease: 'expo.inOut' }, T);

4. Number Ticker(数字滚动)—— D1

4. Number Ticker (Number scrolling) — D1

html
<div class="ticker" data-end="3600">0</div>
js
const ticker = document.querySelector('.ticker');
const obj = { val: 0 };
tl.to(obj, {
  val: parseInt(ticker.dataset.end),
  duration: 1.8, ease: 'power2.out',
  onUpdate: () => { ticker.textContent = Math.round(obj.val).toLocaleString(); },
}, T);
html
<div class="ticker" data-end="3600">0</div>
js
const ticker = document.querySelector('.ticker');
const obj = { val: 0 };
tl.to(obj, {
  val: parseInt(ticker.dataset.end),
  duration: 1.8, ease: 'power2.out',
  onUpdate: () => { ticker.textContent = Math.round(obj.val).toLocaleString(); },
}, T);

5. Outline → Fill(空心字变实心)—— A2

5. Outline → Fill (Hollow text to solid) — A2

css
.morph { -webkit-text-stroke: 4px #f5efe5; color: transparent; }
js
tl.to('.morph', { color: '#e87a3e', webkitTextStrokeColor: '#e87a3e', duration: 0.5, ease: 'power2.out' }, T);
css
.morph { -webkit-text-stroke: 4px #f5efe5; color: transparent; }
js
tl.to('.morph', { color: '#e87a3e', webkitTextStrokeColor: '#e87a3e', duration: 0.5, ease: 'power2.out' }, T);

6. Letter Highlight Sweep(关键词扫光)—— E 类 climax 词

6. Letter Highlight Sweep (Keyword sweep highlight) — Category E climax word

html
<span class="sweep"><span class="sweep-bg"></span>搭档</span>
css
.sweep { position: relative; display: inline-block; padding: 0 8px; }
.sweep-bg { position: absolute; inset: 0; background: #e87a3e; transform: scaleX(0); transform-origin: left; z-index: -1; }
js
tl.to('.sweep-bg', { scaleX: 1, duration: 0.5, ease: 'power3.inOut' }, T);
tl.to('.sweep', { color: '#0e0b08', duration: 0.1 }, T + 0.25);
html
<span class="sweep"><span class="sweep-bg"></span>搭档</span>
css
.sweep { position: relative; display: inline-block; padding: 0 8px; }
.sweep-bg { position: absolute; inset: 0; background: #e87a3e; transform: scaleX(0); transform-origin: left; z-index: -1; }
js
tl.to('.sweep-bg', { scaleX: 1, duration: 0.5, ease: 'power3.inOut' }, T);
tl.to('.sweep', { color: '#0e0b08', duration: 0.1 }, T + 0.25);

7. Background Color Punch(背景闪变)—— 全片 1-2 次

7. Background Color Punch (Background flash change) — 1-2 times per video

js
tl.to(scene, { backgroundColor: '#e87a3e', duration: 0.08 }, T)
  .to(scene, { backgroundColor: '#0e0b08', duration: 0.4, ease: 'power2.out' }, T + 0.1);
Strike-through 动画:用真实 DOM
<span class="strike-line">
而不是
::after
。伪元素 + CSS 变量在 hyperframes 某些渲染路径下不工作。
html
<span class="strike">领导<span class="strike-line"></span></span>
css
.strike-line { position: absolute; left: -10px; right: -10px; top: 56%; height: 10px; background: #e87a3e; transform: scaleX(0); transform-origin: left; }
js
tl.to('.strike .strike-line', { scaleX: 1, duration: 0.55, ease: 'power2.inOut' }, T);
js
tl.to(scene, { backgroundColor: '#e87a3e', duration: 0.08 }, T)
  .to(scene, { backgroundColor: '#0e0b08', duration: 0.4, ease: 'power2.out' }, T + 0.1);
Strike-through animation: Use real DOM
<span class="strike-line">
instead of
::after
. Pseudo-elements + CSS variables may fail in some hyperframes rendering paths.
html
<span class="strike">领导<span class="strike-line"></span></span>
css
.strike-line { position: absolute; left: -10px; right: -10px; top: 56%; height: 10px; background: #e87a3e; transform: scaleX(0); transform-origin: left; }
js
tl.to('.strike .strike-line', { scaleX: 1, duration: 0.55, ease: 'power2.inOut' }, T);

Step 6: 加 SFX

Step 6: Add SFX

bash
~/.claude/skills/wjs-converting-text-to-video/scripts/synth-sfx.sh <article-folder>/video
生成
video/sfx/{tick,chime,bell}.mp3
  • tick.mp3
    — 80ms 1.2kHz sine,转场用(每次 scene 切换前 0.3s)
  • chime.mp3
    — 220ms 880+1320Hz 双音,对话/列表某项亮起时(可选
  • bell.mp3
    — 1.5s 低频钟,最后 climax 词出来时(全片最多 1 次
接入 timeline
html
<audio id="aud-narration" src="narration.mp3" data-start="0" data-duration="<total>" data-track-index="0" data-volume="1"></audio>

<audio id="aud-tick-s02" src="sfx/tick.mp3" data-start="<scene2.start - 0.3>" data-duration="0.1" data-track-index="2" data-volume="0.55"></audio>
<!-- 重复每个 scene 切换;T2/T4 flash 转场可不加 tick -->

<audio id="aud-chime-s08-1" src="sfx/chime.mp3" data-start="<T>" data-duration="0.3" data-track-index="3" data-volume="0.45"></audio>
<audio id="aud-bell-s12" src="sfx/bell.mp3" data-start="<climax-T>" data-duration="1.6" data-track-index="4" data-volume="0.55"></audio>
⚠️ 每个
<audio>
必须有
id
,否则 render 出 silent(hyperframes 强制要求)。
不同
track-index
不冲突,同 track 不能时间重叠。
SFX 用量节制:转场 tick 必须;chime / bell 是装饰,scene 内容简单时不加;bell 全片只 1 次。
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/synth-sfx.sh <article-folder>/video
Generates
video/sfx/{tick,chime,bell}.mp3
:
  • tick.mp3
    — 80ms 1.2kHz sine, for transitions (0.3s before each scene switch)
  • chime.mp3
    — 220ms 880+1320Hz dual-tone, used when dialogue/list items light up (optional)
  • bell.mp3
    — 1.5s low-frequency bell, used when final climax word appears (max 1 time per video)
Integrate into timeline:
html
<audio id="aud-narration" src="narration.mp3" data-start="0" data-duration="<total>" data-track-index="0" data-volume="1"></audio>

<audio id="aud-tick-s02" src="sfx/tick.mp3" data-start="<scene2.start - 0.3>" data-duration="0.1" data-track-index="2" data-volume="0.55"></audio>
<!-- Repeat for each scene switch; no tick needed for T2/T4 flash transitions -->

<audio id="aud-chime-s08-1" src="sfx/chime.mp3" data-start="<T>" data-duration="0.3" data-track-index="3" data-volume="0.45"></audio>
<audio id="aud-bell-s12" src="sfx/bell.mp3" data-start="<climax-T>" data-duration="1.6" data-track-index="4" data-volume="0.55"></audio>
⚠️ Every
<audio>
must have an
id
— otherwise render will be silent (hyperframes mandatory requirement).
Different
track-index
values do not conflict; overlapping times on the same track are not allowed.
SFX Usage Discipline: Transition ticks are mandatory; chimes/bells are decorative, do not add when scene content is simple; bell can only be used once per video.

Step 7: Lint + Inspect + Render(必须按顺序)

Step 7: Lint + Inspect + Render (Must follow order)

bash
cd <article-folder>/video
bash
cd <article-folder>/video

必跑 1:linter(必须 0 errors)

Mandatory 1: Linter (must have 0 errors)

npx hyperframes lint
npx hyperframes lint

必跑 2:layout inspect 找溢出(必须 0 errors)

Mandatory 2: Layout inspection to find overflow (must have 0 errors)

npx hyperframes inspect --at 1,8,15,25,35,45,55,65
npx hyperframes inspect --at 1,8,15,25,35,45,55,65

推荐:snapshot 看排版

Recommended: Snapshot to check layout

npx hyperframes snapshot --at <t1>,<t2>,<t3> .
npx hyperframes snapshot --at <t1>,<t2>,<t3> .

渲染(lint + inspect 都通过才能跑)

Render (only run after lint + inspect pass)

⚠️ 输出到上级目录,与 video/ 平行 —— 最终 MP4 不放 video/ 里

⚠️ Output to parent directory, parallel to video/ — final MP4 is not stored in video/

npx hyperframes render --quality standard --fps 30 --output ../<slug>.mp4

**为什么 inspect 必跑**:竖屏 1080 宽很窄,3-4 字 hero 在 280-400px 字号下就接近溢出。每次必须 inspect,**0 errors 才能 render**。

**fix overflow**:
- 字号缩小(inspect 给具体建议)
- 长 hero 分行("没法积累" → 两行 "没法" / "积累")
- `white-space: nowrap` 只在确认字数 × 字号 < 屏宽时
- 若 `.em` 在 `reveal-wrap` 内溢出 → 加 `line-height: 1` 到 `.em`

**渲染质量**:
- `--quality draft` ~30s 渲染 — 迭代用
- `--quality standard` ~1.5min — 默认,发布够用
- `--quality high` ~3min — 投大屏 / 商务
npx hyperframes render --quality standard --fps 30 --output ../<slug>.mp4

**Why inspection is mandatory**: The 1080px wide vertical screen is narrow, and 3-4 character hero text at 280-400px font size is close to overflow. Must inspect every time, **only render when 0 errors**.

**Fix overflow**:
- Reduce font size (inspect gives specific suggestions)
- Wrap long hero text ("没法积累" → two lines "没法" / "积累")
- Only use `white-space: nowrap` when confirming (number of characters × font size) < screen width
- If `.em` overflows inside `reveal-wrap` → add `line-height: 1` to `.em`

**Render Quality**:
- `--quality draft` ~30s rendering — for iteration
- `--quality standard` ~1.5min — default, sufficient for publishing
- `--quality high` ~3min — for large screens / business use

Step 8: 预览

Step 8: Preview

输出:
<article-folder>/<slug>.mp4
video/
平行
,不在
video/
内 ——
video/
留给中间文件)。
open <article-folder>/<slug>.mp4
给用户预览。不要自动上传到视频号(用户可能想先剪/调)。
Output:
<article-folder>/<slug>.mp4
(parallel to
video/
, not inside
video/
video/
is for intermediate files).
Use
open <article-folder>/<slug>.mp4
to let user preview. Do not auto-upload to WeChat Channels (user may want to edit/adjust first).

Step 9: 发布到 YouTube(自动 cron,不在 render 流程内

Step 9: Publish to YouTube (Auto cron, not part of render workflow)

新视频 render 完成后不立即上传 —— YouTube 有 daily quota 限制(默认 6 个/天 @ 1600 配额点/上传),渲染多了会卡 quota。
做法:cron 每天 10:00 自动跑
daily-upload-batch.sh
,挑最多 5 个还没上传过的 MP4(按文章日期升序),上传后写
.youtube.json
记录。
cron 已注册(一次性,不用重复跑):
0 10 * * * /Users/jianshuo/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh
手动触发不要在 wjs-converting-text-to-video 流程里跑 — 让 cron 处理):
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh
Do not upload immediately after new video rendering — YouTube has daily quota limits (default 6 videos/day @ 1600 quota points/upload), rendering multiple videos will cause quota blocking.
Method: Cron runs
daily-upload-batch.sh
automatically at 10:00 every day, selects up to 5 MP4s that haven't been uploaded yet (sorted by article date ascending), and writes
.youtube.json
after upload.
Cron is already registered (one-time setup, no need to run again):
0 10 * * * /Users/jianshuo/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh
Manual trigger (do not run in wjs-converting-text-to-video workflow — let cron handle it):
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh

或单个文章立即上传

Or upload single article immediately

~/.claude/skills/wjs-converting-text-to-video/scripts/publish-to-youtube.py <article-folder>

每个上传的脚本行为:
1. 检测 MP4 portrait/landscape → portrait 标题加 `#shorts`、landscape 普通 video
2. title 从 article.md H1 / description 从前几段
3. 检查 `<article-folder>/.youtube.json`:存在 → 尝试删老再传新(需 `youtube.force-ssl` scope,当前 token 没这个 scope → 跳过 delete + 上传新)
4. 写 `.youtube.json` 记录

详见 memory: [[auto-publish-youtube]]
~/.claude/skills/wjs-converting-text-to-video/scripts/publish-to-youtube.py <article-folder>

Script behavior for each upload:
1. Detect MP4 portrait/landscape → add `#shorts` to title for portrait, regular video for landscape
2. Title from article.md H1 / description from first few paragraphs
3. Check `<article-folder>/.youtube.json`: if exists → try to delete old video and upload new one (requires `youtube.force-ssl` scope, current token does not have this scope → skip delete + upload new)
4. Write record to `.youtube.json`

See memory: [[auto-publish-youtube]]

目录结构

Directory Structure

<article-folder>/
├── article.md
├── illustration.png            # 用户原始示意图,不直接用作 bg
├── <slug>.mp4                  # ⭐ 最终视频(与 video/ 平行,不放 video/ 里)
└── video/                      # 所有中间产物
    ├── narration_chunks.json   # 5-10 个 scene 的旁白文本
    ├── tts_narration.py        # bootstrap 复制进来
    ├── narration.mp3           # 合并的全段 TTS
    ├── narration/              # 单段 mp3 (s01..sN)
    ├── timing.json             # 每段 start/end/duration
    ├── bg.png                  # GPT Image 2 生成的水彩背景
    ├── sfx/{tick,chime,bell}.mp3
    ├── index.html              # HyperFrames composition
    ├── hyperframes.json
    ├── meta.json
    ├── package.json
    └── snapshots/              # 渲染前快照
<article-folder>/
├── article.md
├── illustration.png            # User's original schematic, not directly used as bg
├── <slug>.mp4                  # ⭐ Final video (parallel to video/, not stored in video/)
└── video/                      # All intermediate products
    ├── narration_chunks.json   # Narration text for 5-10 scenes
    ├── tts_narration.py        # Copied during bootstrap
    ├── narration.mp3           # Merged full TTS track
    ├── narration/              # Individual segment mp3 (s01..sN)
    ├── timing.json             # Start/end/duration of each segment
    ├── bg.png                  # Abstract watercolor background generated by GPT Image 2
    ├── sfx/{tick,chime,bell}.mp3
    ├── index.html              # HyperFrames composition
    ├── hyperframes.json
    ├── meta.json
    ├── package.json
    └── snapshots/              # Pre-render snapshots

Skill 自身文件

Skill Own Files

~/.claude/skills/wjs-converting-text-to-video/
├── SKILL.md
└── scripts/
    ├── bootstrap-project.sh        # init video/ 目录 + 复制 helper + 生成 sfx
    ├── generate-bg.sh              # 调 GPT Image 2 生成抽象水彩 bg.png
    ├── tts.py                      # Volcano TTS narration 生成
    ├── synth-sfx.sh                # tick/chime/bell 合成 (ffmpeg)
    ├── retrofit-bg-image.py        # 给已有视频补 bg-image 层
    ├── strip-dark-scene-bgs.py     # 剥离 scene-level 暗色 bg,让 bg-image 透出
    └── publish-to-youtube.py       # 自动上传 MP4 到 YouTube(portrait→Shorts),可替换已有上传
~/.claude/skills/wjs-converting-text-to-video/
├── SKILL.md
└── scripts/
    ├── bootstrap-project.sh        # Initialize video/ directory + copy helpers + generate sfx
    ├── generate-bg.sh              # Call GPT Image 2 to generate abstract watercolor bg.png
    ├── tts.py                      # Generate Volcano TTS narration
    ├── synth-sfx.sh                # Synthesize tick/chime/bell (ffmpeg)
    ├── retrofit-bg-image.py        # Add bg-image layer to existing videos
    ├── strip-dark-scene-bgs.py     # Remove scene-level dark backgrounds to let bg-image show through
    └── publish-to-youtube.py       # Auto-upload MP4 to YouTube (portrait→Shorts), can replace existing uploads

Anti-Patterns

Anti-Patterns

反单调(最重要 — "平铺直叙"的根源)

Anti-Monotony (Most important — root cause of "flat narration")

不要原因
所有 scene 都用 B1 双行 strikethrough历史最大失败模式。B1 整片最多 2 次
所有 scene 居中布局死气沉沉。≥2 非居中
所有 scene 字号差不多跨度必须 ≥240px
所有 scene 时长 5-7s跨度必须 ≥6s
整片只用 blur crossfade每 4 个转场 ≥2 种
整片没有 color-flip≥1 个 A3 是硬要求
整片没有几何元素≥1 个 scene 加粗线 / 大编号 / 引号
整片只用
tl.from({y, opacity})
≥3 种 Modern Motion Techniques
每个 scene 都堆满≥1 个 scene 留白 ≥60%
给每个 scene 都加
background:
盖住 bg-image,等于白生成水彩。普通 scene 不写 bg;只有 A3 color-flip 用纯色
color-flip / emphasis 永远只用橙至少 2-3 种 accent
用灰色作 secondary text / strike / 装饰水彩底上灰色对比度太低,会消失。改用
#f5efe5
cream + opacity 弱化(详见 [[no-low-contrast-text]])
Do NOTReason
Use B1 two-line strikethrough for all scenesThe biggest failure pattern in history. Max 2 B1 scenes per video
Center all scenesLifeless. ≥2 non-centered scenes required
Use similar font sizes for all scenesFont size span must be ≥240px
Make all scenes 5-7s longDuration span must be ≥6s
Use only blur crossfade for all transitionsAt least 2 types per 4 transitions
No color-flip scenes in the video≥1 A3 scene is mandatory
No geometric elements in the video≥1 scene must have thick lines / large numbering / quotation marks
Only use
tl.from({y, opacity})
animations
At least 3 Modern Motion Techniques required
Fill every scene with content≥1 scene must have ≥60% blank space
Add
background:
color to every scene
Covers bg-image, making watercolor generation useless. Do not set bg for regular scenes; only use solid color bg for A3 color-flip scenes
Always use orange for color-flip / emphasisAt least 2-3 different accent colors required
Use gray for secondary text / strike-through / decorationGray has too low contrast on watercolor background and will disappear. Use
#f5efe5
cream + opacity to weaken instead (see [[no-low-contrast-text]])

内容 / 工程

Content / Engineering

不要原因
用 Kokoro 做中文 TTS中文质量差,用户明确不接受
Volcano TTS 传
emotion
参数
_bigtts
声音返回
data: null
静默失败
zh_male_jieshuonansheng_mars_bigtts
含英文专名时循环 hallucinate
用 serif 字体(Songti / 宋体 / Noto Serif)不够冲击
把整段文章贴屏那是 PPT。视频每屏一个视觉时刻
超过 10 scene / 超过 90 秒注意力放不下
短文硬填到 90 秒文章短就做 30-50s,硬撑长会注水变浅
每个 scene 换字体配色风格风格漂移。design system 固定,模板变化
::after
伪元素 + CSS 变量做 strike
hyperframes 渲染路径下失效。用真实 DOM
<span class="strike-line">
最后 scene 之外用
gsap.to({opacity: 0})
退场动画 hyperframes 禁止 — 转场才是退场
每段 chunk 都加 chime太吵
../illustration.png
作 bg url
hyperframes render 不解析跨目录路径,渲染成纯黑。bg.png 必须在
video/
<audio>
id
render 会 silent。每个
<audio>
必须
id="..."
s1 是 A3 color-flip第一帧看不到 bg-image。color-flip 放 s2+
s1 标题元素都
from({opacity: 0})
第一帧黑屏。s1 主元素
opacity: 1
默认,只动 y/scale
Do NOTReason
Use Kokoro for Chinese TTSPoor Chinese quality, users explicitly reject it
Pass
emotion
parameter to Volcano TTS
_bigtts
voices will return
data: null
and fail silently
Use
zh_male_jieshuonansheng_mars_bigtts
Will loop hallucinate when containing English proper nouns
Use serif fonts (Songti / SimSun / Noto Serif)Not impactful enough
Paste entire article on screenThat's PPT. Video should have one visual moment per screen
Use more than 10 scenes / exceed 90 secondsCannot hold audience attention
Force short articles to 90 secondsShort articles should be 30-50s; forcing length will make content shallow
Change font/color style for each sceneStyle drift. Keep design system fixed, only change templates
Use
::after
pseudo-element + CSS variables for strike-through
Fails in hyperframes rendering paths. Use real DOM
<span class="strike-line">
Use
gsap.to({opacity: 0})
for scenes other than the last one
Exit animations are prohibited by hyperframes — transitions handle exit
Add chime to every segmentToo noisy
Use
../illustration.png
as bg url
Hyperframes render does not resolve cross-directory paths, will render pure black. bg.png must be inside
video/
Omit
id
from
<audio>
Render will be silent. Every
<audio>
must have
id="..."
Make s1 an A3 color-flip sceneFirst frame cannot see bg-image. Put color-flip scenes in s2+
Use
from({opacity: 0})
for all s1 title elements
First frame will be black screen. s1 main elements should have default
opacity: 1
, only animate y/scale

Common Pitfalls

Common Pitfalls

  • narration 写「——」破折号 → TTS 念出 "破折号"。删掉用句号或逗号
  • 某段 chunk 异常长(>3 chars/s) → Volcano hallucinate 循环。换声音,或拆短
  • scene 时长 < narration 时长 → 旁白被下一个 scene 切掉。scene 必须覆盖整段 narration + 0.3s 缓冲
  • 黑底大字 opacity: 0 时仍可见 → 检查
    .scene
    是否有
    opacity: 0
    默认(除了 s1)
  • .em
    .reveal-wrap
    里少量溢出(top/bottom 几 px)
    → 给
    .em
    line-height: 1
  • snapshot 字形和 render 不一致 → 现在都用 Noto Sans SC,正常一致
  • Write em dash
    ——
    in narration
    → TTS will read "em dash" aloud. Replace with period or comma
  • 某段 chunk 异常长(>3 chars/s) → Volcano will hallucinate and loop. Switch voice or split into shorter segments
  • Scene duration < narration duration → Voiceover will be cut off by next scene. Scene must cover entire narration + 0.3s buffer
  • Black background with large text still visible when opacity: 0 → Check if
    .scene
    has default
    opacity: 0
    (except s1)
  • .em
    slightly overflows (a few px top/bottom) inside
    .reveal-wrap
    → Add
    line-height: 1
    to
    .em
  • Snapshot glyphs differ from render → Now using Noto Sans SC exclusively, should be consistent

Dependencies

Dependencies

  • HyperFrames CLI (
    npx hyperframes
    ) — composition lint / inspect / snapshot / render
  • GPT Image 2 (
    ~/.claude/skills/gpt-image-2-skill/
    ) — 生成 bg.png;
    --provider codex
    用 ChatGPT auth
  • Volcano TTS
    VOLC_TTS_APPID
    /
    VOLC_TTS_ACCESS_TOKEN
    ~/code/.env
  • ffmpeg — SFX 合成、audio concat、aspect-ratio 检测
  • YouTube uploader (
    ~/.claude/skills/wjs-uploading-video/
    ) + OAuth token at
    ~/.config/youtube/token.json
    —— Step 9 自动发布
  • HyperFrames CLI (
    npx hyperframes
    ) — composition lint / inspect / snapshot / render
  • GPT Image 2 (
    ~/.claude/skills/gpt-image-2-skill/
    ) — generate bg.png; use
    --provider codex
    for ChatGPT auth
  • Volcano TTS
    VOLC_TTS_APPID
    /
    VOLC_TTS_ACCESS_TOKEN
    in
    ~/code/.env
  • ffmpeg — SFX synthesis, audio concat, aspect-ratio detection
  • YouTube uploader (
    ~/.claude/skills/wjs-uploading-video/
    ) + OAuth token at
    ~/.config/youtube/token.json
    — Step 9 auto-publishing