wjs-converting-text-to-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

wjs-converting-text-to-video

把一篇王建硕风格的微信公众号

article.md

做成 1080×1920 竖屏、30-90 秒 的中文解说短视频：TTS 旁白 + HyperFrames CSS/GSAP 动画 + 抽象水彩背景 + 转场 SFX。输出 MP4 给视频号 / 抖音 / 小红书 / Reels。

Convert a Wang Jianshuo-style WeChat Official Account

article.md

into a 1080×1920 vertical, 30-90 second Chinese narrated short video: TTS voiceover + HyperFrames CSS/GSAP animations + abstract watercolor backgrounds + transition SFX. Output MP4 for WeChat Channels / Douyin / Xiaohongshu / Reels.

What this skill produces

维度	默认
尺寸	1080×1920 竖屏 (9:16)
时长	30-90 秒
Scene 数	5-10
旁白	火山引擎 Volcano TTS，默认阿虎对话男声
背景	GPT Image 2 生成的抽象水彩 ( `bg.png` ) + blur 30 + 暖黑半透明 overlay
字体	Noto Sans SC，hero 900，主文字暖奶白
输出	`<article-folder>/<slug>.mp4` （与 `video/` 平行，不放 `video/` 里）
发布	自动上传到 YouTube — Portrait → Shorts，Landscape → 普通 video；重新渲染会替换老视频（不累积）

Dimension	Default
Resolution	1080×1920 vertical (9:16)
Duration	30-90 seconds
Number of Scenes	5-10
Voiceover	Volcano Engine Volcano TTS, default "Ahu Conversation" male voice
Background	Abstract watercolor generated by GPT Image 2 ( `bg.png` ) + blur 30 + warm black semi-transparent overlay
Font	Noto Sans SC, hero weight 900, main text warm cream white
Output	`<article-folder>/<slug>.mp4` (parallel to `video/` , not stored in `video/` )
Publishing	Auto-upload to YouTube — Portrait → Shorts, Landscape → regular video; re-rendering replaces old video (no accumulation)

When this skill fires

用户已有
```
article.md
```
，说「做成视频」「做一个解说」「讲一遍」

用户跑

/wjs-converting-text-to-video <article-folder>

用户说「把昨天发的那 X 篇都做成视频」之类的批量请求

The user already has
```
article.md
```
and says: 「做成视频」「做一个解说」「讲一遍」

The user runs

/wjs-converting-text-to-video <article-folder>

The user requests batch conversion like "Turn all X articles I posted yesterday into videos"

When NOT to use

没有文章稿，只是一个想法 → 先用
```
/wjs-publishing-wechat
```
写出 article.md，再来

用户要的是字幕烧录 / 翻译 / 配音替换 → 用

/wjs-burning-subtitles

/wjs-dubbing-video

/wjs-localizing-video

视频要英文 / 西语等非中文 → 本 skill 专注中文 TTS (Volcano 火山引擎)；非中文走 hyperframes 自带 tts 命令 (kokoro 英文还可以)
横屏 16:9 → 本 skill 默认竖屏；横屏仅在用户明确要求时改

No article draft, only an idea → First use
```
/wjs-publishing-wechat
```
to write article.md, then proceed
User needs subtitle burning / translation / voiceover replacement → Use
```
/wjs-burning-subtitles
```
/
```
/wjs-dubbing-video
```
/
```
/wjs-localizing-video
```
Video requires non-Chinese languages like English/Spanish → This skill focuses on Chinese TTS (Volcano Engine); use hyperframes' built-in tts command for non-Chinese (Kokoro works well for English)
Landscape 16:9 format → This skill defaults to vertical; only change to landscape if explicitly requested by user

Core Principle

视频不是文章的可视化朗读，而是文章的视觉重构。

每个 scene 是一个独立的视觉时刻 —— 一个对比、一个排比、一个数字、一个比喻。文字撑满屏幕，黑体加粗，重点字橙色高亮。背景是抽象水彩 (blur 后柔化)，整体调子稳重、克制、有冲击力。

节奏 > 模板。一段 5-10 scene 的视频，如果从头到尾都是"两行对照"的同一种排版，就不是视频，是 slideshow。现代感来自对比 —— 极端字号差、不对称布局、短 scene 与长 scene 交替、纯文字 scene 与几何元素 scene 交替、水彩底 scene 与亮色 punch scene 交替。

默认是平庸的。如果只是从模板表顶端挑几种最容易的，结果一定是"平铺直叙的两行格式"。强制走 Step 1b Scene Mix Rule 配比。

Video is not a visual reading of the article, but a visual reconstruction of it.

Each scene is an independent visual moment — a contrast, a parallelism, a number, a metaphor. Text fills the screen, bolded, with key words highlighted in orange. The background is abstract watercolor (softened with blur), with an overall tone that is steady, restrained, and impactful.

Rhythm > Templates. A video with 5-10 scenes that uses the same "two-line comparison" layout throughout is not a video, it's a slideshow. Modernity comes from contrast — extreme font size differences, asymmetric layouts, alternating short and long scenes, alternating text-only and geometric-element scenes, alternating watercolor-background and bright punch scenes.

Default is mediocre. If you just pick the easiest templates from the top of the list, the result will definitely be a "flat two-line format". Mandatorily follow the Step 1b Scene Mix Rule ratio.

Workflow

Step 1: 设计 5-10 个视觉时刻

Step 1: Design 5-10 visual moments

读

<article-folder>/article.md

，按论证结构拆成 5-10 个 scene（控制在 30-90 秒总时长）。短文（核心 1-2 个要点）做 5-6 scene / 30-50s；长文 8-10 scene / 60-90s。每个 scene 一段叙述（旁白）+ 一个清晰的视觉骨架。

模板表 —— 6 类共 16 种，按需混搭：

Read

<article-folder>/article.md

, split it into 5-10 scenes according to the argument structure (control total duration to 30-90 seconds). Short articles (1-2 core points) use 5-6 scenes / 30-50s; long articles use 8-10 scenes / 60-90s. Each scene includes a narration segment + a clear visual framework.

Template Library — 6 categories, 16 templates total, mix as needed:

A. Hero / Punch（强对比 climax，每片 ≥1，时长 ≤4s）

A. Hero / Punch (High-contrast climax, ≥1 per video, duration ≤4s)

模板	适合
A1. 全屏单字 hero	1-3 字 climax 词撑满屏，字号 280-400px
A2. Outline hero	空心字 `-webkit-text-stroke: 4px #f5efe5; color: transparent;`
A3. Color-flip punch	整屏背景换亮色（橙/红/金/翠绿等），反白字
A4. Gradient text hero	大字加 `background: linear-gradient(...); -webkit-background-clip: text;`

Template	Suitable for
A1. Full-screen single-character hero	1-3 climax words filling the screen, font size 280-400px
A2. Outline hero	Hollow text with `-webkit-text-stroke: 4px #f5efe5; color: transparent;`
A3. Color-flip punch	Full-screen background changes to bright color (orange/red/gold/green etc.), with reversed text color
A4. Gradient text hero	Large text with `background: linear-gradient(...); -webkit-background-clip: text;`

B. Contrast / 对照（反差结构，每片 1-2 个，时长 5-8s）

B. Contrast / Comparison (Contrast structure, 1-2 per video, duration 5-8s)

模板	适合
B1. 双行对照 + strikethrough	「以前 X，现在 Y」「不是 A，是 B」 — 整片最多 2 个
B2. 左右分屏对照	屏幕一分为二（可加竖线分隔）
B3. 对角线对照	左上 ↔ 右下，中间大量留白

Template	Suitable for
B1. Two-line comparison + strikethrough	"Previously X, now Y" / "Not A, but B" — max 2 per video
B2. Split-screen left-right comparison	Screen divided into two halves (can add vertical separator line)
B3. Diagonal comparison	Top-left ↔ Bottom-right, with large blank space in the middle

C. List / 结构（多项并列，每片 1-2 个，时长 6-10s）

C. List / Structure (Parallel items, 1-2 per video, duration 6-10s)

模板	适合
C1. N 个卡片横排	3-5 个并列，用深暖黑 + 单色边框
C2. 垂直堆叠关键词	6-8 个排比项，可加大数字编号 01-08
C3. 真网格	2×2 / 3×2 网格，每格图标 + 标签（竖屏宽度有限，4 列横排会挤）
C4. 阶梯 / 错位列表	每项 `margin-left` 递增

Template	Suitable for
C1. Horizontal row of N cards	3-5 parallel items, using dark warm black + monochrome border
C2. Vertically stacked keywords	6-8 parallel items, can add large numbering 01-08
C3. True grid	2×2 / 3×2 grid, each cell with icon + label (vertical screen width is limited, 4 columns will be crowded)
C4. Stepped / staggered list	Each item has increasing `margin-left`

D. Stat / 数据（数字 climax，每片 ≥1，时长 4-6s）

D. Stat / Data (Number climax, ≥1 per video, duration 4-6s)

模板	适合
D1. 数字 ticker	0 → N 滚动动画（ `gsap.to({textContent})` ）
D2. 数字 + 标签	主数字 200-400px + 60-80px 解释
D3. 进度条 / 时间轴	横向 progress bar + 节点

Template	Suitable for
D1. Number ticker	0 → N scrolling animation ( `gsap.to({textContent})` )
D2. Number + label	Main number 200-400px + 60-80px explanation
D3. Progress bar / timeline	Horizontal progress bar + nodes

E. Quote / Climax（金句落点，每片 1-2 个，时长 6-10s）

E. Quote / Climax (Key quote conclusion, 1-2 per video, duration 6-10s)

模板	适合
E1. 段落级 hero text	一句 60-100px 金句，左对齐 + 左侧 emphasis bar
E2. 大引号 + 内文	巨大半透明开引号作背景装饰

Template	Suitable for
E1. Paragraph-level hero text	A 60-100px key quote, left-aligned + left emphasis bar
E2. Large quotation marks + content	Huge semi-transparent opening quotation marks as background decoration

F. 装饰 / 几何（节奏调味，可选）

F. Decoration / Geometry (Rhythm seasoning, optional)

模板	适合
F1. 格子 + spinner / 进度条	多并发画面
F2. 对话气泡 ↔ 回应	角色 A 说 → 角色 B 做

每个 scene 的旁白控制在 3-12 秒（短 punch 3-4s，长 breath 10-12s，不要全部都是 5-7s）。所有 scene 加起来 30-90 秒，不要超过 90 秒。文章短就做短，5 个 scene × 6s = 30s 也是合格。

Template	Suitable for
F1. Grid + spinner / progress bar	Multi-concurrent visuals
F2. Dialogue bubble ↔ Response	Character A speaks → Character B acts

Each scene's narration should be 3-12 seconds (short punch scenes 3-4s, long breathing scenes 10-12s, don't make all scenes 5-7s). Total duration of all scenes should be 30-90 seconds, no more than 90 seconds. Short articles should be made short — 5 scenes × 6s = 30s is acceptable.

Step 1b: Scene Mix Rule（强制）

Step 1b: Scene Mix Rule (Mandatory)

写完 5-10 个 scene 设计后，按下面 checklist 自查。任何一条不满足 → 回去调整。

After designing 5-10 scenes, self-check using the following checklist. If any item is not met → go back and adjust.

配比硬规则

Ratio Hard Rules

节奏硬规则

Rhythm Hard Rules

scene 时长跨度 ≥ 6s（最短 ≤ 4s、最长 ≥ 9s）
≥2 次"短 → 长 → 短"或"长 → 短"节奏切换
字号跨度 ≥ 240px（最大 hero ≥ 320px，最小 ≤ 80px）

Scene duration span ≥ 6s (shortest ≤ 4s, longest ≥ 9s)
≥2 rhythm switches like "short → long → short" or "long → short"
Font size span ≥ 240px (largest hero ≥ 320px, smallest ≤ 80px)

布局硬规则

Layout Hard Rules

≥2 个 scene 非居中（贴角、对角、左对齐、阶梯等）
≥1 个 scene 留白占 ≥ 60% 屏幕（呼吸）
≥1 个 scene 含几何装饰（粗线、色块、箭头、圆点、大编号）

≥2 scenes with non-centered layout (corner-aligned, diagonal, left-aligned, stepped etc.)
≥1 scene with blank space occupying ≥ 60% of the screen (breathing space)
≥1 scene containing geometric decorations (thick lines, color blocks, arrows, dots, large numbering)

配色硬规则

Color Hard Rules

大部分 scene 没有
background:
色 — 让水彩 bg-image 透出；只有 A3 color-flip 才用纯色 bg
color-flip scene 颜色不只是橙/蓝/白（深红 / 深金 / 翠绿 / 青松 / 暗紫等都可）
emphasis 至少 2-3 种颜色（技术词用蓝、价值词用金、增长词用绿、警告词用红）

Most scenes do not have
background:
color — let the watercolor bg-image show through; only use solid color bg for A3 color-flip scenes
Color-flip scene colors are not limited to orange/blue/white (deep red / deep gold / emerald green / pine green / dark purple etc. are all acceptable)
At least 2-3 different emphasis colors (technical terms in blue, value terms in gold, growth terms in green, warning terms in red)

反单调自检

Anti-Monotony Self-Check

把所有 scene 截图缩成缩略图并排 — 能一眼分辨吗？如果 8 个看起来一样 → 重做
第 1、4、7 scene 的视觉密度是不是不一样？应该有的密、有的极简
有"meta-rhythm"吗？比如 A 开场 → 3 个 B/C 展开 → D climax → E 收尾 — 比线性铺更有戏剧弧

Screenshot all scenes, shrink to thumbnails and arrange side by side — can you tell them apart at a glance? If 8 look the same → redo
Are the visual densities of scenes 1,4,7 different? Some should be dense, some extremely minimal
Is there a "meta-rhythm"? For example: A opening → 3 B/C expansion scenes → D climax → E conclusion — more dramatic than linear layout

Step 2: 写

narration_chunks.json

Step 2: Write

narration_chunks.json

json

[
  {"id": "s01", "text": "我们以前，是 AI 的领导。现在，我们就是它的维修工。"},
  {"id": "s02", "text": "..."}
]

写旁白细节：

比 article.md 更口语、更短促，逗号/句号多用让 TTS 自然停顿
数字 / 英文混排 OK（"Claude Code"、"100 倍"），Volcano 都能读
不写括号注释、不写
```
...
```
、不写破折号
```
——
```
（TTS 会念出 "破折号" 三字）
删掉 article.md 里的
```
**加粗 markdown**
```
，只留纯文字
去掉百姓网相关 facts：article.md 里如出现「百姓网」「百姓网现在 X 人」「百姓网员工」等都要 strip 或泛化（"百姓网现在 158 个人" → "现实里没几个真人"）。这是过时信息，不要进视频。同理 visuals 不要出现 "百姓网" label 或 "158 人" stat。详见 [[no-baixing-facts]]

json

[
  {"id": "s01", "text": "我们以前，是 AI 的领导。现在，我们就是它的维修工。"},
  {"id": "s02", "text": "..."}
]

Narration Writing Details:

More colloquial and concise than article.md, use more commas/periods to allow natural pauses in TTS
Mixed numbers/English is OK ("Claude Code", "100 倍"), Volcano TTS can read them correctly
Do not write parenthetical comments,
```
...
```
, or em dashes
```
——
```
(TTS will read "em dash" aloud)
Remove
```
**bold markdown**
```
from article.md, leave only plain text
Remove Baixing.com-related facts: If article.md contains "百姓网", "百姓网 now has X people", "百姓网 employees" etc., strip or generalize them ("百姓网 now has 158 people" → "There are very few real people in reality"). This is outdated information and should not be included in the video. Similarly, do not include "百姓网" labels or "158 people" stats in visuals. See [[no-baixing-facts]]

Step 3: 生成 TTS narration

Step 3: Generate TTS narration

bash

cd <article-folder>/video
python3 tts_narration.py

脚本默认用

zh_male_ahu_conversation_wvae_bigtts

（阿虎对话）— 段间插 0.35s 静音，输出

narration.mp3

timing.json

。

Volcano TTS 注意事项（踩过的坑）：

用 resource
```
volc.service_type.10029
```
，speaker 选
```
zh_*_*_bigtts
```
绝对不要传
emotion
/
emotion_scale
— 大部分
```
_bigtts
```
声音会返回
```
data: null
```
静默失败
绝对不要用 kokoro（hyperframes 自带 tts）— 中文质量差，用户明确不接受
避免
```
zh_male_jieshuonansheng_mars_bigtts
```
— 含英文专名（如 "Claude Code"）会循环 hallucinate

备用声音（按推荐顺序）：

```
zh_male_ahu_conversation_wvae_bigtts
```
(阿虎对话) — 默认，自然口语
```
zh_male_M392_conversation_wvae_bigtts
```
— 同 wvae 系列
```
zh_male_wennuanahu_moon_bigtts
```
(温暖阿虎) — 更暖、播音感
```
zh_male_silang_mars_bigtts
```
(思朗) — 沉稳思考，戏剧感强
```
zh_male_baqiqingshu_mars_bigtts
```
(霸气) — 更有力度

切声音：

python3 tts_narration.py --voice zh_male_silang_mars_bigtts

bash

cd <article-folder>/video
python3 tts_narration.py

The script defaults to

zh_male_ahu_conversation_wvae_bigtts

(Ahu Conversation) — inserts 0.35s silence between segments, outputs

narration.mp3

timing.json

Volcano TTS Notes (Lessons Learned):

Use resource
```
volc.service_type.10029
```
, select speakers with
```
zh_*_*_bigtts
```
Never pass
emotion
/
emotion_scale
parameters — most
```
_bigtts
```
voices will return
```
data: null
```
and fail silently
Never use Kokoro (hyperframes' built-in tts) — Chinese quality is poor, users explicitly reject it
Avoid
```
zh_male_jieshuonansheng_mars_bigtts
```
— will loop hallucinate when containing English proper nouns (e.g., "Claude Code")

Alternative Voices (in recommended order):

```
zh_male_ahu_conversation_wvae_bigtts
```
(Ahu Conversation) — default, natural colloquial
```
zh_male_M392_conversation_wvae_bigtts
```
— same wvae series
```
zh_male_wennuanahu_moon_bigtts
```
(Warm Ahu) — warmer, broadcast-style
```
zh_male_silang_mars_bigtts
```
(Silang) — calm, thoughtful, dramatic
```
zh_male_baqiqingshu_mars_bigtts
```
(Domineering) — more powerful

Switch voices:

python3 tts_narration.py --voice zh_male_silang_mars_bigtts

Step 4: 生成水彩背景图

Step 4: Generate watercolor background image

bg-image 是视觉主基调（柔化的抽象水彩）。不要用 article 的
illustration.png
— 手绘示意图细节太多，blur 后变成均匀深色泥（视觉上仍是纯黑）。必须用专门生成的抽象水彩。

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme>

<theme>

选（根据文章主题）：

theme	色板	适合
`personal`	bright warm yellow, soft coral pink, terracotta, sage green, cream	个人、手作、温暖
`tech`	cool teal, electric blue, deep purple, mint, white	AI、技术、数据
`reflection`	sage green, dusty blue, lavender, pearl, cream	反思、沉静
`warning`	burnt orange, deep red, mustard, charcoal	警示、张力
`growth`	fresh green, gold, soft yellow, sky blue	增长、复利
`abstract`	lavender, dusty rose, sage, soft amber	抽象、哲思

输出：

<article-folder>/video/bg.png

(1088×1920, ~3MB)。

⚠️ 图片必须在
video/
目录内 — 不能用

../illustration.png

，hyperframes render 不解析跨目录相对路径，会渲染成纯黑。

The bg-image is the main visual tone (softened abstract watercolor). Do not use the article's
illustration.png
— hand-drawn schematics have too many details, and become uniform dark mud after blur (visually still pure black). Must use specially generated abstract watercolor.

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme>

Choose

<theme>

(based on article topic):

theme	Color Palette	Suitable for
`personal`	bright warm yellow, soft coral pink, terracotta, sage green, cream	Personal, handcrafted, warm topics
`tech`	cool teal, electric blue, deep purple, mint, white	AI, technology, data topics
`reflection`	sage green, dusty blue, lavender, pearl, cream	Reflection, calm topics
`warning`	burnt orange, deep red, mustard, charcoal	Warning, tension topics
`growth`	fresh green, gold, soft yellow, sky blue	Growth, compound interest topics
`abstract`	lavender, dusty rose, sage, soft amber	Abstract, philosophical topics

Output:

<article-folder>/video/bg.png

(1088×1920, ~3MB).

⚠️ The image must be in the
video/
directory — cannot use

../illustration.png

, hyperframes render does not resolve cross-directory relative paths and will render pure black.

Step 5: 写 HyperFrames composition (

index.html

)

Step 5: Write HyperFrames composition (

index.html

)

读

timing.json

，按每个 chunk 的 start/end 设计 scene。竖屏 1080×1920 结构：

html

<html><head><script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
  html, body {
    width: 1080px; height: 1920px; margin: 0; overflow: hidden;
    background: #0e0b08;
    font-family: 'Noto Sans SC', 'PingFang SC', 'Heiti SC', sans-serif;
    font-weight: 900;
    color: #f5efe5;
    letter-spacing: -0.02em;
    -webkit-font-smoothing: antialiased;
  }
  #bg-image {
    position: absolute; inset: 0;
    background-image: url('bg.png');
    background-size: cover;
    background-position: center;
    filter: blur(30px) brightness(0.65) saturate(0.85);
    z-index: 0;
    transform: scale(1.1);
  }
  #bg-overlay {
    position: absolute; inset: 0;
    background: rgba(14, 11, 8, 0.28);
    z-index: 1;
  }
  .scene { position: absolute; inset: 0; overflow: hidden; opacity: 0; z-index: 2; }
  #s1 { opacity: 1; }
  /* ... scene-specific styles ... */
</style></head>
<body>
  <div id="root" data-composition-id="main" data-start="0" data-duration="<total+2>" data-width="1080" data-height="1920">
    <div id="bg-image"></div>
    <div id="bg-overlay"></div>
    <!-- scene divs s1..sN -->
    <!-- audio: narration + ticks + chimes + bell -->
  </div>
  <script>
    /* GSAP timeline: paused + register to window.__timelines['main'] */
  </script>
</body></html>

Read

timing.json

, design scenes according to each chunk's start/end times. 1080×1920 vertical screen structure:

html

<html><head><script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
  html, body {
    width: 1080px; height: 1920px; margin: 0; overflow: hidden;
    background: #0e0b08;
    font-family: 'Noto Sans SC', 'PingFang SC', 'Heiti SC', sans-serif;
    font-weight: 900;
    color: #f5efe5;
    letter-spacing: -0.02em;
    -webkit-font-smoothing: antialiased;
  }
  #bg-image {
    position: absolute; inset: 0;
    background-image: url('bg.png');
    background-size: cover;
    background-position: center;
    filter: blur(30px) brightness(0.65) saturate(0.85);
    z-index: 0;
    transform: scale(1.1);
  }
  #bg-overlay {
    position: absolute; inset: 0;
    background: rgba(14, 11, 8, 0.28);
    z-index: 1;
  }
  .scene { position: absolute; inset: 0; overflow: hidden; opacity: 0; z-index: 2; }
  #s1 { opacity: 1; }
  /* ... scene-specific styles ... */
</style></head>
<body>
  <div id="root" data-composition-id="main" data-start="0" data-duration="<total+2>" data-width="1080" data-height="1920">
    <div id="bg-image"></div>
    <div id="bg-overlay"></div>
    <!-- scene divs s1..sN -->
    <!-- audio: narration + ticks + chimes + bell -->
  </div>
  <script>
    /* GSAP timeline: paused + register to window.__timelines['main'] */
  </script>
</body></html>

🎬 第一帧规则（硬性）

🎬 First Frame Rule (Mandatory)

视频 t=0 必须包含：

bg-image 完全可见 — 永远 opacity 1，从不 fade-in（CSS 默认就可见，别在 GSAP 里改它的 opacity）
标题元素可见 — s1 的主要标题元素
```
tl.from({y:30, scale:0.95})
```
可，但不要
```
tl.from({opacity:0})
```
，否则 t=0 就是黑屏
s1 不能是 A3 color-flip — 否则盖住 bg-image，第一帧就看不到水彩。color-flip 留给 s2+

At video t=0, it must include:

Full visibility of bg-image — always opacity 1, never fade-in (visible by default in CSS, do not change its opacity in GSAP)
Visible title element — s1's main title element can use
```
tl.from({y:30, scale:0.95})
```
, but do not use
```
tl.from({opacity:0})
```
, otherwise t=0 will be black screen
s1 cannot be A3 color-flip — otherwise it will cover the bg-image, and the watercolor will not be visible in the first frame. Save color-flip for s2+

色彩系统

Color System

主文字 / 锚定色（design system，全片一致）：

角色	值	用法
主文字	`#f5efe5` 暖奶白	hero / 主要内容
二级文字（副标题、caption）	`#f5efe5` + `opacity: 0.7` + 小字号	不要用灰色（ `#8a7e72` 在水彩底上看不清）。用 opacity + 缩字号做 hierarchy
划掉文字本身	`#f5efe5` + `opacity: 0.5` + strikethrough line	不要用 `#6d635a` 暗灰 — 在水彩底上看不清。改用 opacity 弱化 + 橙色 strike line
装饰大编号（01-08）	`#f5efe5` + `opacity: 0.18` 或 `#e87a3e` + `opacity: 0.35`	不要用 `#2b2620` 等深灰（水彩底上完全消失）
Outline 描边	`#f5efe5` 4-8px stroke + `color: transparent`	A2 空心字
默认 fallback bg	`#0e0b08` 深暖黑	被 bg-image + overlay 覆盖；color-flip 不用

核心原则：所有文字用
#f5efe5
cream 或
#e87a3e
橙系（accent palette），用 opacity + size 做 hierarchy，不用色相变化。灰色是黑底时代的遗物，水彩底上一律不用。详见 [[no-low-contrast-text]]

Color-flip 背景 palette（A3，不只是橙/蓝/白）：

hex	适合
`#e87a3e` 经典橙	警示、强调、climax punch
`#6b9bc4` 亮蓝	数据、技术 climax
`#f5efe5` 暖奶白	收尾、安静的反差
`#c45c3e` 深红	警告、错误 climax
`#d4a040` 深金	成就、价值 climax
`#5a8c6a` 翠绿	增长、复利、生命力
`#4a8a8a` 青松	冷静、长期主义
`#7a5a8a` 暗紫	智慧、神秘 climax
`#c48a8a` 暗粉	柔软、人性

color-flip 上的文字用

#0e0b08

或

#f5efe5

反相。

Emphasis / Accent palette（不只是橙）：

hex	适合
`#e87a3e` 橙	默认 emphasis
`#6b9bc4` 蓝	数据、技术词、AI
`#d4a040` 金	价值、成就
`#5a8c6a` 翠绿	增长、好结果
`#4a8a8a` 青松	长期、稳定
`#c45c3e` 深红	警告、反差
`#8a7aaa` 暗紫	抽象、智慧
`#c48a8a` 暗粉	柔软、人性化

整片 emphasis ≥ 2-3 种，根据 scene 主题选 accent。

Main Text / Anchor Colors (design system, consistent throughout the video):

Role	Value	Usage
Main Text	`#f5efe5` warm cream white	Hero text / main content
Secondary Text (subtitle, caption)	`#f5efe5` + `opacity: 0.7` + smaller font size	Do not use gray ( `#8a7e72` is unreadable on watercolor background). Use opacity + font size to create hierarchy
Strikethrough text itself	`#f5efe5` + `opacity: 0.5` + strikethrough line	Do not use `#6d635a` dark gray — unreadable on watercolor background. Use opacity to weaken + orange strike line instead
Decorative large numbering (01-08)	`#f5efe5` + `opacity: 0.18` or `#e87a3e` + `opacity: 0.35`	Do not use `#2b2620` or other dark grays — completely disappears on watercolor background
Outline stroke	`#f5efe5` 4-8px stroke + `color: transparent`	A2 hollow text
Default fallback bg	`#0e0b08` dark warm black	Covered by bg-image + overlay; not used for color-flip

Core Principle: All text uses
#f5efe5
cream or
#e87a3e
orange series (accent palette), use opacity + size for hierarchy, do not use hue changes. Gray is a relic of the black background era, never use it on watercolor backgrounds. See [[no-low-contrast-text]]

Color-flip Background Palette (A3, not limited to orange/blue/white):

hex	Suitable for
`#e87a3e` classic orange	Warning, emphasis, climax punch
`#6b9bc4` bright blue	Data, technology climax
`#f5efe5` warm cream white	Conclusion, quiet contrast
`#c45c3e` deep red	Warning, error climax
`#d4a040` deep gold	Achievement, value climax
`#5a8c6a` emerald green	Growth, compound interest, vitality
`#4a8a8a` pine green	Calm, long-termism
`#7a5a8a` dark purple	Wisdom, mysterious climax
`#c48a8a` dark pink	Soft, humanistic topics

Text on color-flip backgrounds uses

#0e0b08

#f5efe5

reversed colors.

Emphasis / Accent Palette (not limited to orange):

hex	Suitable for
`#e87a3e` orange	Default emphasis
`#6b9bc4` blue	Data, technical terms, AI
`#d4a040` gold	Value, achievement
`#5a8c6a` emerald green	Growth, positive results
`#4a8a8a` pine green	Long-term, stable
`#c45c3e` deep red	Warning, contrast
`#8a7aaa` dark purple	Abstract, wisdom
`#c48a8a` dark pink	Soft, humanistic

Use at least 2-3 different emphasis colors throughout the video, choose accents based on scene themes.

字体系统（竖屏 1080 宽）

Font System (1080px wide vertical screen)

项	值
字重	hero 900 / 主文 800 / 二级 600-700 / caption 500
字距	hero `-0.04em` 到 `-0.06em` / 主文 `-0.02em` / caption `0`
Punch hero (A1/A2，1-3 字)	280-400px
短句 hero (4-6 字)	160-240px
长句 hero (7-10 字)	100-150px
卡片内容	56-130px
副标题	40-72px
Caption / 序号 / 标签	20-40px

Item	Value
Font Weight	hero 900 / main text 800 / secondary 600-700 / caption 500
Letter Spacing	hero `-0.04em` to `-0.06em` / main text `-0.02em` / caption `0`
Punch hero (A1/A2, 1-3 characters)	280-400px
Short sentence hero (4-6 characters)	160-240px
Long sentence hero (7-10 characters)	100-150px
Card content	56-130px
Subtitle	40-72px
Caption / numbering / label	20-40px

布局系统（反居中惯性）

Layout System (Anti-centering inertia)

布局	CSS 关键	适合
居中	`flex; center; center;`	A 类 hero，但 ≤50% scene
左对齐贴顶	`padding: 80px 80px 0 80px;`	E 类金句、长 quote
右下角锚定	`position: absolute; right: 80px; bottom: 80px;`	落款、climax 词
对角线	top-left / bottom-right	B3 对角对照
网格	`display: grid; grid-template-columns: repeat(2, 1fr);`	C3（竖屏 2×N 而非 3×N）
阶梯	每项 `margin-left: calc(60px * var(--i));`	C4 错位列表
贴底 + 上方留白	`position: absolute; bottom: 60px;` 上方空白	呼吸 scene
边角小元素	文字小贴一角，其他全空	极简 / 留白 punch

Padding：撑满型 40-80px，呼吸型 120-200px。不要所有 scene 都用同一个 padding。

Layout	CSS Key Points	Suitable for
Centered	`flex; center; center;`	Category A hero scenes, but ≤50% of total scenes
Left-aligned top	`padding: 80px 80px 0 80px;`	Category E key quotes, long quotes
Bottom-right anchored	`position: absolute; right: 80px; bottom: 80px;`	Signature, climax words
Diagonal	top-left / bottom-right	B3 diagonal comparison
Grid	`display: grid; grid-template-columns: repeat(2, 1fr);`	C3 (2×N instead of 3×N for vertical screen)
Stepped	Each item has `margin-left: calc(60px * var(--i));`	C4 staggered list
Bottom-aligned + top blank space	`position: absolute; bottom: 60px;` with blank space above	Breathing scenes
Corner small element	Small text anchored to one corner, rest blank	Minimalist / blank punch scenes

Padding: 40-80px for full-screen scenes, 120-200px for breathing scenes. Do not use the same padding for all scenes.

几何装饰元素

Geometric Decorative Elements

每隔几个 scene 用一个：

粗短线 8-16px × 40-200px，emphasis bar，橙色
左侧 emphasis bar 6px × 100%，配长 quote
大数字编号 01-08，list 项序号（淡灰、巨大、装饰性）
大引号字符
```
"
```
半透明超大置左上
横向分隔线 2-4px 奶白 30% 透明
圆点 / 方块 12-20px、橙色，list bullet
箭头 ➜ 或自绘 SVG

Use one every few scenes:

Thick short line 8-16px × 40-200px, emphasis bar, orange
Left emphasis bar 6px × 100%, paired with long quotes
Large numbering 01-08, list item numbering (light gray, huge, decorative)
Large quotation mark character
```
"
```
semi-transparent, placed top-left
Horizontal separator line 2-4px cream white with 30% transparency
Dot / square 12-20px, orange, list bullet
Arrow ➜ or custom SVG

Scene 转场（4 种 + 混用规则）

Scene Transitions (4 types + mixing rules)

不要全片都 blur crossfade。每 4 个转场必须 ≥2 种类型。

T1. Blur crossfade（默认柔和）

0.6s，
```
sine.inOut
```

后 scene

opacity: 0, filter: blur(24px)

→

opacity: 1, filter: blur(0)

前 scene 同时 fade-out + blur

T2. White flash cut（punch 切，最现代）

0.18s 总长：60ms 白闪 → 切 → 40ms 新 scene scale 1.05 → 1
适合：进入 A 类 hero、D 类 stat、climax 切换

tl.to('.flash', { opacity: 1, duration: 0.06, ease: 'none' }, T - 0.06)
  .set(prevScene, { opacity: 0 }, T)
  .set(nextScene, { opacity: 1 }, T)
  .to('.flash', { opacity: 0, duration: 0.12, ease: 'power2.out' }, T)
  .from(nextScene, { scale: 1.05, duration: 0.25, ease: 'expo.out' }, T);

T3. Scale push（推进感）

0.55s，前 scene
```
scale: 1 → 0.85
```
，后 scene
```
scale: 1.15 → 1
```
适合：从概览推到细节

T4. Color flash cut（橙/蓝闪一下，强烈节奏）

0.22s 总长：80ms 全屏橙 → 切 → 40ms 收
适合：进入 A3 color-flip 或关键转折
全片最多 2 次

flash overlay 在 HTML 里加

<div class="flash">

全屏定位、默认 opacity 0、z-index 100。

Do not use blur crossfade for all transitions. For every 4 transitions, use at least 2 different types.

T1. Blur crossfade (default soft)

0.6s,
```
sine.inOut
```

Next scene transitions from

opacity: 0, filter: blur(24px)

→

opacity: 1, filter: blur(0)

Previous scene fades out + blurs simultaneously

T2. White flash cut (punch cut, most modern)

Total 0.18s: 60ms white flash → cut → 40ms new scene scale 1.05 → 1
Suitable for: entering Category A hero, Category D stat, climax transitions

tl.to('.flash', { opacity: 1, duration: 0.06, ease: 'none' }, T - 0.06)
  .set(prevScene, { opacity: 0 }, T)
  .set(nextScene, { opacity: 1 }, T)
  .to('.flash', { opacity: 0, duration: 0.12, ease: 'power2.out' }, T)
  .from(nextScene, { scale: 1.05, duration: 0.25, ease: 'expo.out' }, T);

T3. Scale push (sense of advancement)

0.55s, previous scene
```
scale: 1 → 0.85
```
, next scene
```
scale: 1.15 → 1
```
Suitable for: pushing from overview to details

T4. Color flash cut (orange/blue flash, strong rhythm)

Total 0.22s: 80ms full-screen orange → cut → 40ms fade out
Suitable for: entering A3 color-flip or key turning points
Max 2 times per video

Add flash overlay in HTML:

<div class="flash">

positioned full-screen, default opacity 0, z-index 100.

入场动画规则

Entrance Animation Rules

每个 scene 的每个元素都用
```
tl.from(...)
```
入场（y/opacity/scale）
入场 stagger 0.1-0.3s；首元素 t = scene.start + 0.3 起

≥3 种不同 ease（

power3.out

back.out(1.3)

expo.out

elastic.out(1, 0.5)

）

不要
gsap.to({opacity: 0})
退场 — 转场已处理。只有最后 scene 可 fade-to-black
整片必须用到 ≥3 种 Modern Motion Techniques

Every element in every scene uses
```
tl.from(...)
```
entrance animation (y/opacity/scale)
Entrance stagger 0.1-0.3s; first element starts at scene.start + 0.3s

Use at least 3 different eases (

power3.out

back.out(1.3)

expo.out

elastic.out(1, 0.5)

)

Do not use
gsap.to({opacity: 0})
for exit — transitions handle exit. Only the last scene can fade-to-black
Must use at least 3 types of Modern Motion Techniques throughout the video

Modern Motion Techniques

平庸视频和现代视频的差别一半在排版、一半在 motion。下面 7 种每片必须用 ≥3 种（特定 scene 用，不要全片堆）。

Half the difference between mediocre and modern videos is layout, the other half is motion. Use at least 3 of the following 7 techniques per video (use in specific scenes, don't stack all throughout)

1. Kinetic Typography（字符 stagger 入场）—— A 类 hero

1. Kinetic Typography (Character stagger entrance) — Category A hero

html

<h1 class="kinetic">维 修 工</h1>

tl.from('.kinetic span', {
  y: 180, opacity: 0, rotateX: -90,
  duration: 0.7, stagger: 0.06,
  ease: 'back.out(1.4)',
  transformOrigin: '50% 100%',
}, T);

html

<h1 class="kinetic">维 修 工</h1>

tl.from('.kinetic span', {
  y: 180, opacity: 0, rotateX: -90,
  duration: 0.7, stagger: 0.06,
  ease: 'back.out(1.4)',
  transformOrigin: '50% 100%',
}, T);

2. Camera Punch（推近 / 拉远）—— A3、D 类

2. Camera Punch (Push in / Pull out) — A3, Category D

tl.from(scene, { scale: 1.15, opacity: 0, duration: 0.5, ease: 'expo.out' }, sceneStart);

tl.from(scene, { scale: 1.15, opacity: 0, duration: 0.5, ease: 'expo.out' }, sceneStart);

3. Mask Reveal（clip-path 揭示）—— E 类 quote

3. Mask Reveal (clip-path reveal) — Category E quote

css

.reveal { clip-path: inset(0 100% 0 0); }

tl.to('.reveal', { clipPath: 'inset(0 0% 0 0)', duration: 0.9, ease: 'expo.inOut' }, T);

css

.reveal { clip-path: inset(0 100% 0 0); }

tl.to('.reveal', { clipPath: 'inset(0 0% 0 0)', duration: 0.9, ease: 'expo.inOut' }, T);

4. Number Ticker（数字滚动）—— D1

4. Number Ticker (Number scrolling) — D1

html

<div class="ticker" data-end="3600">0</div>

const ticker = document.querySelector('.ticker');
const obj = { val: 0 };
tl.to(obj, {
  val: parseInt(ticker.dataset.end),
  duration: 1.8, ease: 'power2.out',
  onUpdate: () => { ticker.textContent = Math.round(obj.val).toLocaleString(); },
}, T);

html

<div class="ticker" data-end="3600">0</div>

const ticker = document.querySelector('.ticker');
const obj = { val: 0 };
tl.to(obj, {
  val: parseInt(ticker.dataset.end),
  duration: 1.8, ease: 'power2.out',
  onUpdate: () => { ticker.textContent = Math.round(obj.val).toLocaleString(); },
}, T);

5. Outline → Fill（空心字变实心）—— A2

5. Outline → Fill (Hollow text to solid) — A2

css

.morph { -webkit-text-stroke: 4px #f5efe5; color: transparent; }

tl.to('.morph', { color: '#e87a3e', webkitTextStrokeColor: '#e87a3e', duration: 0.5, ease: 'power2.out' }, T);

css

.morph { -webkit-text-stroke: 4px #f5efe5; color: transparent; }

tl.to('.morph', { color: '#e87a3e', webkitTextStrokeColor: '#e87a3e', duration: 0.5, ease: 'power2.out' }, T);

6. Letter Highlight Sweep（关键词扫光）—— E 类 climax 词

6. Letter Highlight Sweep (Keyword sweep highlight) — Category E climax word

html

<span class="sweep"><span class="sweep-bg"></span>搭档</span>

css

.sweep { position: relative; display: inline-block; padding: 0 8px; }
.sweep-bg { position: absolute; inset: 0; background: #e87a3e; transform: scaleX(0); transform-origin: left; z-index: -1; }

tl.to('.sweep-bg', { scaleX: 1, duration: 0.5, ease: 'power3.inOut' }, T);
tl.to('.sweep', { color: '#0e0b08', duration: 0.1 }, T + 0.25);

html

<span class="sweep"><span class="sweep-bg"></span>搭档</span>

css

.sweep { position: relative; display: inline-block; padding: 0 8px; }
.sweep-bg { position: absolute; inset: 0; background: #e87a3e; transform: scaleX(0); transform-origin: left; z-index: -1; }

tl.to('.sweep-bg', { scaleX: 1, duration: 0.5, ease: 'power3.inOut' }, T);
tl.to('.sweep', { color: '#0e0b08', duration: 0.1 }, T + 0.25);

7. Background Color Punch（背景闪变）—— 全片 1-2 次

7. Background Color Punch (Background flash change) — 1-2 times per video

tl.to(scene, { backgroundColor: '#e87a3e', duration: 0.08 }, T)
  .to(scene, { backgroundColor: '#0e0b08', duration: 0.4, ease: 'power2.out' }, T + 0.1);

Strike-through 动画：用真实 DOM

<span class="strike-line">

而不是

::after

。伪元素 + CSS 变量在 hyperframes 某些渲染路径下不工作。

html

<span class="strike">领导<span class="strike-line"></span></span>

css

.strike-line { position: absolute; left: -10px; right: -10px; top: 56%; height: 10px; background: #e87a3e; transform: scaleX(0); transform-origin: left; }

tl.to('.strike .strike-line', { scaleX: 1, duration: 0.55, ease: 'power2.inOut' }, T);

tl.to(scene, { backgroundColor: '#e87a3e', duration: 0.08 }, T)
  .to(scene, { backgroundColor: '#0e0b08', duration: 0.4, ease: 'power2.out' }, T + 0.1);

Strike-through animation: Use real DOM

<span class="strike-line">

instead of

::after

. Pseudo-elements + CSS variables may fail in some hyperframes rendering paths.

html

<span class="strike">领导<span class="strike-line"></span></span>

css

.strike-line { position: absolute; left: -10px; right: -10px; top: 56%; height: 10px; background: #e87a3e; transform: scaleX(0); transform-origin: left; }

tl.to('.strike .strike-line', { scaleX: 1, duration: 0.55, ease: 'power2.inOut' }, T);

Step 6: 加 SFX

Step 6: Add SFX

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/synth-sfx.sh <article-folder>/video

生成

video/sfx/{tick,chime,bell}.mp3

：

```
tick.mp3
```
— 80ms 1.2kHz sine，转场用（每次 scene 切换前 0.3s）
```
chime.mp3
```
— 220ms 880+1320Hz 双音，对话/列表某项亮起时（可选）
```
bell.mp3
```
— 1.5s 低频钟，最后 climax 词出来时（全片最多 1 次）

接入 timeline：

html

<audio id="aud-narration" src="narration.mp3" data-start="0" data-duration="<total>" data-track-index="0" data-volume="1"></audio>

<audio id="aud-tick-s02" src="sfx/tick.mp3" data-start="<scene2.start - 0.3>" data-duration="0.1" data-track-index="2" data-volume="0.55"></audio>
<!-- 重复每个 scene 切换；T2/T4 flash 转场可不加 tick -->

<audio id="aud-chime-s08-1" src="sfx/chime.mp3" data-start="<T>" data-duration="0.3" data-track-index="3" data-volume="0.45"></audio>
<audio id="aud-bell-s12" src="sfx/bell.mp3" data-start="<climax-T>" data-duration="1.6" data-track-index="4" data-volume="0.55"></audio>

⚠️ 每个
<audio>
必须有
id
，否则 render 出 silent（hyperframes 强制要求）。

不同

track-index

不冲突，同 track 不能时间重叠。

SFX 用量节制：转场 tick 必须；chime / bell 是装饰，scene 内容简单时不加；bell 全片只 1 次。

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/synth-sfx.sh <article-folder>/video

Generates

video/sfx/{tick,chime,bell}.mp3

```
tick.mp3
```
— 80ms 1.2kHz sine, for transitions (0.3s before each scene switch)
```
chime.mp3
```
— 220ms 880+1320Hz dual-tone, used when dialogue/list items light up (optional)
```
bell.mp3
```
— 1.5s low-frequency bell, used when final climax word appears (max 1 time per video)

Integrate into timeline:

html

<audio id="aud-narration" src="narration.mp3" data-start="0" data-duration="<total>" data-track-index="0" data-volume="1"></audio>

<audio id="aud-tick-s02" src="sfx/tick.mp3" data-start="<scene2.start - 0.3>" data-duration="0.1" data-track-index="2" data-volume="0.55"></audio>
<!-- Repeat for each scene switch; no tick needed for T2/T4 flash transitions -->

<audio id="aud-chime-s08-1" src="sfx/chime.mp3" data-start="<T>" data-duration="0.3" data-track-index="3" data-volume="0.45"></audio>
<audio id="aud-bell-s12" src="sfx/bell.mp3" data-start="<climax-T>" data-duration="1.6" data-track-index="4" data-volume="0.55"></audio>

⚠️ Every
<audio>
must have an
id
— otherwise render will be silent (hyperframes mandatory requirement).

Different

track-index

values do not conflict; overlapping times on the same track are not allowed.

SFX Usage Discipline: Transition ticks are mandatory; chimes/bells are decorative, do not add when scene content is simple; bell can only be used once per video.

Step 7: Lint + Inspect + Render（必须按顺序）

Step 7: Lint + Inspect + Render (Must follow order)

bash

cd <article-folder>/video

bash

cd <article-folder>/video

必跑 1：linter（必须 0 errors）

Mandatory 1: Linter (must have 0 errors)

npx hyperframes lint

必跑 2：layout inspect 找溢出（必须 0 errors）

Mandatory 2: Layout inspection to find overflow (must have 0 errors)

npx hyperframes inspect --at 1,8,15,25,35,45,55,65

Recommended: Snapshot to check layout

npx hyperframes snapshot --at <t1>,<t2>,<t3> .

渲染（lint + inspect 都通过才能跑）

Render (only run after lint + inspect pass)

⚠️ 输出到上级目录，与 video/ 平行 —— 最终 MP4 不放 video/ 里

⚠️ Output to parent directory, parallel to video/ — final MP4 is not stored in video/

npx hyperframes render --quality standard --fps 30 --output ../<slug>.mp4


**为什么 inspect 必跑**：竖屏 1080 宽很窄，3-4 字 hero 在 280-400px 字号下就接近溢出。每次必须 inspect，**0 errors 才能 render**。

**fix overflow**：
- 字号缩小（inspect 给具体建议）
- 长 hero 分行（"没法积累" → 两行 "没法" / "积累"）
- `white-space: nowrap` 只在确认字数 × 字号 < 屏宽时
- 若 `.em` 在 `reveal-wrap` 内溢出 → 加 `line-height: 1` 到 `.em`

**渲染质量**：
- `--quality draft` ~30s 渲染 — 迭代用
- `--quality standard` ~1.5min — 默认，发布够用
- `--quality high` ~3min — 投大屏 / 商务

npx hyperframes render --quality standard --fps 30 --output ../<slug>.mp4


**Why inspection is mandatory**: The 1080px wide vertical screen is narrow, and 3-4 character hero text at 280-400px font size is close to overflow. Must inspect every time, **only render when 0 errors**.

**Fix overflow**:
- Reduce font size (inspect gives specific suggestions)
- Wrap long hero text ("没法积累" → two lines "没法" / "积累")
- Only use `white-space: nowrap` when confirming (number of characters × font size) < screen width
- If `.em` overflows inside `reveal-wrap` → add `line-height: 1` to `.em`

**Render Quality**:
- `--quality draft` ~30s rendering — for iteration
- `--quality standard` ~1.5min — default, sufficient for publishing
- `--quality high` ~3min — for large screens / business use

Step 8: 预览

Step 8: Preview

输出：

<article-folder>/<slug>.mp4

（与
video/
平行，不在

video/

内 ——

video/

留给中间文件）。

open <article-folder>/<slug>.mp4

给用户预览。不要自动上传到视频号（用户可能想先剪/调）。

Output:

<article-folder>/<slug>.mp4

(parallel to
video/
, not inside

video/

—

video/

is for intermediate files).

Use

open <article-folder>/<slug>.mp4

to let user preview. Do not auto-upload to WeChat Channels (user may want to edit/adjust first).

Step 9: 发布到 YouTube（自动 cron，不在 render 流程内）

Step 9: Publish to YouTube (Auto cron, not part of render workflow)

新视频 render 完成后不立即上传 —— YouTube 有 daily quota 限制（默认 6 个/天 @ 1600 配额点/上传），渲染多了会卡 quota。

做法：cron 每天 10:00 自动跑

daily-upload-batch.sh

，挑最多 5 个还没上传过的 MP4（按文章日期升序），上传后写

.youtube.json

记录。

cron 已注册（一次性，不用重复跑）：

0 10 * * * /Users/jianshuo/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh

手动触发（不要在 wjs-converting-text-to-video 流程里跑 — 让 cron 处理）：

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh

Do not upload immediately after new video rendering — YouTube has daily quota limits (default 6 videos/day @ 1600 quota points/upload), rendering multiple videos will cause quota blocking.

Method: Cron runs

daily-upload-batch.sh

automatically at 10:00 every day, selects up to 5 MP4s that haven't been uploaded yet (sorted by article date ascending), and writes

.youtube.json

after upload.

Cron is already registered (one-time setup, no need to run again):

0 10 * * * /Users/jianshuo/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh

Manual trigger (do not run in wjs-converting-text-to-video workflow — let cron handle it):

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh

或单个文章立即上传

Or upload single article immediately

~/.claude/skills/wjs-converting-text-to-video/scripts/publish-to-youtube.py <article-folder>


每个上传的脚本行为：
1. 检测 MP4 portrait/landscape → portrait 标题加 `#shorts`、landscape 普通 video
2. title 从 article.md H1 / description 从前几段
3. 检查 `<article-folder>/.youtube.json`：存在 → 尝试删老再传新（需 `youtube.force-ssl` scope，当前 token 没这个 scope → 跳过 delete + 上传新）
4. 写 `.youtube.json` 记录

详见 memory: [[auto-publish-youtube]]

~/.claude/skills/wjs-converting-text-to-video/scripts/publish-to-youtube.py <article-folder>


Script behavior for each upload:
1. Detect MP4 portrait/landscape → add `#shorts` to title for portrait, regular video for landscape
2. Title from article.md H1 / description from first few paragraphs
3. Check `<article-folder>/.youtube.json`: if exists → try to delete old video and upload new one (requires `youtube.force-ssl` scope, current token does not have this scope → skip delete + upload new)
4. Write record to `.youtube.json`

See memory: [[auto-publish-youtube]]

目录结构

Directory Structure

<article-folder>/
├── article.md
├── illustration.png            # 用户原始示意图，不直接用作 bg
├── <slug>.mp4                  # ⭐ 最终视频（与 video/ 平行，不放 video/ 里）
└── video/                      # 所有中间产物
    ├── narration_chunks.json   # 5-10 个 scene 的旁白文本
    ├── tts_narration.py        # bootstrap 复制进来
    ├── narration.mp3           # 合并的全段 TTS
    ├── narration/              # 单段 mp3 (s01..sN)
    ├── timing.json             # 每段 start/end/duration
    ├── bg.png                  # GPT Image 2 生成的水彩背景
    ├── sfx/{tick,chime,bell}.mp3
    ├── index.html              # HyperFrames composition
    ├── hyperframes.json
    ├── meta.json
    ├── package.json
    └── snapshots/              # 渲染前快照

<article-folder>/
├── article.md
├── illustration.png            # User's original schematic, not directly used as bg
├── <slug>.mp4                  # ⭐ Final video (parallel to video/, not stored in video/)
└── video/                      # All intermediate products
    ├── narration_chunks.json   # Narration text for 5-10 scenes
    ├── tts_narration.py        # Copied during bootstrap
    ├── narration.mp3           # Merged full TTS track
    ├── narration/              # Individual segment mp3 (s01..sN)
    ├── timing.json             # Start/end/duration of each segment
    ├── bg.png                  # Abstract watercolor background generated by GPT Image 2
    ├── sfx/{tick,chime,bell}.mp3
    ├── index.html              # HyperFrames composition
    ├── hyperframes.json
    ├── meta.json
    ├── package.json
    └── snapshots/              # Pre-render snapshots

Skill 自身文件

Skill Own Files

~/.claude/skills/wjs-converting-text-to-video/
├── SKILL.md
└── scripts/
    ├── bootstrap-project.sh        # init video/ 目录 + 复制 helper + 生成 sfx
    ├── generate-bg.sh              # 调 GPT Image 2 生成抽象水彩 bg.png
    ├── tts.py                      # Volcano TTS narration 生成
    ├── synth-sfx.sh                # tick/chime/bell 合成 (ffmpeg)
    ├── retrofit-bg-image.py        # 给已有视频补 bg-image 层
    ├── strip-dark-scene-bgs.py     # 剥离 scene-level 暗色 bg，让 bg-image 透出
    └── publish-to-youtube.py       # 自动上传 MP4 到 YouTube（portrait→Shorts），可替换已有上传

~/.claude/skills/wjs-converting-text-to-video/
├── SKILL.md
└── scripts/
    ├── bootstrap-project.sh        # Initialize video/ directory + copy helpers + generate sfx
    ├── generate-bg.sh              # Call GPT Image 2 to generate abstract watercolor bg.png
    ├── tts.py                      # Generate Volcano TTS narration
    ├── synth-sfx.sh                # Synthesize tick/chime/bell (ffmpeg)
    ├── retrofit-bg-image.py        # Add bg-image layer to existing videos
    ├── strip-dark-scene-bgs.py     # Remove scene-level dark backgrounds to let bg-image show through
    └── publish-to-youtube.py       # Auto-upload MP4 to YouTube (portrait→Shorts), can replace existing uploads

Anti-Patterns

反单调（最重要 — "平铺直叙"的根源）

Anti-Monotony (Most important — root cause of "flat narration")

不要	原因
所有 scene 都用 B1 双行 strikethrough	历史最大失败模式。B1 整片最多 2 次
所有 scene 居中布局	死气沉沉。≥2 非居中
所有 scene 字号差不多	跨度必须 ≥240px
所有 scene 时长 5-7s	跨度必须 ≥6s
整片只用 blur crossfade	每 4 个转场 ≥2 种
整片没有 color-flip	≥1 个 A3 是硬要求
整片没有几何元素	≥1 个 scene 加粗线 / 大编号 / 引号
整片只用 `tl.from({y, opacity})`	≥3 种 Modern Motion Techniques
每个 scene 都堆满	≥1 个 scene 留白 ≥60%
给每个 scene 都加 `background:` 色	盖住 bg-image，等于白生成水彩。普通 scene 不写 bg；只有 A3 color-flip 用纯色
color-flip / emphasis 永远只用橙	至少 2-3 种 accent
用灰色作 secondary text / strike / 装饰	水彩底上灰色对比度太低，会消失。改用 `#f5efe5` cream + opacity 弱化（详见 [[no-low-contrast-text]]）

Do NOT	Reason
Use B1 two-line strikethrough for all scenes	The biggest failure pattern in history. Max 2 B1 scenes per video
Center all scenes	Lifeless. ≥2 non-centered scenes required
Use similar font sizes for all scenes	Font size span must be ≥240px
Make all scenes 5-7s long	Duration span must be ≥6s
Use only blur crossfade for all transitions	At least 2 types per 4 transitions
No color-flip scenes in the video	≥1 A3 scene is mandatory
No geometric elements in the video	≥1 scene must have thick lines / large numbering / quotation marks
Only use `tl.from({y, opacity})` animations	At least 3 Modern Motion Techniques required
Fill every scene with content	≥1 scene must have ≥60% blank space
Add `background:` color to every scene	Covers bg-image, making watercolor generation useless. Do not set bg for regular scenes; only use solid color bg for A3 color-flip scenes
Always use orange for color-flip / emphasis	At least 2-3 different accent colors required
Use gray for secondary text / strike-through / decoration	Gray has too low contrast on watercolor background and will disappear. Use `#f5efe5` cream + opacity to weaken instead (see [[no-low-contrast-text]])

内容 / 工程

Content / Engineering

不要	原因
用 Kokoro 做中文 TTS	中文质量差，用户明确不接受
Volcano TTS 传 `emotion` 参数	`_bigtts` 声音返回 `data: null` 静默失败
用 `zh_male_jieshuonansheng_mars_bigtts`	含英文专名时循环 hallucinate
用 serif 字体（Songti / 宋体 / Noto Serif）	不够冲击
把整段文章贴屏	那是 PPT。视频每屏一个视觉时刻
超过 10 scene / 超过 90 秒	注意力放不下
短文硬填到 90 秒	文章短就做 30-50s，硬撑长会注水变浅
每个 scene 换字体配色风格	风格漂移。design system 固定，模板变化
`::after` 伪元素 + CSS 变量做 strike	hyperframes 渲染路径下失效。用真实 DOM `<span class="strike-line">`
最后 scene 之外用 `gsap.to({opacity: 0})`	退场动画 hyperframes 禁止 — 转场才是退场
每段 chunk 都加 chime	太吵
用 `../illustration.png` 作 bg url	hyperframes render 不解析跨目录路径，渲染成纯黑。bg.png 必须在 `video/` 内
`<audio>` 没 `id`	render 会 silent。每个 `<audio>` 必须 `id="..."`
s1 是 A3 color-flip	第一帧看不到 bg-image。color-flip 放 s2+
s1 标题元素都 `from({opacity: 0})`	第一帧黑屏。s1 主元素 `opacity: 1` 默认，只动 y/scale

Do NOT	Reason
Use Kokoro for Chinese TTS	Poor Chinese quality, users explicitly reject it
Pass `emotion` parameter to Volcano TTS	`_bigtts` voices will return `data: null` and fail silently
Use `zh_male_jieshuonansheng_mars_bigtts`	Will loop hallucinate when containing English proper nouns
Use serif fonts (Songti / SimSun / Noto Serif)	Not impactful enough
Paste entire article on screen	That's PPT. Video should have one visual moment per screen
Use more than 10 scenes / exceed 90 seconds	Cannot hold audience attention
Force short articles to 90 seconds	Short articles should be 30-50s; forcing length will make content shallow
Change font/color style for each scene	Style drift. Keep design system fixed, only change templates
Use `::after` pseudo-element + CSS variables for strike-through	Fails in hyperframes rendering paths. Use real DOM `<span class="strike-line">`
Use `gsap.to({opacity: 0})` for scenes other than the last one	Exit animations are prohibited by hyperframes — transitions handle exit
Add chime to every segment	Too noisy
Use `../illustration.png` as bg url	Hyperframes render does not resolve cross-directory paths, will render pure black. bg.png must be inside `video/`
Omit `id` from `<audio>`	Render will be silent. Every `<audio>` must have `id="..."`
Make s1 an A3 color-flip scene	First frame cannot see bg-image. Put color-flip scenes in s2+
Use `from({opacity: 0})` for all s1 title elements	First frame will be black screen. s1 main elements should have default `opacity: 1` , only animate y/scale

Common Pitfalls

narration 写「——」破折号 → TTS 念出 "破折号"。删掉用句号或逗号
某段 chunk 异常长（>3 chars/s） → Volcano hallucinate 循环。换声音，或拆短
scene 时长 < narration 时长 → 旁白被下一个 scene 切掉。scene 必须覆盖整段 narration + 0.3s 缓冲
黑底大字 opacity: 0 时仍可见 → 检查
```
.scene
```
是否有
```
opacity: 0
```
默认（除了 s1）
.em
在
.reveal-wrap
里少量溢出（top/bottom 几 px） → 给
```
.em
```
加
```
line-height: 1
```
snapshot 字形和 render 不一致 → 现在都用 Noto Sans SC，正常一致

Write em dash
——
in narration → TTS will read "em dash" aloud. Replace with period or comma
某段 chunk 异常长（>3 chars/s） → Volcano will hallucinate and loop. Switch voice or split into shorter segments
Scene duration < narration duration → Voiceover will be cut off by next scene. Scene must cover entire narration + 0.3s buffer
Black background with large text still visible when opacity: 0 → Check if
```
.scene
```
has default
```
opacity: 0
```
(except s1)
.em
slightly overflows (a few px top/bottom) inside
.reveal-wrap
→ Add
```
line-height: 1
```
to
```
.em
```
Snapshot glyphs differ from render → Now using Noto Sans SC exclusively, should be consistent

Dependencies

HyperFrames CLI (
```
npx hyperframes
```
) — composition lint / inspect / snapshot / render

GPT Image 2 (

~/.claude/skills/gpt-image-2-skill/

) — 生成 bg.png；

--provider codex

用 ChatGPT auth

Volcano TTS —

VOLC_TTS_APPID

VOLC_TTS_ACCESS_TOKEN

在

~/code/.env

ffmpeg — SFX 合成、audio concat、aspect-ratio 检测

YouTube uploader (

~/.claude/skills/wjs-uploading-video/

) + OAuth token at

~/.config/youtube/token.json

—— Step 9 自动发布

HyperFrames CLI (
```
npx hyperframes
```
) — composition lint / inspect / snapshot / render
GPT Image 2 (
```
~/.claude/skills/gpt-image-2-skill/
```
) — generate bg.png; use
```
--provider codex
```
for ChatGPT auth

Volcano TTS —

VOLC_TTS_APPID

VOLC_TTS_ACCESS_TOKEN

~/code/.env

ffmpeg — SFX synthesis, audio concat, aspect-ratio detection

YouTube uploader (

~/.claude/skills/wjs-uploading-video/

) + OAuth token at

~/.config/youtube/token.json

— Step 9 auto-publishing

wjs-converting-text-to-video

Original

Translation

wjs-converting-text-to-video

wjs-converting-text-to-video

What this skill produces

What this skill produces

When this skill fires

When this skill fires

When NOT to use

When NOT to use

Core Principle

Core Principle

Workflow

Workflow

Step 1: 设计 5-10 个视觉时刻

Step 1: Design 5-10 visual moments

A. Hero / Punch（强对比 climax，每片 ≥1，时长 ≤4s）

A. Hero / Punch (High-contrast climax, ≥1 per video, duration ≤4s)

B. Contrast / 对照（反差结构，每片 1-2 个，时长 5-8s）

B. Contrast / Comparison (Contrast structure, 1-2 per video, duration 5-8s)

C. List / 结构（多项并列，每片 1-2 个，时长 6-10s）

C. List / Structure (Parallel items, 1-2 per video, duration 6-10s)

D. Stat / 数据（数字 climax，每片 ≥1，时长 4-6s）

D. Stat / Data (Number climax, ≥1 per video, duration 4-6s)

E. Quote / Climax（金句落点，每片 1-2 个，时长 6-10s）

E. Quote / Climax (Key quote conclusion, 1-2 per video, duration 6-10s)

F. 装饰 / 几何（节奏调味，可选）

F. Decoration / Geometry (Rhythm seasoning, optional)

Step 1b: Scene Mix Rule（强制）

Step 1b: Scene Mix Rule (Mandatory)

配比硬规则

Ratio Hard Rules

节奏硬规则

Rhythm Hard Rules

布局硬规则

Layout Hard Rules

配色硬规则

Color Hard Rules

反单调自检

Anti-Monotony Self-Check

Step 2: 写 narration_chunks.json

Step 2: Write narration_chunks.json

Step 3: 生成 TTS narration

Step 3: Generate TTS narration

Step 4: 生成水彩背景图

Step 4: Generate watercolor background image

Step 5: 写 HyperFrames composition (index.html)

Step 5: Write HyperFrames composition (index.html)

🎬 第一帧规则（硬性）

🎬 First Frame Rule (Mandatory)

色彩系统

Color System

字体系统（竖屏 1080 宽）

Font System (1080px wide vertical screen)

布局系统（反居中惯性）

Layout System (Anti-centering inertia)

几何装饰元素

Geometric Decorative Elements

Scene 转场（4 种 + 混用规则）

Scene Transitions (4 types + mixing rules)

入场动画规则

Entrance Animation Rules

Modern Motion Techniques

Modern Motion Techniques

1. Kinetic Typography（字符 stagger 入场）—— A 类 hero

1. Kinetic Typography (Character stagger entrance) — Category A hero

2. Camera Punch（推近 / 拉远）—— A3、D 类

2. Camera Punch (Push in / Pull out) — A3, Category D

3. Mask Reveal（clip-path 揭示）—— E 类 quote

3. Mask Reveal (clip-path reveal) — Category E quote

4. Number Ticker（数字滚动）—— D1

4. Number Ticker (Number scrolling) — D1

5. Outline → Fill（空心字变实心）—— A2

5. Outline → Fill (Hollow text to solid) — A2

6. Letter Highlight Sweep（关键词扫光）—— E 类 climax 词

6. Letter Highlight Sweep (Keyword sweep highlight) — Category E climax word

7. Background Color Punch（背景闪变）—— 全片 1-2 次

7. Background Color Punch (Background flash change) — 1-2 times per video

Step 6: 加 SFX

Step 2: 写
`narration_chunks.json`

Step 2: Write
`narration_chunks.json`

Step 5: 写 HyperFrames composition (
`index.html`
)

Step 5: Write HyperFrames composition (
`index.html`
)