wjs-converting-text-to-video

Convert a Wang Jianshuo-style WeChat Official Account

article.md

into a 1080×1920 vertical, 30-90 second Chinese narrated short video: TTS voiceover + HyperFrames CSS/GSAP animations + abstract watercolor backgrounds + transition SFX. Output MP4 for WeChat Channels / Douyin / Xiaohongshu / Reels.

What this skill produces

Dimension	Default
Resolution	1080×1920 vertical (9:16)
Duration	30-90 seconds
Number of Scenes	5-10
Voiceover	Volcano Engine Volcano TTS, default "Ahu Conversation" male voice
Background	Abstract watercolor generated by GPT Image 2 ( `bg.png` ) + blur 30 + warm black semi-transparent overlay
Font	Noto Sans SC, hero weight 900, main text warm cream white
Output	`<article-folder>/<slug>.mp4` (parallel to `video/` , not stored in `video/` )
Publishing	Auto-upload to YouTube — Portrait → Shorts, Landscape → regular video; re-rendering replaces old video (no accumulation)

When this skill fires

The user already has
```
article.md
```
and says: 「做成视频」「做一个解说」「讲一遍」

The user runs

/wjs-converting-text-to-video <article-folder>

The user requests batch conversion like "Turn all X articles I posted yesterday into videos"

When NOT to use

No article draft, only an idea → First use
```
/wjs-publishing-wechat
```
to write article.md, then proceed
User needs subtitle burning / translation / voiceover replacement → Use
```
/wjs-burning-subtitles
```
/
```
/wjs-dubbing-video
```
/
```
/wjs-localizing-video
```
Video requires non-Chinese languages like English/Spanish → This skill focuses on Chinese TTS (Volcano Engine); use hyperframes' built-in tts command for non-Chinese (Kokoro works well for English)
Landscape 16:9 format → This skill defaults to vertical; only change to landscape if explicitly requested by user

Core Principle

Video is not a visual reading of the article, but a visual reconstruction of it.

Each scene is an independent visual moment — a contrast, a parallelism, a number, a metaphor. Text fills the screen, bolded, with key words highlighted in orange. The background is abstract watercolor (softened with blur), with an overall tone that is steady, restrained, and impactful.

Rhythm > Templates. A video with 5-10 scenes that uses the same "two-line comparison" layout throughout is not a video, it's a slideshow. Modernity comes from contrast — extreme font size differences, asymmetric layouts, alternating short and long scenes, alternating text-only and geometric-element scenes, alternating watercolor-background and bright punch scenes.

Default is mediocre. If you just pick the easiest templates from the top of the list, the result will definitely be a "flat two-line format". Mandatorily follow the Step 1b Scene Mix Rule ratio.

Workflow

Step 1: Design 5-10 visual moments

Read

<article-folder>/article.md

, split it into 5-10 scenes according to the argument structure (control total duration to 30-90 seconds). Short articles (1-2 core points) use 5-6 scenes / 30-50s; long articles use 8-10 scenes / 60-90s. Each scene includes a narration segment + a clear visual framework.

Template Library — 6 categories, 16 templates total, mix as needed:

A. Hero / Punch (High-contrast climax, ≥1 per video, duration ≤4s)

Template	Suitable for
A1. Full-screen single-character hero	1-3 climax words filling the screen, font size 280-400px
A2. Outline hero	Hollow text with `-webkit-text-stroke: 4px #f5efe5; color: transparent;`
A3. Color-flip punch	Full-screen background changes to bright color (orange/red/gold/green etc.), with reversed text color
A4. Gradient text hero	Large text with `background: linear-gradient(...); -webkit-background-clip: text;`

B. Contrast / Comparison (Contrast structure, 1-2 per video, duration 5-8s)

Template	Suitable for
B1. Two-line comparison + strikethrough	"Previously X, now Y" / "Not A, but B" — max 2 per video
B2. Split-screen left-right comparison	Screen divided into two halves (can add vertical separator line)
B3. Diagonal comparison	Top-left ↔ Bottom-right, with large blank space in the middle

C. List / Structure (Parallel items, 1-2 per video, duration 6-10s)

Template	Suitable for
C1. Horizontal row of N cards	3-5 parallel items, using dark warm black + monochrome border
C2. Vertically stacked keywords	6-8 parallel items, can add large numbering 01-08
C3. True grid	2×2 / 3×2 grid, each cell with icon + label (vertical screen width is limited, 4 columns will be crowded)
C4. Stepped / staggered list	Each item has increasing `margin-left`

D. Stat / Data (Number climax, ≥1 per video, duration 4-6s)

Template	Suitable for
D1. Number ticker	0 → N scrolling animation ( `gsap.to({textContent})` )
D2. Number + label	Main number 200-400px + 60-80px explanation
D3. Progress bar / timeline	Horizontal progress bar + nodes

E. Quote / Climax (Key quote conclusion, 1-2 per video, duration 6-10s)

Template	Suitable for
E1. Paragraph-level hero text	A 60-100px key quote, left-aligned + left emphasis bar
E2. Large quotation marks + content	Huge semi-transparent opening quotation marks as background decoration

F. Decoration / Geometry (Rhythm seasoning, optional)

Template	Suitable for
F1. Grid + spinner / progress bar	Multi-concurrent visuals
F2. Dialogue bubble ↔ Response	Character A speaks → Character B acts

Each scene's narration should be 3-12 seconds (short punch scenes 3-4s, long breathing scenes 10-12s, don't make all scenes 5-7s). Total duration of all scenes should be 30-90 seconds, no more than 90 seconds. Short articles should be made short — 5 scenes × 6s = 30s is acceptable.

Step 1b: Scene Mix Rule (Mandatory)

After designing 5-10 scenes, self-check using the following checklist. If any item is not met → go back and adjust.

Ratio Hard Rules

≥1 scene from Category A / D / C / E
≤2 scenes using B1 template (two-line strikethrough — the most overused template in history)
≥1 A3 color-flip scene (bright background with reversed text)
≥4 different template categories (at least 4 from A/B/C/D/E/F)
≤2 consecutive scenes using the same category

Rhythm Hard Rules

Scene duration span ≥ 6s (shortest ≤ 4s, longest ≥ 9s)
≥2 rhythm switches like "short → long → short" or "long → short"
Font size span ≥ 240px (largest hero ≥ 320px, smallest ≤ 80px)

Layout Hard Rules

≥2 scenes with non-centered layout (corner-aligned, diagonal, left-aligned, stepped etc.)
≥1 scene with blank space occupying ≥ 60% of the screen (breathing space)
≥1 scene containing geometric decorations (thick lines, color blocks, arrows, dots, large numbering)

Color Hard Rules

Most scenes do not have
background:
color — let the watercolor bg-image show through; only use solid color bg for A3 color-flip scenes
Color-flip scene colors are not limited to orange/blue/white (deep red / deep gold / emerald green / pine green / dark purple etc. are all acceptable)
At least 2-3 different emphasis colors (technical terms in blue, value terms in gold, growth terms in green, warning terms in red)

Anti-Monotony Self-Check

Screenshot all scenes, shrink to thumbnails and arrange side by side — can you tell them apart at a glance? If 8 look the same → redo
Are the visual densities of scenes 1,4,7 different? Some should be dense, some extremely minimal
Is there a "meta-rhythm"? For example: A opening → 3 B/C expansion scenes → D climax → E conclusion — more dramatic than linear layout

Step 2: Write

narration_chunks.json

json

[
  {"id": "s01", "text": "我们以前，是 AI 的领导。现在，我们就是它的维修工。"},
  {"id": "s02", "text": "..."}
]

Narration Writing Details:

More colloquial and concise than article.md, use more commas/periods to allow natural pauses in TTS
Mixed numbers/English is OK ("Claude Code", "100 倍"), Volcano TTS can read them correctly
Do not write parenthetical comments,
```
...
```
, or em dashes
```
——
```
(TTS will read "em dash" aloud)
Remove
```
**bold markdown**
```
from article.md, leave only plain text
Remove Baixing.com-related facts: If article.md contains "百姓网", "百姓网 now has X people", "百姓网 employees" etc., strip or generalize them ("百姓网 now has 158 people" → "There are very few real people in reality"). This is outdated information and should not be included in the video. Similarly, do not include "百姓网" labels or "158 people" stats in visuals. See [[no-baixing-facts]]

Step 3: Generate TTS narration

bash

cd <article-folder>/video
python3 tts_narration.py

The script defaults to

zh_male_ahu_conversation_wvae_bigtts

(Ahu Conversation) — inserts 0.35s silence between segments, outputs

narration.mp3

timing.json

Volcano TTS Notes (Lessons Learned):

Use resource
```
volc.service_type.10029
```
, select speakers with
```
zh_*_*_bigtts
```
Never pass
emotion
/
emotion_scale
parameters — most
```
_bigtts
```
voices will return
```
data: null
```
and fail silently
Never use Kokoro (hyperframes' built-in tts) — Chinese quality is poor, users explicitly reject it
Avoid
```
zh_male_jieshuonansheng_mars_bigtts
```
— will loop hallucinate when containing English proper nouns (e.g., "Claude Code")

Alternative Voices (in recommended order):

```
zh_male_ahu_conversation_wvae_bigtts
```
(Ahu Conversation) — default, natural colloquial
```
zh_male_M392_conversation_wvae_bigtts
```
— same wvae series
```
zh_male_wennuanahu_moon_bigtts
```
(Warm Ahu) — warmer, broadcast-style
```
zh_male_silang_mars_bigtts
```
(Silang) — calm, thoughtful, dramatic
```
zh_male_baqiqingshu_mars_bigtts
```
(Domineering) — more powerful

Switch voices:

python3 tts_narration.py --voice zh_male_silang_mars_bigtts

Step 4: Generate watercolor background image

The bg-image is the main visual tone (softened abstract watercolor). Do not use the article's
illustration.png
— hand-drawn schematics have too many details, and become uniform dark mud after blur (visually still pure black). Must use specially generated abstract watercolor.

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme>

Choose

<theme>

(based on article topic):

theme	Color Palette	Suitable for
`personal`	bright warm yellow, soft coral pink, terracotta, sage green, cream	Personal, handcrafted, warm topics
`tech`	cool teal, electric blue, deep purple, mint, white	AI, technology, data topics
`reflection`	sage green, dusty blue, lavender, pearl, cream	Reflection, calm topics
`warning`	burnt orange, deep red, mustard, charcoal	Warning, tension topics
`growth`	fresh green, gold, soft yellow, sky blue	Growth, compound interest topics
`abstract`	lavender, dusty rose, sage, soft amber	Abstract, philosophical topics

Output:

<article-folder>/video/bg.png

(1088×1920, ~3MB).

⚠️ The image must be in the
video/
directory — cannot use

../illustration.png

, hyperframes render does not resolve cross-directory relative paths and will render pure black.

Step 5: Write HyperFrames composition (

index.html

)

Read

timing.json

, design scenes according to each chunk's start/end times. 1080×1920 vertical screen structure:

html

<html><head><script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
  html, body {
    width: 1080px; height: 1920px; margin: 0; overflow: hidden;
    background: #0e0b08;
    font-family: 'Noto Sans SC', 'PingFang SC', 'Heiti SC', sans-serif;
    font-weight: 900;
    color: #f5efe5;
    letter-spacing: -0.02em;
    -webkit-font-smoothing: antialiased;
  }
  #bg-image {
    position: absolute; inset: 0;
    background-image: url('bg.png');
    background-size: cover;
    background-position: center;
    filter: blur(30px) brightness(0.65) saturate(0.85);
    z-index: 0;
    transform: scale(1.1);
  }
  #bg-overlay {
    position: absolute; inset: 0;
    background: rgba(14, 11, 8, 0.28);
    z-index: 1;
  }
  .scene { position: absolute; inset: 0; overflow: hidden; opacity: 0; z-index: 2; }
  #s1 { opacity: 1; }
  /* ... scene-specific styles ... */
</style></head>
<body>
  <div id="root" data-composition-id="main" data-start="0" data-duration="<total+2>" data-width="1080" data-height="1920">
    <div id="bg-image"></div>
    <div id="bg-overlay"></div>
    <!-- scene divs s1..sN -->
    <!-- audio: narration + ticks + chimes + bell -->
  </div>
  <script>
    /* GSAP timeline: paused + register to window.__timelines['main'] */
  </script>
</body></html>

🎬 First Frame Rule (Mandatory)

At video t=0, it must include:

Full visibility of bg-image — always opacity 1, never fade-in (visible by default in CSS, do not change its opacity in GSAP)
Visible title element — s1's main title element can use
```
tl.from({y:30, scale:0.95})
```
, but do not use
```
tl.from({opacity:0})
```
, otherwise t=0 will be black screen
s1 cannot be A3 color-flip — otherwise it will cover the bg-image, and the watercolor will not be visible in the first frame. Save color-flip for s2+

Color System

Main Text / Anchor Colors (design system, consistent throughout the video):

Role	Value	Usage
Main Text	`#f5efe5` warm cream white	Hero text / main content
Secondary Text (subtitle, caption)	`#f5efe5` + `opacity: 0.7` + smaller font size	Do not use gray ( `#8a7e72` is unreadable on watercolor background). Use opacity + font size to create hierarchy
Strikethrough text itself	`#f5efe5` + `opacity: 0.5` + strikethrough line	Do not use `#6d635a` dark gray — unreadable on watercolor background. Use opacity to weaken + orange strike line instead
Decorative large numbering (01-08)	`#f5efe5` + `opacity: 0.18` or `#e87a3e` + `opacity: 0.35`	Do not use `#2b2620` or other dark grays — completely disappears on watercolor background
Outline stroke	`#f5efe5` 4-8px stroke + `color: transparent`	A2 hollow text
Default fallback bg	`#0e0b08` dark warm black	Covered by bg-image + overlay; not used for color-flip

Core Principle: All text uses
#f5efe5
cream or
#e87a3e
orange series (accent palette), use opacity + size for hierarchy, do not use hue changes. Gray is a relic of the black background era, never use it on watercolor backgrounds. See [[no-low-contrast-text]]

Color-flip Background Palette (A3, not limited to orange/blue/white):

hex	Suitable for
`#e87a3e` classic orange	Warning, emphasis, climax punch
`#6b9bc4` bright blue	Data, technology climax
`#f5efe5` warm cream white	Conclusion, quiet contrast
`#c45c3e` deep red	Warning, error climax
`#d4a040` deep gold	Achievement, value climax
`#5a8c6a` emerald green	Growth, compound interest, vitality
`#4a8a8a` pine green	Calm, long-termism
`#7a5a8a` dark purple	Wisdom, mysterious climax
`#c48a8a` dark pink	Soft, humanistic topics

Text on color-flip backgrounds uses

#0e0b08

#f5efe5

reversed colors.

Emphasis / Accent Palette (not limited to orange):

hex	Suitable for
`#e87a3e` orange	Default emphasis
`#6b9bc4` blue	Data, technical terms, AI
`#d4a040` gold	Value, achievement
`#5a8c6a` emerald green	Growth, positive results
`#4a8a8a` pine green	Long-term, stable
`#c45c3e` deep red	Warning, contrast
`#8a7aaa` dark purple	Abstract, wisdom
`#c48a8a` dark pink	Soft, humanistic

Use at least 2-3 different emphasis colors throughout the video, choose accents based on scene themes.

Font System (1080px wide vertical screen)

Item	Value
Font Weight	hero 900 / main text 800 / secondary 600-700 / caption 500
Letter Spacing	hero `-0.04em` to `-0.06em` / main text `-0.02em` / caption `0`
Punch hero (A1/A2, 1-3 characters)	280-400px
Short sentence hero (4-6 characters)	160-240px
Long sentence hero (7-10 characters)	100-150px
Card content	56-130px
Subtitle	40-72px
Caption / numbering / label	20-40px

Layout System (Anti-centering inertia)

Layout	CSS Key Points	Suitable for
Centered	`flex; center; center;`	Category A hero scenes, but ≤50% of total scenes
Left-aligned top	`padding: 80px 80px 0 80px;`	Category E key quotes, long quotes
Bottom-right anchored	`position: absolute; right: 80px; bottom: 80px;`	Signature, climax words
Diagonal	top-left / bottom-right	B3 diagonal comparison
Grid	`display: grid; grid-template-columns: repeat(2, 1fr);`	C3 (2×N instead of 3×N for vertical screen)
Stepped	Each item has `margin-left: calc(60px * var(--i));`	C4 staggered list
Bottom-aligned + top blank space	`position: absolute; bottom: 60px;` with blank space above	Breathing scenes
Corner small element	Small text anchored to one corner, rest blank	Minimalist / blank punch scenes

Padding: 40-80px for full-screen scenes, 120-200px for breathing scenes. Do not use the same padding for all scenes.

Geometric Decorative Elements

Use one every few scenes:

Thick short line 8-16px × 40-200px, emphasis bar, orange
Left emphasis bar 6px × 100%, paired with long quotes
Large numbering 01-08, list item numbering (light gray, huge, decorative)
Large quotation mark character
```
"
```
semi-transparent, placed top-left
Horizontal separator line 2-4px cream white with 30% transparency
Dot / square 12-20px, orange, list bullet
Arrow ➜ or custom SVG

Scene Transitions (4 types + mixing rules)

Do not use blur crossfade for all transitions. For every 4 transitions, use at least 2 different types.

T1. Blur crossfade (default soft)

0.6s,
```
sine.inOut
```

Next scene transitions from

opacity: 0, filter: blur(24px)

→

opacity: 1, filter: blur(0)

Previous scene fades out + blurs simultaneously

T2. White flash cut (punch cut, most modern)

Total 0.18s: 60ms white flash → cut → 40ms new scene scale 1.05 → 1
Suitable for: entering Category A hero, Category D stat, climax transitions

tl.to('.flash', { opacity: 1, duration: 0.06, ease: 'none' }, T - 0.06)
  .set(prevScene, { opacity: 0 }, T)
  .set(nextScene, { opacity: 1 }, T)
  .to('.flash', { opacity: 0, duration: 0.12, ease: 'power2.out' }, T)
  .from(nextScene, { scale: 1.05, duration: 0.25, ease: 'expo.out' }, T);

T3. Scale push (sense of advancement)

0.55s, previous scene
```
scale: 1 → 0.85
```
, next scene
```
scale: 1.15 → 1
```
Suitable for: pushing from overview to details

T4. Color flash cut (orange/blue flash, strong rhythm)

Total 0.22s: 80ms full-screen orange → cut → 40ms fade out
Suitable for: entering A3 color-flip or key turning points
Max 2 times per video

Add flash overlay in HTML:

<div class="flash">

positioned full-screen, default opacity 0, z-index 100.

Entrance Animation Rules

Every element in every scene uses
```
tl.from(...)
```
entrance animation (y/opacity/scale)
Entrance stagger 0.1-0.3s; first element starts at scene.start + 0.3s

Use at least 3 different eases (

power3.out

back.out(1.3)

expo.out

elastic.out(1, 0.5)

)

Do not use
gsap.to({opacity: 0})
for exit — transitions handle exit. Only the last scene can fade-to-black
Must use at least 3 types of Modern Motion Techniques throughout the video

Modern Motion Techniques

Half the difference between mediocre and modern videos is layout, the other half is motion. Use at least 3 of the following 7 techniques per video (use in specific scenes, don't stack all throughout)

1. Kinetic Typography (Character stagger entrance) — Category A hero

html

<h1 class="kinetic">维 修 工</h1>

tl.from('.kinetic span', {
  y: 180, opacity: 0, rotateX: -90,
  duration: 0.7, stagger: 0.06,
  ease: 'back.out(1.4)',
  transformOrigin: '50% 100%',
}, T);

2. Camera Punch (Push in / Pull out) — A3, Category D

tl.from(scene, { scale: 1.15, opacity: 0, duration: 0.5, ease: 'expo.out' }, sceneStart);

3. Mask Reveal (clip-path reveal) — Category E quote

css

.reveal { clip-path: inset(0 100% 0 0); }

tl.to('.reveal', { clipPath: 'inset(0 0% 0 0)', duration: 0.9, ease: 'expo.inOut' }, T);

4. Number Ticker (Number scrolling) — D1

html

<div class="ticker" data-end="3600">0</div>

const ticker = document.querySelector('.ticker');
const obj = { val: 0 };
tl.to(obj, {
  val: parseInt(ticker.dataset.end),
  duration: 1.8, ease: 'power2.out',
  onUpdate: () => { ticker.textContent = Math.round(obj.val).toLocaleString(); },
}, T);

5. Outline → Fill (Hollow text to solid) — A2

css

.morph { -webkit-text-stroke: 4px #f5efe5; color: transparent; }

tl.to('.morph', { color: '#e87a3e', webkitTextStrokeColor: '#e87a3e', duration: 0.5, ease: 'power2.out' }, T);

6. Letter Highlight Sweep (Keyword sweep highlight) — Category E climax word

html

<span class="sweep"><span class="sweep-bg"></span>搭档</span>

css

.sweep { position: relative; display: inline-block; padding: 0 8px; }
.sweep-bg { position: absolute; inset: 0; background: #e87a3e; transform: scaleX(0); transform-origin: left; z-index: -1; }

tl.to('.sweep-bg', { scaleX: 1, duration: 0.5, ease: 'power3.inOut' }, T);
tl.to('.sweep', { color: '#0e0b08', duration: 0.1 }, T + 0.25);

7. Background Color Punch (Background flash change) — 1-2 times per video

tl.to(scene, { backgroundColor: '#e87a3e', duration: 0.08 }, T)
  .to(scene, { backgroundColor: '#0e0b08', duration: 0.4, ease: 'power2.out' }, T + 0.1);

Strike-through animation: Use real DOM

<span class="strike-line">

instead of

::after

. Pseudo-elements + CSS variables may fail in some hyperframes rendering paths.

html

<span class="strike">领导<span class="strike-line"></span></span>

css

.strike-line { position: absolute; left: -10px; right: -10px; top: 56%; height: 10px; background: #e87a3e; transform: scaleX(0); transform-origin: left; }

tl.to('.strike .strike-line', { scaleX: 1, duration: 0.55, ease: 'power2.inOut' }, T);

Step 6: Add SFX

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/synth-sfx.sh <article-folder>/video

Generates

video/sfx/{tick,chime,bell}.mp3

```
tick.mp3
```
— 80ms 1.2kHz sine, for transitions (0.3s before each scene switch)
```
chime.mp3
```
— 220ms 880+1320Hz dual-tone, used when dialogue/list items light up (optional)
```
bell.mp3
```
— 1.5s low-frequency bell, used when final climax word appears (max 1 time per video)

Integrate into timeline:

html

<audio id="aud-narration" src="narration.mp3" data-start="0" data-duration="<total>" data-track-index="0" data-volume="1"></audio>

<audio id="aud-tick-s02" src="sfx/tick.mp3" data-start="<scene2.start - 0.3>" data-duration="0.1" data-track-index="2" data-volume="0.55"></audio>
<!-- Repeat for each scene switch; no tick needed for T2/T4 flash transitions -->

<audio id="aud-chime-s08-1" src="sfx/chime.mp3" data-start="<T>" data-duration="0.3" data-track-index="3" data-volume="0.45"></audio>
<audio id="aud-bell-s12" src="sfx/bell.mp3" data-start="<climax-T>" data-duration="1.6" data-track-index="4" data-volume="0.55"></audio>

⚠️ Every
<audio>
must have an
id
— otherwise render will be silent (hyperframes mandatory requirement).

Different

track-index

values do not conflict; overlapping times on the same track are not allowed.

SFX Usage Discipline: Transition ticks are mandatory; chimes/bells are decorative, do not add when scene content is simple; bell can only be used once per video.

Step 7: Lint + Inspect + Render (Must follow order)

bash

cd <article-folder>/video

# Mandatory 1: Linter (must have 0 errors)
npx hyperframes lint

# Mandatory 2: Layout inspection to find overflow (must have 0 errors)
npx hyperframes inspect --at 1,8,15,25,35,45,55,65

# Recommended: Snapshot to check layout
npx hyperframes snapshot --at <t1>,<t2>,<t3> .

# Render (only run after lint + inspect pass)
# ⚠️ Output to parent directory, parallel to video/ — final MP4 is not stored in video/
npx hyperframes render --quality standard --fps 30 --output ../<slug>.mp4

Why inspection is mandatory: The 1080px wide vertical screen is narrow, and 3-4 character hero text at 280-400px font size is close to overflow. Must inspect every time, only render when 0 errors.

Fix overflow:

Reduce font size (inspect gives specific suggestions)
Wrap long hero text ("没法积累" → two lines "没法" / "积累")
Only use
```
white-space: nowrap
```
when confirming (number of characters × font size) < screen width
If
```
.em
```
overflows inside
```
reveal-wrap
```
→ add
```
line-height: 1
```
to
```
.em
```

Render Quality:

```
--quality draft
```
~30s rendering — for iteration
```
--quality standard
```
~1.5min — default, sufficient for publishing
```
--quality high
```
~3min — for large screens / business use

Step 8: Preview

Output:

<article-folder>/<slug>.mp4

(parallel to
video/
, not inside

video/

—

video/

is for intermediate files).

Use

open <article-folder>/<slug>.mp4

to let user preview. Do not auto-upload to WeChat Channels (user may want to edit/adjust first).

Step 9: Publish to YouTube (Auto cron, not part of render workflow)

Do not upload immediately after new video rendering — YouTube has daily quota limits (default 6 videos/day @ 1600 quota points/upload), rendering multiple videos will cause quota blocking.

Method: Cron runs

daily-upload-batch.sh

automatically at 10:00 every day, selects up to 5 MP4s that haven't been uploaded yet (sorted by article date ascending), and writes

.youtube.json

after upload.

Cron is already registered (one-time setup, no need to run again):

0 10 * * * /Users/jianshuo/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh

Manual trigger (do not run in wjs-converting-text-to-video workflow — let cron handle it):

bash

~/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh
# Or upload single article immediately
~/.claude/skills/wjs-converting-text-to-video/scripts/publish-to-youtube.py <article-folder>

Script behavior for each upload:

Detect MP4 portrait/landscape → add
```
#shorts
```
to title for portrait, regular video for landscape
Title from article.md H1 / description from first few paragraphs
Check
```
<article-folder>/.youtube.json
```
: if exists → try to delete old video and upload new one (requires
```
youtube.force-ssl
```
scope, current token does not have this scope → skip delete + upload new)
Write record to
```
.youtube.json
```

See memory: [[auto-publish-youtube]]

Directory Structure

<article-folder>/
├── article.md
├── illustration.png            # User's original schematic, not directly used as bg
├── <slug>.mp4                  # ⭐ Final video (parallel to video/, not stored in video/)
└── video/                      # All intermediate products
    ├── narration_chunks.json   # Narration text for 5-10 scenes
    ├── tts_narration.py        # Copied during bootstrap
    ├── narration.mp3           # Merged full TTS track
    ├── narration/              # Individual segment mp3 (s01..sN)
    ├── timing.json             # Start/end/duration of each segment
    ├── bg.png                  # Abstract watercolor background generated by GPT Image 2
    ├── sfx/{tick,chime,bell}.mp3
    ├── index.html              # HyperFrames composition
    ├── hyperframes.json
    ├── meta.json
    ├── package.json
    └── snapshots/              # Pre-render snapshots

Skill Own Files

~/.claude/skills/wjs-converting-text-to-video/
├── SKILL.md
└── scripts/
    ├── bootstrap-project.sh        # Initialize video/ directory + copy helpers + generate sfx
    ├── generate-bg.sh              # Call GPT Image 2 to generate abstract watercolor bg.png
    ├── tts.py                      # Generate Volcano TTS narration
    ├── synth-sfx.sh                # Synthesize tick/chime/bell (ffmpeg)
    ├── retrofit-bg-image.py        # Add bg-image layer to existing videos
    ├── strip-dark-scene-bgs.py     # Remove scene-level dark backgrounds to let bg-image show through
    └── publish-to-youtube.py       # Auto-upload MP4 to YouTube (portrait→Shorts), can replace existing uploads

Anti-Patterns

Anti-Monotony (Most important — root cause of "flat narration")

Do NOT	Reason
Use B1 two-line strikethrough for all scenes	The biggest failure pattern in history. Max 2 B1 scenes per video
Center all scenes	Lifeless. ≥2 non-centered scenes required
Use similar font sizes for all scenes	Font size span must be ≥240px
Make all scenes 5-7s long	Duration span must be ≥6s
Use only blur crossfade for all transitions	At least 2 types per 4 transitions
No color-flip scenes in the video	≥1 A3 scene is mandatory
No geometric elements in the video	≥1 scene must have thick lines / large numbering / quotation marks
Only use `tl.from({y, opacity})` animations	At least 3 Modern Motion Techniques required
Fill every scene with content	≥1 scene must have ≥60% blank space
Add `background:` color to every scene	Covers bg-image, making watercolor generation useless. Do not set bg for regular scenes; only use solid color bg for A3 color-flip scenes
Always use orange for color-flip / emphasis	At least 2-3 different accent colors required
Use gray for secondary text / strike-through / decoration	Gray has too low contrast on watercolor background and will disappear. Use `#f5efe5` cream + opacity to weaken instead (see [[no-low-contrast-text]])

Content / Engineering

Do NOT	Reason
Use Kokoro for Chinese TTS	Poor Chinese quality, users explicitly reject it
Pass `emotion` parameter to Volcano TTS	`_bigtts` voices will return `data: null` and fail silently
Use `zh_male_jieshuonansheng_mars_bigtts`	Will loop hallucinate when containing English proper nouns
Use serif fonts (Songti / SimSun / Noto Serif)	Not impactful enough
Paste entire article on screen	That's PPT. Video should have one visual moment per screen
Use more than 10 scenes / exceed 90 seconds	Cannot hold audience attention
Force short articles to 90 seconds	Short articles should be 30-50s; forcing length will make content shallow
Change font/color style for each scene	Style drift. Keep design system fixed, only change templates
Use `::after` pseudo-element + CSS variables for strike-through	Fails in hyperframes rendering paths. Use real DOM `<span class="strike-line">`
Use `gsap.to({opacity: 0})` for scenes other than the last one	Exit animations are prohibited by hyperframes — transitions handle exit
Add chime to every segment	Too noisy
Use `../illustration.png` as bg url	Hyperframes render does not resolve cross-directory paths, will render pure black. bg.png must be inside `video/`
Omit `id` from `<audio>`	Render will be silent. Every `<audio>` must have `id="..."`
Make s1 an A3 color-flip scene	First frame cannot see bg-image. Put color-flip scenes in s2+
Use `from({opacity: 0})` for all s1 title elements	First frame will be black screen. s1 main elements should have default `opacity: 1` , only animate y/scale

Common Pitfalls

Write em dash
——
in narration → TTS will read "em dash" aloud. Replace with period or comma
某段 chunk 异常长（>3 chars/s） → Volcano will hallucinate and loop. Switch voice or split into shorter segments
Scene duration < narration duration → Voiceover will be cut off by next scene. Scene must cover entire narration + 0.3s buffer
Black background with large text still visible when opacity: 0 → Check if
```
.scene
```
has default
```
opacity: 0
```
(except s1)
.em
slightly overflows (a few px top/bottom) inside
.reveal-wrap
→ Add
```
line-height: 1
```
to
```
.em
```
Snapshot glyphs differ from render → Now using Noto Sans SC exclusively, should be consistent

Dependencies

HyperFrames CLI (
```
npx hyperframes
```
) — composition lint / inspect / snapshot / render
GPT Image 2 (
```
~/.claude/skills/gpt-image-2-skill/
```
) — generate bg.png; use
```
--provider codex
```
for ChatGPT auth

Volcano TTS —

VOLC_TTS_APPID

VOLC_TTS_ACCESS_TOKEN

~/code/.env

ffmpeg — SFX synthesis, audio concat, aspect-ratio detection

YouTube uploader (

~/.claude/skills/wjs-uploading-video/

) + OAuth token at

~/.config/youtube/token.json

— Step 9 auto-publishing

wjs-converting-text-to-video

NPX Install

Tags

SKILL.md Content (Chinese)

wjs-converting-text-to-video

What this skill produces

When this skill fires

When NOT to use

Core Principle

Workflow

Step 1: Design 5-10 visual moments

A. Hero / Punch (High-contrast climax, ≥1 per video, duration ≤4s)

B. Contrast / Comparison (Contrast structure, 1-2 per video, duration 5-8s)

C. List / Structure (Parallel items, 1-2 per video, duration 6-10s)

D. Stat / Data (Number climax, ≥1 per video, duration 4-6s)

E. Quote / Climax (Key quote conclusion, 1-2 per video, duration 6-10s)

F. Decoration / Geometry (Rhythm seasoning, optional)

Step 1b: Scene Mix Rule (Mandatory)

Ratio Hard Rules

Rhythm Hard Rules

Layout Hard Rules

Color Hard Rules

Anti-Monotony Self-Check

Step 2: Write narration_chunks.json

Step 3: Generate TTS narration

Step 4: Generate watercolor background image

Step 5: Write HyperFrames composition (index.html)

🎬 First Frame Rule (Mandatory)

Color System

Font System (1080px wide vertical screen)

Layout System (Anti-centering inertia)

Geometric Decorative Elements

Scene Transitions (4 types + mixing rules)

Entrance Animation Rules

Modern Motion Techniques

1. Kinetic Typography (Character stagger entrance) — Category A hero

2. Camera Punch (Push in / Pull out) — A3, Category D

3. Mask Reveal (clip-path reveal) — Category E quote

4. Number Ticker (Number scrolling) — D1

5. Outline → Fill (Hollow text to solid) — A2

6. Letter Highlight Sweep (Keyword sweep highlight) — Category E climax word

7. Background Color Punch (Background flash change) — 1-2 times per video

Step 6: Add SFX

Step 7: Lint + Inspect + Render (Must follow order)

Step 8: Preview

Step 9: Publish to YouTube (Auto cron, not part of render workflow)

Directory Structure

Skill Own Files

Anti-Patterns

Anti-Monotony (Most important — root cause of "flat narration")

Content / Engineering

Common Pitfalls

Dependencies

Step 2: Write
`narration_chunks.json`

Step 5: Write HyperFrames composition (
`index.html`
)