wjs-converting-text-to-video
Convert a Wang Jianshuo-style WeChat Official Account
into a
1080×1920 vertical, 30-90 second Chinese narrated short video: TTS voiceover + HyperFrames CSS/GSAP animations + abstract watercolor backgrounds + transition SFX. Output MP4 for WeChat Channels / Douyin / Xiaohongshu / Reels.
What this skill produces
| Dimension | Default |
|---|
| Resolution | 1080×1920 vertical (9:16) |
| Duration | 30-90 seconds |
| Number of Scenes | 5-10 |
| Voiceover | Volcano Engine Volcano TTS, default "Ahu Conversation" male voice |
| Background | Abstract watercolor generated by GPT Image 2 () + blur 30 + warm black semi-transparent overlay |
| Font | Noto Sans SC, hero weight 900, main text warm cream white |
| Output | <article-folder>/<slug>.mp4
(parallel to , not stored in ) |
| Publishing | Auto-upload to YouTube — Portrait → Shorts, Landscape → regular video; re-rendering replaces old video (no accumulation) |
When this skill fires
- The user already has and says: 「做成视频」「做一个解说」「讲一遍」
- The user runs
/wjs-converting-text-to-video <article-folder>
- The user requests batch conversion like "Turn all X articles I posted yesterday into videos"
When NOT to use
- No article draft, only an idea → First use to write article.md, then proceed
- User needs subtitle burning / translation / voiceover replacement → Use / /
- Video requires non-Chinese languages like English/Spanish → This skill focuses on Chinese TTS (Volcano Engine); use hyperframes' built-in tts command for non-Chinese (Kokoro works well for English)
- Landscape 16:9 format → This skill defaults to vertical; only change to landscape if explicitly requested by user
Core Principle
Video is not a visual reading of the article, but a visual reconstruction of it.
Each scene is an independent visual moment — a contrast, a parallelism, a number, a metaphor. Text fills the screen, bolded, with key words highlighted in orange. The background is abstract watercolor (softened with blur), with an overall tone that is steady, restrained, and impactful.
Rhythm > Templates. A video with 5-10 scenes that uses the same "two-line comparison" layout throughout is not a video, it's a slideshow. Modernity comes from contrast — extreme font size differences, asymmetric layouts, alternating short and long scenes, alternating text-only and geometric-element scenes, alternating watercolor-background and bright punch scenes.
Default is mediocre. If you just pick the easiest templates from the top of the list, the result will definitely be a "flat two-line format". Mandatorily follow the
Step 1b Scene Mix Rule ratio.
Workflow
Step 1: Design 5-10 visual moments
Read
<article-folder>/article.md
, split it into 5-10 scenes according to the argument structure (control total duration to 30-90 seconds). Short articles (1-2 core points) use 5-6 scenes / 30-50s; long articles use 8-10 scenes / 60-90s. Each scene includes a narration segment + a clear visual framework.
Template Library — 6 categories, 16 templates total, mix as needed:
A. Hero / Punch (High-contrast climax, ≥1 per video, duration ≤4s)
| Template | Suitable for |
|---|
| A1. Full-screen single-character hero | 1-3 climax words filling the screen, font size 280-400px |
| A2. Outline hero | Hollow text with -webkit-text-stroke: 4px #f5efe5; color: transparent;
|
| A3. Color-flip punch | Full-screen background changes to bright color (orange/red/gold/green etc.), with reversed text color |
| A4. Gradient text hero | Large text with background: linear-gradient(...); -webkit-background-clip: text;
|
B. Contrast / Comparison (Contrast structure, 1-2 per video, duration 5-8s)
| Template | Suitable for |
|---|
| B1. Two-line comparison + strikethrough | "Previously X, now Y" / "Not A, but B" — max 2 per video |
| B2. Split-screen left-right comparison | Screen divided into two halves (can add vertical separator line) |
| B3. Diagonal comparison | Top-left ↔ Bottom-right, with large blank space in the middle |
C. List / Structure (Parallel items, 1-2 per video, duration 6-10s)
| Template | Suitable for |
|---|
| C1. Horizontal row of N cards | 3-5 parallel items, using dark warm black + monochrome border |
| C2. Vertically stacked keywords | 6-8 parallel items, can add large numbering 01-08 |
| C3. True grid | 2×2 / 3×2 grid, each cell with icon + label (vertical screen width is limited, 4 columns will be crowded) |
| C4. Stepped / staggered list | Each item has increasing |
D. Stat / Data (Number climax, ≥1 per video, duration 4-6s)
| Template | Suitable for |
|---|
| D1. Number ticker | 0 → N scrolling animation () |
| D2. Number + label | Main number 200-400px + 60-80px explanation |
| D3. Progress bar / timeline | Horizontal progress bar + nodes |
E. Quote / Climax (Key quote conclusion, 1-2 per video, duration 6-10s)
| Template | Suitable for |
|---|
| E1. Paragraph-level hero text | A 60-100px key quote, left-aligned + left emphasis bar |
| E2. Large quotation marks + content | Huge semi-transparent opening quotation marks as background decoration |
F. Decoration / Geometry (Rhythm seasoning, optional)
| Template | Suitable for |
|---|
| F1. Grid + spinner / progress bar | Multi-concurrent visuals |
| F2. Dialogue bubble ↔ Response | Character A speaks → Character B acts |
Each scene's narration should be 3-12 seconds (short punch scenes 3-4s, long breathing scenes 10-12s, don't make all scenes 5-7s). Total duration of all scenes should be 30-90 seconds, no more than 90 seconds. Short articles should be made short — 5 scenes × 6s = 30s is acceptable.
Step 1b: Scene Mix Rule (Mandatory)
After designing 5-10 scenes, self-check using the following checklist. If any item is not met → go back and adjust.
Ratio Hard Rules
Rhythm Hard Rules
Layout Hard Rules
Color Hard Rules
Anti-Monotony Self-Check
- Screenshot all scenes, shrink to thumbnails and arrange side by side — can you tell them apart at a glance? If 8 look the same → redo
- Are the visual densities of scenes 1,4,7 different? Some should be dense, some extremely minimal
- Is there a "meta-rhythm"? For example: A opening → 3 B/C expansion scenes → D climax → E conclusion — more dramatic than linear layout
Step 2: Write
json
[
{"id": "s01", "text": "我们以前,是 AI 的领导。现在,我们就是它的维修工。"},
{"id": "s02", "text": "..."}
]
Narration Writing Details:
- More colloquial and concise than article.md, use more commas/periods to allow natural pauses in TTS
- Mixed numbers/English is OK ("Claude Code", "100 倍"), Volcano TTS can read them correctly
- Do not write parenthetical comments, , or em dashes (TTS will read "em dash" aloud)
- Remove from article.md, leave only plain text
- Remove Baixing.com-related facts: If article.md contains "百姓网", "百姓网 now has X people", "百姓网 employees" etc., strip or generalize them ("百姓网 now has 158 people" → "There are very few real people in reality"). This is outdated information and should not be included in the video. Similarly, do not include "百姓网" labels or "158 people" stats in visuals. See [[no-baixing-facts]]
Step 3: Generate TTS narration
bash
cd <article-folder>/video
python3 tts_narration.py
The script defaults to
zh_male_ahu_conversation_wvae_bigtts
(Ahu Conversation) — inserts 0.35s silence between segments, outputs
+
.
Volcano TTS Notes (Lessons Learned):
- Use resource , select speakers with
- Never pass / parameters — most voices will return and fail silently
- Never use Kokoro (hyperframes' built-in tts) — Chinese quality is poor, users explicitly reject it
- Avoid
zh_male_jieshuonansheng_mars_bigtts
— will loop hallucinate when containing English proper nouns (e.g., "Claude Code")
Alternative Voices (in recommended order):
zh_male_ahu_conversation_wvae_bigtts
(Ahu Conversation) — default, natural colloquial
zh_male_M392_conversation_wvae_bigtts
— same wvae series
zh_male_wennuanahu_moon_bigtts
(Warm Ahu) — warmer, broadcast-style
zh_male_silang_mars_bigtts
(Silang) — calm, thoughtful, dramatic
zh_male_baqiqingshu_mars_bigtts
(Domineering) — more powerful
Switch voices:
python3 tts_narration.py --voice zh_male_silang_mars_bigtts
Step 4: Generate watercolor background image
The bg-image is the main visual tone (softened abstract watercolor).
Do not use the article's — hand-drawn schematics have too many details, and become uniform dark mud after blur (visually still pure black). Must use specially generated abstract watercolor.
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme>
Choose
(based on article topic):
| theme | Color Palette | Suitable for |
|---|
| bright warm yellow, soft coral pink, terracotta, sage green, cream | Personal, handcrafted, warm topics |
| cool teal, electric blue, deep purple, mint, white | AI, technology, data topics |
| sage green, dusty blue, lavender, pearl, cream | Reflection, calm topics |
| burnt orange, deep red, mustard, charcoal | Warning, tension topics |
| fresh green, gold, soft yellow, sky blue | Growth, compound interest topics |
| lavender, dusty rose, sage, soft amber | Abstract, philosophical topics |
Output:
<article-folder>/video/bg.png
(1088×1920, ~3MB).
⚠️ The image must be in the directory — cannot use
, hyperframes render does not resolve cross-directory relative paths and will render pure black.
Step 5: Write HyperFrames composition ()
Read
, design scenes according to each chunk's start/end times. 1080×1920 vertical screen structure:
html
<html><head><script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
html, body {
width: 1080px; height: 1920px; margin: 0; overflow: hidden;
background: #0e0b08;
font-family: 'Noto Sans SC', 'PingFang SC', 'Heiti SC', sans-serif;
font-weight: 900;
color: #f5efe5;
letter-spacing: -0.02em;
-webkit-font-smoothing: antialiased;
}
#bg-image {
position: absolute; inset: 0;
background-image: url('bg.png');
background-size: cover;
background-position: center;
filter: blur(30px) brightness(0.65) saturate(0.85);
z-index: 0;
transform: scale(1.1);
}
#bg-overlay {
position: absolute; inset: 0;
background: rgba(14, 11, 8, 0.28);
z-index: 1;
}
.scene { position: absolute; inset: 0; overflow: hidden; opacity: 0; z-index: 2; }
#s1 { opacity: 1; }
/* ... scene-specific styles ... */
</style></head>
<body>
<div id="root" data-composition-id="main" data-start="0" data-duration="<total+2>" data-width="1080" data-height="1920">
<div id="bg-image"></div>
<div id="bg-overlay"></div>
<!-- scene divs s1..sN -->
<!-- audio: narration + ticks + chimes + bell -->
</div>
<script>
/* GSAP timeline: paused + register to window.__timelines['main'] */
</script>
</body></html>
🎬 First Frame Rule (Mandatory)
At video t=0, it must include:
- Full visibility of bg-image — always opacity 1, never fade-in (visible by default in CSS, do not change its opacity in GSAP)
- Visible title element — s1's main title element can use
tl.from({y:30, scale:0.95})
, but do not use , otherwise t=0 will be black screen
- s1 cannot be A3 color-flip — otherwise it will cover the bg-image, and the watercolor will not be visible in the first frame. Save color-flip for s2+
Color System
Main Text / Anchor Colors (design system, consistent throughout the video):
| Role | Value | Usage |
|---|
| Main Text | warm cream white | Hero text / main content |
| Secondary Text (subtitle, caption) | + + smaller font size | Do not use gray ( is unreadable on watercolor background). Use opacity + font size to create hierarchy |
| Strikethrough text itself | + + strikethrough line | Do not use dark gray — unreadable on watercolor background. Use opacity to weaken + orange strike line instead |
| Decorative large numbering (01-08) | + or + | Do not use or other dark grays — completely disappears on watercolor background |
| Outline stroke | 4-8px stroke + | A2 hollow text |
| Default fallback bg | dark warm black | Covered by bg-image + overlay; not used for color-flip |
Core Principle: All text uses cream or orange series (accent palette), use opacity + size for hierarchy, do not use hue changes. Gray is a relic of the black background era, never use it on watercolor backgrounds. See [[no-low-contrast-text]]
Color-flip Background Palette (A3, not limited to orange/blue/white):
| hex | Suitable for |
|---|
| classic orange | Warning, emphasis, climax punch |
| bright blue | Data, technology climax |
| warm cream white | Conclusion, quiet contrast |
| deep red | Warning, error climax |
| deep gold | Achievement, value climax |
| emerald green | Growth, compound interest, vitality |
| pine green | Calm, long-termism |
| dark purple | Wisdom, mysterious climax |
| dark pink | Soft, humanistic topics |
Text on color-flip backgrounds uses
or
reversed colors.
Emphasis / Accent Palette (not limited to orange):
| hex | Suitable for |
|---|
| orange | Default emphasis |
| blue | Data, technical terms, AI |
| gold | Value, achievement |
| emerald green | Growth, positive results |
| pine green | Long-term, stable |
| deep red | Warning, contrast |
| dark purple | Abstract, wisdom |
| dark pink | Soft, humanistic |
Use at least 2-3 different emphasis colors throughout the video, choose accents based on scene themes.
Font System (1080px wide vertical screen)
| Item | Value |
|---|
| Font Weight | hero 900 / main text 800 / secondary 600-700 / caption 500 |
| Letter Spacing | hero to / main text / caption |
| Punch hero (A1/A2, 1-3 characters) | 280-400px |
| Short sentence hero (4-6 characters) | 160-240px |
| Long sentence hero (7-10 characters) | 100-150px |
| Card content | 56-130px |
| Subtitle | 40-72px |
| Caption / numbering / label | 20-40px |
Layout System (Anti-centering inertia)
| Layout | CSS Key Points | Suitable for |
|---|
| Centered | | Category A hero scenes, but ≤50% of total scenes |
| Left-aligned top | padding: 80px 80px 0 80px;
| Category E key quotes, long quotes |
| Bottom-right anchored | position: absolute; right: 80px; bottom: 80px;
| Signature, climax words |
| Diagonal | top-left / bottom-right | B3 diagonal comparison |
| Grid | display: grid; grid-template-columns: repeat(2, 1fr);
| C3 (2×N instead of 3×N for vertical screen) |
| Stepped | Each item has margin-left: calc(60px * var(--i));
| C4 staggered list |
| Bottom-aligned + top blank space | position: absolute; bottom: 60px;
with blank space above | Breathing scenes |
| Corner small element | Small text anchored to one corner, rest blank | Minimalist / blank punch scenes |
Padding: 40-80px for full-screen scenes, 120-200px for breathing scenes. Do not use the same padding for all scenes.
Geometric Decorative Elements
Use one every few scenes:
- Thick short line 8-16px × 40-200px, emphasis bar, orange
- Left emphasis bar 6px × 100%, paired with long quotes
- Large numbering 01-08, list item numbering (light gray, huge, decorative)
- Large quotation mark character semi-transparent, placed top-left
- Horizontal separator line 2-4px cream white with 30% transparency
- Dot / square 12-20px, orange, list bullet
- Arrow ➜ or custom SVG
Scene Transitions (4 types + mixing rules)
Do not use blur crossfade for all transitions. For every 4 transitions, use at least 2 different types.
T1. Blur crossfade (default soft)
- 0.6s,
- Next scene transitions from
opacity: 0, filter: blur(24px)
→ opacity: 1, filter: blur(0)
- Previous scene fades out + blurs simultaneously
T2. White flash cut (punch cut, most modern)
- Total 0.18s: 60ms white flash → cut → 40ms new scene scale 1.05 → 1
- Suitable for: entering Category A hero, Category D stat, climax transitions
js
tl.to('.flash', { opacity: 1, duration: 0.06, ease: 'none' }, T - 0.06)
.set(prevScene, { opacity: 0 }, T)
.set(nextScene, { opacity: 1 }, T)
.to('.flash', { opacity: 0, duration: 0.12, ease: 'power2.out' }, T)
.from(nextScene, { scale: 1.05, duration: 0.25, ease: 'expo.out' }, T);
T3. Scale push (sense of advancement)
- 0.55s, previous scene , next scene
- Suitable for: pushing from overview to details
T4. Color flash cut (orange/blue flash, strong rhythm)
- Total 0.22s: 80ms full-screen orange → cut → 40ms fade out
- Suitable for: entering A3 color-flip or key turning points
- Max 2 times per video
Add flash overlay in HTML:
positioned full-screen, default opacity 0, z-index 100.
Entrance Animation Rules
- Every element in every scene uses entrance animation (y/opacity/scale)
- Entrance stagger 0.1-0.3s; first element starts at scene.start + 0.3s
- Use at least 3 different eases ( / / / )
- Do not use for exit — transitions handle exit. Only the last scene can fade-to-black
- Must use at least 3 types of Modern Motion Techniques throughout the video
Modern Motion Techniques
Half the difference between mediocre and modern videos is layout, the other half is motion. Use at least 3 of the following 7 techniques per video (use in specific scenes, don't stack all throughout)
1. Kinetic Typography (Character stagger entrance) — Category A hero
html
<h1 class="kinetic">维 修 工</h1>
js
tl.from('.kinetic span', {
y: 180, opacity: 0, rotateX: -90,
duration: 0.7, stagger: 0.06,
ease: 'back.out(1.4)',
transformOrigin: '50% 100%',
}, T);
2. Camera Punch (Push in / Pull out) — A3, Category D
js
tl.from(scene, { scale: 1.15, opacity: 0, duration: 0.5, ease: 'expo.out' }, sceneStart);
3. Mask Reveal (clip-path reveal) — Category E quote
css
.reveal { clip-path: inset(0 100% 0 0); }
js
tl.to('.reveal', { clipPath: 'inset(0 0% 0 0)', duration: 0.9, ease: 'expo.inOut' }, T);
4. Number Ticker (Number scrolling) — D1
html
<div class="ticker" data-end="3600">0</div>
js
const ticker = document.querySelector('.ticker');
const obj = { val: 0 };
tl.to(obj, {
val: parseInt(ticker.dataset.end),
duration: 1.8, ease: 'power2.out',
onUpdate: () => { ticker.textContent = Math.round(obj.val).toLocaleString(); },
}, T);
5. Outline → Fill (Hollow text to solid) — A2
css
.morph { -webkit-text-stroke: 4px #f5efe5; color: transparent; }
js
tl.to('.morph', { color: '#e87a3e', webkitTextStrokeColor: '#e87a3e', duration: 0.5, ease: 'power2.out' }, T);
6. Letter Highlight Sweep (Keyword sweep highlight) — Category E climax word
html
<span class="sweep"><span class="sweep-bg"></span>搭档</span>
css
.sweep { position: relative; display: inline-block; padding: 0 8px; }
.sweep-bg { position: absolute; inset: 0; background: #e87a3e; transform: scaleX(0); transform-origin: left; z-index: -1; }
js
tl.to('.sweep-bg', { scaleX: 1, duration: 0.5, ease: 'power3.inOut' }, T);
tl.to('.sweep', { color: '#0e0b08', duration: 0.1 }, T + 0.25);
7. Background Color Punch (Background flash change) — 1-2 times per video
js
tl.to(scene, { backgroundColor: '#e87a3e', duration: 0.08 }, T)
.to(scene, { backgroundColor: '#0e0b08', duration: 0.4, ease: 'power2.out' }, T + 0.1);
Strike-through animation: Use real DOM
<span class="strike-line">
instead of
. Pseudo-elements + CSS variables may fail in some hyperframes rendering paths.
html
<span class="strike">领导<span class="strike-line"></span></span>
css
.strike-line { position: absolute; left: -10px; right: -10px; top: 56%; height: 10px; background: #e87a3e; transform: scaleX(0); transform-origin: left; }
js
tl.to('.strike .strike-line', { scaleX: 1, duration: 0.55, ease: 'power2.inOut' }, T);
Step 6: Add SFX
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/synth-sfx.sh <article-folder>/video
Generates
video/sfx/{tick,chime,bell}.mp3
:
- — 80ms 1.2kHz sine, for transitions (0.3s before each scene switch)
- — 220ms 880+1320Hz dual-tone, used when dialogue/list items light up (optional)
- — 1.5s low-frequency bell, used when final climax word appears (max 1 time per video)
Integrate into timeline:
html
<audio id="aud-narration" src="narration.mp3" data-start="0" data-duration="<total>" data-track-index="0" data-volume="1"></audio>
<audio id="aud-tick-s02" src="sfx/tick.mp3" data-start="<scene2.start - 0.3>" data-duration="0.1" data-track-index="2" data-volume="0.55"></audio>
<!-- Repeat for each scene switch; no tick needed for T2/T4 flash transitions -->
<audio id="aud-chime-s08-1" src="sfx/chime.mp3" data-start="<T>" data-duration="0.3" data-track-index="3" data-volume="0.45"></audio>
<audio id="aud-bell-s12" src="sfx/bell.mp3" data-start="<climax-T>" data-duration="1.6" data-track-index="4" data-volume="0.55"></audio>
⚠️ Every must have an — otherwise render will be silent (hyperframes mandatory requirement).
Different
values do not conflict; overlapping times on the same track are not allowed.
SFX Usage Discipline: Transition ticks are mandatory; chimes/bells are decorative, do not add when scene content is simple; bell can only be used once per video.
Step 7: Lint + Inspect + Render (Must follow order)
bash
cd <article-folder>/video
# Mandatory 1: Linter (must have 0 errors)
npx hyperframes lint
# Mandatory 2: Layout inspection to find overflow (must have 0 errors)
npx hyperframes inspect --at 1,8,15,25,35,45,55,65
# Recommended: Snapshot to check layout
npx hyperframes snapshot --at <t1>,<t2>,<t3> .
# Render (only run after lint + inspect pass)
# ⚠️ Output to parent directory, parallel to video/ — final MP4 is not stored in video/
npx hyperframes render --quality standard --fps 30 --output ../<slug>.mp4
Why inspection is mandatory: The 1080px wide vertical screen is narrow, and 3-4 character hero text at 280-400px font size is close to overflow. Must inspect every time, only render when 0 errors.
Fix overflow:
- Reduce font size (inspect gives specific suggestions)
- Wrap long hero text ("没法积累" → two lines "没法" / "积累")
- Only use when confirming (number of characters × font size) < screen width
- If overflows inside → add to
Render Quality:
- ~30s rendering — for iteration
- ~1.5min — default, sufficient for publishing
- ~3min — for large screens / business use
Step 8: Preview
Output:
<article-folder>/<slug>.mp4
(
parallel to , not inside
—
is for intermediate files).
Use
open <article-folder>/<slug>.mp4
to let user preview.
Do not auto-upload to WeChat Channels (user may want to edit/adjust first).
Step 9: Publish to YouTube (Auto cron, not part of render workflow)
Do not upload immediately after new video rendering — YouTube has daily quota limits (default 6 videos/day @ 1600 quota points/upload), rendering multiple videos will cause quota blocking.
Method: Cron runs
automatically at 10:00 every day, selects up to 5 MP4s that haven't been uploaded yet (sorted by article date ascending), and writes
after upload.
Cron is already registered (one-time setup, no need to run again):
0 10 * * * /Users/jianshuo/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh
Manual trigger (do not run in wjs-converting-text-to-video workflow — let cron handle it):
bash
~/.claude/skills/wjs-converting-text-to-video/scripts/daily-upload-batch.sh
# Or upload single article immediately
~/.claude/skills/wjs-converting-text-to-video/scripts/publish-to-youtube.py <article-folder>
Script behavior for each upload:
- Detect MP4 portrait/landscape → add to title for portrait, regular video for landscape
- Title from article.md H1 / description from first few paragraphs
- Check
<article-folder>/.youtube.json
: if exists → try to delete old video and upload new one (requires scope, current token does not have this scope → skip delete + upload new)
- Write record to
See memory: [[auto-publish-youtube]]
Directory Structure
<article-folder>/
├── article.md
├── illustration.png # User's original schematic, not directly used as bg
├── <slug>.mp4 # ⭐ Final video (parallel to video/, not stored in video/)
└── video/ # All intermediate products
├── narration_chunks.json # Narration text for 5-10 scenes
├── tts_narration.py # Copied during bootstrap
├── narration.mp3 # Merged full TTS track
├── narration/ # Individual segment mp3 (s01..sN)
├── timing.json # Start/end/duration of each segment
├── bg.png # Abstract watercolor background generated by GPT Image 2
├── sfx/{tick,chime,bell}.mp3
├── index.html # HyperFrames composition
├── hyperframes.json
├── meta.json
├── package.json
└── snapshots/ # Pre-render snapshots
Skill Own Files
~/.claude/skills/wjs-converting-text-to-video/
├── SKILL.md
└── scripts/
├── bootstrap-project.sh # Initialize video/ directory + copy helpers + generate sfx
├── generate-bg.sh # Call GPT Image 2 to generate abstract watercolor bg.png
├── tts.py # Generate Volcano TTS narration
├── synth-sfx.sh # Synthesize tick/chime/bell (ffmpeg)
├── retrofit-bg-image.py # Add bg-image layer to existing videos
├── strip-dark-scene-bgs.py # Remove scene-level dark backgrounds to let bg-image show through
└── publish-to-youtube.py # Auto-upload MP4 to YouTube (portrait→Shorts), can replace existing uploads
Anti-Patterns
Anti-Monotony (Most important — root cause of "flat narration")
| Do NOT | Reason |
|---|
| Use B1 two-line strikethrough for all scenes | The biggest failure pattern in history. Max 2 B1 scenes per video |
| Center all scenes | Lifeless. ≥2 non-centered scenes required |
| Use similar font sizes for all scenes | Font size span must be ≥240px |
| Make all scenes 5-7s long | Duration span must be ≥6s |
| Use only blur crossfade for all transitions | At least 2 types per 4 transitions |
| No color-flip scenes in the video | ≥1 A3 scene is mandatory |
| No geometric elements in the video | ≥1 scene must have thick lines / large numbering / quotation marks |
| Only use animations | At least 3 Modern Motion Techniques required |
| Fill every scene with content | ≥1 scene must have ≥60% blank space |
| Add color to every scene | Covers bg-image, making watercolor generation useless. Do not set bg for regular scenes; only use solid color bg for A3 color-flip scenes |
| Always use orange for color-flip / emphasis | At least 2-3 different accent colors required |
| Use gray for secondary text / strike-through / decoration | Gray has too low contrast on watercolor background and will disappear. Use cream + opacity to weaken instead (see [[no-low-contrast-text]]) |
Content / Engineering
| Do NOT | Reason |
|---|
| Use Kokoro for Chinese TTS | Poor Chinese quality, users explicitly reject it |
| Pass parameter to Volcano TTS | voices will return and fail silently |
Use zh_male_jieshuonansheng_mars_bigtts
| Will loop hallucinate when containing English proper nouns |
| Use serif fonts (Songti / SimSun / Noto Serif) | Not impactful enough |
| Paste entire article on screen | That's PPT. Video should have one visual moment per screen |
| Use more than 10 scenes / exceed 90 seconds | Cannot hold audience attention |
| Force short articles to 90 seconds | Short articles should be 30-50s; forcing length will make content shallow |
| Change font/color style for each scene | Style drift. Keep design system fixed, only change templates |
| Use pseudo-element + CSS variables for strike-through | Fails in hyperframes rendering paths. Use real DOM <span class="strike-line">
|
| Use for scenes other than the last one | Exit animations are prohibited by hyperframes — transitions handle exit |
| Add chime to every segment | Too noisy |
| Use as bg url | Hyperframes render does not resolve cross-directory paths, will render pure black. bg.png must be inside |
| Omit from | Render will be silent. Every must have |
| Make s1 an A3 color-flip scene | First frame cannot see bg-image. Put color-flip scenes in s2+ |
| Use for all s1 title elements | First frame will be black screen. s1 main elements should have default , only animate y/scale |
Common Pitfalls
- Write em dash in narration → TTS will read "em dash" aloud. Replace with period or comma
- 某段 chunk 异常长(>3 chars/s) → Volcano will hallucinate and loop. Switch voice or split into shorter segments
- Scene duration < narration duration → Voiceover will be cut off by next scene. Scene must cover entire narration + 0.3s buffer
- Black background with large text still visible when opacity: 0 → Check if has default (except s1)
- slightly overflows (a few px top/bottom) inside → Add to
- Snapshot glyphs differ from render → Now using Noto Sans SC exclusively, should be consistent
Dependencies
- HyperFrames CLI () — composition lint / inspect / snapshot / render
- GPT Image 2 (
~/.claude/skills/gpt-image-2-skill/
) — generate bg.png; use for ChatGPT auth
- Volcano TTS — / in
- ffmpeg — SFX synthesis, audio concat, aspect-ratio detection
- YouTube uploader (
~/.claude/skills/wjs-uploading-video/
) + OAuth token at ~/.config/youtube/token.json
— Step 9 auto-publishing