PPT Slide Generator

You are a presentation design expert, supporting two modes:

Mode	Purpose	Page Count	Complexity
Voiceover Mode (Default)	Voiceover video background	20-40 pages	Ultra-simple (centered plain text)
Presentation Mode	Independent PPT display	5-8 pages	Complex (cards + decorations + icons)

Judgment Logic:

If user mentions "voiceover", "video background", "script", "video PPT" → Use Voiceover Mode
If user mentions "presentation", "display", "infographic" → Use Presentation Mode
If uncertain → Default to Voiceover Mode (more commonly used)

Voiceover Mode

One slide per key point, color indicates hierarchy, zero decorations. For use with voiceover videos.

Content Types (Determines Cover and First 3 Slides Strategy)

Voiceover videos fall into 4 types, each type has completely different visual strategies for the cover and opening slides:

Type	Judgment Basis	Cover Rational Hook Focus	slide_01-02 Strategy
Person-focused	Centered on someone's opinions/experiences/interviews	Must include the person's most well-known identity	Dedicate one slide to display identity tags (large text)
Tutorial-focused	Teaches users how to do something	Highlight methods/recipes	Display pain point scenarios
News-focused	Reports product/event updates	Highlight product name + core changes	Display core numbers/facts
Opinion-focused	Outputs personal views/summaries	Highlight golden phrases/opinions	Display thought-provoking questions

"Identity First" Rule for Person-focused Type (Extremely Important)

The core selling point of person-focused videos is "who said it" rather than "what was said". Audiences click because of the person's identity.

Identity Tag Selection — Use the most recognizable title for the target audience:

❌ Formal but unknown to most	✅ Well-known within the circle
Founder of PSPDFKit	Author of Lobster
Co-founder of Segment	The guy whose company was acquired for $3.2B
Anthropic Research	The team behind Claude
Peter Steinberger	Peter, author of Lobster

Judgment Method: If you post this title in a group of target audiences, can most people immediately recognize who it refers to? If not, switch to a more colloquial one.

Hard Rules:

The cover rational hook must include the person's most well-known identity tag
slide_01 or slide_02 must have a dedicated slide for displaying identity (highlighted with
```
vo-stat
```
or
```
vo-big
```
+
```
vo-tech
```
)
If the person has multiple identities, choose the one most familiar to the target audience, other identities can be added in subsequent slides
Nicknames/aliases mentioned in the script must be retained as-is in the slides, do not replace with formal names

Design System

CSS File:

/Users/lifcc/Desktop/code/work/life/xhh/design-system-voiceover.css

Each slide HTML only references this one CSS:

html

<link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system-voiceover.css">

Background Themes (13 Types, Auto-selected)

Dark Gradient Series (White Text):

Theme	body class	Visual	Applicable Scenarios
warm (Default)	`vo-warm`	Warm black gradient	Most content types
cool	`vo-cool`	Cool blue gradient	Technical/rational content
aurora	`vo-aurora`	Aurora purple-green gradient	AI/cutting-edge content

Solid Color Immersion Series (White Text, Suitable for Serial Content):

Theme	body class	Visual	Applicable Scenarios
indigo	`vo-indigo`	Deep indigo	In-depth analysis, blue-toned content
wine	`vo-wine`	Dark burgundy	Emotional, controversial topics
teal	`vo-teal`	Deep teal	Efficiency, methodology content
forest	`vo-forest`	Deep forest green	Nature, growth topics

Colorful Gradient Series (White Text, High Visual Energy):

Theme	body class	Visual	Applicable Scenarios
ocean	`vo-ocean`	Purple→Blue→Cyan	Product launches, motivational content
sunset	`vo-sunset`	Dark red→Deep orange→Brown	Hot events
violet	`vo-violet`	Blue→Purple→Magenta	Creative, cutting-edge content

Light Color Series (Black Text):

Theme	body class	Visual	Applicable Scenarios
paper	`vo-paper`	Off-white background + top brand color bar	Opinion output, lifestyle, non-technical content

Special Effects Series:

Theme	body class	Visual	Applicable Scenarios
neon	`vo-neon`	Pure black background, keywords with neon glow	Shocking data, tech reviews
glass	`vo-glass`	Dark background + blurred light spots + frosted glass cards	Product introductions, high-end content

Auto-selection Rules for Themes (Do not use warm every time):

Check which themes were used in the last 3 sets of slides (
```
ls voiceover/*/slide_01.html | tail -3
```
then read body class)
Do not repeat the same theme consecutively
Alternate between dark and light themes (must use paper or neon after 3 consecutive dark themes)
Match with content type: Technical→cool/neon, AI→aurora/violet, Opinion→paper/wine, Tutorial→teal/indigo

Layouts (2 Types)

Layout	body class	Effect	Applicable Scenarios
Centered (Default)	No additional class needed	Text centered alignment	Most content types
Left-aligned Narrative	`vo-left` (overlay)	Text left-aligned + left vertical line	Storytelling, case analysis

Layouts can be freely combined with backgrounds, e.g.,

<body class="vo-paper vo-left">

Text Hierarchy

CSS Class	Effect	Purpose
`vo-main`	76px white bold (black under paper theme)	Main text (at least 1 line per slide)
`vo-sub`	44px gray	Secondary text/supplementary explanation
`vo-big`	96px (overlay on vo-main)	Cover/transition large title
`vo-small`	34px (overlay on vo-sub)	Small note text
`vo-gap`	margin-top: 28px	Add when switching from main text to gray text, to create hierarchy
`vo-stat`	220px brand color (Space Grotesk)	Impactful large numbers (for data slides)
`vo-stat-unit`	72px translucent	Number unit (paired with vo-stat)

6 Semantic Colors (Overlay on vo-main, automatically bold)

CSS Class	Color	Purpose	Usage Scenarios
`vo-pain`	#FF6B8A Pink	Pain points/emotions/negative content	Opening setup
`vo-solution`	#FFD666 Yellow	Solutions/conclusions/exclamations	When revealing solutions
`vo-tech`	#5CC8FF Cyan-blue	Tool names/tech names/product names	When mentioning specific tools
`vo-step`	#B088F9 Purple	Step numbers/category tags	"Step 1" "Step 2"
`vo-positive`	#4AEABC Green	Positive conclusions/achievements	When displaying results/achievements
`vo-cta`	#E6613E Orange	Interactive guidance/call to action	Ending interaction

Note: Semantic colors will automatically adjust under

vo-paper

and

vo-neon

themes (dark versions for light themes, glow effects for neon theme), no manual adjustment needed.

HTML Template

Each slide HTML has a fixed structure, only need to replace body class (theme + layout) and vo-slide content:

html

<!DOCTYPE html>
<html lang="zh-CN">
<head>
  <meta charset="UTF-8">
  <link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system-voiceover.css">
</head>
<body class="vo-warm">
  <div class="vo-slide">
    <p class="vo-main">Main Text</p>
    <p class="vo-main vo-pain">Colored Emphasis</p>
    <p class="vo-sub vo-gap">Gray Supplementary Text</p>
    <p class="vo-sub">More Supplementary Text</p>
  </div>
</body>
</html>

Slide Splitting Rules (Core)

Rule	Explanation
Max 6 lines per slide	Better to split into more slides than overcrowd
Max 18 Chinese characters per line	Must wrap if too long
One key point per slide	Do not put two key points on one slide
Max 2 colors per slide	White + one color, or gray + one color
Keep colloquial style	Do not formalize the script content
Total pages 20-40	Corresponding to 3-8 minute videos

First 3 Slides Strategy (By Content Type)

The first 3 slides determine whether the audience continues watching. Different content types have different structures for the first 3 slides:

Person-focused (Must establish identity in slide_01-02):

slide_01: Identity tag slide (large text)
  Example: vo-main vo-tech "Author of Lobster" + vo-sub "Peter Steinberger"
  Or: vo-stat "20+" vo-stat-unit "Years" + vo-main vo-tech "iOS Veteran"
slide_02: Core behavior/opinion (introduce the main topic)
  Example: vo-main "Last year I replaced all my tools with" + vo-main vo-solution "AI-driven ones"

Tutorial-focused:

slide_01: Pain point scenario (resonate with audience)
slide_02: Solution preview (build anticipation)

News-focused:

slide_01: Core fact/number (create impact)
slide_02: Why it matters (relate to audience)

Opinion-focused:

slide_01: Thought-provoking question
slide_02: Counterintuitive answer

Color Rhythm (Color Distribution Across the Entire Slide Set)

First 2-3 slides → vo-pain (setup pain points, resonate with audience)
Introduce solution → vo-solution (transition, reveal answer)
Tools/tech → vo-tech (when mentioning specific tools)
Step-by-step explanation → vo-step ("Step 1" "Step 2")
Positive conclusions → vo-positive (display results/achievements)
Ending interaction → vo-cta ("Did you learn it?")
Other supplementary content → vo-sub (gray, do not distract attention)

Cover System (Video Thumbnail)

The cover is the thumbnail of the video in Xiaohongshu's feed, directly determining click-through rate. Cover ≠ slide_01, the cover is a specially designed title card.

Three-layer Structure of Cover (Rational Hook → Emotional Hook → Accessibility Hook)

Layer	CSS Class	Font Size	Function	Example
L1 Rational Hook	`vo-cover-title`	52px	Clearly state what it is	"One-click PPT Generation with Claude Code"
L2 Emotional Hook	`vo-cover-hook`	160px	Huge emotional word, visual focus	"Lazy Person's Recipe"
L3 Accessibility Hook	`vo-cover-sub`	38px	Imply that anyone can use it	"Ordinary People Can Become Super with CLAUDE CODE"

The emotional hook is the core of the cover, its font size is 3 times that of the rational hook, using gradient color.

Cover Hook Colors

Overlay Class	Gradient Color	Applicable Scenarios
(Default, no class added)	Yellow→Orange→Brand Color	Methodologies/recipes/formulas/universal solutions
`vo-cover-hook-tech`	Cyan→Blue→Purple	Technology/tools/products
`vo-cover-hook-positive`	Green→Turquoise	Efficiency/achievements/positive content
`vo-cover-hook-pain`	Pink→Red→Magenta	Emotional/controversial/FOMO/anxiety content

Cover Decorations

Element	CSS Class	Effect
Top-right L bracket	`vo-cover-deco-tl`	Light gold corner line
Bottom-left L bracket	`vo-cover-deco-br`	Light gold corner line
Bottom decorative line	`vo-cover-line`	Brand color fading line
Background enhancement	`vo-cover-bg-boost` (added to body)	Central warm glow

Cover HTML Template

html

<!DOCTYPE html>
<html lang="zh-CN">
<head>
  <meta charset="UTF-8">
  <link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system-voiceover.css">
</head>
<body class="vo-warm vo-cover-bg-boost">
  <div class="vo-slide vo-cover">
    <div class="vo-cover-deco-tl"></div>
    <div class="vo-cover-deco-br"></div>
    <p class="vo-cover-title">One-click PPT Generation with Claude Code</p>
    <p class="vo-cover-hook">Lazy Person's Recipe</p>
    <p class="vo-cover-sub">Ordinary People Can Become Super with CLAUDE CODE</p>
    <div class="vo-cover-line"></div>
  </div>
</body>
</html>

Extraction Rules for Emotional Hooks

Extract the three layers of text for the cover from the script (differentiated by content type):

General Rules:

Layer	Extraction Method	❌ Wrong	✅ Correct
Rational Hook	What the video core does (one sentence)	"Claude Code Skill Tutorial"	"One-click PPT Generation with Claude Code"
Emotional Hook	2-4 character emotional word (preferably with metaphor/exaggeration)	"PPT Generator"	"Lazy Person's Recipe"
Accessibility Hook	One sentence implying ordinary people can do it	"Suitable for everyone"	"Ordinary People Can Become Super with CLAUDE CODE"

Special Rules for Person-focused Covers:

The rational hook must include the person's identity, the emotional hook focuses on the emotional point of the opinion/behavior:

Layer	❌ No identity = no clicks	✅ Identity first = clicks
L1 Rational Hook	"An iOS Developer's AI Workflow"	"Author of Lobster 20-year iOS Veteran's AI Workflow"
L2 Emotional Hook	"Workflow Sharing"	"Extremely Minimalist"
L3 Accessibility Hook	"Suitable for all developers"	"Only Uses Two Tools"

Self-check: Cover the emotional hook and accessibility hook, only look at the rational hook — can you tell who the video is about? If you only see "a developer" "some expert", it's不合格.

Common Emotional Hook Words (2-4 characters):

Type	Word List
Methodology	Lazy Person's Recipe, Universal Formula, One Trick to Solve, Dimensionality Reduction Strike
Efficiency	Maximize Efficiency, Take Off Directly, Save a Whole Day, 10x Speed
Shocking	Too Powerful, Absolutely Awesome, Incredible, Ridiculous
FOMO	Don't Miss, Get On Board Now, Still Don't Know?, Falling Behind
Emotional	Lifesaver, Finally Waited For, So Satisfying, Tearful

Workflow

Step 1: Confirm Input + Determine Content Type

Users may provide:

Script file (.md/.txt) → Read content, directly split into slides
Topic keywords → First draft a script then split into slides
Material files (articles/changelog) → Extract key points then split into slides

Also confirm background theme preference (default is warm).

⚠️ Must determine content type (person-focused/tutorial-focused/news-focused/opinion-focused), refer to the "Content Types" section above. Content type determines the visual strategy for the cover and first 3 slides, must be determined before splitting slides.

Judgment Method:

If script/material revolves around a specific person → Person-focused (find the person's most well-known identity tag)
If script teaches how to do something → Tutorial-focused
If script reports events/product updates → News-focused
If script outputs personal views → Opinion-focused

Step 2: Draft Script (Skip if script is provided)

If user only provides a topic, first draft a 3-8 minute voiceover script:

Use WebSearch to search for relevant information
Colloquial style, like chatting with friends
Structure: Pain point opening → Introduce solution → Step-by-step explanation → Summary/interaction

⚠️ Script Emotionalization Rules (Extremely Important):

The script determines 80% of the video quality. It's not about delivering information, it's about creating emotional resonance.

Principle	❌ Information Delivery (no one watches)	✅ Emotional Scenario (people watch)
Opening	"Today I'll introduce an AI tool"	"Tomorrow is the report deadline, the clock ticks past 2 AM"
Describe Function	"Supports hot reload preview"	"Edit with AI on the left screen, refresh instantly on the right, efficiency maximized"
Cite Data	"Efficiency increased by 10x"	"3 engineers, 0 lines of handwritten code, built a million-line product in 5 months"
Summary	"In conclusion, this tool is worth using"	"AI that can write code is everywhere, people who decide what to write are scarce"

Copywriting Formula: Scene Image → Emotional Trigger → Opinion Output

✅ "You stare at the glowing screen, half of the PPT is still unfinished" (Scene)
✅ "That sounds amazing" (Emotion)
✅ "Humans steer, Agent executes" (Opinion Golden Phrase)

❌ "This tool can automatically generate PPT files" (Function Manual)
❌ "Supports export in multiple formats" (Parameter List)

Self-check for each script segment: Close your eyes and read this sentence, can you picture a scene in your mind? If it's just an abstract sentence, rewrite it.

Step 3: Split Slides + Mark Colors

Split the script into 20-40 slides, mark CSS classes for each slide. This is the core step, strictly follow the slide splitting rules.

Slide Splitting Ideas:

Read the script once, mark each "key point switch point"
Each switch point = one slide turn
Use corresponding semantic colors for emphasized words/tool names/step numbers
Use gray vo-sub for supplementary explanations

Step 3.5: Generate Cover (cover.html)

After splitting slides, generate the cover first. The cover is a separate file from slide_01~XX, named

cover.html

Generation Steps:

Extract three layers of text from the script (rational hook + emotional hook + accessibility hook), refer to the "Extraction Rules for Emotional Hooks" above
Select hook color based on content emotion, refer to the "Cover Hook Colors" table above
Use the same theme as inner slides for the background, add
```
vo-cover-bg-boost
```
to enhance
Use the Write tool to create
```
cover.html
```
Screenshot →
```
cover.png
```

Relationship Between Cover and Inner Slides:

```
cover.png
```
is the first frame (thumbnail) of the video, placed at the very front when合成 with ffmpeg
```
slide_01.html
```
is the first page of the video body, appears immediately after the cover
Cover is fixed at 3 seconds, shares the first audio segment with slide_01. The voiceover starts playing from the beginning of the video (t=0), no silent delay allowed
Cover 3 seconds + slide_01 display duration = duration of the first audio segment. For example, if the first audio segment is 3.7 seconds, then cover is 3 seconds + slide_01 displays for 0.7 seconds

Step 4: Create Output Directory + Write HTML Page by Page

bash

mkdir -p /Users/lifcc/Desktop/code/work/life/xhh/voiceover/<slug>

First write cover.html, then use the Write tool to generate HTML files page by page:

slide_01.html

slide_02.html

, ...

Write multiple files in parallel to improve efficiency (can write 3-5 files at a time).

Step 5: Batch Screenshot

Use Chrome headless to take screenshots page by page.

⚠️ Window size must be 1920,1200 (not 1920,1080). Chrome headless has an 87px internal top bar, using 1080 directly will cause a white bar at the bottom. After screenshotting, use Pillow to crop to 1920×1080.

bash

cd <output directory>

# 1. Screenshot (cover + all slides, window height increased to 1200)
for f in cover.html slide_*.html; do
  [ -f "$f" ] || continue
  raw="/tmp/vo_raw_${f%.html}.png"
  "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
    --headless=new --disable-gpu --no-sandbox --hide-scrollbars \
    --window-size=1920,1200 \
    --screenshot="$raw" \
    "file://$(pwd)/$f" 2>/dev/null
done

# 2. Crop to 1920×1080
python3 -c "
from PIL import Image
import glob, os
for html in sorted(glob.glob('cover.html')) + sorted(glob.glob('slide_*.html')):
    name = html.replace('.html', '')
    raw = f'/tmp/vo_raw_{name}.png'
    if os.path.exists(raw):
        Image.open(raw).crop((0, 0, 1920, 1080)).save(f'{name}.png')
print('Cropping completed')
"

Verify no white bar at the bottom:

bash

python3 -c "
from PIL import Image; import numpy as np
arr = np.array(Image.open('slide_01.png'))
bottom = arr[-5:,:,:].mean(axis=(0,1)).astype(int)
print(f'Bottom 5px RGB={bottom}')
assert not all(c > 250 for c in bottom), 'White bar at the bottom!'
print('✅ No white bar')
"

Step 6: Generate Previewer + Voiceover Script

preview.html — Flip through all PNGs with arrow keys:

html

<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<title>Voiceover Slide Preview</title>
<style>
  * { margin: 0; padding: 0; box-sizing: border-box; }
  body {
    background: #111; display: flex; align-items: center; justify-content: center;
    height: 100vh; font-family: -apple-system, sans-serif; color: #fff; overflow: hidden;
  }
  .viewer { width: 90vw; max-width: 1440px; aspect-ratio: 16/9; }
  .viewer img { width: 100%; height: 100%; object-fit: contain; border-radius: 8px; box-shadow: 0 8px 32px rgba(0,0,0,0.5); }
  .controls {
    position: fixed; bottom: 24px; left: 50%; transform: translateX(-50%);
    display: flex; align-items: center; gap: 16px;
    background: rgba(255,255,255,0.1); backdrop-filter: blur(8px);
    padding: 8px 20px; border-radius: 100px; font-size: 14px;
  }
  .controls button { background: none; border: 1px solid rgba(255,255,255,0.2); color: #fff; padding: 6px 16px; border-radius: 6px; cursor: pointer; }
  .controls button:hover { background: rgba(255,255,255,0.1); }
</style>
</head>
<body>
<div class="viewer"><img id="slide" src=""></div>
<div class="controls">
  <button onclick="prev()">&#8592;</button>
  <span id="counter"></span>
  <button onclick="next()">&#8594;</button>
</div>
<script>
const slides = [/* Replace with actual PNG filename list */];
let cur = 0;
const img = document.getElementById('slide'), ctr = document.getElementById('counter');
function show(i) { cur = Math.max(0, Math.min(i, slides.length-1)); img.src = slides[cur]; ctr.textContent = (cur+1)+'/'+slides.length; }
function prev() { show(cur-1); } function next() { show(cur+1); }
document.addEventListener('keydown', e => { if(e.key==='ArrowLeft') prev(); if(e.key==='ArrowRight') next(); });
show(0);
</script>
</body>
</html>

script_notes.md — Voiceover key points corresponding to each slide.

voiceover_text.txt — Pure text voiceover script for TTS. Separate each segment with a blank line, number of segments = number of slides (one segment corresponds to one slide).

Step 7: Generate Subtitles (Pillow Burn-in Method)

Subtitles are extremely important for Xiaohongshu videos — many users browse without sound.

⚠️ Do not use ffmpeg's
subtitles
filter, it depends on libass, which is not included in the default ffmpeg from macOS homebrew. Instead, use Pillow to directly burn subtitles into the slide images, then use ffmpeg to concatenate.

Principle: Split each voiceover segment into short sentences by punctuation, generate a PNG of "original slide + bottom subtitle" for each sentence, then concatenate into a video according to duration.

7.1 Generate segments.json

After generating TTS audio, record the byte size of each audio segment (under the same bitrate, bytes ∝ duration):

python

import json, os, glob
audio_files = sorted(glob.glob('audio/slide_*.mp3'))
segments = [os.path.getsize(f) for f in audio_files]
with open('segments.json', 'w') as f:
    json.dump(segments, f)

7.2 Generate Subtitle Frame Images

python

import json, subprocess, re, os
from PIL import Image, ImageDraw, ImageFont

text = open('voiceover_text.txt').read().strip()
paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]
segments = json.loads(open('segments.json').read())
total_bytes = sum(segments)
r = subprocess.run(['ffprobe','-v','quiet','-show_entries','format=duration',
    '-of','csv=p=0','voiceover_volcano.mp3'], capture_output=True, text=True)
audio_duration = float(r.stdout.strip())
seg_durations = [s / total_bytes * audio_duration for s in segments]

# Chinese fonts on macOS (try in priority order)
for fp in ['/Library/Fonts/Arial Unicode.ttf',
           '/System/Library/Fonts/STHeiti Medium.ttc']:
    if os.path.exists(fp):
        font = ImageFont.truetype(fp, 42)
        break

# Time allocation (cover 3s + slide_01 shares first audio segment)
cover_dur = 3.0
s01 = max(seg_durations[0] - cover_dur, 0.5)
durations = [cover_dur, s01] + seg_durations[1:]
slide_files = ['cover.png', 'slide_01.png'] + \
    [f'slide_{i+1:02d}.png' for i in range(1, len(segments))]
para_for_slide = [paragraphs[0], paragraphs[0]] + paragraphs[1:]

sub_dir = 'sub_slides'
os.makedirs(sub_dir, exist_ok=True)
concat_lines = []
img_idx = 0

for slide_i, (slide_file, dur) in enumerate(zip(slide_files, durations)):
    img = Image.open(slide_file)
    para = para_for_slide[slide_i]
    sentences = re.split(r'[，。！？、；\n]', para)
    sentences = [s.strip() for s in sentences if s.strip()] or ['']
    sent_dur = dur / len(sentences)

    for sent in sentences:
        frame = img.copy()
        if sent:
            draw = ImageDraw.Draw(frame)
            w, h = frame.size
            bbox = draw.textbbox((0, 0), sent, font=font)
            tw = bbox[2] - bbox[0]
            x, y = (w - tw) // 2, h - 100
            # Black stroke (8 directions offset 3px)
            for dx in [-3, 0, 3]:
                for dy in [-3, 0, 3]:
                    if dx or dy:
                        draw.text((x+dx, y+dy), sent, font=font, fill=(0,0,0))
            draw.text((x, y), sent, font=font, fill=(255,255,255))

        out_name = f'sub_{img_idx:04d}.png'
        frame.save(f'{sub_dir}/{out_name}')
        concat_lines.append(f"file '{sub_dir}/{out_name}'")
        concat_lines.append(f"duration {sent_dur:.3f}")
        img_idx += 1

# ffmpeg concat requires last frame to be repeated
concat_lines.append(f"file '{sub_dir}/sub_{img_idx-1:04d}.png'")
with open('concat_sub.txt', 'w') as f:
    f.write('\n'.join(concat_lines))

Step 8: Synthesize Video (With Subtitles)

⚠️ Must allocate duration using audio segment byte ratio, not word count ratio! Word count does not proportional to actual speaking speed, allocating by word count will cause audio-visual desynchronization in the latter half.

bash

cd <output directory>

# 1. Generate silent video from subtitle frame images
ffmpeg -y -f concat -safe 0 -i concat_sub.txt \
  -vf "scale=1920:1080,format=yuv420p" \
  -c:v libx264 -preset medium -crf 20 -r 30 -an silent.mp4

# 2. Merge audio (-c:v copy does not re-encode, fast speed)
ffmpeg -y -i silent.mp4 -i voiceover_volcano.mp3 \
  -c:v copy -c:a aac -b:a 192k -shortest -movflags +faststart \
  output_sub.mp4

rm -f silent.mp4 concat_sub.txt

Subtitle Style:

Font: Arial Unicode MS 42px (white with 3px black stroke)
Position: Bottom center, 100px from bottom edge
Clear on both dark and light backgrounds

Two Output Files:

```
output.mp4
```
— No subtitles (generated by build_video.py)
```
output_sub.mp4
```
— With subtitles (generated in this step)

Step 9: Show Results

Read a few key PNGs (first slide, middle slide, last slide) for user to preview the effect
Inform user of the output directory path
Prompt user to
```
open preview.html
```
to flip through slides in browser
The video already includes subtitles, can be directly published

File Organization

<working directory>/voiceover/
└── <slug>/
    ├── cover.html + cover.png        # Cover (video thumbnail/first frame)
    ├── slide_01.html ~ slide_XX.html
    ├── slide_01.png ~ slide_XX.png
    ├── script_notes.md
    └── preview.html

Common Issues

Font Loading

CSS loads Noto Sans SC + Space Grotesk via Google Fonts CDN. Chrome headless screenshot requires network connection. If fonts fail to load, screenshots will use default fonts (poor effect).

Content Exceeds Canvas

Max 6 lines × 18 characters per slide. If content cannot fit, split into two slides, do not reduce font size.

Theme Switching

Usually use the same theme for the entire slide set. If you want to switch themes on certain slides, just change the body class of those slides' HTML.

Presentation Mode

High information density, formal PPT with cards, decorative layers, and brand markers. 5-8 pages.

Design System

Each slide HTML references two CSS files:

html

<link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system.css">
<link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system-slides.css">

File Organization

<working directory>/slides/
└── YYYYMMDD-<slug>/
    ├── slide_01.html ~ slide_XX.html
    ├── slide_01.png ~ slide_XX.png
    ├── script_notes.md
    └── preview.html

6 Theme Colors

Theme	CSS Class	Applicable Content
Obsidian (Dark)	`.theme-obsidian`	Technical depth, product launches
Paper (Light)	`.theme-paper`	Checklists, summaries, light content
Signal (Accent)	`.theme-signal`	Major news, milestones
Aurora (Aurora)	`.theme-aurora`	AI cutting-edge, large models
Mist (Mist)	`.theme-mist`	Tool recommendations, popular science
Glacier (Glacier)	`.theme-glacier`	Product reviews, app recommendations

4 Slide Types

Type	Layout	CSS Class	Reference Template
title	Centered large title + subtitle + tags	`.slide-title`	`templates/slides/title.html`
content	Left title + right key point cards	`.slide-content`	`templates/slides/content.html`
data	Top title + data card grid	`.slide-data`	`templates/slides/data.html`
ending	Centered summary + CTA + brand	`.slide-ending`	`templates/slides/ending.html`

Template file path prefix:

/Users/lifcc/Desktop/code/work/life/xhh/

Structural Rules

First page must be title, last page must be ending
Freely combine content and data pages in the middle
Each page must have brand marker
```
lif.
```
and decorative layer

Four-layer Visual Structure (Each Page Must Satisfy)

Layer	Function	Implementation
L1 Background Color	Set tone	`.theme-*` class
L2 Decorative Layer	Visual richness	`.deco-noise` + theme decorations
L3 Content Container	Carry information	`.card-*` card components
L4 Text/Icons	Convey information	Theme text color + SVG icons

Decorative layers for each theme:

Theme	L2 Decorative Layer
Obsidian	`deco-noise` + `deco-dots` + `deco-bracket-tl/br`
Paper	`deco-noise` + `deco-hlines` + `deco-top-stripe`
Signal	`deco-noise` + `deco-stripes` + `deco-gradient-bottom`
Aurora	`deco-grain` + `deco-aurora`
Mist	`deco-noise` + `deco-mist-glow` + `deco-soft-dots`
Glacier	`deco-noise` + `deco-shimmer` + `deco-refraction`

Presentation Mode Workflow

Understand Requirements — Topic, materials, theme color preference, number of pages
Topic Research (if no materials) — WebSearch to collect and organize fact lists
Plan Structure — Determine the type and core information of each page
Generate HTML + Screenshot Page by Page — Read reference templates → Replace content → Write and save → Chrome screenshot
Generate Voiceover Script —
```
script_notes.md
```
Generate Previewer —
```
preview.html
```
(same previewer template as voiceover mode)
Show Results — Read PNG previews + inform path

Screenshot command is the same as voiceover mode (

--window-size=1920,1200

+ crop to 1920×1080), refer to Step 5 of voiceover mode.

SVG Icon Library (For Presentation Mode)

html

<svg xmlns="http://www.w3.org/2000/svg" style="display:none">
  <symbol id="icon-bolt" viewBox="0 0 24 24"><path d="M13 2L3 14h9l-1 8 10-12h-9l1-8z"/></symbol>
  <symbol id="icon-code" viewBox="0 0 24 24"><polyline points="16 18 22 12 16 6"/><polyline points="8 6 2 12 8 18"/></symbol>
  <symbol id="icon-chart" viewBox="0 0 24 24"><line x1="18" y1="20" x2="18" y2="10"/><line x1="12" y1="20" x2="12" y2="4"/><line x1="6" y1="20" x2="6" y2="14"/></symbol>
  <symbol id="icon-rocket" viewBox="0 0 24 24"><path d="M4.5 16.5c-1.5 1.26-2 5-2 5s3.74-.5 5-2c.71-.84.7-2.13-.09-2.91a2.18 2.18 0 0 0-2.91-.09z"/><path d="M12 15l-3-3 7.5-7.5A12.71 12.71 0 0 1 22 2c0 2.35-1.1 6.58-4.5 10L15 12"/></symbol>
  <symbol id="icon-shield" viewBox="0 0 24 24"><path d="M12 22s8-4 8-10V5l-8-3-8 3v7c0 6 8 10 8 10z"/></symbol>
  <symbol id="icon-terminal" viewBox="0 0 24 24"><polyline points="4 17 10 11 4 5"/><line x1="12" y1="19" x2="20" y2="19"/></symbol>
  <symbol id="icon-brain" viewBox="0 0 24 24"><path d="M12 2a6 6 0 0 0-6 6c0 2.2 1.2 4.1 3 5.2V20h6v-6.8c1.8-1.1 3-3 3-5.2a6 6 0 0 0-6-6z"/><line x1="9" y1="14" x2="9" y2="20"/><line x1="15" y1="14" x2="15" y2="20"/></symbol>
  <symbol id="icon-users" viewBox="0 0 24 24"><path d="M17 21v-2a4 4 0 0 0-4-4H5a4 4 0 0 0-4 4v2"/><circle cx="9" cy="7" r="4"/><path d="M23 21v-2a4 4 0 0 0-3-3.87"/><path d="M16 3.13a4 4 0 0 1 0 7.75"/></symbol>
  <symbol id="icon-fire" viewBox="0 0 24 24"><path d="M12 23c-4.97 0-9-2.69-9-6 0-4 4-8 4-8s.5 2 2 3c.47-.8 1.5-3 1-6 3.5 2.5 6 6 6 10 1.5-1 2-3.5 2-5 2.5 2.5 3 4.5 3 6 0 3.31-4.03 6-9 6z"/></symbol>
  <symbol id="icon-star" viewBox="0 0 24 24"><polygon points="12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2"/></symbol>
  <symbol id="icon-lightbulb" viewBox="0 0 24 24"><path d="M9 21h6m-6-3h6m-3-18a7 7 0 0 0-4 12.7V17h8v-4.3A7 7 0 0 0 12 0z"/></symbol>
  <symbol id="icon-search" viewBox="0 0 24 24"><circle cx="11" cy="11" r="8"/><line x1="21" y1="21" x2="16.65" y2="16.65"/></symbol>
  <symbol id="icon-download" viewBox="0 0 24 24"><path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"/><polyline points="7 10 12 15 17 10"/><line x1="12" y1="15" x2="12" y2="3"/></symbol>
  <symbol id="icon-clock" viewBox="0 0 24 24"><circle cx="12" cy="12" r="10"/><polyline points="12 6 12 12 16 14"/></symbol>
  <symbol id="icon-link" viewBox="0 0 24 24"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"/><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"/></symbol>
</svg>

Usage:

<svg class="icon icon-sm"><use href="#icon-bolt"/></svg>

slides

NPX Install

Tags

SKILL.md Content (Chinese)

PPT Slide Generator

Voiceover Mode

Content Types (Determines Cover and First 3 Slides Strategy)

"Identity First" Rule for Person-focused Type (Extremely Important)

Design System

Background Themes (13 Types, Auto-selected)

Layouts (2 Types)

Text Hierarchy

6 Semantic Colors (Overlay on vo-main, automatically bold)

HTML Template

Slide Splitting Rules (Core)

First 3 Slides Strategy (By Content Type)

Color Rhythm (Color Distribution Across the Entire Slide Set)

Cover System (Video Thumbnail)

Three-layer Structure of Cover (Rational Hook → Emotional Hook → Accessibility Hook)

Cover Hook Colors

Cover Decorations

Cover HTML Template

Extraction Rules for Emotional Hooks

Workflow

Step 1: Confirm Input + Determine Content Type

Step 2: Draft Script (Skip if script is provided)

Step 3: Split Slides + Mark Colors

Step 3.5: Generate Cover (cover.html)

Step 4: Create Output Directory + Write HTML Page by Page

Step 5: Batch Screenshot

Step 6: Generate Previewer + Voiceover Script

Step 7: Generate Subtitles (Pillow Burn-in Method)

7.1 Generate segments.json

7.2 Generate Subtitle Frame Images

Step 8: Synthesize Video (With Subtitles)

Step 9: Show Results

File Organization

Common Issues

Font Loading

Content Exceeds Canvas

Theme Switching

Presentation Mode

Design System

File Organization

6 Theme Colors

4 Slide Types

Structural Rules

Four-layer Visual Structure (Each Page Must Satisfy)

Presentation Mode Workflow

SVG Icon Library (For Presentation Mode)