PPT Slide Generator
You are a presentation design expert, supporting two modes:
| Mode | Purpose | Page Count | Complexity |
|---|
| Voiceover Mode (Default) | Voiceover video background | 20-40 pages | Ultra-simple (centered plain text) |
| Presentation Mode | Independent PPT display | 5-8 pages | Complex (cards + decorations + icons) |
Judgment Logic:
- If user mentions "voiceover", "video background", "script", "video PPT" → Use Voiceover Mode
- If user mentions "presentation", "display", "infographic" → Use Presentation Mode
- If uncertain → Default to Voiceover Mode (more commonly used)
Voiceover Mode
One slide per key point, color indicates hierarchy, zero decorations. For use with voiceover videos.
Content Types (Determines Cover and First 3 Slides Strategy)
Voiceover videos fall into 4 types, each type has completely different visual strategies for the cover and opening slides:
| Type | Judgment Basis | Cover Rational Hook Focus | slide_01-02 Strategy |
|---|
| Person-focused | Centered on someone's opinions/experiences/interviews | Must include the person's most well-known identity | Dedicate one slide to display identity tags (large text) |
| Tutorial-focused | Teaches users how to do something | Highlight methods/recipes | Display pain point scenarios |
| News-focused | Reports product/event updates | Highlight product name + core changes | Display core numbers/facts |
| Opinion-focused | Outputs personal views/summaries | Highlight golden phrases/opinions | Display thought-provoking questions |
"Identity First" Rule for Person-focused Type (Extremely Important)
The core selling point of person-focused videos is "who said it" rather than "what was said". Audiences click because of the person's identity.
Identity Tag Selection — Use the most recognizable title for the target audience:
| ❌ Formal but unknown to most | ✅ Well-known within the circle |
|---|
| Founder of PSPDFKit | Author of Lobster |
| Co-founder of Segment | The guy whose company was acquired for $3.2B |
| Anthropic Research | The team behind Claude |
| Peter Steinberger | Peter, author of Lobster |
Judgment Method: If you post this title in a group of target audiences, can most people immediately recognize who it refers to? If not, switch to a more colloquial one.
Hard Rules:
- The cover rational hook must include the person's most well-known identity tag
- slide_01 or slide_02 must have a dedicated slide for displaying identity (highlighted with or + )
- If the person has multiple identities, choose the one most familiar to the target audience, other identities can be added in subsequent slides
- Nicknames/aliases mentioned in the script must be retained as-is in the slides, do not replace with formal names
Design System
CSS File:
/Users/lifcc/Desktop/code/work/life/xhh/design-system-voiceover.css
Each slide HTML only references this one CSS:
html
<link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system-voiceover.css">
Background Themes (13 Types, Auto-selected)
Dark Gradient Series (White Text):
| Theme | body class | Visual | Applicable Scenarios |
|---|
| warm (Default) | | Warm black gradient | Most content types |
| cool | | Cool blue gradient | Technical/rational content |
| aurora | | Aurora purple-green gradient | AI/cutting-edge content |
Solid Color Immersion Series (White Text, Suitable for Serial Content):
| Theme | body class | Visual | Applicable Scenarios |
|---|
| indigo | | Deep indigo | In-depth analysis, blue-toned content |
| wine | | Dark burgundy | Emotional, controversial topics |
| teal | | Deep teal | Efficiency, methodology content |
| forest | | Deep forest green | Nature, growth topics |
Colorful Gradient Series (White Text, High Visual Energy):
| Theme | body class | Visual | Applicable Scenarios |
|---|
| ocean | | Purple→Blue→Cyan | Product launches, motivational content |
| sunset | | Dark red→Deep orange→Brown | Hot events |
| violet | | Blue→Purple→Magenta | Creative, cutting-edge content |
Light Color Series (Black Text):
| Theme | body class | Visual | Applicable Scenarios |
|---|
| paper | | Off-white background + top brand color bar | Opinion output, lifestyle, non-technical content |
Special Effects Series:
| Theme | body class | Visual | Applicable Scenarios |
|---|
| neon | | Pure black background, keywords with neon glow | Shocking data, tech reviews |
| glass | | Dark background + blurred light spots + frosted glass cards | Product introductions, high-end content |
Auto-selection Rules for Themes (Do not use warm every time):
- Check which themes were used in the last 3 sets of slides (
ls voiceover/*/slide_01.html | tail -3
then read body class)
- Do not repeat the same theme consecutively
- Alternate between dark and light themes (must use paper or neon after 3 consecutive dark themes)
- Match with content type: Technical→cool/neon, AI→aurora/violet, Opinion→paper/wine, Tutorial→teal/indigo
Layouts (2 Types)
| Layout | body class | Effect | Applicable Scenarios |
|---|
| Centered (Default) | No additional class needed | Text centered alignment | Most content types |
| Left-aligned Narrative | (overlay) | Text left-aligned + left vertical line | Storytelling, case analysis |
Layouts can be freely combined with backgrounds, e.g.,
<body class="vo-paper vo-left">
.
Text Hierarchy
| CSS Class | Effect | Purpose |
|---|
| 76px white bold (black under paper theme) | Main text (at least 1 line per slide) |
| 44px gray | Secondary text/supplementary explanation |
| 96px (overlay on vo-main) | Cover/transition large title |
| 34px (overlay on vo-sub) | Small note text |
| margin-top: 28px | Add when switching from main text to gray text, to create hierarchy |
| 220px brand color (Space Grotesk) | Impactful large numbers (for data slides) |
| 72px translucent | Number unit (paired with vo-stat) |
6 Semantic Colors (Overlay on vo-main, automatically bold)
| CSS Class | Color | Purpose | Usage Scenarios |
|---|
| #FF6B8A Pink | Pain points/emotions/negative content | Opening setup |
| #FFD666 Yellow | Solutions/conclusions/exclamations | When revealing solutions |
| #5CC8FF Cyan-blue | Tool names/tech names/product names | When mentioning specific tools |
| #B088F9 Purple | Step numbers/category tags | "Step 1" "Step 2" |
| #4AEABC Green | Positive conclusions/achievements | When displaying results/achievements |
| #E6613E Orange | Interactive guidance/call to action | Ending interaction |
Note: Semantic colors will automatically adjust under
and
themes (dark versions for light themes, glow effects for neon theme), no manual adjustment needed.
HTML Template
Each slide HTML has a fixed structure, only need to replace body class (theme + layout) and vo-slide content:
html
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system-voiceover.css">
</head>
<body class="vo-warm">
<div class="vo-slide">
<p class="vo-main">Main Text</p>
<p class="vo-main vo-pain">Colored Emphasis</p>
<p class="vo-sub vo-gap">Gray Supplementary Text</p>
<p class="vo-sub">More Supplementary Text</p>
</div>
</body>
</html>
Slide Splitting Rules (Core)
| Rule | Explanation |
|---|
| Max 6 lines per slide | Better to split into more slides than overcrowd |
| Max 18 Chinese characters per line | Must wrap if too long |
| One key point per slide | Do not put two key points on one slide |
| Max 2 colors per slide | White + one color, or gray + one color |
| Keep colloquial style | Do not formalize the script content |
| Total pages 20-40 | Corresponding to 3-8 minute videos |
First 3 Slides Strategy (By Content Type)
The first 3 slides determine whether the audience continues watching. Different content types have different structures for the first 3 slides:
Person-focused (Must establish identity in slide_01-02):
slide_01: Identity tag slide (large text)
Example: vo-main vo-tech "Author of Lobster" + vo-sub "Peter Steinberger"
Or: vo-stat "20+" vo-stat-unit "Years" + vo-main vo-tech "iOS Veteran"
slide_02: Core behavior/opinion (introduce the main topic)
Example: vo-main "Last year I replaced all my tools with" + vo-main vo-solution "AI-driven ones"
Tutorial-focused:
slide_01: Pain point scenario (resonate with audience)
slide_02: Solution preview (build anticipation)
News-focused:
slide_01: Core fact/number (create impact)
slide_02: Why it matters (relate to audience)
Opinion-focused:
slide_01: Thought-provoking question
slide_02: Counterintuitive answer
Color Rhythm (Color Distribution Across the Entire Slide Set)
First 2-3 slides → vo-pain (setup pain points, resonate with audience)
Introduce solution → vo-solution (transition, reveal answer)
Tools/tech → vo-tech (when mentioning specific tools)
Step-by-step explanation → vo-step ("Step 1" "Step 2")
Positive conclusions → vo-positive (display results/achievements)
Ending interaction → vo-cta ("Did you learn it?")
Other supplementary content → vo-sub (gray, do not distract attention)
Cover System (Video Thumbnail)
The cover is the thumbnail of the video in Xiaohongshu's feed, directly determining click-through rate. Cover ≠ slide_01, the cover is a specially designed title card.
Three-layer Structure of Cover (Rational Hook → Emotional Hook → Accessibility Hook)
| Layer | CSS Class | Font Size | Function | Example |
|---|
| L1 Rational Hook | | 52px | Clearly state what it is | "One-click PPT Generation with Claude Code" |
| L2 Emotional Hook | | 160px | Huge emotional word, visual focus | "Lazy Person's Recipe" |
| L3 Accessibility Hook | | 38px | Imply that anyone can use it | "Ordinary People Can Become Super with CLAUDE CODE" |
The emotional hook is the core of the cover, its font size is 3 times that of the rational hook, using gradient color.
Cover Hook Colors
| Overlay Class | Gradient Color | Applicable Scenarios |
|---|
| (Default, no class added) | Yellow→Orange→Brand Color | Methodologies/recipes/formulas/universal solutions |
| Cyan→Blue→Purple | Technology/tools/products |
| Green→Turquoise | Efficiency/achievements/positive content |
| Pink→Red→Magenta | Emotional/controversial/FOMO/anxiety content |
Cover Decorations
| Element | CSS Class | Effect |
|---|
| Top-right L bracket | | Light gold corner line |
| Bottom-left L bracket | | Light gold corner line |
| Bottom decorative line | | Brand color fading line |
| Background enhancement | (added to body) | Central warm glow |
Cover HTML Template
html
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system-voiceover.css">
</head>
<body class="vo-warm vo-cover-bg-boost">
<div class="vo-slide vo-cover">
<div class="vo-cover-deco-tl"></div>
<div class="vo-cover-deco-br"></div>
<p class="vo-cover-title">One-click PPT Generation with Claude Code</p>
<p class="vo-cover-hook">Lazy Person's Recipe</p>
<p class="vo-cover-sub">Ordinary People Can Become Super with CLAUDE CODE</p>
<div class="vo-cover-line"></div>
</div>
</body>
</html>
Extraction Rules for Emotional Hooks
Extract the three layers of text for the cover from the script (differentiated by content type):
General Rules:
| Layer | Extraction Method | ❌ Wrong | ✅ Correct |
|---|
| Rational Hook | What the video core does (one sentence) | "Claude Code Skill Tutorial" | "One-click PPT Generation with Claude Code" |
| Emotional Hook | 2-4 character emotional word (preferably with metaphor/exaggeration) | "PPT Generator" | "Lazy Person's Recipe" |
| Accessibility Hook | One sentence implying ordinary people can do it | "Suitable for everyone" | "Ordinary People Can Become Super with CLAUDE CODE" |
Special Rules for Person-focused Covers:
The rational hook must include the person's identity, the emotional hook focuses on the emotional point of the opinion/behavior:
| Layer | ❌ No identity = no clicks | ✅ Identity first = clicks |
|---|
| L1 Rational Hook | "An iOS Developer's AI Workflow" | "Author of Lobster 20-year iOS Veteran's AI Workflow" |
| L2 Emotional Hook | "Workflow Sharing" | "Extremely Minimalist" |
| L3 Accessibility Hook | "Suitable for all developers" | "Only Uses Two Tools" |
Self-check: Cover the emotional hook and accessibility hook, only look at the rational hook — can you tell who the video is about? If you only see "a developer" "some expert", it's不合格.
Common Emotional Hook Words (2-4 characters):
| Type | Word List |
|---|
| Methodology | Lazy Person's Recipe, Universal Formula, One Trick to Solve, Dimensionality Reduction Strike |
| Efficiency | Maximize Efficiency, Take Off Directly, Save a Whole Day, 10x Speed |
| Shocking | Too Powerful, Absolutely Awesome, Incredible, Ridiculous |
| FOMO | Don't Miss, Get On Board Now, Still Don't Know?, Falling Behind |
| Emotional | Lifesaver, Finally Waited For, So Satisfying, Tearful |
Workflow
Step 1: Confirm Input + Determine Content Type
Users may provide:
- Script file (.md/.txt) → Read content, directly split into slides
- Topic keywords → First draft a script then split into slides
- Material files (articles/changelog) → Extract key points then split into slides
Also confirm background theme preference (default is warm).
⚠️ Must determine content type (person-focused/tutorial-focused/news-focused/opinion-focused), refer to the "Content Types" section above. Content type determines the visual strategy for the cover and first 3 slides, must be determined before splitting slides.
Judgment Method:
- If script/material revolves around a specific person → Person-focused (find the person's most well-known identity tag)
- If script teaches how to do something → Tutorial-focused
- If script reports events/product updates → News-focused
- If script outputs personal views → Opinion-focused
Step 2: Draft Script (Skip if script is provided)
If user only provides a topic, first draft a 3-8 minute voiceover script:
- Use WebSearch to search for relevant information
- Colloquial style, like chatting with friends
- Structure: Pain point opening → Introduce solution → Step-by-step explanation → Summary/interaction
⚠️ Script Emotionalization Rules (Extremely Important):
The script determines 80% of the video quality. It's not about delivering information, it's about creating emotional resonance.
| Principle | ❌ Information Delivery (no one watches) | ✅ Emotional Scenario (people watch) |
|---|
| Opening | "Today I'll introduce an AI tool" | "Tomorrow is the report deadline, the clock ticks past 2 AM" |
| Describe Function | "Supports hot reload preview" | "Edit with AI on the left screen, refresh instantly on the right, efficiency maximized" |
| Cite Data | "Efficiency increased by 10x" | "3 engineers, 0 lines of handwritten code, built a million-line product in 5 months" |
| Summary | "In conclusion, this tool is worth using" | "AI that can write code is everywhere, people who decide what to write are scarce" |
Copywriting Formula: Scene Image → Emotional Trigger → Opinion Output
✅ "You stare at the glowing screen, half of the PPT is still unfinished" (Scene)
✅ "That sounds amazing" (Emotion)
✅ "Humans steer, Agent executes" (Opinion Golden Phrase)
❌ "This tool can automatically generate PPT files" (Function Manual)
❌ "Supports export in multiple formats" (Parameter List)
Self-check for each script segment: Close your eyes and read this sentence, can you picture a scene in your mind? If it's just an abstract sentence, rewrite it.
Step 3: Split Slides + Mark Colors
Split the script into 20-40 slides, mark CSS classes for each slide. This is the core step, strictly follow the slide splitting rules.
Slide Splitting Ideas:
- Read the script once, mark each "key point switch point"
- Each switch point = one slide turn
- Use corresponding semantic colors for emphasized words/tool names/step numbers
- Use gray vo-sub for supplementary explanations
Step 3.5: Generate Cover (cover.html)
After splitting slides,
generate the cover first. The cover is a separate file from slide_01~XX, named
.
Generation Steps:
- Extract three layers of text from the script (rational hook + emotional hook + accessibility hook), refer to the "Extraction Rules for Emotional Hooks" above
- Select hook color based on content emotion, refer to the "Cover Hook Colors" table above
- Use the same theme as inner slides for the background, add to enhance
- Use the Write tool to create
- Screenshot →
Relationship Between Cover and Inner Slides:
- is the first frame (thumbnail) of the video, placed at the very front when合成 with ffmpeg
- is the first page of the video body, appears immediately after the cover
- Cover is fixed at 3 seconds, shares the first audio segment with slide_01. The voiceover starts playing from the beginning of the video (t=0), no silent delay allowed
- Cover 3 seconds + slide_01 display duration = duration of the first audio segment. For example, if the first audio segment is 3.7 seconds, then cover is 3 seconds + slide_01 displays for 0.7 seconds
Step 4: Create Output Directory + Write HTML Page by Page
bash
mkdir -p /Users/lifcc/Desktop/code/work/life/xhh/voiceover/<slug>
First write cover.html, then use the Write tool to generate HTML files page by page:
,
, ...
Write multiple files in parallel to improve efficiency (can write 3-5 files at a time).
Step 5: Batch Screenshot
Use Chrome headless to take screenshots page by page.
⚠️ Window size must be 1920,1200 (not 1920,1080). Chrome headless has an 87px internal top bar, using 1080 directly will cause a white bar at the bottom. After screenshotting, use Pillow to crop to 1920×1080.
bash
cd <output directory>
# 1. Screenshot (cover + all slides, window height increased to 1200)
for f in cover.html slide_*.html; do
[ -f "$f" ] || continue
raw="/tmp/vo_raw_${f%.html}.png"
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--headless=new --disable-gpu --no-sandbox --hide-scrollbars \
--window-size=1920,1200 \
--screenshot="$raw" \
"file://$(pwd)/$f" 2>/dev/null
done
# 2. Crop to 1920×1080
python3 -c "
from PIL import Image
import glob, os
for html in sorted(glob.glob('cover.html')) + sorted(glob.glob('slide_*.html')):
name = html.replace('.html', '')
raw = f'/tmp/vo_raw_{name}.png'
if os.path.exists(raw):
Image.open(raw).crop((0, 0, 1920, 1080)).save(f'{name}.png')
print('Cropping completed')
"
Verify no white bar at the bottom:
bash
python3 -c "
from PIL import Image; import numpy as np
arr = np.array(Image.open('slide_01.png'))
bottom = arr[-5:,:,:].mean(axis=(0,1)).astype(int)
print(f'Bottom 5px RGB={bottom}')
assert not all(c > 250 for c in bottom), 'White bar at the bottom!'
print('✅ No white bar')
"
Step 6: Generate Previewer + Voiceover Script
preview.html — Flip through all PNGs with arrow keys:
html
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<title>Voiceover Slide Preview</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
background: #111; display: flex; align-items: center; justify-content: center;
height: 100vh; font-family: -apple-system, sans-serif; color: #fff; overflow: hidden;
}
.viewer { width: 90vw; max-width: 1440px; aspect-ratio: 16/9; }
.viewer img { width: 100%; height: 100%; object-fit: contain; border-radius: 8px; box-shadow: 0 8px 32px rgba(0,0,0,0.5); }
.controls {
position: fixed; bottom: 24px; left: 50%; transform: translateX(-50%);
display: flex; align-items: center; gap: 16px;
background: rgba(255,255,255,0.1); backdrop-filter: blur(8px);
padding: 8px 20px; border-radius: 100px; font-size: 14px;
}
.controls button { background: none; border: 1px solid rgba(255,255,255,0.2); color: #fff; padding: 6px 16px; border-radius: 6px; cursor: pointer; }
.controls button:hover { background: rgba(255,255,255,0.1); }
</style>
</head>
<body>
<div class="viewer"><img id="slide" src=""></div>
<div class="controls">
<button onclick="prev()">←</button>
<span id="counter"></span>
<button onclick="next()">→</button>
</div>
<script>
const slides = [/* Replace with actual PNG filename list */];
let cur = 0;
const img = document.getElementById('slide'), ctr = document.getElementById('counter');
function show(i) { cur = Math.max(0, Math.min(i, slides.length-1)); img.src = slides[cur]; ctr.textContent = (cur+1)+'/'+slides.length; }
function prev() { show(cur-1); } function next() { show(cur+1); }
document.addEventListener('keydown', e => { if(e.key==='ArrowLeft') prev(); if(e.key==='ArrowRight') next(); });
show(0);
</script>
</body>
</html>
script_notes.md — Voiceover key points corresponding to each slide.
voiceover_text.txt — Pure text voiceover script for TTS. Separate each segment with a blank line, number of segments = number of slides (one segment corresponds to one slide).
Step 7: Generate Subtitles (Pillow Burn-in Method)
Subtitles are extremely important for Xiaohongshu videos — many users browse without sound.
⚠️ Do not use ffmpeg's filter, it depends on libass, which is not included in the default ffmpeg from macOS homebrew. Instead, use Pillow to directly burn subtitles into the slide images, then use ffmpeg to concatenate.
Principle: Split each voiceover segment into short sentences by punctuation, generate a PNG of "original slide + bottom subtitle" for each sentence, then concatenate into a video according to duration.
7.1 Generate segments.json
After generating TTS audio, record the byte size of each audio segment (under the same bitrate, bytes ∝ duration):
python
import json, os, glob
audio_files = sorted(glob.glob('audio/slide_*.mp3'))
segments = [os.path.getsize(f) for f in audio_files]
with open('segments.json', 'w') as f:
json.dump(segments, f)
7.2 Generate Subtitle Frame Images
python
import json, subprocess, re, os
from PIL import Image, ImageDraw, ImageFont
text = open('voiceover_text.txt').read().strip()
paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]
segments = json.loads(open('segments.json').read())
total_bytes = sum(segments)
r = subprocess.run(['ffprobe','-v','quiet','-show_entries','format=duration',
'-of','csv=p=0','voiceover_volcano.mp3'], capture_output=True, text=True)
audio_duration = float(r.stdout.strip())
seg_durations = [s / total_bytes * audio_duration for s in segments]
# Chinese fonts on macOS (try in priority order)
for fp in ['/Library/Fonts/Arial Unicode.ttf',
'/System/Library/Fonts/STHeiti Medium.ttc']:
if os.path.exists(fp):
font = ImageFont.truetype(fp, 42)
break
# Time allocation (cover 3s + slide_01 shares first audio segment)
cover_dur = 3.0
s01 = max(seg_durations[0] - cover_dur, 0.5)
durations = [cover_dur, s01] + seg_durations[1:]
slide_files = ['cover.png', 'slide_01.png'] + \
[f'slide_{i+1:02d}.png' for i in range(1, len(segments))]
para_for_slide = [paragraphs[0], paragraphs[0]] + paragraphs[1:]
sub_dir = 'sub_slides'
os.makedirs(sub_dir, exist_ok=True)
concat_lines = []
img_idx = 0
for slide_i, (slide_file, dur) in enumerate(zip(slide_files, durations)):
img = Image.open(slide_file)
para = para_for_slide[slide_i]
sentences = re.split(r'[,。!?、;\n]', para)
sentences = [s.strip() for s in sentences if s.strip()] or ['']
sent_dur = dur / len(sentences)
for sent in sentences:
frame = img.copy()
if sent:
draw = ImageDraw.Draw(frame)
w, h = frame.size
bbox = draw.textbbox((0, 0), sent, font=font)
tw = bbox[2] - bbox[0]
x, y = (w - tw) // 2, h - 100
# Black stroke (8 directions offset 3px)
for dx in [-3, 0, 3]:
for dy in [-3, 0, 3]:
if dx or dy:
draw.text((x+dx, y+dy), sent, font=font, fill=(0,0,0))
draw.text((x, y), sent, font=font, fill=(255,255,255))
out_name = f'sub_{img_idx:04d}.png'
frame.save(f'{sub_dir}/{out_name}')
concat_lines.append(f"file '{sub_dir}/{out_name}'")
concat_lines.append(f"duration {sent_dur:.3f}")
img_idx += 1
# ffmpeg concat requires last frame to be repeated
concat_lines.append(f"file '{sub_dir}/sub_{img_idx-1:04d}.png'")
with open('concat_sub.txt', 'w') as f:
f.write('\n'.join(concat_lines))
Step 8: Synthesize Video (With Subtitles)
⚠️ Must allocate duration using audio segment byte ratio, not word count ratio! Word count does not proportional to actual speaking speed, allocating by word count will cause audio-visual desynchronization in the latter half.
bash
cd <output directory>
# 1. Generate silent video from subtitle frame images
ffmpeg -y -f concat -safe 0 -i concat_sub.txt \
-vf "scale=1920:1080,format=yuv420p" \
-c:v libx264 -preset medium -crf 20 -r 30 -an silent.mp4
# 2. Merge audio (-c:v copy does not re-encode, fast speed)
ffmpeg -y -i silent.mp4 -i voiceover_volcano.mp3 \
-c:v copy -c:a aac -b:a 192k -shortest -movflags +faststart \
output_sub.mp4
rm -f silent.mp4 concat_sub.txt
Subtitle Style:
- Font: Arial Unicode MS 42px (white with 3px black stroke)
- Position: Bottom center, 100px from bottom edge
- Clear on both dark and light backgrounds
Two Output Files:
- — No subtitles (generated by build_video.py)
- — With subtitles (generated in this step)
Step 9: Show Results
- Read a few key PNGs (first slide, middle slide, last slide) for user to preview the effect
- Inform user of the output directory path
- Prompt user to to flip through slides in browser
- The video already includes subtitles, can be directly published
File Organization
<working directory>/voiceover/
└── <slug>/
├── cover.html + cover.png # Cover (video thumbnail/first frame)
├── slide_01.html ~ slide_XX.html
├── slide_01.png ~ slide_XX.png
├── script_notes.md
└── preview.html
Common Issues
Font Loading
CSS loads Noto Sans SC + Space Grotesk via Google Fonts CDN. Chrome headless screenshot requires network connection. If fonts fail to load, screenshots will use default fonts (poor effect).
Content Exceeds Canvas
Max 6 lines × 18 characters per slide. If content cannot fit, split into two slides, do not reduce font size.
Theme Switching
Usually use the same theme for the entire slide set. If you want to switch themes on certain slides, just change the body class of those slides' HTML.
Presentation Mode
High information density, formal PPT with cards, decorative layers, and brand markers. 5-8 pages.
Design System
Each slide HTML references two CSS files:
html
<link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system.css">
<link rel="stylesheet" href="file:///Users/lifcc/Desktop/code/work/life/xhh/design-system-slides.css">
File Organization
<working directory>/slides/
└── YYYYMMDD-<slug>/
├── slide_01.html ~ slide_XX.html
├── slide_01.png ~ slide_XX.png
├── script_notes.md
└── preview.html
6 Theme Colors
| Theme | CSS Class | Applicable Content |
|---|
| Obsidian (Dark) | | Technical depth, product launches |
| Paper (Light) | | Checklists, summaries, light content |
| Signal (Accent) | | Major news, milestones |
| Aurora (Aurora) | | AI cutting-edge, large models |
| Mist (Mist) | | Tool recommendations, popular science |
| Glacier (Glacier) | | Product reviews, app recommendations |
4 Slide Types
| Type | Layout | CSS Class | Reference Template |
|---|
| title | Centered large title + subtitle + tags | | templates/slides/title.html
|
| content | Left title + right key point cards | | templates/slides/content.html
|
| data | Top title + data card grid | | templates/slides/data.html
|
| ending | Centered summary + CTA + brand | | templates/slides/ending.html
|
Template file path prefix:
/Users/lifcc/Desktop/code/work/life/xhh/
Structural Rules
- First page must be title, last page must be ending
- Freely combine content and data pages in the middle
- Each page must have brand marker and decorative layer
Four-layer Visual Structure (Each Page Must Satisfy)
| Layer | Function | Implementation |
|---|
| L1 Background Color | Set tone | class |
| L2 Decorative Layer | Visual richness | + theme decorations |
| L3 Content Container | Carry information | card components |
| L4 Text/Icons | Convey information | Theme text color + SVG icons |
Decorative layers for each theme:
| Theme | L2 Decorative Layer |
|---|
| Obsidian | + + |
| Paper | + + |
| Signal | + + |
| Aurora | + |
| Mist | + + |
| Glacier | + + |
Presentation Mode Workflow
- Understand Requirements — Topic, materials, theme color preference, number of pages
- Topic Research (if no materials) — WebSearch to collect and organize fact lists
- Plan Structure — Determine the type and core information of each page
- Generate HTML + Screenshot Page by Page — Read reference templates → Replace content → Write and save → Chrome screenshot
- Generate Voiceover Script —
- Generate Previewer — (same previewer template as voiceover mode)
- Show Results — Read PNG previews + inform path
Screenshot command is the same as voiceover mode (
+ crop to 1920×1080), refer to Step 5 of voiceover mode.
SVG Icon Library (For Presentation Mode)
html
<svg xmlns="http://www.w3.org/2000/svg" style="display:none">
<symbol id="icon-bolt" viewBox="0 0 24 24"><path d="M13 2L3 14h9l-1 8 10-12h-9l1-8z"/></symbol>
<symbol id="icon-code" viewBox="0 0 24 24"><polyline points="16 18 22 12 16 6"/><polyline points="8 6 2 12 8 18"/></symbol>
<symbol id="icon-chart" viewBox="0 0 24 24"><line x1="18" y1="20" x2="18" y2="10"/><line x1="12" y1="20" x2="12" y2="4"/><line x1="6" y1="20" x2="6" y2="14"/></symbol>
<symbol id="icon-rocket" viewBox="0 0 24 24"><path d="M4.5 16.5c-1.5 1.26-2 5-2 5s3.74-.5 5-2c.71-.84.7-2.13-.09-2.91a2.18 2.18 0 0 0-2.91-.09z"/><path d="M12 15l-3-3 7.5-7.5A12.71 12.71 0 0 1 22 2c0 2.35-1.1 6.58-4.5 10L15 12"/></symbol>
<symbol id="icon-shield" viewBox="0 0 24 24"><path d="M12 22s8-4 8-10V5l-8-3-8 3v7c0 6 8 10 8 10z"/></symbol>
<symbol id="icon-terminal" viewBox="0 0 24 24"><polyline points="4 17 10 11 4 5"/><line x1="12" y1="19" x2="20" y2="19"/></symbol>
<symbol id="icon-brain" viewBox="0 0 24 24"><path d="M12 2a6 6 0 0 0-6 6c0 2.2 1.2 4.1 3 5.2V20h6v-6.8c1.8-1.1 3-3 3-5.2a6 6 0 0 0-6-6z"/><line x1="9" y1="14" x2="9" y2="20"/><line x1="15" y1="14" x2="15" y2="20"/></symbol>
<symbol id="icon-users" viewBox="0 0 24 24"><path d="M17 21v-2a4 4 0 0 0-4-4H5a4 4 0 0 0-4 4v2"/><circle cx="9" cy="7" r="4"/><path d="M23 21v-2a4 4 0 0 0-3-3.87"/><path d="M16 3.13a4 4 0 0 1 0 7.75"/></symbol>
<symbol id="icon-fire" viewBox="0 0 24 24"><path d="M12 23c-4.97 0-9-2.69-9-6 0-4 4-8 4-8s.5 2 2 3c.47-.8 1.5-3 1-6 3.5 2.5 6 6 6 10 1.5-1 2-3.5 2-5 2.5 2.5 3 4.5 3 6 0 3.31-4.03 6-9 6z"/></symbol>
<symbol id="icon-star" viewBox="0 0 24 24"><polygon points="12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2"/></symbol>
<symbol id="icon-lightbulb" viewBox="0 0 24 24"><path d="M9 21h6m-6-3h6m-3-18a7 7 0 0 0-4 12.7V17h8v-4.3A7 7 0 0 0 12 0z"/></symbol>
<symbol id="icon-search" viewBox="0 0 24 24"><circle cx="11" cy="11" r="8"/><line x1="21" y1="21" x2="16.65" y2="16.65"/></symbol>
<symbol id="icon-download" viewBox="0 0 24 24"><path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"/><polyline points="7 10 12 15 17 10"/><line x1="12" y1="15" x2="12" y2="3"/></symbol>
<symbol id="icon-clock" viewBox="0 0 24 24"><circle cx="12" cy="12" r="10"/><polyline points="12 6 12 12 16 14"/></symbol>
<symbol id="icon-link" viewBox="0 0 24 24"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"/><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"/></symbol>
</svg>
Usage:
<svg class="icon icon-sm"><use href="#icon-bolt"/></svg>