video
Original:🇺🇸 English
Translated
Use when the user wants AI-generated short-form video — knowledge cards (picstory / 小红书 / TikTok / Reels), narrated explainers, presentations, AI clips, or slides — covering picstory, present, slides, explain, and image generation. For PaperSlide / paper-textured article-to-card reels, use voxflow:paper-slide.
3installs
Sourcevoxflowstudio/skills
Added on
NPX Install
npx skill4agent add voxflowstudio/skills videoTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →VoxFlow Video Skill
Generate short-form videos with AI: LLM writes the script, AI draws cards or scenes, TTS narrates, FFmpeg / Remotion renders the final MP4.
For PaperSlide / paper-textured article-to-card reels, switch to .
voxflow:paper-slideFive entry points — pick by what the user wants:
| Command | Output | Use when |
|---|---|---|
| Vertical/landscape MP4 with hand-drawn cards or cinematic scenes | "知识卡片视频", 小红书 / Twitter edu, sketchnote tutorials |
| 1080×1920 narrated card video, 5 visual schemes | Pitch decks, explainer reels, branded short-form |
| MP4 explainer with title / bullets / summary scenes | "What is X?" tutorials, course intros |
| Self-contained HTML deck with embedded TTS audio | Product launches, talks, share-as-link |
| Single PNG | One-off illustrations / thumbnails (Hunyuan TextToImage) |
Prerequisites
- and
npm install -g voxflowvoxflow login - installed (
ffmpeg/brew install ffmpeg) — required for MP4 rendersudo apt install ffmpeg - For /
present(Remotion-backed): the local plugin install includesexplain. Ifremotion-cards/says "Remotion not ready", runpresentinside the bundlednpm installdirectory. Forremotion-cards/, you can skip local Remotion withexplain.--cloud
🎴 picstory — knowledge-card video
LLM writes a structured script, AI draws one card per scene, TTS narrates, FFmpeg assembles. Best for 小红书 / Twitter edu / TikTok.
Quick start
bash
# Default: Chinese, sketchnote style, portrait, 5 scenes
voxflow picstory --topic "AI Agent 入门指南"
# 2-scene quick test (no full video render)
voxflow picstory --topic "AI 入门" --scenes 2 --image-only
# English landscape video
voxflow picstory --topic "How React Hooks Work" --language en --ratio landscape --style photoOutput: + (script).
picstory-<timestamp>.mp4.jsonCard-type styles (structured heading + key points)
| Look | Best for |
|---|---|---|
| Colorful hand-drawn bullet journal | Knowledge sharing, tutorials |
| Cyberpunk dark with neon glow | Tech, startup |
| Soft 3D clay on pastel gradients | 小红书 lifestyle |
| White chalk on dark green | Science, academic |
Scene-type styles (free-form illustration)
| Look | Best for |
|---|---|---|
| Cinematic / photo-real | Storytelling, travel |
| Japanese manga ink linework | Drama, step-by-step guides |
| 1940s broadsheet | History, factual stories |
Ratios
| Pixels | Platform |
|---|---|---|
| 1080×1920 | 小红书, TikTok, Reels, 抖音 |
| 1920×1080 | YouTube, B站 |
| 1080×1080 | Instagram, Twitter |
Script-model presets (--script-model
)
--script-model| Preset | Provider | Strength |
|---|---|---|
| omitted | server config | Balanced default (gpt-4o-mini) |
| OpenRouter | Multilingual, good Chinese |
| DeepSeek | Cheapest, excellent Chinese |
| 腾讯混元 | Chinese-native |
| Moonshot | Chinese long context |
Server enforces an allowlist — only the preset names above work; arbitrary model IDs are rejected.
Image quality (--quality
)
--qualityImage generation is the only meaningful cost (~$0.005-0.08 per image). LLM script (~2K tokens) is negligible.
| Provider | Strength | 5-image cost |
|---|---|---|---|
| OpenRouter Gemini Flash | Cheapest, balanced | ~$0.025 |
| OpenRouter Gemini Pro | Higher detail | mid |
| OpenRouter gpt-5.4-image-2 | Best overall, ~16× cost | ~$0.40 |
| Aiberm Gemini Flash | Cheap Aiberm tier | low |
| Aiberm Gemini Pro | Strongest Chinese text rendering — best for 小红书 cards with Chinese headers | mid |
Use for iteration; when cards must contain accurate Chinese characters; for hero exports.
fasthd-aibermultraFull options
| Flag | Default | Description |
|---|---|---|
| required (or | Story topic |
| — | Paste full article instead of a topic |
| | See styles above |
| | |
| | |
| | 2–10. Use |
| server default | See presets above |
| | See table above |
| default | TTS voice from |
| | TTS speed 0.5–2.0 |
| — | Background music (mp3/wav) mixed under narration |
| | BGM volume 0–1 |
| | Per-scene fade in/out seconds ( |
| false | Save images + audio without final video render |
| — | Directory for all outputs |
| auto | Final MP4 path |
Quota cost
| Operation | Quota |
|---|---|
| LLM script | 100 |
| TTS / scene | 100 |
| Image / scene | 500 |
| 2-scene test | ~1,300 |
| 5-scene full | ~3,100 |
Free tier (10K/month) ≈ 3 full picstory videos.
Pipeline
Topic / Text
├─[1] LLM script → { title, scenes: [{ heading, keyPoints, narration }] }
├─[2] TTS per scene → PCM audio
├─[3] Image per scene (parallel) → JPEG via /api/image/generate
└─[4] FFmpeg: image + WAV → Ken Burns zoompan MP4 → concat → +BGM → final.mp4Troubleshooting
| Problem | Fix |
|---|---|
| |
| Retry; if stuck on |
| Render too slow | Ken Burns is CPU-heavy. Use |
| Use a |
| Chinese text mangled in cards | Switch to |
📑 present — narrated card video (Remotion local)
Text or URL → LLM cards → TTS → Remotion render. 1080×1920, 5 visual schemes.
bash
voxflow present --text "Claude Code 是一个 AI 编程工具" --style aurora
voxflow present --url https://example.com/article --style noir
voxflow present --text "2025 AI 芯片格局" --web-search --style neon
voxflow present --cards pre-generated.json --no-audio| Flag | Default | Notes |
|---|---|---|
| one required | input source |
| | |
| | TTS voice |
| | 0.5–2.0 |
| false | Silent video only |
| false | Augment LLM with up-to-date web facts |
| | |
Ifreports "Remotion not ready", runpresentinside the bundlednpm installdirectory to set up the local renderer.remotion-cards/
🧠 explain — AI explainer video
Title / bullets / summary scene flow. Best for "What is X?" tutorials.
bash
voxflow explain --topic "What is React?"
voxflow explain --topic demo --output demo.mp4 # built-in demo (no API call)
voxflow explain --topic "区块链入门" --style chalkboard --voice v-male-Bk7vD3xP
voxflow explain --topic "Machine Learning" --audio-only
voxflow explain --topic "AI Agent 入门" --cloud # render on server| Flag | Default | Notes |
|---|---|---|
| required | Use |
| | |
| | |
| | |
| | 3–12 |
| false | Skip render, output WAV only |
| false | Use cloud Remotion instead of local |
| |
📊 slides — HTML presentation with TTS
Generates a self-contained HTML deck with embedded base64 audio per slide. Open in any browser, no server needed.
bash
voxflow slides "AI in Healthcare"
voxflow slides "Q4 Revenue Report" --template report --theme paper
voxflow slides "React Tutorial" --template tutorial --model balanced
voxflow slides "Startup Pitch" --template pitch --theme ocean --no-audio| Flag | Default | Notes |
|---|---|---|
| required | Topic |
| | |
| | |
| | |
| | |
| false | Skip TTS, slides only |
| |
10 layouts: , , , , , , , , , . Templates auto-pick a layout sequence.
herotitle-bulletstwo-columnthree-cardsimage-leftimage-rightquotetimelinestatssection🖼 image — single Hunyuan illustration
Synchronous text → image (PNG). Useful for thumbnails or one-off art.
bash
voxflow image "a sleeping cat in a sunlit window" --resolution 1024:1024 -o cat.pngResolutions: , , , , , , , , , .
768:768768:10241024:7681024:1024720:12801280:720768:12801280:7681080:19201920:1080Prompt max 1000 chars. Output: local PNG + COS URL.
Pick-the-right-tool checklist
"小红书风格知识卡片" → picstory
"AI 短视频 + caption" → picstory --style sketchnote
"explainer / What is X?" → explain
"branded short with my text" → present
"already have cards/script" → present --cards
"shareable HTML deck w/ audio" → slides
"single illustration" → imageRules
- Search voices with before passing
voxflow voices. Never guess IDs.--voice - Check quota before video calls (): picstory ≈ 3K, present/explain ≈ 500–2K.
voxflow status - Test cheap first: validates the script before paying for full render.
picstory --scenes 2 --image-only - After render finishes, auto-play: (macOS).
open output.mp4