image-to-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseImage to Video Skill
图片转视频Skill
Operator Context
操作环境
This skill operates as an operator for CLI-based video creation, configuring Claude's behavior for deterministic FFmpeg script execution. It implements the Sequential Pipeline architectural pattern -- Validate, Prepare, Encode, Verify -- with Domain Intelligence embedded in FFmpeg filter selection and resolution matching.
本Skill作为基于CLI的视频制作操作器,配置Claude的行为以实现确定性FFmpeg脚本执行。它采用顺序流水线架构模式——验证、准备、编码、校验——并在FFmpeg滤镜选择和分辨率匹配中嵌入了领域智能。
Hardcoded Behaviors (Always Apply)
硬编码行为(始终适用)
- CLAUDE.md Compliance: Read and follow repository CLAUDE.md before creating video
- Over-Engineering Prevention: Only implement what is directly requested. No extra visualizations, no format conversions beyond MP4
- FFmpeg Validation: Always verify FFmpeg is installed before attempting video creation
- Input Validation: Check that both image and audio files exist before processing
- Absolute Paths Only: Always use absolute paths for image, audio, and output arguments
- CLAUDE.md合规性:创建视频前请阅读并遵循仓库中的CLAUDE.md
- 避免过度设计:仅实现用户直接请求的功能。不添加额外可视化效果,仅支持MP4格式转换
- FFmpeg验证:尝试创建视频前始终验证FFmpeg是否已安装
- 输入验证:处理前检查图片和音频文件是否均存在
- 仅使用绝对路径:图片、音频和输出参数始终使用绝对路径
Default Behaviors (ON unless disabled)
默认行为(默认开启,可关闭)
- Resolution Default: Use 1080p (1920x1080) unless user specifies otherwise
- Static Mode: No visualization overlay unless user requests one
- AAC Audio: Encode audio as 192k AAC for broad compatibility
- H.264 Video: Encode with libx264 preset medium, CRF 23, yuv420p pixel format
- Output Verification: Run ffprobe on output and report file size after creation
- 默认分辨率:除非用户指定,否则使用1080p(1920x1080)
- 静态模式:除非用户请求,否则不添加可视化叠加层
- AAC音频:将音频编码为192k AAC以实现广泛兼容性
- H.264视频:使用libx264预设medium、CRF 23、yuv420p像素格式进行编码
- 输出校验:创建完成后运行ffprobe检查输出文件并报告文件大小
Optional Behaviors (OFF unless enabled)
可选行为(默认关闭,需开启)
- Waveform Visualization: Neon waveform overlay with
--visualization waveform - Spectrum Visualization: Scrolling frequency spectrum with
--visualization spectrum - CQT Visualization: Piano-roll style bars with
--visualization cqt - Bars Visualization: Frequency bar graph with
--visualization bars - Custom Resolution: Override with preset (720p, square, vertical)
--resolution - Workspace Mode: Batch process paired files with
--process-workspace
- 波形可视化:使用添加霓虹波形叠加层
--visualization waveform - 频谱可视化:使用添加滚动频率频谱
--visualization spectrum - CQT可视化:使用添加钢琴卷帘风格的条形图
--visualization cqt - 条形图可视化:使用添加频率条形图
--visualization bars - 自定义分辨率:使用预设覆盖默认设置(720p、方形、竖屏)
--resolution - 工作区模式:使用批量处理配对文件
--process-workspace
What This Skill CAN Do
本Skill可实现的功能
- Combine a static image with audio to produce an MP4 video
- Scale images to target resolution while preserving aspect ratio
- Add audio visualization overlays (waveform, spectrum, cqt, bars)
- Support multiple resolution presets (1080p, 720p, square, vertical)
- Batch process matching image+audio pairs from workspace directory
- Validate FFmpeg availability and report actionable install instructions
- 将静态图片与音频结合生成MP4视频
- 按目标分辨率缩放图片并保留宽高比
- 添加音频可视化叠加层(波形、频谱、CQT、条形图)
- 支持多种分辨率预设(1080p、720p、方形、竖屏)
- 批量处理工作区目录中的所有配对图片+音频文件
- 验证FFmpeg可用性并提供可操作的安装说明
What This Skill CANNOT Do
本Skill不可实现的功能
- Generate images (use for that)
gemini-image-generator - Edit existing videos or trim/split audio
- Stream live video or produce non-MP4 formats
- Add text overlays, captions, or transitions
- Work without FFmpeg installed on the system
- 生成图片(请使用)
gemini-image-generator - 编辑现有视频或修剪/分割音频
- 直播视频或生成非MP4格式的文件
- 添加文本叠加层、字幕或转场效果
- 在未安装FFmpeg的系统上运行
Instructions
操作步骤
Phase 1: VALIDATE
阶段1:验证
Goal: Confirm all prerequisites before attempting video creation.
Step 1: Check FFmpeg installation
bash
ffmpeg -versionIf FFmpeg is not installed, provide platform-specific install instructions and stop.
Step 2: Verify input files exist
bash
ls -la /absolute/path/to/image.png /absolute/path/to/audio.mp3Confirm both files exist and have non-zero size. Supported formats:
- Images: PNG, JPG, JPEG, GIF, WEBP, BMP
- Audio: MP3, WAV, M4A, OGG, FLAC
Step 3: Determine parameters
Resolve resolution preset and visualization mode from user request. If the user did not specify, use defaults (1080p, static).
| Preset | Dimensions | Platform |
|---|---|---|
| 1920x1080 | YouTube HD (default) |
| 1280x720 | Standard HD, smaller files |
| 1080x1080 | Instagram, social media |
| 1080x1920 | Stories, Reels, TikTok |
Gate: FFmpeg installed, both input files exist, parameters resolved. Proceed only when gate passes.
目标:在尝试创建视频前确认所有先决条件。
步骤1:检查FFmpeg安装情况
bash
ffmpeg -version如果未安装FFmpeg,请提供针对不同平台的安装说明并停止操作。
步骤2:验证输入文件是否存在
bash
ls -la /absolute/path/to/image.png /absolute/path/to/audio.mp3确认两个文件均存在且大小非零。支持的格式:
- 图片:PNG、JPG、JPEG、GIF、WEBP、BMP
- 音频:MP3、WAV、M4A、OGG、FLAC
步骤3:确定参数
根据用户请求确定分辨率预设和可视化模式。如果用户未指定,则使用默认值(1080p、静态)。
| 预设 | 尺寸 | 适用平台 |
|---|---|---|
| 1920x1080 | YouTube高清(默认) |
| 1280x720 | 标准高清,文件更小 |
| 1080x1080 | Instagram、社交媒体 |
| 1080x1920 | 快拍、Reels、TikTok |
准入条件:已安装FFmpeg、两个输入文件均存在、参数已确定。仅当所有条件满足时才可继续。
Phase 2: PREPARE
阶段2:准备
Goal: Set up output path and confirm no conflicts.
Step 1: Determine output path
Use the path provided by the user. If none given, derive from the audio filename:
/same/directory/as/audio/filename.mp4Step 2: Ensure output directory exists
The script creates parent directories automatically. Verify the target directory is writable.
Gate: Output path determined, directory accessible. Proceed only when gate passes.
目标:设置输出路径并确认无冲突。
步骤1:确定输出路径
使用用户提供的路径。如果未提供,则从音频文件名派生:
/same/directory/as/audio/filename.mp4步骤2:确保输出目录存在
脚本会自动创建父目录。验证目标目录可写入。
准入条件:已确定输出路径、目录可访问。仅当所有条件满足时才可继续。
Phase 3: ENCODE
阶段3:编码
Goal: Execute FFmpeg to produce the video.
Step 1: Run the script
bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
--image /absolute/path/to/image.png \
--audio /absolute/path/to/audio.mp3 \
--output /absolute/path/to/output.mp4 \
--resolution 1080p \
--visualization staticFor workspace batch mode (processes all matched pairs in ):
workspace/input/bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
--process-workspace \
--visualization waveformStep 2: Monitor output
The script prints progress including input paths, resolution, visualization mode, and duration. Watch for ERROR lines in output.
Gate: Script exits with code 0. Proceed only when gate passes.
目标:执行FFmpeg生成视频。
步骤1:运行脚本
bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
--image /absolute/path/to/image.png \
--audio /absolute/path/to/audio.mp3 \
--output /absolute/path/to/output.mp4 \
--resolution 1080p \
--visualization static对于工作区批量模式(处理中的所有配对文件):
workspace/input/bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
--process-workspace \
--visualization waveform步骤2:监控输出
脚本会打印进度信息,包括输入路径、分辨率、可视化模式和时长。注意观察输出中的ERROR行。
准入条件:脚本以代码0退出。仅当条件满足时才可继续。
Phase 4: VERIFY
阶段4:校验
Goal: Confirm the output video is valid and report results.
Step 1: Check file exists and has reasonable size
bash
ls -la /absolute/path/to/output.mp4Step 2: Probe video metadata
bash
ffprobe -v error -show_entries format=duration,size -show_entries stream=codec_name,width,height \
-of default=noprint_wrappers=1 /absolute/path/to/output.mp4Confirm video duration matches audio duration (within 1 second tolerance).
Step 3: Report to user
Provide: output file path, file size, duration, resolution, and visualization mode used.
Gate: Output file exists, duration matches audio, metadata is valid. Task complete.
目标:确认输出视频有效并报告结果。
步骤1:检查文件是否存在且大小合理
bash
ls -la /absolute/path/to/output.mp4步骤2:探测视频元数据
bash
ffprobe -v error -show_entries format=duration,size -show_entries stream=codec_name,width,height \
-of default=noprint_wrappers=1 /absolute/path/to/output.mp4确认视频时长与音频时长匹配(误差在1秒以内)。
步骤3:向用户报告
提供:输出文件路径、文件大小、时长、分辨率和使用的可视化模式。
准入条件:输出文件存在、时长与音频匹配、元数据有效。任务完成。
Error Handling
错误处理
Error: "FFmpeg is not installed or not in PATH"
错误:"FFmpeg is not installed or not in PATH"
Cause: FFmpeg binary not found on system
Solution:
- Install via package manager: (macOS),
brew install ffmpeg(Ubuntu)sudo apt install ffmpeg - Verify with after install
ffmpeg -version - Ensure FFmpeg is in system PATH
原因:系统中未找到FFmpeg二进制文件
解决方案:
- 通过包管理器安装:(macOS)、
brew install ffmpeg(Ubuntu)sudo apt install ffmpeg - 安装后使用验证
ffmpeg -version - 确保FFmpeg在系统PATH中
Error: "Image file not found" or "Audio file not found"
错误:"Image file not found"或"Audio file not found"
Cause: Path is incorrect, relative, or file does not exist
Solution:
- Verify the path is absolute, not relative
- Check file permissions with
ls -la - Confirm the file extension matches a supported format
原因:路径不正确、使用相对路径或文件不存在
解决方案:
- 验证路径为绝对路径而非相对路径
- 使用检查文件权限
ls -la - 确认文件扩展名属于支持的格式
Error: "FFmpeg failed" with filter errors
错误:"FFmpeg failed"并伴随滤镜错误
Cause: FFmpeg build lacks filter support (showwaves, showspectrum, showcqt)
Solution:
- Install the full FFmpeg build, not a minimal variant
- On Ubuntu: (full package)
sudo apt install ffmpeg - Fall back to which requires no special filters
--visualization static
原因:FFmpeg构建缺少滤镜支持(showwaves、showspectrum、showcqt)
解决方案:
- 安装完整的FFmpeg构建版本,而非精简版
- 在Ubuntu上:(完整包)
sudo apt install ffmpeg - 回退到模式,该模式不需要特殊滤镜
--visualization static
Error: "Could not determine audio duration"
错误:"Could not determine audio duration"
Cause: Audio file is corrupted or uses an unsupported container format
Solution:
- Test the audio independently:
ffprobe /path/to/audio.mp3 - Convert to a known format:
ffmpeg -i input.audio -acodec pcm_s16le output.wav - Re-run with the converted file
原因:音频文件损坏或使用了不支持的容器格式
解决方案:
- 独立测试音频文件:
ffprobe /path/to/audio.mp3 - 转换为已知支持的格式:
ffmpeg -i input.audio -acodec pcm_s16le output.wav - 使用转换后的文件重新运行脚本
Anti-Patterns
反模式
Anti-Pattern 1: Using Relative Paths
反模式1:使用相对路径
What it looks like:
Why wrong: The script may execute from a different working directory, breaking all paths silently.
Do instead: Always use absolute paths for every argument.
python3 image_to_video.py -i ../cover.png -a song.mp3表现:
错误原因:脚本可能在不同的工作目录执行,导致所有路径无声息地失效。
正确做法:始终为所有参数使用绝对路径。
python3 image_to_video.py -i ../cover.png -a song.mp3Anti-Pattern 2: Skipping FFmpeg Verification
反模式2:跳过FFmpeg验证
What it looks like: Running the script directly without checking first.
Why wrong: Produces confusing subprocess errors instead of clear install instructions.
Do instead: Complete Phase 1 validation before any encoding attempt.
ffmpeg -version表现:未先检查就直接运行脚本。
错误原因:会产生令人困惑的子进程错误,而非清晰的安装说明。
正确做法:在尝试任何编码操作前完成阶段1的验证。
ffmpeg -versionAnti-Pattern 3: Wrong Resolution for Target Platform
反模式3:为目标平台选择错误的分辨率
What it looks like: Using 1080p landscape for TikTok, or vertical for YouTube.
Why wrong: Content gets cropped or displays with large black bars on the target platform.
Do instead: Ask the user what platform the video targets, then select the matching preset.
表现:为TikTok使用1080p横屏分辨率,或为YouTube使用竖屏分辨率。
错误原因:内容会在目标平台上被裁剪或显示大黑边。
正确做法:询问用户视频的目标平台,然后选择匹配的预设。
Anti-Pattern 4: Skipping Output Verification
反模式4:跳过输出校验
What it looks like: Reporting success based on script exit code alone without probing the output.
Why wrong: FFmpeg can exit 0 but produce a corrupt or zero-duration file.
Do instead: Complete Phase 4 -- probe the output, confirm duration matches audio.
表现:仅根据脚本退出代码就报告成功,而不探测输出文件。
错误原因:FFmpeg可能以代码0退出,但生成的文件可能损坏或时长为零。
正确做法:完成阶段4——探测输出文件,确认时长与音频匹配。
References
参考资料
This skill uses these shared patterns:
- Anti-Rationalization - Prevents shortcut rationalizations
- Verification Checklist - Pre-completion checks
本Skill使用以下共享模式:
- 反合理化 - 防止捷径合理化
- 校验清单 - 完成前检查
Domain-Specific Anti-Rationalization
领域特定反合理化
| Rationalization | Why It Is Wrong | Required Action |
|---|---|---|
| "FFmpeg is always installed" | Many systems lack it or have minimal builds | Run |
| "The script handles everything" | Script can fail silently with bad inputs | Validate inputs in Phase 1 |
| "File size looks right" | Size alone does not prove video integrity | Probe with ffprobe, check duration |
| "Static mode is fine" | User may have requested visualization | Re-read the request before defaulting |
| 合理化借口 | 错误原因 | 要求操作 |
|---|---|---|
| "FFmpeg肯定已经安装了" | 许多系统没有安装FFmpeg或仅安装了精简版本 | 每次运行 |
| "脚本会处理所有事情" | 脚本在输入错误时可能无声息地失败 | 在阶段1验证输入 |
| "文件大小看起来正常" | 仅靠大小无法证明视频完整性 | 使用ffprobe探测文件,检查时长 |
| "静态模式就够了" | 用户可能请求了可视化效果 | 采用默认设置前重新阅读用户请求 |
Reference Files
参考文件
- : FFmpeg filter documentation for visualization modes
${CLAUDE_SKILL_DIR}/references/ffmpeg-filters.md - : Python CLI script (exit codes: 0=success, 1=no FFmpeg, 2=encode failed, 3=missing args)
${CLAUDE_SKILL_DIR}/scripts/image_to_video.py
- :可视化模式的FFmpeg滤镜文档
${CLAUDE_SKILL_DIR}/references/ffmpeg-filters.md - :Python CLI脚本(退出代码:0=成功,1=未安装FFmpeg,2=编码失败,3=缺少参数)
${CLAUDE_SKILL_DIR}/scripts/image_to_video.py