image-to-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Image to Video Skill

图片转视频Skill

Operator Context

操作环境

This skill operates as an operator for CLI-based video creation, configuring Claude's behavior for deterministic FFmpeg script execution. It implements the Sequential Pipeline architectural pattern -- Validate, Prepare, Encode, Verify -- with Domain Intelligence embedded in FFmpeg filter selection and resolution matching.

本Skill作为基于CLI的视频制作操作器，配置Claude的行为以实现确定性FFmpeg脚本执行。它采用顺序流水线架构模式——验证、准备、编码、校验——并在FFmpeg滤镜选择和分辨率匹配中嵌入了领域智能。

Hardcoded Behaviors (Always Apply)

硬编码行为（始终适用）

CLAUDE.md Compliance: Read and follow repository CLAUDE.md before creating video
Over-Engineering Prevention: Only implement what is directly requested. No extra visualizations, no format conversions beyond MP4
FFmpeg Validation: Always verify FFmpeg is installed before attempting video creation
Input Validation: Check that both image and audio files exist before processing
Absolute Paths Only: Always use absolute paths for image, audio, and output arguments

CLAUDE.md合规性：创建视频前请阅读并遵循仓库中的CLAUDE.md
避免过度设计：仅实现用户直接请求的功能。不添加额外可视化效果，仅支持MP4格式转换
FFmpeg验证：尝试创建视频前始终验证FFmpeg是否已安装
输入验证：处理前检查图片和音频文件是否均存在
仅使用绝对路径：图片、音频和输出参数始终使用绝对路径

Default Behaviors (ON unless disabled)

默认行为（默认开启，可关闭）

Resolution Default: Use 1080p (1920x1080) unless user specifies otherwise
Static Mode: No visualization overlay unless user requests one
AAC Audio: Encode audio as 192k AAC for broad compatibility
H.264 Video: Encode with libx264 preset medium, CRF 23, yuv420p pixel format
Output Verification: Run ffprobe on output and report file size after creation

默认分辨率：除非用户指定，否则使用1080p（1920x1080）
静态模式：除非用户请求，否则不添加可视化叠加层
AAC音频：将音频编码为192k AAC以实现广泛兼容性
H.264视频：使用libx264预设medium、CRF 23、yuv420p像素格式进行编码
输出校验：创建完成后运行ffprobe检查输出文件并报告文件大小

Optional Behaviors (OFF unless enabled)

可选行为（默认关闭，需开启）

Waveform Visualization: Neon waveform overlay with
```
--visualization waveform
```
Spectrum Visualization: Scrolling frequency spectrum with
```
--visualization spectrum
```
CQT Visualization: Piano-roll style bars with
```
--visualization cqt
```
Bars Visualization: Frequency bar graph with
```
--visualization bars
```
Custom Resolution: Override with
```
--resolution
```
preset (720p, square, vertical)
Workspace Mode: Batch process paired files with
```
--process-workspace
```

波形可视化：使用
```
--visualization waveform
```
添加霓虹波形叠加层
频谱可视化：使用
```
--visualization spectrum
```
添加滚动频率频谱
CQT可视化：使用
```
--visualization cqt
```
添加钢琴卷帘风格的条形图
条形图可视化：使用
```
--visualization bars
```
添加频率条形图
自定义分辨率：使用
```
--resolution
```
预设覆盖默认设置（720p、方形、竖屏）
工作区模式：使用
```
--process-workspace
```
批量处理配对文件

What This Skill CAN Do

本Skill可实现的功能

Combine a static image with audio to produce an MP4 video
Scale images to target resolution while preserving aspect ratio
Add audio visualization overlays (waveform, spectrum, cqt, bars)
Support multiple resolution presets (1080p, 720p, square, vertical)
Batch process matching image+audio pairs from workspace directory
Validate FFmpeg availability and report actionable install instructions

将静态图片与音频结合生成MP4视频
按目标分辨率缩放图片并保留宽高比
添加音频可视化叠加层（波形、频谱、CQT、条形图）
支持多种分辨率预设（1080p、720p、方形、竖屏）
批量处理工作区目录中的所有配对图片+音频文件
验证FFmpeg可用性并提供可操作的安装说明

What This Skill CANNOT Do

本Skill不可实现的功能

Generate images (use
```
gemini-image-generator
```
for that)
Edit existing videos or trim/split audio
Stream live video or produce non-MP4 formats
Add text overlays, captions, or transitions
Work without FFmpeg installed on the system

生成图片（请使用
```
gemini-image-generator
```
）
编辑现有视频或修剪/分割音频
直播视频或生成非MP4格式的文件
添加文本叠加层、字幕或转场效果
在未安装FFmpeg的系统上运行

Instructions

操作步骤

Phase 1: VALIDATE

阶段1：验证

Goal: Confirm all prerequisites before attempting video creation.

Step 1: Check FFmpeg installation

bash

ffmpeg -version

If FFmpeg is not installed, provide platform-specific install instructions and stop.

Step 2: Verify input files exist

bash

ls -la /absolute/path/to/image.png /absolute/path/to/audio.mp3

Confirm both files exist and have non-zero size. Supported formats:

Images: PNG, JPG, JPEG, GIF, WEBP, BMP
Audio: MP3, WAV, M4A, OGG, FLAC

Step 3: Determine parameters

Resolve resolution preset and visualization mode from user request. If the user did not specify, use defaults (1080p, static).

Preset	Dimensions	Platform
`1080p`	1920x1080	YouTube HD (default)
`720p`	1280x720	Standard HD, smaller files
`square`	1080x1080	Instagram, social media
`vertical`	1080x1920	Stories, Reels, TikTok

Gate: FFmpeg installed, both input files exist, parameters resolved. Proceed only when gate passes.

目标：在尝试创建视频前确认所有先决条件。

步骤1：检查FFmpeg安装情况

bash

ffmpeg -version

如果未安装FFmpeg，请提供针对不同平台的安装说明并停止操作。

步骤2：验证输入文件是否存在

bash

ls -la /absolute/path/to/image.png /absolute/path/to/audio.mp3

确认两个文件均存在且大小非零。支持的格式：

图片：PNG、JPG、JPEG、GIF、WEBP、BMP
音频：MP3、WAV、M4A、OGG、FLAC

步骤3：确定参数

根据用户请求确定分辨率预设和可视化模式。如果用户未指定，则使用默认值（1080p、静态）。

预设	尺寸	适用平台
`1080p`	1920x1080	YouTube高清（默认）
`720p`	1280x720	标准高清，文件更小
`square`	1080x1080	Instagram、社交媒体
`vertical`	1080x1920	快拍、Reels、TikTok

准入条件：已安装FFmpeg、两个输入文件均存在、参数已确定。仅当所有条件满足时才可继续。

Phase 2: PREPARE

阶段2：准备

Goal: Set up output path and confirm no conflicts.

Step 1: Determine output path

Use the path provided by the user. If none given, derive from the audio filename:

/same/directory/as/audio/filename.mp4

Step 2: Ensure output directory exists

The script creates parent directories automatically. Verify the target directory is writable.

Gate: Output path determined, directory accessible. Proceed only when gate passes.

目标：设置输出路径并确认无冲突。

步骤1：确定输出路径

使用用户提供的路径。如果未提供，则从音频文件名派生：

/same/directory/as/audio/filename.mp4

步骤2：确保输出目录存在

脚本会自动创建父目录。验证目标目录可写入。

准入条件：已确定输出路径、目录可访问。仅当所有条件满足时才可继续。

Phase 3: ENCODE

阶段3：编码

Goal: Execute FFmpeg to produce the video.

Step 1: Run the script

bash

python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
  --image /absolute/path/to/image.png \
  --audio /absolute/path/to/audio.mp3 \
  --output /absolute/path/to/output.mp4 \
  --resolution 1080p \
  --visualization static

For workspace batch mode (processes all matched pairs in

workspace/input/

bash

python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
  --process-workspace \
  --visualization waveform

Step 2: Monitor output

The script prints progress including input paths, resolution, visualization mode, and duration. Watch for ERROR lines in output.

Gate: Script exits with code 0. Proceed only when gate passes.

目标：执行FFmpeg生成视频。

步骤1：运行脚本

bash

python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
  --image /absolute/path/to/image.png \
  --audio /absolute/path/to/audio.mp3 \
  --output /absolute/path/to/output.mp4 \
  --resolution 1080p \
  --visualization static

对于工作区批量模式（处理

workspace/input/

中的所有配对文件）：

bash

python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
  --process-workspace \
  --visualization waveform

步骤2：监控输出

脚本会打印进度信息，包括输入路径、分辨率、可视化模式和时长。注意观察输出中的ERROR行。

准入条件：脚本以代码0退出。仅当条件满足时才可继续。

Phase 4: VERIFY

阶段4：校验

Goal: Confirm the output video is valid and report results.

Step 1: Check file exists and has reasonable size

bash

ls -la /absolute/path/to/output.mp4

Step 2: Probe video metadata

bash

ffprobe -v error -show_entries format=duration,size -show_entries stream=codec_name,width,height \
  -of default=noprint_wrappers=1 /absolute/path/to/output.mp4

Confirm video duration matches audio duration (within 1 second tolerance).

Step 3: Report to user

Provide: output file path, file size, duration, resolution, and visualization mode used.

Gate: Output file exists, duration matches audio, metadata is valid. Task complete.

目标：确认输出视频有效并报告结果。

步骤1：检查文件是否存在且大小合理

bash

ls -la /absolute/path/to/output.mp4

步骤2：探测视频元数据

bash

ffprobe -v error -show_entries format=duration,size -show_entries stream=codec_name,width,height \
  -of default=noprint_wrappers=1 /absolute/path/to/output.mp4

确认视频时长与音频时长匹配（误差在1秒以内）。

步骤3：向用户报告

提供：输出文件路径、文件大小、时长、分辨率和使用的可视化模式。

准入条件：输出文件存在、时长与音频匹配、元数据有效。任务完成。

Error Handling

错误处理

Error: "FFmpeg is not installed or not in PATH"

错误："FFmpeg is not installed or not in PATH"

Cause: FFmpeg binary not found on system Solution:

Install via package manager:
```
brew install ffmpeg
```
(macOS),
```
sudo apt install ffmpeg
```
(Ubuntu)
Verify with
```
ffmpeg -version
```
after install
Ensure FFmpeg is in system PATH

原因：系统中未找到FFmpeg二进制文件解决方案：

通过包管理器安装：
```
brew install ffmpeg
```
（macOS）、
```
sudo apt install ffmpeg
```
（Ubuntu）
安装后使用
```
ffmpeg -version
```
验证
确保FFmpeg在系统PATH中

Error: "Image file not found" or "Audio file not found"

错误："Image file not found"或"Audio file not found"

Cause: Path is incorrect, relative, or file does not exist Solution:

Verify the path is absolute, not relative
Check file permissions with
```
ls -la
```
Confirm the file extension matches a supported format

原因：路径不正确、使用相对路径或文件不存在解决方案：

验证路径为绝对路径而非相对路径
使用
```
ls -la
```
检查文件权限
确认文件扩展名属于支持的格式

Error: "FFmpeg failed" with filter errors

错误："FFmpeg failed"并伴随滤镜错误

Cause: FFmpeg build lacks filter support (showwaves, showspectrum, showcqt) Solution:

Install the full FFmpeg build, not a minimal variant
On Ubuntu:
```
sudo apt install ffmpeg
```
(full package)
Fall back to
```
--visualization static
```
which requires no special filters

原因：FFmpeg构建缺少滤镜支持（showwaves、showspectrum、showcqt）解决方案：

安装完整的FFmpeg构建版本，而非精简版
在Ubuntu上：
```
sudo apt install ffmpeg
```
（完整包）
回退到
```
--visualization static
```
模式，该模式不需要特殊滤镜

Error: "Could not determine audio duration"

错误："Could not determine audio duration"

Cause: Audio file is corrupted or uses an unsupported container format Solution:

Test the audio independently:
```
ffprobe /path/to/audio.mp3
```

Convert to a known format:

ffmpeg -i input.audio -acodec pcm_s16le output.wav

Re-run with the converted file

原因：音频文件损坏或使用了不支持的容器格式解决方案：

独立测试音频文件：
```
ffprobe /path/to/audio.mp3
```

转换为已知支持的格式：

ffmpeg -i input.audio -acodec pcm_s16le output.wav

使用转换后的文件重新运行脚本

Anti-Patterns

反模式

Anti-Pattern 1: Using Relative Paths

反模式1：使用相对路径

What it looks like:

python3 image_to_video.py -i ../cover.png -a song.mp3

Why wrong: The script may execute from a different working directory, breaking all paths silently. Do instead: Always use absolute paths for every argument.

表现：

python3 image_to_video.py -i ../cover.png -a song.mp3

错误原因：脚本可能在不同的工作目录执行，导致所有路径无声息地失效。 正确做法：始终为所有参数使用绝对路径。

Anti-Pattern 2: Skipping FFmpeg Verification

反模式2：跳过FFmpeg验证

What it looks like: Running the script directly without checking

ffmpeg -version

first. Why wrong: Produces confusing subprocess errors instead of clear install instructions. Do instead: Complete Phase 1 validation before any encoding attempt.

表现：未先检查

ffmpeg -version

就直接运行脚本。 错误原因：会产生令人困惑的子进程错误，而非清晰的安装说明。 正确做法：在尝试任何编码操作前完成阶段1的验证。

Anti-Pattern 3: Wrong Resolution for Target Platform

反模式3：为目标平台选择错误的分辨率

What it looks like: Using 1080p landscape for TikTok, or vertical for YouTube. Why wrong: Content gets cropped or displays with large black bars on the target platform. Do instead: Ask the user what platform the video targets, then select the matching preset.

表现：为TikTok使用1080p横屏分辨率，或为YouTube使用竖屏分辨率。 错误原因：内容会在目标平台上被裁剪或显示大黑边。 正确做法：询问用户视频的目标平台，然后选择匹配的预设。

Anti-Pattern 4: Skipping Output Verification

反模式4：跳过输出校验

What it looks like: Reporting success based on script exit code alone without probing the output. Why wrong: FFmpeg can exit 0 but produce a corrupt or zero-duration file. Do instead: Complete Phase 4 -- probe the output, confirm duration matches audio.

表现：仅根据脚本退出代码就报告成功，而不探测输出文件。 错误原因：FFmpeg可能以代码0退出，但生成的文件可能损坏或时长为零。 正确做法：完成阶段4——探测输出文件，确认时长与音频匹配。

References

参考资料

This skill uses these shared patterns:

Anti-Rationalization - Prevents shortcut rationalizations
Verification Checklist - Pre-completion checks

本Skill使用以下共享模式：

反合理化 - 防止捷径合理化
校验清单 - 完成前检查

Domain-Specific Anti-Rationalization

领域特定反合理化

Rationalization	Why It Is Wrong	Required Action
"FFmpeg is always installed"	Many systems lack it or have minimal builds	Run `ffmpeg -version` every time
"The script handles everything"	Script can fail silently with bad inputs	Validate inputs in Phase 1
"File size looks right"	Size alone does not prove video integrity	Probe with ffprobe, check duration
"Static mode is fine"	User may have requested visualization	Re-read the request before defaulting

合理化借口	错误原因	要求操作
"FFmpeg肯定已经安装了"	许多系统没有安装FFmpeg或仅安装了精简版本	每次运行 `ffmpeg -version` 进行检查
"脚本会处理所有事情"	脚本在输入错误时可能无声息地失败	在阶段1验证输入
"文件大小看起来正常"	仅靠大小无法证明视频完整性	使用ffprobe探测文件，检查时长
"静态模式就够了"	用户可能请求了可视化效果	采用默认设置前重新阅读用户请求

Reference Files

参考文件

```
${CLAUDE_SKILL_DIR}/references/ffmpeg-filters.md
```
: FFmpeg filter documentation for visualization modes
```
${CLAUDE_SKILL_DIR}/scripts/image_to_video.py
```
: Python CLI script (exit codes: 0=success, 1=no FFmpeg, 2=encode failed, 3=missing args)

${CLAUDE_SKILL_DIR}/references/ffmpeg-filters.md

：可视化模式的FFmpeg滤镜文档

```
${CLAUDE_SKILL_DIR}/scripts/image_to_video.py
```
：Python CLI脚本（退出代码：0=成功，1=未安装FFmpeg，2=编码失败，3=缺少参数）