image-to-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image to Video Skill

图片转视频Skill

Operator Context

操作环境

This skill operates as an operator for CLI-based video creation, configuring Claude's behavior for deterministic FFmpeg script execution. It implements the Sequential Pipeline architectural pattern -- Validate, Prepare, Encode, Verify -- with Domain Intelligence embedded in FFmpeg filter selection and resolution matching.
本Skill作为基于CLI的视频制作操作器,配置Claude的行为以实现确定性FFmpeg脚本执行。它采用顺序流水线架构模式——验证、准备、编码、校验——并在FFmpeg滤镜选择和分辨率匹配中嵌入了领域智能

Hardcoded Behaviors (Always Apply)

硬编码行为(始终适用)

  • CLAUDE.md Compliance: Read and follow repository CLAUDE.md before creating video
  • Over-Engineering Prevention: Only implement what is directly requested. No extra visualizations, no format conversions beyond MP4
  • FFmpeg Validation: Always verify FFmpeg is installed before attempting video creation
  • Input Validation: Check that both image and audio files exist before processing
  • Absolute Paths Only: Always use absolute paths for image, audio, and output arguments
  • CLAUDE.md合规性:创建视频前请阅读并遵循仓库中的CLAUDE.md
  • 避免过度设计:仅实现用户直接请求的功能。不添加额外可视化效果,仅支持MP4格式转换
  • FFmpeg验证:尝试创建视频前始终验证FFmpeg是否已安装
  • 输入验证:处理前检查图片和音频文件是否均存在
  • 仅使用绝对路径:图片、音频和输出参数始终使用绝对路径

Default Behaviors (ON unless disabled)

默认行为(默认开启,可关闭)

  • Resolution Default: Use 1080p (1920x1080) unless user specifies otherwise
  • Static Mode: No visualization overlay unless user requests one
  • AAC Audio: Encode audio as 192k AAC for broad compatibility
  • H.264 Video: Encode with libx264 preset medium, CRF 23, yuv420p pixel format
  • Output Verification: Run ffprobe on output and report file size after creation
  • 默认分辨率:除非用户指定,否则使用1080p(1920x1080)
  • 静态模式:除非用户请求,否则不添加可视化叠加层
  • AAC音频:将音频编码为192k AAC以实现广泛兼容性
  • H.264视频:使用libx264预设medium、CRF 23、yuv420p像素格式进行编码
  • 输出校验:创建完成后运行ffprobe检查输出文件并报告文件大小

Optional Behaviors (OFF unless enabled)

可选行为(默认关闭,需开启)

  • Waveform Visualization: Neon waveform overlay with
    --visualization waveform
  • Spectrum Visualization: Scrolling frequency spectrum with
    --visualization spectrum
  • CQT Visualization: Piano-roll style bars with
    --visualization cqt
  • Bars Visualization: Frequency bar graph with
    --visualization bars
  • Custom Resolution: Override with
    --resolution
    preset (720p, square, vertical)
  • Workspace Mode: Batch process paired files with
    --process-workspace
  • 波形可视化:使用
    --visualization waveform
    添加霓虹波形叠加层
  • 频谱可视化:使用
    --visualization spectrum
    添加滚动频率频谱
  • CQT可视化:使用
    --visualization cqt
    添加钢琴卷帘风格的条形图
  • 条形图可视化:使用
    --visualization bars
    添加频率条形图
  • 自定义分辨率:使用
    --resolution
    预设覆盖默认设置(720p、方形、竖屏)
  • 工作区模式:使用
    --process-workspace
    批量处理配对文件

What This Skill CAN Do

本Skill可实现的功能

  • Combine a static image with audio to produce an MP4 video
  • Scale images to target resolution while preserving aspect ratio
  • Add audio visualization overlays (waveform, spectrum, cqt, bars)
  • Support multiple resolution presets (1080p, 720p, square, vertical)
  • Batch process matching image+audio pairs from workspace directory
  • Validate FFmpeg availability and report actionable install instructions
  • 将静态图片与音频结合生成MP4视频
  • 按目标分辨率缩放图片并保留宽高比
  • 添加音频可视化叠加层(波形、频谱、CQT、条形图)
  • 支持多种分辨率预设(1080p、720p、方形、竖屏)
  • 批量处理工作区目录中的所有配对图片+音频文件
  • 验证FFmpeg可用性并提供可操作的安装说明

What This Skill CANNOT Do

本Skill不可实现的功能

  • Generate images (use
    gemini-image-generator
    for that)
  • Edit existing videos or trim/split audio
  • Stream live video or produce non-MP4 formats
  • Add text overlays, captions, or transitions
  • Work without FFmpeg installed on the system

  • 生成图片(请使用
    gemini-image-generator
  • 编辑现有视频或修剪/分割音频
  • 直播视频或生成非MP4格式的文件
  • 添加文本叠加层、字幕或转场效果
  • 在未安装FFmpeg的系统上运行

Instructions

操作步骤

Phase 1: VALIDATE

阶段1:验证

Goal: Confirm all prerequisites before attempting video creation.
Step 1: Check FFmpeg installation
bash
ffmpeg -version
If FFmpeg is not installed, provide platform-specific install instructions and stop.
Step 2: Verify input files exist
bash
ls -la /absolute/path/to/image.png /absolute/path/to/audio.mp3
Confirm both files exist and have non-zero size. Supported formats:
  • Images: PNG, JPG, JPEG, GIF, WEBP, BMP
  • Audio: MP3, WAV, M4A, OGG, FLAC
Step 3: Determine parameters
Resolve resolution preset and visualization mode from user request. If the user did not specify, use defaults (1080p, static).
PresetDimensionsPlatform
1080p
1920x1080YouTube HD (default)
720p
1280x720Standard HD, smaller files
square
1080x1080Instagram, social media
vertical
1080x1920Stories, Reels, TikTok
Gate: FFmpeg installed, both input files exist, parameters resolved. Proceed only when gate passes.
目标:在尝试创建视频前确认所有先决条件。
步骤1:检查FFmpeg安装情况
bash
ffmpeg -version
如果未安装FFmpeg,请提供针对不同平台的安装说明并停止操作。
步骤2:验证输入文件是否存在
bash
ls -la /absolute/path/to/image.png /absolute/path/to/audio.mp3
确认两个文件均存在且大小非零。支持的格式:
  • 图片:PNG、JPG、JPEG、GIF、WEBP、BMP
  • 音频:MP3、WAV、M4A、OGG、FLAC
步骤3:确定参数
根据用户请求确定分辨率预设和可视化模式。如果用户未指定,则使用默认值(1080p、静态)。
预设尺寸适用平台
1080p
1920x1080YouTube高清(默认)
720p
1280x720标准高清,文件更小
square
1080x1080Instagram、社交媒体
vertical
1080x1920快拍、Reels、TikTok
准入条件:已安装FFmpeg、两个输入文件均存在、参数已确定。仅当所有条件满足时才可继续。

Phase 2: PREPARE

阶段2:准备

Goal: Set up output path and confirm no conflicts.
Step 1: Determine output path
Use the path provided by the user. If none given, derive from the audio filename:
/same/directory/as/audio/filename.mp4
Step 2: Ensure output directory exists
The script creates parent directories automatically. Verify the target directory is writable.
Gate: Output path determined, directory accessible. Proceed only when gate passes.
目标:设置输出路径并确认无冲突。
步骤1:确定输出路径
使用用户提供的路径。如果未提供,则从音频文件名派生:
/same/directory/as/audio/filename.mp4
步骤2:确保输出目录存在
脚本会自动创建父目录。验证目标目录可写入。
准入条件:已确定输出路径、目录可访问。仅当所有条件满足时才可继续。

Phase 3: ENCODE

阶段3:编码

Goal: Execute FFmpeg to produce the video.
Step 1: Run the script
bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
  --image /absolute/path/to/image.png \
  --audio /absolute/path/to/audio.mp3 \
  --output /absolute/path/to/output.mp4 \
  --resolution 1080p \
  --visualization static
For workspace batch mode (processes all matched pairs in
workspace/input/
):
bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
  --process-workspace \
  --visualization waveform
Step 2: Monitor output
The script prints progress including input paths, resolution, visualization mode, and duration. Watch for ERROR lines in output.
Gate: Script exits with code 0. Proceed only when gate passes.
目标:执行FFmpeg生成视频。
步骤1:运行脚本
bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
  --image /absolute/path/to/image.png \
  --audio /absolute/path/to/audio.mp3 \
  --output /absolute/path/to/output.mp4 \
  --resolution 1080p \
  --visualization static
对于工作区批量模式(处理
workspace/input/
中的所有配对文件):
bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
  --process-workspace \
  --visualization waveform
步骤2:监控输出
脚本会打印进度信息,包括输入路径、分辨率、可视化模式和时长。注意观察输出中的ERROR行。
准入条件:脚本以代码0退出。仅当条件满足时才可继续。

Phase 4: VERIFY

阶段4:校验

Goal: Confirm the output video is valid and report results.
Step 1: Check file exists and has reasonable size
bash
ls -la /absolute/path/to/output.mp4
Step 2: Probe video metadata
bash
ffprobe -v error -show_entries format=duration,size -show_entries stream=codec_name,width,height \
  -of default=noprint_wrappers=1 /absolute/path/to/output.mp4
Confirm video duration matches audio duration (within 1 second tolerance).
Step 3: Report to user
Provide: output file path, file size, duration, resolution, and visualization mode used.
Gate: Output file exists, duration matches audio, metadata is valid. Task complete.

目标:确认输出视频有效并报告结果。
步骤1:检查文件是否存在且大小合理
bash
ls -la /absolute/path/to/output.mp4
步骤2:探测视频元数据
bash
ffprobe -v error -show_entries format=duration,size -show_entries stream=codec_name,width,height \
  -of default=noprint_wrappers=1 /absolute/path/to/output.mp4
确认视频时长与音频时长匹配(误差在1秒以内)。
步骤3:向用户报告
提供:输出文件路径、文件大小、时长、分辨率和使用的可视化模式。
准入条件:输出文件存在、时长与音频匹配、元数据有效。任务完成。

Error Handling

错误处理

Error: "FFmpeg is not installed or not in PATH"

错误:"FFmpeg is not installed or not in PATH"

Cause: FFmpeg binary not found on system Solution:
  1. Install via package manager:
    brew install ffmpeg
    (macOS),
    sudo apt install ffmpeg
    (Ubuntu)
  2. Verify with
    ffmpeg -version
    after install
  3. Ensure FFmpeg is in system PATH
原因:系统中未找到FFmpeg二进制文件 解决方案:
  1. 通过包管理器安装:
    brew install ffmpeg
    (macOS)、
    sudo apt install ffmpeg
    (Ubuntu)
  2. 安装后使用
    ffmpeg -version
    验证
  3. 确保FFmpeg在系统PATH中

Error: "Image file not found" or "Audio file not found"

错误:"Image file not found"或"Audio file not found"

Cause: Path is incorrect, relative, or file does not exist Solution:
  1. Verify the path is absolute, not relative
  2. Check file permissions with
    ls -la
  3. Confirm the file extension matches a supported format
原因:路径不正确、使用相对路径或文件不存在 解决方案:
  1. 验证路径为绝对路径而非相对路径
  2. 使用
    ls -la
    检查文件权限
  3. 确认文件扩展名属于支持的格式

Error: "FFmpeg failed" with filter errors

错误:"FFmpeg failed"并伴随滤镜错误

Cause: FFmpeg build lacks filter support (showwaves, showspectrum, showcqt) Solution:
  1. Install the full FFmpeg build, not a minimal variant
  2. On Ubuntu:
    sudo apt install ffmpeg
    (full package)
  3. Fall back to
    --visualization static
    which requires no special filters
原因:FFmpeg构建缺少滤镜支持(showwaves、showspectrum、showcqt) 解决方案:
  1. 安装完整的FFmpeg构建版本,而非精简版
  2. 在Ubuntu上:
    sudo apt install ffmpeg
    (完整包)
  3. 回退到
    --visualization static
    模式,该模式不需要特殊滤镜

Error: "Could not determine audio duration"

错误:"Could not determine audio duration"

Cause: Audio file is corrupted or uses an unsupported container format Solution:
  1. Test the audio independently:
    ffprobe /path/to/audio.mp3
  2. Convert to a known format:
    ffmpeg -i input.audio -acodec pcm_s16le output.wav
  3. Re-run with the converted file

原因:音频文件损坏或使用了不支持的容器格式 解决方案:
  1. 独立测试音频文件:
    ffprobe /path/to/audio.mp3
  2. 转换为已知支持的格式:
    ffmpeg -i input.audio -acodec pcm_s16le output.wav
  3. 使用转换后的文件重新运行脚本

Anti-Patterns

反模式

Anti-Pattern 1: Using Relative Paths

反模式1:使用相对路径

What it looks like:
python3 image_to_video.py -i ../cover.png -a song.mp3
Why wrong: The script may execute from a different working directory, breaking all paths silently. Do instead: Always use absolute paths for every argument.
表现
python3 image_to_video.py -i ../cover.png -a song.mp3
错误原因:脚本可能在不同的工作目录执行,导致所有路径无声息地失效。 正确做法:始终为所有参数使用绝对路径。

Anti-Pattern 2: Skipping FFmpeg Verification

反模式2:跳过FFmpeg验证

What it looks like: Running the script directly without checking
ffmpeg -version
first. Why wrong: Produces confusing subprocess errors instead of clear install instructions. Do instead: Complete Phase 1 validation before any encoding attempt.
表现:未先检查
ffmpeg -version
就直接运行脚本。 错误原因:会产生令人困惑的子进程错误,而非清晰的安装说明。 正确做法:在尝试任何编码操作前完成阶段1的验证。

Anti-Pattern 3: Wrong Resolution for Target Platform

反模式3:为目标平台选择错误的分辨率

What it looks like: Using 1080p landscape for TikTok, or vertical for YouTube. Why wrong: Content gets cropped or displays with large black bars on the target platform. Do instead: Ask the user what platform the video targets, then select the matching preset.
表现:为TikTok使用1080p横屏分辨率,或为YouTube使用竖屏分辨率。 错误原因:内容会在目标平台上被裁剪或显示大黑边。 正确做法:询问用户视频的目标平台,然后选择匹配的预设。

Anti-Pattern 4: Skipping Output Verification

反模式4:跳过输出校验

What it looks like: Reporting success based on script exit code alone without probing the output. Why wrong: FFmpeg can exit 0 but produce a corrupt or zero-duration file. Do instead: Complete Phase 4 -- probe the output, confirm duration matches audio.

表现:仅根据脚本退出代码就报告成功,而不探测输出文件。 错误原因:FFmpeg可能以代码0退出,但生成的文件可能损坏或时长为零。 正确做法:完成阶段4——探测输出文件,确认时长与音频匹配。

References

参考资料

This skill uses these shared patterns:
  • Anti-Rationalization - Prevents shortcut rationalizations
  • Verification Checklist - Pre-completion checks
本Skill使用以下共享模式:
  • 反合理化 - 防止捷径合理化
  • 校验清单 - 完成前检查

Domain-Specific Anti-Rationalization

领域特定反合理化

RationalizationWhy It Is WrongRequired Action
"FFmpeg is always installed"Many systems lack it or have minimal buildsRun
ffmpeg -version
every time
"The script handles everything"Script can fail silently with bad inputsValidate inputs in Phase 1
"File size looks right"Size alone does not prove video integrityProbe with ffprobe, check duration
"Static mode is fine"User may have requested visualizationRe-read the request before defaulting
合理化借口错误原因要求操作
"FFmpeg肯定已经安装了"许多系统没有安装FFmpeg或仅安装了精简版本每次运行
ffmpeg -version
进行检查
"脚本会处理所有事情"脚本在输入错误时可能无声息地失败在阶段1验证输入
"文件大小看起来正常"仅靠大小无法证明视频完整性使用ffprobe探测文件,检查时长
"静态模式就够了"用户可能请求了可视化效果采用默认设置前重新阅读用户请求

Reference Files

参考文件

  • ${CLAUDE_SKILL_DIR}/references/ffmpeg-filters.md
    : FFmpeg filter documentation for visualization modes
  • ${CLAUDE_SKILL_DIR}/scripts/image_to_video.py
    : Python CLI script (exit codes: 0=success, 1=no FFmpeg, 2=encode failed, 3=missing args)
  • ${CLAUDE_SKILL_DIR}/references/ffmpeg-filters.md
    :可视化模式的FFmpeg滤镜文档
  • ${CLAUDE_SKILL_DIR}/scripts/image_to_video.py
    :Python CLI脚本(退出代码:0=成功,1=未安装FFmpeg,2=编码失败,3=缺少参数)