ai-video-gen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Video Generation Skill

AI视频生成技能

Generate complete videos from text descriptions using AI.
通过AI从文本描述生成完整视频。

Capabilities

功能特性

  1. Image Generation - DALL-E 3, Stable Diffusion, Flux
  2. Video Generation - LumaAI, Runway, Replicate models
  3. Voice-over - OpenAI TTS, ElevenLabs
  4. Video Editing - FFmpeg assembly, transitions, overlays
  1. 图像生成 - DALL-E 3、Stable Diffusion、Flux
  2. 视频生成 - LumaAI、Runway、Replicate模型
  3. 旁白配音 - OpenAI TTS、ElevenLabs
  4. 视频编辑 - FFmpeg拼接、转场、叠加效果

Quick Start

快速开始

bash
undefined
bash
undefined

Generate a complete video

生成完整视频

python skills/ai-video-gen/generate_video.py --prompt "A sunset over mountains" --output sunset.mp4
python skills/ai-video-gen/generate_video.py --prompt "A sunset over mountains" --output sunset.mp4

Just images to video

仅将图像转为视频

python skills/ai-video-gen/images_to_video.py --images img1.png img2.png --output result.mp4
python skills/ai-video-gen/images_to_video.py --images img1.png img2.png --output result.mp4

Add voiceover

添加旁白配音

python skills/ai-video-gen/add_voiceover.py --video input.mp4 --text "Your narration" --output final.mp4
undefined
python skills/ai-video-gen/add_voiceover.py --video input.mp4 --text "Your narration" --output final.mp4
undefined

Setup

配置步骤

Required API Keys

所需API密钥

Add to your environment or
.env
file:
bash
undefined
添加到环境变量或
.env
文件中:
bash
undefined

Image Generation (pick one)

图像生成(选择其一)

OPENAI_API_KEY=sk-... # DALL-E 3 REPLICATE_API_TOKEN=r8_... # Stable Diffusion, Flux
OPENAI_API_KEY=sk-... # DALL-E 3 REPLICATE_API_TOKEN=r8_... # Stable Diffusion、Flux

Video Generation (pick one)

视频生成(选择其一)

LUMAAI_API_KEY=luma_... # LumaAI Dream Machine RUNWAY_API_KEY=... # Runway ML REPLICATE_API_TOKEN=r8_... # Multiple models
LUMAAI_API_KEY=luma_... # LumaAI Dream Machine RUNWAY_API_KEY=... # Runway ML REPLICATE_API_TOKEN=r8_... # 多模型支持

Voice (optional)

配音(可选)

OPENAI_API_KEY=sk-... # OpenAI TTS ELEVENLABS_API_KEY=... # ElevenLabs
OPENAI_API_KEY=sk-... # OpenAI TTS ELEVENLABS_API_KEY=... # ElevenLabs

Or use FREE local options (no API needed)

或使用免费本地选项(无需API)

undefined
undefined

Install Dependencies

安装依赖

bash
pip install openai requests pillow replicate python-dotenv
bash
pip install openai requests pillow replicate python-dotenv

FFmpeg

FFmpeg

Already installed via winget.
已通过winget安装。

Usage Examples

使用示例

1. Text to Video (Full Pipeline)

1. 文本转视频(完整流程)

bash
python skills/ai-video-gen/generate_video.py \
  --prompt "A futuristic city at night with flying cars" \
  --duration 5 \
  --voiceover "Welcome to the future" \
  --output future_city.mp4
bash
python skills/ai-video-gen/generate_video.py \
  --prompt "A futuristic city at night with flying cars" \
  --duration 5 \
  --voiceover "Welcome to the future" \
  --output future_city.mp4

2. Multiple Scenes

2. 多场景视频

bash
python skills/ai-video-gen/multi_scene.py \
  --scenes "Morning sunrise" "Busy city street" "Peaceful night" \
  --duration 3 \
  --output day_in_life.mp4
bash
python skills/ai-video-gen/multi_scene.py \
  --scenes "Morning sunrise" "Busy city street" "Peaceful night" \
  --duration 3 \
  --output day_in_life.mp4

3. Image Sequence to Video

3. 图像序列转视频

bash
python skills/ai-video-gen/images_to_video.py \
  --images frame1.png frame2.png frame3.png \
  --fps 24 \
  --output animation.mp4
bash
python skills/ai-video-gen/images_to_video.py \
  --images frame1.png frame2.png frame3.png \
  --fps 24 \
  --output animation.mp4

Workflow Options

工作流选项

Budget Mode (FREE)

预算模式(免费)

  • Image: Stable Diffusion (local or free API)
  • Video: Open source models
  • Voice: OpenAI TTS (cheap) or free TTS
  • Edit: FFmpeg
  • 图像:Stable Diffusion(本地或免费API)
  • 视频:开源模型
  • 配音:OpenAI TTS(低成本)或免费TTS工具
  • 编辑:FFmpeg

Quality Mode (Paid)

高质量模式(付费)

  • Image: DALL-E 3 or Midjourney
  • Video: Runway Gen-3 or LumaAI
  • Voice: ElevenLabs
  • Edit: FFmpeg + effects
  • 图像:DALL-E 3或Midjourney
  • 视频:Runway Gen-3或LumaAI
  • 配音:ElevenLabs
  • 编辑:FFmpeg + 特效

Scripts Reference

脚本参考

  • generate_video.py
    - Main end-to-end generator
  • images_to_video.py
    - Convert image sequence to video
  • add_voiceover.py
    - Add narration to existing video
  • multi_scene.py
    - Create multi-scene videos
  • edit_video.py
    - Apply effects, transitions, overlays
  • generate_video.py
    - 核心端到端生成脚本
  • images_to_video.py
    - 将图像序列转换为视频
  • add_voiceover.py
    - 为现有视频添加旁白
  • multi_scene.py
    - 创建多场景视频
  • edit_video.py
    - 应用特效、转场、叠加效果

API Cost Estimates

API成本估算

  • DALL-E 3: ~$0.04-0.08 per image
  • Replicate: ~$0.01-0.10 per generation
  • LumaAI: $0-0.50 per 5sec (free tier available)
  • Runway: ~$0.05 per second
  • OpenAI TTS: ~$0.015 per 1K characters
  • ElevenLabs: ~$0.30 per 1K characters (better quality)
  • DALL-E 3: 约0.04-0.08美元/张图像
  • Replicate: 约0.01-0.10美元/次生成
  • LumaAI: 每5秒0-0.50美元(提供免费额度)
  • Runway: 约0.05美元/秒
  • OpenAI TTS: 约0.015美元/千字符
  • ElevenLabs: 约0.30美元/千字符(音质更佳)

Examples

示例

See
examples/
folder for sample outputs and prompts.
请查看
examples/
文件夹获取示例输出和提示词。