ai-video-gen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AI Video Generation Skill

AI视频生成技能

Generate complete videos from text descriptions using AI.

通过AI从文本描述生成完整视频。

Capabilities

功能特性

Image Generation - DALL-E 3, Stable Diffusion, Flux
Video Generation - LumaAI, Runway, Replicate models
Voice-over - OpenAI TTS, ElevenLabs
Video Editing - FFmpeg assembly, transitions, overlays

图像生成 - DALL-E 3、Stable Diffusion、Flux
视频生成 - LumaAI、Runway、Replicate模型
旁白配音 - OpenAI TTS、ElevenLabs
视频编辑 - FFmpeg拼接、转场、叠加效果

Quick Start

快速开始

bash

undefined

bash

undefined

Generate a complete video

生成完整视频

python skills/ai-video-gen/generate_video.py --prompt "A sunset over mountains" --output sunset.mp4

Just images to video

仅将图像转为视频

python skills/ai-video-gen/images_to_video.py --images img1.png img2.png --output result.mp4

Add voiceover

添加旁白配音

python skills/ai-video-gen/add_voiceover.py --video input.mp4 --text "Your narration" --output final.mp4

undefined

python skills/ai-video-gen/add_voiceover.py --video input.mp4 --text "Your narration" --output final.mp4

undefined

Setup

配置步骤

Required API Keys

所需API密钥

Add to your environment or

.env

file:

bash

undefined

添加到环境变量或

.env

文件中：

bash

undefined

Image Generation (pick one)

图像生成（选择其一）

OPENAI_API_KEY=sk-... # DALL-E 3 REPLICATE_API_TOKEN=r8_... # Stable Diffusion, Flux

OPENAI_API_KEY=sk-... # DALL-E 3 REPLICATE_API_TOKEN=r8_... # Stable Diffusion、Flux

Video Generation (pick one)

视频生成（选择其一）

LUMAAI_API_KEY=luma_... # LumaAI Dream Machine RUNWAY_API_KEY=... # Runway ML REPLICATE_API_TOKEN=r8_... # Multiple models

LUMAAI_API_KEY=luma_... # LumaAI Dream Machine RUNWAY_API_KEY=... # Runway ML REPLICATE_API_TOKEN=r8_... # 多模型支持

Voice (optional)

配音（可选）

OPENAI_API_KEY=sk-... # OpenAI TTS ELEVENLABS_API_KEY=... # ElevenLabs

Or use FREE local options (no API needed)

或使用免费本地选项（无需API）

undefined

undefined

Install Dependencies

安装依赖

bash

pip install openai requests pillow replicate python-dotenv

bash

pip install openai requests pillow replicate python-dotenv

FFmpeg

Already installed via winget.

已通过winget安装。

Usage Examples

使用示例

1. Text to Video (Full Pipeline)

1. 文本转视频（完整流程）

bash

python skills/ai-video-gen/generate_video.py \
  --prompt "A futuristic city at night with flying cars" \
  --duration 5 \
  --voiceover "Welcome to the future" \
  --output future_city.mp4

bash

python skills/ai-video-gen/generate_video.py \
  --prompt "A futuristic city at night with flying cars" \
  --duration 5 \
  --voiceover "Welcome to the future" \
  --output future_city.mp4

2. Multiple Scenes

2. 多场景视频

bash

python skills/ai-video-gen/multi_scene.py \
  --scenes "Morning sunrise" "Busy city street" "Peaceful night" \
  --duration 3 \
  --output day_in_life.mp4

bash

python skills/ai-video-gen/multi_scene.py \
  --scenes "Morning sunrise" "Busy city street" "Peaceful night" \
  --duration 3 \
  --output day_in_life.mp4

3. Image Sequence to Video

3. 图像序列转视频

bash

python skills/ai-video-gen/images_to_video.py \
  --images frame1.png frame2.png frame3.png \
  --fps 24 \
  --output animation.mp4

bash

python skills/ai-video-gen/images_to_video.py \
  --images frame1.png frame2.png frame3.png \
  --fps 24 \
  --output animation.mp4

Workflow Options

工作流选项

Budget Mode (FREE)

预算模式（免费）

Image: Stable Diffusion (local or free API)
Video: Open source models
Voice: OpenAI TTS (cheap) or free TTS
Edit: FFmpeg

图像：Stable Diffusion（本地或免费API）
视频：开源模型
配音：OpenAI TTS（低成本）或免费TTS工具
编辑：FFmpeg

Quality Mode (Paid)

高质量模式（付费）

Image: DALL-E 3 or Midjourney
Video: Runway Gen-3 or LumaAI
Voice: ElevenLabs
Edit: FFmpeg + effects

图像：DALL-E 3或Midjourney
视频：Runway Gen-3或LumaAI
配音：ElevenLabs
编辑：FFmpeg + 特效

Scripts Reference

脚本参考

```
generate_video.py
```
- Main end-to-end generator
```
images_to_video.py
```
- Convert image sequence to video
```
add_voiceover.py
```
- Add narration to existing video
```
multi_scene.py
```
- Create multi-scene videos
```
edit_video.py
```
- Apply effects, transitions, overlays

```
generate_video.py
```
- 核心端到端生成脚本
```
images_to_video.py
```
- 将图像序列转换为视频
```
add_voiceover.py
```
- 为现有视频添加旁白
```
multi_scene.py
```
- 创建多场景视频
```
edit_video.py
```
- 应用特效、转场、叠加效果

API Cost Estimates

API成本估算

DALL-E 3: ~$0.04-0.08 per image
Replicate: ~$0.01-0.10 per generation
LumaAI: $0-0.50 per 5sec (free tier available)
Runway: ~$0.05 per second
OpenAI TTS: ~$0.015 per 1K characters
ElevenLabs: ~$0.30 per 1K characters (better quality)

DALL-E 3: 约0.04-0.08美元/张图像
Replicate: 约0.01-0.10美元/次生成
LumaAI: 每5秒0-0.50美元（提供免费额度）
Runway: 约0.05美元/秒
OpenAI TTS: 约0.015美元/千字符
ElevenLabs: 约0.30美元/千字符（音质更佳）

Examples

示例

See

examples/

folder for sample outputs and prompts.

请查看

examples/

文件夹获取示例输出和提示词。