ai-avatar-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Avatar & Talking Head Videos
AI虚拟形象(AI Avatar)与会说话头部视频
Create AI avatars and talking head videos via inference.sh CLI.

通过inference.sh CLI创建AI虚拟形象与会说话头部视频。

Quick Start
快速开始
Requires inference.sh CLI (). Install instructionsbelt
bash
belt login需要安装inference.sh CLI()。安装说明belt
bash
belt loginRecommended: P-Video-Avatar (fastest, cheapest, built-in TTS)
推荐使用:P-Video-Avatar(速度最快、成本最低,内置TTS)
belt app run pruna/p-video-avatar --input '{
"image": "https://portrait.jpg",
"voice_script": "Hello, welcome to our product demo!",
"voice": "Zephyr (Female)"
}'
undefinedbelt app run pruna/p-video-avatar --input '{
"image": "https://portrait.jpg",
"voice_script": "Hello, welcome to our product demo!",
"voice": "Zephyr (Female)"
}'
undefinedAvailable Models
可用模型
Start with P-Video-Avatar — it's 18x faster and 6x cheaper than alternatives, with built-in TTS, dynamic backgrounds, and 1080p support.
| Model | App ID | Best For | Built-in TTS |
|---|---|---|---|
| P-Video-Avatar | | Best overall: speed, cost, quality, control | Yes (30 voices, 10 languages) |
| OmniHuman 1.5 | | Multi-character, audio-driven | No |
| Fabric 1.0 | | Image talks with lipsync | Yes |
| PixVerse Lipsync | | Highly realistic lipsync | No |
优先选择P-Video-Avatar — 相比其他模型快18倍、成本低6倍,内置TTS、动态背景,支持1080p分辨率。
| 模型 | 应用ID | 最佳适用场景 | 内置TTS |
|---|---|---|---|
| P-Video-Avatar | | 综合最佳:速度、成本、质量、可控性 | 是(30种音色,10种语言) |
| OmniHuman 1.5 | | 多角色、音频驱动 | 否 |
| Fabric 1.0 | | 图片唇形同步说话 | 是 |
| PixVerse Lipsync | | 高逼真度唇形同步 | 否 |
Cost & Speed Comparison
成本与速度对比
| Model | Speed (per sec of video) | Cost per second |
|---|---|---|
| P-Video-Avatar | ~1.83s/s | $0.025 |
| OmniHuman 1.5 | ~28s/s (15x slower) | $0.16 (6.4x more) |
| Fabric 1.0 | ~34s/s (18x slower) | $0.14 (5.6x more) |
| 模型 | 速度(每生成1秒视频耗时) | 每秒成本 |
|---|---|---|
| P-Video-Avatar | ~1.83s/s | $0.025 |
| OmniHuman 1.5 | ~28s/s(慢15倍) | $0.16(贵6.4倍) |
| Fabric 1.0 | ~34s/s(慢18倍) | $0.14(贵5.6倍) |
Examples
使用示例
P-Video-Avatar (Recommended)
P-Video-Avatar(推荐)
Generate avatar from portrait + text script with built-in TTS:
bash
belt app run pruna/p-video-avatar --input '{
"image": "https://portrait.jpg",
"voice_script": "Welcome to our product walkthrough. Today I will show you three key features.",
"voice": "Puck (Male)",
"voice_language": "English (US)",
"resolution": "720p"
}'With custom style control:
bash
belt app run pruna/p-video-avatar --input '{
"image": "https://portrait.jpg",
"voice_script": "This is exciting news!",
"voice": "Aoede (Female)",
"voice_prompt": "Enthusiastic and energetic tone",
"video_prompt": "The person is presenting on stage with dramatic lighting",
"resolution": "1080p"
}'With audio file instead of TTS:
bash
belt app run pruna/p-video-avatar --input '{
"image": "https://portrait.jpg",
"audio": "https://speech.mp3"
}'通过肖像图+文本脚本结合内置TTS生成虚拟形象:
bash
belt app run pruna/p-video-avatar --input '{
"image": "https://portrait.jpg",
"voice_script": "Welcome to our product walkthrough. Today I will show you three key features.",
"voice": "Puck (Male)",
"voice_language": "English (US)",
"resolution": "720p"
}'自定义风格控制:
bash
belt app run pruna/p-video-avatar --input '{
"image": "https://portrait.jpg",
"voice_script": "This is exciting news!",
"voice": "Aoede (Female)",
"voice_prompt": "Enthusiastic and energetic tone",
"video_prompt": "The person is presenting on stage with dramatic lighting",
"resolution": "1080p"
}'使用音频文件替代TTS:
bash
belt app run pruna/p-video-avatar --input '{
"image": "https://portrait.jpg",
"audio": "https://speech.mp3"
}'Full Workflow: Generate Portrait + Avatar
完整流程:生成肖像图+虚拟形象
Use Pruna P-Image to generate the portrait, then create the avatar:
bash
undefined先使用Pruna P-Image生成肖像图,再创建虚拟形象:
bash
undefined1. Generate a portrait image
1. 生成肖像图
belt app run pruna/p-image --input '{
"prompt": "professional headshot portrait of a young woman, neutral background, looking at camera, studio lighting, photorealistic",
"aspect_ratio": "9:16"
}'
belt app run pruna/p-image --input '{
"prompt": "professional headshot portrait of a young woman, neutral background, looking at camera, studio lighting, photorealistic",
"aspect_ratio": "9:16"
}'
2. Create avatar video with built-in TTS
2. 结合内置TTS创建虚拟形象视频
belt app run pruna/p-video-avatar --input '{
"image": "<image-url-from-step-1>",
"voice_script": "Hi there! Let me walk you through our latest features.",
"voice": "Zephyr (Female)"
}'
undefinedbelt app run pruna/p-video-avatar --input '{
"image": "<image-url-from-step-1>",
"voice_script": "Hi there! Let me walk you through our latest features.",
"voice": "Zephyr (Female)"
}'
undefinedOmniHuman 1.5 (Multi-Character)
OmniHuman 1.5(多角色)
bash
belt app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'Supports specifying which character to drive in multi-person images.
bash
belt app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'支持指定多人图片中的驱动角色。
Fabric 1.0 (Image Talks)
Fabric 1.0(图片说话)
bash
belt app run falai/fabric-1-0 --input '{
"image_url": "https://face.jpg",
"audio_url": "https://audio.mp3"
}'bash
belt app run falai/fabric-1-0 --input '{
"image_url": "https://face.jpg",
"audio_url": "https://audio.mp3"
}'PixVerse Lipsync
PixVerse Lipsync
bash
belt app run falai/pixverse-lipsync --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'bash
belt app run falai/pixverse-lipsync --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'Full Workflow: TTS + Avatar (Non-TTS Models)
完整流程:TTS+虚拟形象(无内置TTS模型)
For models without built-in TTS, generate speech first:
bash
undefined对于无内置TTS的模型,需先生成语音:
bash
undefined1. Generate speech from text
1. 从文本生成语音
belt app run infsh/kokoro-tts --input '{
"prompt": "Welcome to our product demo. Today I will show you..."
}' > speech.json
belt app run infsh/kokoro-tts --input '{
"prompt": "Welcome to our product demo. Today I will show you..."
}' > speech.json
2. Create avatar video with the speech
2. 结合语音创建虚拟形象视频
belt app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://presenter-photo.jpg",
"audio_url": "<audio-url-from-step-1>"
}'
undefinedbelt app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://presenter-photo.jpg",
"audio_url": "<audio-url-from-step-1>"
}'
undefinedFull Workflow: Dub Video in Another Language
完整流程:视频多语言配音
bash
undefinedbash
undefined1. Transcribe original video
1. 转录原视频音频
belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json
belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json
2. Translate text (manually or with an LLM)
2. 翻译文本(手动或通过LLM)
3. Generate speech in new language
3. 生成目标语言语音
belt app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json
belt app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json
4. Lipsync the original video with new audio
4. 将原视频与新音频做唇形同步
belt app run infsh/latentsync-1-6 --input '{
"video_url": "https://original-video.mp4",
"audio_url": "<new-audio-url>"
}'
undefinedbelt app run infsh/latentsync-1-6 --input '{
"video_url": "https://original-video.mp4",
"audio_url": "<new-audio-url>"
}'
undefinedUse Cases
适用场景
- Marketing: Product demos with AI presenter
- Education: Course videos, explainers
- Localization: Dub content in multiple languages
- Social Media: Consistent virtual influencer
- Corporate: Training videos, announcements
- Gaming: Character avatars, NPC dialogue
- 营销:搭配AI主持人的产品演示
- 教育:课程视频、讲解视频
- 本地化:多语言内容配音
- 社交媒体:风格统一的虚拟网红
- 企业:培训视频、公告内容
- 游戏:角色虚拟形象、NPC对话
Tips
使用技巧
- Use high-quality portrait photos (front-facing, good lighting)
- Audio should be clear with minimal background noise
- P-Video-Avatar supports built-in TTS — no need for a separate speech generation step
- P-Video-Avatar output aspect ratio matches the input image
- Generate portraits with using
pruna/p-imageaspect ratio for vertical videos9:16 - OmniHuman 1.5 supports multiple people in one image
- LatentSync is best for syncing existing videos to new audio
- 使用高质量肖像照(正面朝向、光线良好)
- 音频需清晰,背景噪音尽可能小
- P-Video-Avatar内置TTS,无需额外语音生成步骤
- P-Video-Avatar输出视频比例与输入图片一致
- 使用生成肖像图时,选择
pruna/p-image比例适配竖屏视频9:16 - OmniHuman 1.5支持单图多角色
- LatentSync最适合将现有视频与新音频做同步
Related Skills
相关技能
bash
undefinedbash
undefinedDedicated P-Video-Avatar skill
专属P-Video-Avatar技能
npx skills add inference-sh/skills@p-video-avatar
npx skills add inference-sh/skills@p-video-avatar
Full platform skill (all 250+ apps)
全平台技能(包含250+应用)
npx skills add inference-sh/skills@infsh-cli
npx skills add inference-sh/skills@infsh-cli
Text-to-speech (generate audio for non-TTS avatar models)
文本转语音(为无内置TTS的虚拟形象模型生成音频)
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@text-to-speech
Speech-to-text (transcribe for dubbing)
语音转文本(为配音场景做转录)
npx skills add inference-sh/skills@speech-to-text
npx skills add inference-sh/skills@speech-to-text
Video generation
视频生成
npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@ai-video-generation
Image generation (create avatar images)
图片生成(创建虚拟形象图片)
npx skills add inference-sh/skills@ai-image-generation
Browse all video apps: `belt app list --category video`npx skills add inference-sh/skills@ai-image-generation
浏览所有视频类应用:`belt app list --category video`Documentation
文档参考
- Running Apps - How to run apps via CLI
- Content Pipeline Example - Building media workflows
- Streaming Results - Real-time progress updates