ai-avatar-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Avatar & Talking Head Videos

AI虚拟形象(AI Avatar)与会说话头部视频

Create AI avatars and talking head videos via inference.sh CLI.
AI Avatar & Talking Head Videos
通过inference.sh CLI创建AI虚拟形象与会说话头部视频。
AI Avatar & Talking Head Videos

Quick Start

快速开始

Requires inference.sh CLI (
belt
). Install instructions
bash
belt login
需要安装inference.sh CLI(
belt
)。安装说明
bash
belt login

Recommended: P-Video-Avatar (fastest, cheapest, built-in TTS)

推荐使用:P-Video-Avatar(速度最快、成本最低,内置TTS)

belt app run pruna/p-video-avatar --input '{ "image": "https://portrait.jpg", "voice_script": "Hello, welcome to our product demo!", "voice": "Zephyr (Female)" }'
undefined
belt app run pruna/p-video-avatar --input '{ "image": "https://portrait.jpg", "voice_script": "Hello, welcome to our product demo!", "voice": "Zephyr (Female)" }'
undefined

Available Models

可用模型

Start with P-Video-Avatar — it's 18x faster and 6x cheaper than alternatives, with built-in TTS, dynamic backgrounds, and 1080p support.
ModelApp IDBest ForBuilt-in TTS
P-Video-Avatar
pruna/p-video-avatar
Best overall: speed, cost, quality, controlYes (30 voices, 10 languages)
OmniHuman 1.5
bytedance/omnihuman-1-5
Multi-character, audio-drivenNo
Fabric 1.0
falai/fabric-1-0
Image talks with lipsyncYes
PixVerse Lipsync
falai/pixverse-lipsync
Highly realistic lipsyncNo
优先选择P-Video-Avatar — 相比其他模型快18倍、成本低6倍,内置TTS、动态背景,支持1080p分辨率。
模型应用ID最佳适用场景内置TTS
P-Video-Avatar
pruna/p-video-avatar
综合最佳:速度、成本、质量、可控性是(30种音色,10种语言)
OmniHuman 1.5
bytedance/omnihuman-1-5
多角色、音频驱动
Fabric 1.0
falai/fabric-1-0
图片唇形同步说话
PixVerse Lipsync
falai/pixverse-lipsync
高逼真度唇形同步

Cost & Speed Comparison

成本与速度对比

ModelSpeed (per sec of video)Cost per second
P-Video-Avatar~1.83s/s$0.025
OmniHuman 1.5~28s/s (15x slower)$0.16 (6.4x more)
Fabric 1.0~34s/s (18x slower)$0.14 (5.6x more)
模型速度(每生成1秒视频耗时)每秒成本
P-Video-Avatar~1.83s/s$0.025
OmniHuman 1.5~28s/s(慢15倍)$0.16(贵6.4倍)
Fabric 1.0~34s/s(慢18倍)$0.14(贵5.6倍)

Examples

使用示例

P-Video-Avatar (Recommended)

P-Video-Avatar(推荐)

Generate avatar from portrait + text script with built-in TTS:
bash
belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "voice_script": "Welcome to our product walkthrough. Today I will show you three key features.",
  "voice": "Puck (Male)",
  "voice_language": "English (US)",
  "resolution": "720p"
}'
With custom style control:
bash
belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "voice_script": "This is exciting news!",
  "voice": "Aoede (Female)",
  "voice_prompt": "Enthusiastic and energetic tone",
  "video_prompt": "The person is presenting on stage with dramatic lighting",
  "resolution": "1080p"
}'
With audio file instead of TTS:
bash
belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "audio": "https://speech.mp3"
}'
通过肖像图+文本脚本结合内置TTS生成虚拟形象:
bash
belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "voice_script": "Welcome to our product walkthrough. Today I will show you three key features.",
  "voice": "Puck (Male)",
  "voice_language": "English (US)",
  "resolution": "720p"
}'
自定义风格控制:
bash
belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "voice_script": "This is exciting news!",
  "voice": "Aoede (Female)",
  "voice_prompt": "Enthusiastic and energetic tone",
  "video_prompt": "The person is presenting on stage with dramatic lighting",
  "resolution": "1080p"
}'
使用音频文件替代TTS:
bash
belt app run pruna/p-video-avatar --input '{
  "image": "https://portrait.jpg",
  "audio": "https://speech.mp3"
}'

Full Workflow: Generate Portrait + Avatar

完整流程:生成肖像图+虚拟形象

Use Pruna P-Image to generate the portrait, then create the avatar:
bash
undefined
先使用Pruna P-Image生成肖像图,再创建虚拟形象:
bash
undefined

1. Generate a portrait image

1. 生成肖像图

belt app run pruna/p-image --input '{ "prompt": "professional headshot portrait of a young woman, neutral background, looking at camera, studio lighting, photorealistic", "aspect_ratio": "9:16" }'
belt app run pruna/p-image --input '{ "prompt": "professional headshot portrait of a young woman, neutral background, looking at camera, studio lighting, photorealistic", "aspect_ratio": "9:16" }'

2. Create avatar video with built-in TTS

2. 结合内置TTS创建虚拟形象视频

belt app run pruna/p-video-avatar --input '{ "image": "<image-url-from-step-1>", "voice_script": "Hi there! Let me walk you through our latest features.", "voice": "Zephyr (Female)" }'
undefined
belt app run pruna/p-video-avatar --input '{ "image": "<image-url-from-step-1>", "voice_script": "Hi there! Let me walk you through our latest features.", "voice": "Zephyr (Female)" }'
undefined

OmniHuman 1.5 (Multi-Character)

OmniHuman 1.5(多角色)

bash
belt app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
Supports specifying which character to drive in multi-person images.
bash
belt app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
支持指定多人图片中的驱动角色。

Fabric 1.0 (Image Talks)

Fabric 1.0(图片说话)

bash
belt app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'
bash
belt app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

PixVerse Lipsync

bash
belt app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
bash
belt app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Full Workflow: TTS + Avatar (Non-TTS Models)

完整流程:TTS+虚拟形象(无内置TTS模型)

For models without built-in TTS, generate speech first:
bash
undefined
对于无内置TTS的模型,需先生成语音:
bash
undefined

1. Generate speech from text

1. 从文本生成语音

belt app run infsh/kokoro-tts --input '{ "prompt": "Welcome to our product demo. Today I will show you..." }' > speech.json
belt app run infsh/kokoro-tts --input '{ "prompt": "Welcome to our product demo. Today I will show you..." }' > speech.json

2. Create avatar video with the speech

2. 结合语音创建虚拟形象视频

belt app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'
undefined
belt app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'
undefined

Full Workflow: Dub Video in Another Language

完整流程:视频多语言配音

bash
undefined
bash
undefined

1. Transcribe original video

1. 转录原视频音频

belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json
belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

2. Translate text (manually or with an LLM)

2. 翻译文本(手动或通过LLM)

3. Generate speech in new language

3. 生成目标语言语音

belt app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json
belt app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

4. Lipsync the original video with new audio

4. 将原视频与新音频做唇形同步

belt app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'
undefined
belt app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'
undefined

Use Cases

适用场景

  • Marketing: Product demos with AI presenter
  • Education: Course videos, explainers
  • Localization: Dub content in multiple languages
  • Social Media: Consistent virtual influencer
  • Corporate: Training videos, announcements
  • Gaming: Character avatars, NPC dialogue
  • 营销:搭配AI主持人的产品演示
  • 教育:课程视频、讲解视频
  • 本地化:多语言内容配音
  • 社交媒体:风格统一的虚拟网红
  • 企业:培训视频、公告内容
  • 游戏:角色虚拟形象、NPC对话

Tips

使用技巧

  • Use high-quality portrait photos (front-facing, good lighting)
  • Audio should be clear with minimal background noise
  • P-Video-Avatar supports built-in TTS — no need for a separate speech generation step
  • P-Video-Avatar output aspect ratio matches the input image
  • Generate portraits with
    pruna/p-image
    using
    9:16
    aspect ratio for vertical videos
  • OmniHuman 1.5 supports multiple people in one image
  • LatentSync is best for syncing existing videos to new audio
  • 使用高质量肖像照(正面朝向、光线良好)
  • 音频需清晰,背景噪音尽可能小
  • P-Video-Avatar内置TTS,无需额外语音生成步骤
  • P-Video-Avatar输出视频比例与输入图片一致
  • 使用
    pruna/p-image
    生成肖像图时,选择
    9:16
    比例适配竖屏视频
  • OmniHuman 1.5支持单图多角色
  • LatentSync最适合将现有视频与新音频做同步

Related Skills

相关技能

bash
undefined
bash
undefined

Dedicated P-Video-Avatar skill

专属P-Video-Avatar技能

npx skills add inference-sh/skills@p-video-avatar
npx skills add inference-sh/skills@p-video-avatar

Full platform skill (all 250+ apps)

全平台技能(包含250+应用)

npx skills add inference-sh/skills@infsh-cli
npx skills add inference-sh/skills@infsh-cli

Text-to-speech (generate audio for non-TTS avatar models)

文本转语音(为无内置TTS的虚拟形象模型生成音频)

npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@text-to-speech

Speech-to-text (transcribe for dubbing)

语音转文本(为配音场景做转录)

npx skills add inference-sh/skills@speech-to-text
npx skills add inference-sh/skills@speech-to-text

Video generation

视频生成

npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@ai-video-generation

Image generation (create avatar images)

图片生成(创建虚拟形象图片)

npx skills add inference-sh/skills@ai-image-generation

Browse all video apps: `belt app list --category video`
npx skills add inference-sh/skills@ai-image-generation

浏览所有视频类应用:`belt app list --category video`

Documentation

文档参考