alibabacloud-avatar-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Human Avatar — Alibaba Cloud AI Video & Speech

数字人分身——阿里云AI视频与语音生成工具

Capabilities overview

功能概览

CapabilityScriptModel / APIRegionSummary
LivePortrait
live_portrait.py
liveportrait
cn-beijingPortrait + audio/video → talking video, two steps
EMO
portrait_animate.py
emo-v1
cn-beijingPortrait + audio → talking head, detect + generate
AA (AnimateAnyone)
animate_anyone.py
animate-anyone-gen2
cn-beijingFull-body animation: detect → motion template → video
T2I
text_to_image.py
wan2.x-t2i
Multi-regionText → image, default wan2.2-t2i-flash
I2V
image_to_video.py
wan2.x-i2v
Multi-regionImage → video; T2I→I2V pipeline supported; default wan2.7-i2v-flash
Qwen TTS
qwen_tts.py
qwen3-tts-*
cn-beijing / SingaporeText → speech; auto model/voice by scene
LingMou
avatar_video.py
LingMou SDKcn-beijingTemplate-based digital-human broadcast video

功能脚本文件模型/API地域概述
LivePortrait
live_portrait.py
liveportrait
cn-beijing肖像+音频/视频→数字人分身视频,两步流程
EMO
portrait_animate.py
emo-v1
cn-beijing肖像+音频→数字人分身,检测+生成
AA(AnimateAnyone)
animate_anyone.py
animate-anyone-gen2
cn-beijing全身动画:检测→动作模板→视频
T2I
text_to_image.py
wan2.x-t2i
多地域文本→图片,默认模型wan2.2-t2i-flash
I2V
image_to_video.py
wan2.x-i2v
多地域图片→视频;支持T2I→I2V流水线;默认模型wan2.7-i2v-flash
Qwen TTS
qwen_tts.py
qwen3-tts-*
cn-beijing / Singapore文本→语音;根据场景自动选择模型/音色
LingMou
avatar_video.py
LingMou SDKcn-beijing基于模板的数字人播报视频

Quick selection guide

快速选择指南

Talking head (have audio/video already)     → LivePortrait
Talking head (no audio; synthesize first)   → Qwen TTS → LivePortrait
Full-body dance / motion                    → AA (AnimateAnyone)
Text → image                                → T2I (text_to_image)
Image → video                               → I2V (image_to_video)
Text → video end-to-end                     → T2I → I2V (image_to_video --t2i-prompt)
Enterprise digital human / template news    → LingMou (avatar_video)

已有音频/视频的数字人分身     → LivePortrait
无音频(需先合成)的数字人分身   → Qwen TTS → LivePortrait
全身舞蹈/动作动画                    → AA (AnimateAnyone)
文本→图片                                → T2I (text_to_image)
图片→视频                               → I2V (image_to_video)
端到端文本→视频                     → T2I → I2V (image_to_video --t2i-prompt)
企业数字人/模板化新闻    → LingMou (avatar_video)

Environment setup

环境配置

bash
pip install requests==2.33.1 dashscope==1.25.15 oss2==2.19.1 numpy==1.26.4
bash
pip install requests==2.33.1 dashscope==1.25.15 oss2==2.19.1 numpy==1.26.4

LingMou additionally:

LingMou额外依赖:

pip install alibabacloud-lingmou20250527==1.7.0 alibabacloud-tea-openapi==0.4.4

```bash
export DASHSCOPE_API_KEY=sk-xxxx               # Beijing-region API key
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx         # OSS upload
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com
⚠️ API keys for
cn-beijing
and Singapore are not interchangeable; use the key for the correct region.
OSS_ENDPOINT
may include or omit the
https://
prefix; scripts normalize it.

pip install alibabacloud-lingmou20250527==1.7.0 alibabacloud-tea-openapi==0.4.4

```bash
export DASHSCOPE_API_KEY=sk-xxxx               # 北京地域API密钥
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx         # OSS上传密钥
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com
⚠️ 北京地域和新加坡地域的API密钥不可通用,请使用对应地域的密钥。
OSS_ENDPOINT
可包含或省略
https://
前缀,脚本会自动标准化处理。

1. LivePortrait — talking-head video

1. LivePortrait——数字人分身视频

When to use: You have a portrait photo + speech and want a talking-head video quickly.
Flow:
Step 1: liveportrait-detect (sync)  → pass=true
Step 2: liveportrait        (async)  → video_url
Image: Single person, front-facing portrait, clear face, no occlusion
Audio: wav/mp3, < 15MB, 1s–3min
Video input: Audio extracted automatically (ffmpeg)
bash
undefined
适用场景:你已有肖像照片+语音,想要快速生成数字人分身视频。
流程:
步骤1: liveportrait-detect(同步)  → pass=true
步骤2: liveportrait       (异步)  → video_url
图片要求:单人正面肖像,面部清晰无遮挡
音频要求:wav/mp3格式,小于15MB,时长1秒–3分钟
视频输入:自动提取音频(依赖ffmpeg)
bash
undefined

Image + audio file

图片+音频文件

python scripts/live_portrait.py
--image ./portrait.jpg
--audio ./speech.mp3
--template normal --download
python scripts/live_portrait.py
--image ./portrait.jpg
--audio ./speech.mp3
--template normal --download

Image + video (extract audio)

图片+视频(自动提取音频)

python scripts/live_portrait.py
--image ./portrait.jpg
--video ./speech_video.mp4
--template active --download
python scripts/live_portrait.py
--image ./portrait.jpg
--video ./speech_video.mp4
--template active --download

Public URLs

公共URL

python scripts/live_portrait.py
--image-url "https://..."
--audio-url "https://..."
--mouth-strength 1.2 --download

**Motion templates**:
- `normal` (default, moderate motion)
- `calm` (calm; news / storytelling)
- `active` (lively; singing / hosting)

---
python scripts/live_portrait.py
--image-url "https://..."
--audio-url "https://..."
--mouth-strength 1.2 --download

**动作模板**:
- `normal`(默认,动作幅度适中)
- `calm`(沉稳风格;适用于新闻/故事讲述)
- `active`(活泼风格;适用于唱歌/主持)

---

2. Qwen TTS — text to speech

2. Qwen TTS——文本转语音

When to use: Generate speech files from text (for LivePortrait, EMO, etc.).
Default model:
qwen3-tts-vd-realtime-2026-01-15
适用场景:将文本转换为语音文件(用于LivePortrait、EMO等场景)。
默认模型
qwen3-tts-vd-realtime-2026-01-15

Auto model selection by scene

根据场景自动选择模型

Scene
--scene
Suggested modelSuggested voice
default
/
brand
qwen3-tts-vd-realtime-2026-01-15
Cherry
news
/
documentary
/
advertising
qwen3-tts-instruct-flash-realtime
Serena / Ethan
audiobook
/
drama
qwen3-tts-instruct-flash-realtime
Cherry / Dylan
customer_service
/
chatbot
/
education
qwen3-tts-flash-realtime
Anna / Ethan
ecommerce
/
short_video
qwen3-tts-flash-realtime
Cherry / Chelsie
场景参数
--scene
推荐模型推荐音色
default
/
brand
qwen3-tts-vd-realtime-2026-01-15
Cherry
news
/
documentary
/
advertising
qwen3-tts-instruct-flash-realtime
Serena / Ethan
audiobook
/
drama
qwen3-tts-instruct-flash-realtime
Cherry / Dylan
customer_service
/
chatbot
/
education
qwen3-tts-flash-realtime
Anna / Ethan
ecommerce
/
short_video
qwen3-tts-flash-realtime
Cherry / Chelsie

Available voices

可用音色

VoiceCharacter
Cherry
Bright, sweet female; ads / audiobooks / dubbing
Serena
Mature, intellectual female; news / explainers / corporate
Ethan
Steady, warm male; education / documentary / training
Dylan
Expressive male; radio drama / game VO
Anna
Gentle, friendly female; support / assistant / daily
Chelsie
Young, fresh female; short video / e-commerce
Thomas
Deep, magnetic male; brand / ads
Luna
Warm, soft female; meditation / storytelling
bash
undefined
音色特点
Cherry
明亮甜美女声;适用于广告/有声书/配音
Serena
成熟知性女声;适用于新闻/解说/企业宣传
Ethan
沉稳温暖男声;适用于教育/纪录片/培训
Dylan
富有表现力男声;适用于广播剧/游戏配音
Anna
温柔友好女声;适用于客服/助手/日常场景
Chelsie
清新年轻女声;适用于短视频/电商
Thomas
磁性低沉男声;适用于品牌/广告
Luna
温暖柔和女声;适用于冥想/故事讲述
bash
undefined

Default (qwen3-tts-vd-realtime + Cherry)

默认配置(qwen3-tts-vd-realtime + Cherry)

python scripts/qwen_tts.py --text "Hello, welcome to Qwen TTS." --download
python scripts/qwen_tts.py --text "Hello, welcome to Qwen TTS." --download

Match by scene

按场景匹配

python scripts/qwen_tts.py --text "Today's market..." --scene news --download python scripts/qwen_tts.py --text "Once upon a time..." --scene audiobook --download
python scripts/qwen_tts.py --text "Today's market..." --scene news --download python scripts/qwen_tts.py --text "Once upon a time..." --scene audiobook --download

Style via instructions

通过指令定制风格

python scripts/qwen_tts.py
--text "Dear students..."
--model qwen3-tts-instruct-flash-realtime
--instructions "Warm tone, steady pace, suitable for teaching"
--download
python scripts/qwen_tts.py
--text "Dear students..."
--model qwen3-tts-instruct-flash-realtime
--instructions "Warm tone, steady pace, suitable for teaching"
--download

List options

列出可选选项

python scripts/qwen_tts.py --list-voices python scripts/qwen_tts.py --list-models

---
python scripts/qwen_tts.py --list-voices python scripts/qwen_tts.py --list-models

---

3. T2I — Wan 2.x text-to-image

3. T2I——万相2.x文本生成图片

When to use: Generate images from text (optionally feed into I2V).
bash
undefined
适用场景:根据文本生成图片(可输入至I2V生成视频)。
bash
undefined

Default model (wan2.2-t2i-flash, fast)

默认模型(wan2.2-t2i-flash,生成速度快)

python scripts/text_to_image.py
--prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light"
--size 960*1696 --download
python scripts/text_to_image.py
--prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light"
--size 960*1696 --download

Higher quality

更高画质

python scripts/text_to_image.py
--prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download
python scripts/text_to_image.py
--prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download

Latest (Wan 2.6)

最新版本(万相2.6)

python scripts/text_to_image.py
--prompt "..." --model wan2.6-t2i --size 1280*1280 --n 1 --download

**Models**:
- `wan2.2-t2i-flash` (default, fast, good for tests)
- `wan2.2-t2i-plus` (higher quality)
- `wan2.6-t2i` (latest; more aspect ratios; sync call)

**Common sizes**: `1280*1280` (1:1) / `960*1696` (9:16) / `1696*960` (16:9)

---
python scripts/text_to_image.py
--prompt "..." --model wan2.6-t2i --size 1280*1280 --n 1 --download

**模型说明**:
- `wan2.2-t2i-flash`(默认,速度快,适合测试)
- `wan2.2-t2i-plus`(画质更高)
- `wan2.6-t2i`(最新版本;支持更多比例;同步调用)

**常用尺寸**:`1280*1280`(1:1)/ `960*1696`(9:16)/ `1696*960`(16:9)

---

4. I2V — Wan 2.x image-to-video

4. I2V——万相2.x图片生成视频

When to use: Turn an image into motion video; supports text-to-video via T2I first.
bash
undefined
适用场景:将图片转换为动态视频;支持先通过T2I生成图片再转视频的端到端文本转视频流程。
bash
undefined

Local image → video

本地图片→视频

python scripts/image_to_video.py
--image ./portrait.jpg
--prompt "She turns slowly and smiles; dress and petals drift gently"
--model wan2.7-i2v
--resolution 720P --duration 5 --download
python scripts/image_to_video.py
--image ./portrait.jpg
--prompt "She turns slowly and smiles; dress and petals drift gently"
--model wan2.7-i2v
--resolution 720P --duration 5 --download

Pipeline: text → image → video

流水线:文本→图片→视频

python scripts/image_to_video.py
--t2i-prompt "A woman in Hanfu in a peach blossom forest"
--prompt "She turns slowly; petals fall; poetic mood"
--download --output result.mp4
python scripts/image_to_video.py
--t2i-prompt "A woman in Hanfu in a peach blossom forest"
--prompt "She turns slowly; petals fall; poetic mood"
--download --output result.mp4

With background music

添加背景音乐

python scripts/image_to_video.py
--image ./portrait.jpg
--audio-url "https://..."
--prompt "..." --download

**Models**:
- `wan2.7-i2v` (default; includes sound; 5s/10s)
- `wan2.5-i2v-preview` (high-quality preview)
- `wan2.2-i2v-plus` (no built-in audio; faster)

---
python scripts/image_to_video.py
--image ./portrait.jpg
--audio-url "https://..."
--prompt "..." --download

**模型说明**:
- `wan2.7-i2v`(默认;自带音效;支持5秒/10秒时长)
- `wan2.5-i2v-preview`(高质量预览版)
- `wan2.2-i2v-plus`(无内置音效;生成速度更快)

---

5. AA AnimateAnyone — full-body animation

5. AA AnimateAnyone——全身动画

When to use: Full-body photo + reference motion video → dance / motion video.
Requirements:
  • Image: Single person, full body front, head to toe, aspect ratio 0.5–2.0
  • Video: Full body in frame from first frame; mp4/avi/mov; fps ≥ 24; 2–60s
Three steps:
Step 1: animate-anyone-detect-gen2   (sync)  → check_pass=true
Step 2: animate-anyone-template-gen2 (async)  → template_id (~3–5 min)
Step 3: animate-anyone-gen2          (async)  → video_url (~3–5 min)
bash
undefined
适用场景:全身照片+参考动作视频→舞蹈/动作视频。
要求:
  • 图片:单人正面全身照,从头到脚完整入镜,宽高比0.5–2.0
  • 视频:第一帧开始全身入镜;格式为mp4/avi/mov;帧率≥24;时长2–60秒
三步流程:
步骤1: animate-anyone-detect-gen2  (同步)  → check_pass=true
步骤2: animate-anyone-template-gen2(异步)  → template_id(约3–5分钟)
步骤3: animate-anyone-gen2         (异步)  → video_url(约3–5分钟)
bash
undefined

Local files (auto convert + OSS upload)

本地文件(自动转换+OSS上传)

python scripts/animate_anyone.py
--image ./portrait_fullbody.jpg
--video ./dance.mp4
--download --output result.mp4
python scripts/animate_anyone.py
--image ./portrait_fullbody.jpg
--video ./dance.mp4
--download --output result.mp4

Use image as background

使用图片作为背景

python scripts/animate_anyone.py
--image ./portrait.jpg --video ./dance.mp4
--use-ref-img-bg --video-ratio 9:16 --download
python scripts/animate_anyone.py
--image ./portrait.jpg --video ./dance.mp4
--use-ref-img-bg --video-ratio 9:16 --download

Skip Step 2 (existing template_id)

跳过步骤2(使用已有template_id)

python scripts/animate_anyone.py
--image ./portrait.jpg
--template-id "AACT.xxx.xxx" --download

> Auto conversion: video webm/mkv/flv → mp4; image webp/heic → jpg; if fps is under 24, normalize to 24 fps

---
python scripts/animate_anyone.py
--image ./portrait.jpg
--template-id "AACT.xxx.xxx" --download

> 自动转换:视频格式webm/mkv/flv→mp4;图片格式webp/heic→jpg;若帧率低于24,自动标准化为24帧

---

6. EMO — talking head (legacy)

6. EMO——数字人分身(旧版本)

Note: Prefer LivePortrait; EMO suits cases that need stricter lip-sync.
bash
python scripts/portrait_animate.py \
  --image ./portrait.jpg \
  --audio ./speech.mp3 \
  --download

注意:优先使用LivePortrait;EMO适用于对唇形同步要求更严格的场景。
bash
python scripts/portrait_animate.py \
  --image ./portrait.jpg \
  --audio ./speech.mp3 \
  --download

7. LingMou — enterprise template video

7. LingMou——企业模板化视频

When to use: Corporate digital-human news, template-based broadcasts, scripted reads with optional character images.
适用场景:企业数字人新闻、模板化播报、带脚本的朗读视频(可选自定义人物图片)。

New workflow (prefer no
template_id
)

新版流程(优先不指定
template_id

  • If the user provides
    template_id
    : use that template to generate.
  • If no
    template_id
    :
    1. List existing broadcast templates for the account.
    2. If any exist, pick one at random for creation.
    3. If none, fetch public templates and copy up to 3 into the account.
    4. Pick one at random from the copy results and continue.
  • Caveat: After a public template is copied, the copy may not yet be a fully “ready-to-render” template; some copies are still drafts and may lack clips, assets, or variable bindings—complete them in LingMou.
  • If the user only gives an image and “make a talking video” without a script: confirm the spoken copy before generating.
  • 如果用户提供
    template_id
    :使用指定模板生成视频。
  • 如果未指定
    template_id
    :
    1. 列出账号下已有的播报模板。
    2. 若有模板,随机选择一个进行生成。
    3. 若无模板,获取公共模板并最多复制3个到账号下。
    4. 从复制的模板中随机选择一个继续生成。
  • 注意:公共模板复制后,副本可能尚未完全准备就绪;部分副本仍为草稿状态,缺少片段、资源或变量绑定,需在LingMou平台中完成配置。
  • 如果用户仅提供图片并要求“制作数字人分身视频”但未提供脚本:需先确认朗读内容再生成。

What
scripts/avatar_video.py
supports

scripts/avatar_video.py
支持的功能

  • --list-templates
    : list account templates
  • --list-public-templates
    : list public templates (SDK 1.7.0+)
  • --copy-public-templates
    : copy up to 3 public templates (SDK 1.7.0+)
  • Omit
    --template-id
    : random existing template
  • When local templates are empty: auto try public-template copy as fallback
  • --show-template-detail
    : template detail and replaceable variables
  • Fills input text into template text variables (prefers
    text_content
    /
    test_text
    )
  • If generation fails right after copying a public template, surfaces a clear error that the template may still need completion (no silent failure)
bash
undefined
  • --list-templates
    : 列出账号下的模板
  • --list-public-templates
    : 列出公共模板(SDK 1.7.0+)
  • --copy-public-templates
    : 最多复制3个公共模板(SDK 1.7.0+)
  • 省略
    --template-id
    : 使用随机已有模板
  • 本地模板为空时:自动尝试复制公共模板作为备选
  • --show-template-detail
    : 查看模板详情及可替换变量
  • 将输入文本填充到模板的文本变量中(优先匹配
    text_content
    /
    test_text
  • 若复制公共模板后立即生成失败,会明确提示模板可能仍需配置(不会静默失败)
bash
undefined

List templates

列出模板

python scripts/avatar_video.py --list-templates
python scripts/avatar_video.py --list-templates

Public templates (SDK 1.7.0+)

列出公共模板(SDK 1.7.0+)

python scripts/avatar_video.py --list-public-templates
python scripts/avatar_video.py --list-public-templates

Copy up to 3 public templates (SDK 1.7.0+)

复制最多3个公共模板(SDK 1.7.0+)

python scripts/avatar_video.py --copy-public-templates
python scripts/avatar_video.py --copy-public-templates

No template_id — random existing template

不指定template_id——使用随机已有模板

python scripts/avatar_video.py
--text "Hello, welcome to today's tech news."
--download
python scripts/avatar_video.py
--text "Hello, welcome to today's tech news."
--download

Specific template_id

指定template_id

python scripts/avatar_video.py
--template-id "BS1b2WNnRMu4ouRzT4clY9Jhg"
--text "Hello, welcome to today's tech news."
--download
python scripts/avatar_video.py
--template-id "BS1b2WNnRMu4ouRzT4clY9Jhg"
--text "Hello, welcome to today's tech news."
--download

Detail for randomly chosen template

查看随机选中模板的详情

python scripts/avatar_video.py
--show-template-detail
--text "This is a test script for broadcast."
undefined
python scripts/avatar_video.py
--show-template-detail
--text "This is a test script for broadcast."
undefined

Conversational usage

对话式使用指南

When the user says things like:
  • “Make a talking video from this image”
  • “Digital-human broadcast for me”
  • “Upload image and make a news read”
Do this:
  1. Check whether they already gave copy/script ready to read.
  2. If not, ask: “What is the exact script to read? You can give bullet points and I can turn them into broadcast-ready copy.”
  3. With script in hand, run LingMou: prefer random existing template; if none locally, try public copy.
  4. If they uploaded a portrait but the template API does not use it, explain: this path is template-driven; for image-driven talking head, use LivePortrait or EMO.

当用户提出以下需求时:
  • “用这张图片制作数字人分身视频”
  • “帮我生成数字人播报视频”
  • “上传图片并制作新闻朗读视频”
请按以下步骤操作:
  1. 检查用户是否已提供可直接朗读的文案/脚本。
  2. 若未提供,询问:“请提供具体的朗读脚本,你可以给出要点,我会将其转换为适合播报的文案。”
  3. 获取脚本后,运行LingMou:优先使用随机已有模板;若本地无模板,尝试复制公共模板。
  4. 如果用户上传了肖像但模板API未使用该图片,需说明:此路径为模板驱动;若需基于图片生成数字人分身,请使用LivePortrait或EMO。

API reference links

API参考链接