alibabacloud-avatar-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Human Avatar — Alibaba Cloud AI Video & Speech

数字人分身——阿里云AI视频与语音生成工具

Capabilities overview

功能概览

Capability	Script	Model / API	Region	Summary
LivePortrait	`live_portrait.py`	`liveportrait`	cn-beijing	Portrait + audio/video → talking video, two steps
EMO	`portrait_animate.py`	`emo-v1`	cn-beijing	Portrait + audio → talking head, detect + generate
AA (AnimateAnyone)	`animate_anyone.py`	`animate-anyone-gen2`	cn-beijing	Full-body animation: detect → motion template → video
T2I	`text_to_image.py`	`wan2.x-t2i`	Multi-region	Text → image, default wan2.2-t2i-flash
I2V	`image_to_video.py`	`wan2.x-i2v`	Multi-region	Image → video; T2I→I2V pipeline supported; default wan2.7-i2v-flash
Qwen TTS	`qwen_tts.py`	`qwen3-tts-*`	cn-beijing / Singapore	Text → speech; auto model/voice by scene
LingMou	`avatar_video.py`	LingMou SDK	cn-beijing	Template-based digital-human broadcast video

功能	脚本文件	模型/API	地域	概述
LivePortrait	`live_portrait.py`	`liveportrait`	cn-beijing	肖像+音频/视频→数字人分身视频，两步流程
EMO	`portrait_animate.py`	`emo-v1`	cn-beijing	肖像+音频→数字人分身，检测+生成
AA（AnimateAnyone）	`animate_anyone.py`	`animate-anyone-gen2`	cn-beijing	全身动画：检测→动作模板→视频
T2I	`text_to_image.py`	`wan2.x-t2i`	多地域	文本→图片，默认模型wan2.2-t2i-flash
I2V	`image_to_video.py`	`wan2.x-i2v`	多地域	图片→视频；支持T2I→I2V流水线；默认模型wan2.7-i2v-flash
Qwen TTS	`qwen_tts.py`	`qwen3-tts-*`	cn-beijing / Singapore	文本→语音；根据场景自动选择模型/音色
LingMou	`avatar_video.py`	LingMou SDK	cn-beijing	基于模板的数字人播报视频

Quick selection guide

快速选择指南

Talking head (have audio/video already)     → LivePortrait
Talking head (no audio; synthesize first)   → Qwen TTS → LivePortrait
Full-body dance / motion                    → AA (AnimateAnyone)
Text → image                                → T2I (text_to_image)
Image → video                               → I2V (image_to_video)
Text → video end-to-end                     → T2I → I2V (image_to_video --t2i-prompt)
Enterprise digital human / template news    → LingMou (avatar_video)

已有音频/视频的数字人分身     → LivePortrait
无音频（需先合成）的数字人分身   → Qwen TTS → LivePortrait
全身舞蹈/动作动画                    → AA (AnimateAnyone)
文本→图片                                → T2I (text_to_image)
图片→视频                               → I2V (image_to_video)
端到端文本→视频                     → T2I → I2V (image_to_video --t2i-prompt)
企业数字人/模板化新闻    → LingMou (avatar_video)

Environment setup

环境配置

bash

pip install requests==2.33.1 dashscope==1.25.15 oss2==2.19.1 numpy==1.26.4

bash

pip install requests==2.33.1 dashscope==1.25.15 oss2==2.19.1 numpy==1.26.4

LingMou additionally:

LingMou额外依赖:

pip install alibabacloud-lingmou20250527==1.7.0 alibabacloud-tea-openapi==0.4.4


```bash
export DASHSCOPE_API_KEY=sk-xxxx               # Beijing-region API key
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx         # OSS upload
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com

⚠️ API keys for
cn-beijing
and Singapore are not interchangeable; use the key for the correct region.
OSS_ENDPOINT
may include or omit the
https://
prefix; scripts normalize it.

pip install alibabacloud-lingmou20250527==1.7.0 alibabacloud-tea-openapi==0.4.4


```bash
export DASHSCOPE_API_KEY=sk-xxxx               # 北京地域API密钥
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx         # OSS上传密钥
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com

⚠️ 北京地域和新加坡地域的API密钥不可通用，请使用对应地域的密钥。
OSS_ENDPOINT
可包含或省略
https://
前缀，脚本会自动标准化处理。

1. LivePortrait — talking-head video

1. LivePortrait——数字人分身视频

When to use: You have a portrait photo + speech and want a talking-head video quickly.

Flow:

Step 1: liveportrait-detect (sync)  → pass=true
  ↓
Step 2: liveportrait        (async)  → video_url

Image: Single person, front-facing portrait, clear face, no occlusion
Audio: wav/mp3, < 15MB, 1s–3min
Video input: Audio extracted automatically (ffmpeg)

bash

undefined

适用场景：你已有肖像照片+语音，想要快速生成数字人分身视频。

流程:

步骤1: liveportrait-detect（同步）  → pass=true
  ↓
步骤2: liveportrait       （异步）  → video_url

图片要求：单人正面肖像，面部清晰无遮挡
音频要求：wav/mp3格式，小于15MB，时长1秒–3分钟
视频输入：自动提取音频（依赖ffmpeg）

bash

undefined

Image + audio file

图片+音频文件

python scripts/live_portrait.py
--image ./portrait.jpg
--audio ./speech.mp3
--template normal --download

Image + video (extract audio)

图片+视频（自动提取音频）

python scripts/live_portrait.py
--image ./portrait.jpg
--video ./speech_video.mp4
--template active --download

Public URLs

公共URL

python scripts/live_portrait.py
--image-url "https://..."
--audio-url "https://..."
--mouth-strength 1.2 --download


**Motion templates**:
- `normal` (default, moderate motion)
- `calm` (calm; news / storytelling)
- `active` (lively; singing / hosting)

---

python scripts/live_portrait.py
--image-url "https://..."
--audio-url "https://..."
--mouth-strength 1.2 --download


**动作模板**:
- `normal`（默认，动作幅度适中）
- `calm`（沉稳风格；适用于新闻/故事讲述）
- `active`（活泼风格；适用于唱歌/主持）

---

2. Qwen TTS — text to speech

2. Qwen TTS——文本转语音

When to use: Generate speech files from text (for LivePortrait, EMO, etc.).

Default model:

qwen3-tts-vd-realtime-2026-01-15

适用场景：将文本转换为语音文件（用于LivePortrait、EMO等场景）。

默认模型：

qwen3-tts-vd-realtime-2026-01-15

Auto model selection by scene

根据场景自动选择模型

Scene `--scene`	Suggested model	Suggested voice
`default` / `brand`	`qwen3-tts-vd-realtime-2026-01-15`	Cherry
`news` / `documentary` / `advertising`	`qwen3-tts-instruct-flash-realtime`	Serena / Ethan
`audiobook` / `drama`	`qwen3-tts-instruct-flash-realtime`	Cherry / Dylan
`customer_service` / `chatbot` / `education`	`qwen3-tts-flash-realtime`	Anna / Ethan
`ecommerce` / `short_video`	`qwen3-tts-flash-realtime`	Cherry / Chelsie

场景参数 `--scene`	推荐模型	推荐音色
`default` / `brand`	`qwen3-tts-vd-realtime-2026-01-15`	Cherry
`news` / `documentary` / `advertising`	`qwen3-tts-instruct-flash-realtime`	Serena / Ethan
`audiobook` / `drama`	`qwen3-tts-instruct-flash-realtime`	Cherry / Dylan
`customer_service` / `chatbot` / `education`	`qwen3-tts-flash-realtime`	Anna / Ethan
`ecommerce` / `short_video`	`qwen3-tts-flash-realtime`	Cherry / Chelsie

Available voices

可用音色

Voice	Character
`Cherry`	Bright, sweet female; ads / audiobooks / dubbing
`Serena`	Mature, intellectual female; news / explainers / corporate
`Ethan`	Steady, warm male; education / documentary / training
`Dylan`	Expressive male; radio drama / game VO
`Anna`	Gentle, friendly female; support / assistant / daily
`Chelsie`	Young, fresh female; short video / e-commerce
`Thomas`	Deep, magnetic male; brand / ads
`Luna`	Warm, soft female; meditation / storytelling

bash

undefined

音色	特点
`Cherry`	明亮甜美女声；适用于广告/有声书/配音
`Serena`	成熟知性女声；适用于新闻/解说/企业宣传
`Ethan`	沉稳温暖男声；适用于教育/纪录片/培训
`Dylan`	富有表现力男声；适用于广播剧/游戏配音
`Anna`	温柔友好女声；适用于客服/助手/日常场景
`Chelsie`	清新年轻女声；适用于短视频/电商
`Thomas`	磁性低沉男声；适用于品牌/广告
`Luna`	温暖柔和女声；适用于冥想/故事讲述

bash

undefined

Default (qwen3-tts-vd-realtime + Cherry)

默认配置（qwen3-tts-vd-realtime + Cherry）

python scripts/qwen_tts.py --text "Hello, welcome to Qwen TTS." --download

Match by scene

按场景匹配

python scripts/qwen_tts.py --text "Today's market..." --scene news --download python scripts/qwen_tts.py --text "Once upon a time..." --scene audiobook --download

Style via instructions

通过指令定制风格

python scripts/qwen_tts.py
--text "Dear students..."
--model qwen3-tts-instruct-flash-realtime
--instructions "Warm tone, steady pace, suitable for teaching"
--download

List options

列出可选选项

python scripts/qwen_tts.py --list-voices python scripts/qwen_tts.py --list-models

---

python scripts/qwen_tts.py --list-voices python scripts/qwen_tts.py --list-models

---

3. T2I — Wan 2.x text-to-image

3. T2I——万相2.x文本生成图片

When to use: Generate images from text (optionally feed into I2V).

bash

undefined

适用场景：根据文本生成图片（可输入至I2V生成视频）。

bash

undefined

Default model (wan2.2-t2i-flash, fast)

默认模型（wan2.2-t2i-flash，生成速度快）

python scripts/text_to_image.py
--prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light"
--size 960*1696 --download

Higher quality

更高画质

python scripts/text_to_image.py
--prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download

Latest (Wan 2.6)

Local image → video

本地图片→视频

python scripts/image_to_video.py
--image ./portrait.jpg
--prompt "She turns slowly and smiles; dress and petals drift gently"
--model wan2.7-i2v
--resolution 720P --duration 5 --download

Pipeline: text → image → video

流水线：文本→图片→视频

python scripts/image_to_video.py
--t2i-prompt "A woman in Hanfu in a peach blossom forest"
--prompt "She turns slowly; petals fall; poetic mood"
--download --output result.mp4

With background music

添加背景音乐

python scripts/image_to_video.py
--image ./portrait.jpg
--audio-url "https://..."
--prompt "..." --download


**Models**:
- `wan2.7-i2v` (default; includes sound; 5s/10s)
- `wan2.5-i2v-preview` (high-quality preview)
- `wan2.2-i2v-plus` (no built-in audio; faster)

---

python scripts/image_to_video.py
--image ./portrait.jpg
--audio-url "https://..."
--prompt "..." --download


**模型说明**:
- `wan2.7-i2v`（默认；自带音效；支持5秒/10秒时长）
- `wan2.5-i2v-preview`（高质量预览版）
- `wan2.2-i2v-plus`（无内置音效；生成速度更快）

---

5. AA AnimateAnyone — full-body animation

5. AA AnimateAnyone——全身动画

When to use: Full-body photo + reference motion video → dance / motion video.

Requirements:

Image: Single person, full body front, head to toe, aspect ratio 0.5–2.0
Video: Full body in frame from first frame; mp4/avi/mov; fps ≥ 24; 2–60s

Three steps:

Step 1: animate-anyone-detect-gen2   (sync)  → check_pass=true
  ↓
Step 2: animate-anyone-template-gen2 (async)  → template_id (~3–5 min)
  ↓
Step 3: animate-anyone-gen2          (async)  → video_url (~3–5 min)

bash

undefined

适用场景：全身照片+参考动作视频→舞蹈/动作视频。

要求:

图片：单人正面全身照，从头到脚完整入镜，宽高比0.5–2.0
视频：第一帧开始全身入镜；格式为mp4/avi/mov；帧率≥24；时长2–60秒

三步流程:

步骤1: animate-anyone-detect-gen2  （同步）  → check_pass=true
  ↓
步骤2: animate-anyone-template-gen2（异步）  → template_id（约3–5分钟）
  ↓
步骤3: animate-anyone-gen2         （异步）  → video_url（约3–5分钟）

bash

undefined

Local files (auto convert + OSS upload)

本地文件（自动转换+OSS上传）

python scripts/animate_anyone.py
--image ./portrait_fullbody.jpg
--video ./dance.mp4
--download --output result.mp4

Use image as background

使用图片作为背景

python scripts/animate_anyone.py
--image ./portrait.jpg --video ./dance.mp4
--use-ref-img-bg --video-ratio 9:16 --download

Skip Step 2 (existing template_id)

跳过步骤2（使用已有template_id）

python scripts/animate_anyone.py
--image ./portrait.jpg
--template-id "AACT.xxx.xxx" --download


> Auto conversion: video webm/mkv/flv → mp4; image webp/heic → jpg; if fps is under 24, normalize to 24 fps

---

python scripts/animate_anyone.py
--image ./portrait.jpg
--template-id "AACT.xxx.xxx" --download


> 自动转换：视频格式webm/mkv/flv→mp4；图片格式webp/heic→jpg；若帧率低于24，自动标准化为24帧

---

6. EMO — talking head (legacy)

6. EMO——数字人分身（旧版本）

Note: Prefer LivePortrait; EMO suits cases that need stricter lip-sync.

bash

python scripts/portrait_animate.py \
  --image ./portrait.jpg \
  --audio ./speech.mp3 \
  --download

注意：优先使用LivePortrait；EMO适用于对唇形同步要求更严格的场景。

bash

python scripts/portrait_animate.py \
  --image ./portrait.jpg \
  --audio ./speech.mp3 \
  --download

7. LingMou — enterprise template video

7. LingMou——企业模板化视频

When to use: Corporate digital-human news, template-based broadcasts, scripted reads with optional character images.

适用场景：企业数字人新闻、模板化播报、带脚本的朗读视频（可选自定义人物图片）。

New workflow (prefer no

template_id

)

新版流程（优先不指定

template_id

）

If the user provides
template_id
: use that template to generate.
If no
template_id
:
1. List existing broadcast templates for the account.
2. If any exist, pick one at random for creation.
3. If none, fetch public templates and copy up to 3 into the account.
4. Pick one at random from the copy results and continue.
Caveat: After a public template is copied, the copy may not yet be a fully “ready-to-render” template; some copies are still drafts and may lack clips, assets, or variable bindings—complete them in LingMou.
If the user only gives an image and “make a talking video” without a script: confirm the spoken copy before generating.

如果用户提供
template_id
：使用指定模板生成视频。
如果未指定
template_id
:
1. 列出账号下已有的播报模板。
2. 若有模板，随机选择一个进行生成。
3. 若无模板，获取公共模板并最多复制3个到账号下。
4. 从复制的模板中随机选择一个继续生成。
注意：公共模板复制后，副本可能尚未完全准备就绪；部分副本仍为草稿状态，缺少片段、资源或变量绑定，需在LingMou平台中完成配置。
如果用户仅提供图片并要求“制作数字人分身视频”但未提供脚本：需先确认朗读内容再生成。

What

scripts/avatar_video.py

supports

scripts/avatar_video.py

支持的功能

```
--list-templates
```
: list account templates
```
--list-public-templates
```
: list public templates (SDK 1.7.0+)
```
--copy-public-templates
```
: copy up to 3 public templates (SDK 1.7.0+)
Omit
```
--template-id
```
: random existing template
When local templates are empty: auto try public-template copy as fallback
```
--show-template-detail
```
: template detail and replaceable variables
Fills input text into template text variables (prefers
```
text_content
```
/
```
test_text
```
)
If generation fails right after copying a public template, surfaces a clear error that the template may still need completion (no silent failure)

bash

undefined

```
--list-templates
```
: 列出账号下的模板
```
--list-public-templates
```
: 列出公共模板（SDK 1.7.0+）
```
--copy-public-templates
```
: 最多复制3个公共模板（SDK 1.7.0+）
省略
```
--template-id
```
: 使用随机已有模板
本地模板为空时：自动尝试复制公共模板作为备选
```
--show-template-detail
```
: 查看模板详情及可替换变量
将输入文本填充到模板的文本变量中（优先匹配
```
text_content
```
/
```
test_text
```
）
若复制公共模板后立即生成失败，会明确提示模板可能仍需配置（不会静默失败）

bash

undefined

List templates

列出模板

python scripts/avatar_video.py --list-templates

Public templates (SDK 1.7.0+)

列出公共模板（SDK 1.7.0+）

python scripts/avatar_video.py --list-public-templates

Copy up to 3 public templates (SDK 1.7.0+)

复制最多3个公共模板（SDK 1.7.0+）

python scripts/avatar_video.py --copy-public-templates

No template_id — random existing template

不指定template_id——使用随机已有模板

python scripts/avatar_video.py
--text "Hello, welcome to today's tech news."
--download

Specific template_id

指定template_id

python scripts/avatar_video.py
--template-id "BS1b2WNnRMu4ouRzT4clY9Jhg"
--text "Hello, welcome to today's tech news."
--download

Detail for randomly chosen template

查看随机选中模板的详情

python scripts/avatar_video.py
--show-template-detail
--text "This is a test script for broadcast."

undefined

python scripts/avatar_video.py
--show-template-detail
--text "This is a test script for broadcast."

undefined

Conversational usage

对话式使用指南

When the user says things like:

“Make a talking video from this image”
“Digital-human broadcast for me”
“Upload image and make a news read”

Do this:

Check whether they already gave copy/script ready to read.
If not, ask: “What is the exact script to read? You can give bullet points and I can turn them into broadcast-ready copy.”
With script in hand, run LingMou: prefer random existing template; if none locally, try public copy.
If they uploaded a portrait but the template API does not use it, explain: this path is template-driven; for image-driven talking head, use LivePortrait or EMO.

当用户提出以下需求时：

“用这张图片制作数字人分身视频”
“帮我生成数字人播报视频”
“上传图片并制作新闻朗读视频”

请按以下步骤操作：

检查用户是否已提供可直接朗读的文案/脚本。
若未提供，询问：“请提供具体的朗读脚本，你可以给出要点，我会将其转换为适合播报的文案。”
获取脚本后，运行LingMou：优先使用随机已有模板；若本地无模板，尝试复制公共模板。
如果用户上传了肖像但模板API未使用该图片，需说明：此路径为模板驱动；若需基于图片生成数字人分身，请使用LivePortrait或EMO。

API reference links

API参考链接

LivePortrait: https://help.aliyun.com/zh/model-studio/liveportrait-api
EMO (emo-detect + emo-v1): references/emo-api.md
AA (Animate Anyone): references/aa-api.md
T2I (text-to-image v2): https://help.aliyun.com/zh/model-studio/text-to-image-v2-api-reference
I2V (image-to-video): https://help.aliyun.com/zh/model-studio/image-to-video-api-reference/
Qwen TTS: https://help.aliyun.com/zh/model-studio/qwen-tts-realtime
LingMou: references/lingmou-api.md
OSS upload: references/oss-upload.md

LivePortrait: https://help.aliyun.com/zh/model-studio/liveportrait-api
EMO (emo-detect + emo-v1): references/emo-api.md
AA (Animate Anyone): references/aa-api.md
T2I (text-to-image v2): https://help.aliyun.com/zh/model-studio/text-to-image-v2-api-reference
I2V (image-to-video): https://help.aliyun.com/zh/model-studio/image-to-video-api-reference/
Qwen TTS: https://help.aliyun.com/zh/model-studio/qwen-tts-realtime
LingMou: references/lingmou-api.md
OSS upload: references/oss-upload.md

alibabacloud-avatar-video

Original

Translation

Human Avatar — Alibaba Cloud AI Video & Speech

数字人分身——阿里云AI视频与语音生成工具

Capabilities overview

功能概览

Quick selection guide

快速选择指南

Environment setup

环境配置

LingMou additionally:

LingMou额外依赖:

1. LivePortrait — talking-head video

1. LivePortrait——数字人分身视频

Image + audio file

图片+音频文件

Image + video (extract audio)

图片+视频（自动提取音频）

Public URLs

公共URL

2. Qwen TTS — text to speech

2. Qwen TTS——文本转语音

Auto model selection by scene

根据场景自动选择模型

Available voices

可用音色

Default (qwen3-tts-vd-realtime + Cherry)

默认配置（qwen3-tts-vd-realtime + Cherry）

Match by scene

按场景匹配

Style via instructions

通过指令定制风格

List options

列出可选选项

3. T2I — Wan 2.x text-to-image

3. T2I——万相2.x文本生成图片

Default model (wan2.2-t2i-flash, fast)

默认模型（wan2.2-t2i-flash，生成速度快）

Higher quality

更高画质

Latest (Wan 2.6)

最新版本（万相2.6）

4. I2V — Wan 2.x image-to-video

4. I2V——万相2.x图片生成视频

Local image → video

本地图片→视频

Pipeline: text → image → video

流水线：文本→图片→视频

With background music

添加背景音乐

5. AA AnimateAnyone — full-body animation

5. AA AnimateAnyone——全身动画

Local files (auto convert + OSS upload)

本地文件（自动转换+OSS上传）

Use image as background

使用图片作为背景

Skip Step 2 (existing template_id)

跳过步骤2（使用已有template_id）

6. EMO — talking head (legacy)

6. EMO——数字人分身（旧版本）

7. LingMou — enterprise template video

7. LingMou——企业模板化视频

New workflow (prefer no template_id)

新版流程（优先不指定template_id）

What scripts/avatar_video.py supports

scripts/avatar_video.py支持的功能

List templates

列出模板

Public templates (SDK 1.7.0+)

列出公共模板（SDK 1.7.0+）

Copy up to 3 public templates (SDK 1.7.0+)

复制最多3个公共模板（SDK 1.7.0+）

No template_id — random existing template

不指定template_id——使用随机已有模板

Specific template_id

指定template_id

Detail for randomly chosen template

查看随机选中模板的详情

Conversational usage

New workflow (prefer no
`template_id`
)

新版流程（优先不指定
`template_id`
）

What
`scripts/avatar_video.py`
supports

`scripts/avatar_video.py`
支持的功能