alibabacloud-avatar-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHuman Avatar — Alibaba Cloud AI Video & Speech
数字人分身——阿里云AI视频与语音生成工具
Capabilities overview
功能概览
| Capability | Script | Model / API | Region | Summary |
|---|---|---|---|---|
| LivePortrait | | | cn-beijing | Portrait + audio/video → talking video, two steps |
| EMO | | | cn-beijing | Portrait + audio → talking head, detect + generate |
| AA (AnimateAnyone) | | | cn-beijing | Full-body animation: detect → motion template → video |
| T2I | | | Multi-region | Text → image, default wan2.2-t2i-flash |
| I2V | | | Multi-region | Image → video; T2I→I2V pipeline supported; default wan2.7-i2v-flash |
| Qwen TTS | | | cn-beijing / Singapore | Text → speech; auto model/voice by scene |
| LingMou | | LingMou SDK | cn-beijing | Template-based digital-human broadcast video |
| 功能 | 脚本文件 | 模型/API | 地域 | 概述 |
|---|---|---|---|---|
| LivePortrait | | | cn-beijing | 肖像+音频/视频→数字人分身视频,两步流程 |
| EMO | | | cn-beijing | 肖像+音频→数字人分身,检测+生成 |
| AA(AnimateAnyone) | | | cn-beijing | 全身动画:检测→动作模板→视频 |
| T2I | | | 多地域 | 文本→图片,默认模型wan2.2-t2i-flash |
| I2V | | | 多地域 | 图片→视频;支持T2I→I2V流水线;默认模型wan2.7-i2v-flash |
| Qwen TTS | | | cn-beijing / Singapore | 文本→语音;根据场景自动选择模型/音色 |
| LingMou | | LingMou SDK | cn-beijing | 基于模板的数字人播报视频 |
Quick selection guide
快速选择指南
Talking head (have audio/video already) → LivePortrait
Talking head (no audio; synthesize first) → Qwen TTS → LivePortrait
Full-body dance / motion → AA (AnimateAnyone)
Text → image → T2I (text_to_image)
Image → video → I2V (image_to_video)
Text → video end-to-end → T2I → I2V (image_to_video --t2i-prompt)
Enterprise digital human / template news → LingMou (avatar_video)已有音频/视频的数字人分身 → LivePortrait
无音频(需先合成)的数字人分身 → Qwen TTS → LivePortrait
全身舞蹈/动作动画 → AA (AnimateAnyone)
文本→图片 → T2I (text_to_image)
图片→视频 → I2V (image_to_video)
端到端文本→视频 → T2I → I2V (image_to_video --t2i-prompt)
企业数字人/模板化新闻 → LingMou (avatar_video)Environment setup
环境配置
bash
pip install requests==2.33.1 dashscope==1.25.15 oss2==2.19.1 numpy==1.26.4bash
pip install requests==2.33.1 dashscope==1.25.15 oss2==2.19.1 numpy==1.26.4LingMou additionally:
LingMou额外依赖:
pip install alibabacloud-lingmou20250527==1.7.0 alibabacloud-tea-openapi==0.4.4
```bash
export DASHSCOPE_API_KEY=sk-xxxx # Beijing-region API key
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx # OSS upload
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com⚠️ API keys forand Singapore are not interchangeable; use the key for the correct region.cn-beijing
may include or omit theOSS_ENDPOINTprefix; scripts normalize it.https://
pip install alibabacloud-lingmou20250527==1.7.0 alibabacloud-tea-openapi==0.4.4
```bash
export DASHSCOPE_API_KEY=sk-xxxx # 北京地域API密钥
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx # OSS上传密钥
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com⚠️ 北京地域和新加坡地域的API密钥不可通用,请使用对应地域的密钥。
可包含或省略OSS_ENDPOINT前缀,脚本会自动标准化处理。https://
1. LivePortrait — talking-head video
1. LivePortrait——数字人分身视频
When to use: You have a portrait photo + speech and want a talking-head video quickly.
Flow:
Step 1: liveportrait-detect (sync) → pass=true
↓
Step 2: liveportrait (async) → video_urlImage: Single person, front-facing portrait, clear face, no occlusion
Audio: wav/mp3, < 15MB, 1s–3min
Video input: Audio extracted automatically (ffmpeg)
Audio: wav/mp3, < 15MB, 1s–3min
Video input: Audio extracted automatically (ffmpeg)
bash
undefined适用场景:你已有肖像照片+语音,想要快速生成数字人分身视频。
流程:
步骤1: liveportrait-detect(同步) → pass=true
↓
步骤2: liveportrait (异步) → video_url图片要求:单人正面肖像,面部清晰无遮挡
音频要求:wav/mp3格式,小于15MB,时长1秒–3分钟
视频输入:自动提取音频(依赖ffmpeg)
音频要求:wav/mp3格式,小于15MB,时长1秒–3分钟
视频输入:自动提取音频(依赖ffmpeg)
bash
undefinedImage + audio file
图片+音频文件
python scripts/live_portrait.py
--image ./portrait.jpg
--audio ./speech.mp3
--template normal --download
--image ./portrait.jpg
--audio ./speech.mp3
--template normal --download
python scripts/live_portrait.py
--image ./portrait.jpg
--audio ./speech.mp3
--template normal --download
--image ./portrait.jpg
--audio ./speech.mp3
--template normal --download
Image + video (extract audio)
图片+视频(自动提取音频)
python scripts/live_portrait.py
--image ./portrait.jpg
--video ./speech_video.mp4
--template active --download
--image ./portrait.jpg
--video ./speech_video.mp4
--template active --download
python scripts/live_portrait.py
--image ./portrait.jpg
--video ./speech_video.mp4
--template active --download
--image ./portrait.jpg
--video ./speech_video.mp4
--template active --download
Public URLs
公共URL
python scripts/live_portrait.py
--image-url "https://..."
--audio-url "https://..."
--mouth-strength 1.2 --download
--image-url "https://..."
--audio-url "https://..."
--mouth-strength 1.2 --download
**Motion templates**:
- `normal` (default, moderate motion)
- `calm` (calm; news / storytelling)
- `active` (lively; singing / hosting)
---python scripts/live_portrait.py
--image-url "https://..."
--audio-url "https://..."
--mouth-strength 1.2 --download
--image-url "https://..."
--audio-url "https://..."
--mouth-strength 1.2 --download
**动作模板**:
- `normal`(默认,动作幅度适中)
- `calm`(沉稳风格;适用于新闻/故事讲述)
- `active`(活泼风格;适用于唱歌/主持)
---2. Qwen TTS — text to speech
2. Qwen TTS——文本转语音
When to use: Generate speech files from text (for LivePortrait, EMO, etc.).
Default model:
qwen3-tts-vd-realtime-2026-01-15适用场景:将文本转换为语音文件(用于LivePortrait、EMO等场景)。
默认模型:
qwen3-tts-vd-realtime-2026-01-15Auto model selection by scene
根据场景自动选择模型
Scene | Suggested model | Suggested voice |
|---|---|---|
| | Cherry |
| | Serena / Ethan |
| | Cherry / Dylan |
| | Anna / Ethan |
| | Cherry / Chelsie |
场景参数 | 推荐模型 | 推荐音色 |
|---|---|---|
| | Cherry |
| | Serena / Ethan |
| | Cherry / Dylan |
| | Anna / Ethan |
| | Cherry / Chelsie |
Available voices
可用音色
| Voice | Character |
|---|---|
| Bright, sweet female; ads / audiobooks / dubbing |
| Mature, intellectual female; news / explainers / corporate |
| Steady, warm male; education / documentary / training |
| Expressive male; radio drama / game VO |
| Gentle, friendly female; support / assistant / daily |
| Young, fresh female; short video / e-commerce |
| Deep, magnetic male; brand / ads |
| Warm, soft female; meditation / storytelling |
bash
undefined| 音色 | 特点 |
|---|---|
| 明亮甜美女声;适用于广告/有声书/配音 |
| 成熟知性女声;适用于新闻/解说/企业宣传 |
| 沉稳温暖男声;适用于教育/纪录片/培训 |
| 富有表现力男声;适用于广播剧/游戏配音 |
| 温柔友好女声;适用于客服/助手/日常场景 |
| 清新年轻女声;适用于短视频/电商 |
| 磁性低沉男声;适用于品牌/广告 |
| 温暖柔和女声;适用于冥想/故事讲述 |
bash
undefinedDefault (qwen3-tts-vd-realtime + Cherry)
默认配置(qwen3-tts-vd-realtime + Cherry)
python scripts/qwen_tts.py --text "Hello, welcome to Qwen TTS." --download
python scripts/qwen_tts.py --text "Hello, welcome to Qwen TTS." --download
Match by scene
按场景匹配
python scripts/qwen_tts.py --text "Today's market..." --scene news --download
python scripts/qwen_tts.py --text "Once upon a time..." --scene audiobook --download
python scripts/qwen_tts.py --text "Today's market..." --scene news --download
python scripts/qwen_tts.py --text "Once upon a time..." --scene audiobook --download
Style via instructions
通过指令定制风格
python scripts/qwen_tts.py
--text "Dear students..."
--model qwen3-tts-instruct-flash-realtime
--instructions "Warm tone, steady pace, suitable for teaching"
--download
--text "Dear students..."
--model qwen3-tts-instruct-flash-realtime
--instructions "Warm tone, steady pace, suitable for teaching"
--download
python scripts/qwen_tts.py
--text "Dear students..."
--model qwen3-tts-instruct-flash-realtime
--instructions "Warm tone, steady pace, suitable for teaching"
--download
--text "Dear students..."
--model qwen3-tts-instruct-flash-realtime
--instructions "Warm tone, steady pace, suitable for teaching"
--download
List options
列出可选选项
python scripts/qwen_tts.py --list-voices
python scripts/qwen_tts.py --list-models
---python scripts/qwen_tts.py --list-voices
python scripts/qwen_tts.py --list-models
---3. T2I — Wan 2.x text-to-image
3. T2I——万相2.x文本生成图片
When to use: Generate images from text (optionally feed into I2V).
bash
undefined适用场景:根据文本生成图片(可输入至I2V生成视频)。
bash
undefinedDefault model (wan2.2-t2i-flash, fast)
默认模型(wan2.2-t2i-flash,生成速度快)
python scripts/text_to_image.py
--prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light"
--size 960*1696 --download
--prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light"
--size 960*1696 --download
python scripts/text_to_image.py
--prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light"
--size 960*1696 --download
--prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light"
--size 960*1696 --download
Higher quality
更高画质
python scripts/text_to_image.py
--prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download
--prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download
python scripts/text_to_image.py
--prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download
--prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download
Latest (Wan 2.6)
最新版本(万相2.6)
python scripts/text_to_image.py
--prompt "..." --model wan2.6-t2i --size 1280*1280 --n 1 --download
--prompt "..." --model wan2.6-t2i --size 1280*1280 --n 1 --download
**Models**:
- `wan2.2-t2i-flash` (default, fast, good for tests)
- `wan2.2-t2i-plus` (higher quality)
- `wan2.6-t2i` (latest; more aspect ratios; sync call)
**Common sizes**: `1280*1280` (1:1) / `960*1696` (9:16) / `1696*960` (16:9)
---python scripts/text_to_image.py
--prompt "..." --model wan2.6-t2i --size 1280*1280 --n 1 --download
--prompt "..." --model wan2.6-t2i --size 1280*1280 --n 1 --download
**模型说明**:
- `wan2.2-t2i-flash`(默认,速度快,适合测试)
- `wan2.2-t2i-plus`(画质更高)
- `wan2.6-t2i`(最新版本;支持更多比例;同步调用)
**常用尺寸**:`1280*1280`(1:1)/ `960*1696`(9:16)/ `1696*960`(16:9)
---4. I2V — Wan 2.x image-to-video
4. I2V——万相2.x图片生成视频
When to use: Turn an image into motion video; supports text-to-video via T2I first.
bash
undefined适用场景:将图片转换为动态视频;支持先通过T2I生成图片再转视频的端到端文本转视频流程。
bash
undefinedLocal image → video
本地图片→视频
python scripts/image_to_video.py
--image ./portrait.jpg
--prompt "She turns slowly and smiles; dress and petals drift gently"
--model wan2.7-i2v
--resolution 720P --duration 5 --download
--image ./portrait.jpg
--prompt "She turns slowly and smiles; dress and petals drift gently"
--model wan2.7-i2v
--resolution 720P --duration 5 --download
python scripts/image_to_video.py
--image ./portrait.jpg
--prompt "She turns slowly and smiles; dress and petals drift gently"
--model wan2.7-i2v
--resolution 720P --duration 5 --download
--image ./portrait.jpg
--prompt "She turns slowly and smiles; dress and petals drift gently"
--model wan2.7-i2v
--resolution 720P --duration 5 --download
Pipeline: text → image → video
流水线:文本→图片→视频
python scripts/image_to_video.py
--t2i-prompt "A woman in Hanfu in a peach blossom forest"
--prompt "She turns slowly; petals fall; poetic mood"
--download --output result.mp4
--t2i-prompt "A woman in Hanfu in a peach blossom forest"
--prompt "She turns slowly; petals fall; poetic mood"
--download --output result.mp4
python scripts/image_to_video.py
--t2i-prompt "A woman in Hanfu in a peach blossom forest"
--prompt "She turns slowly; petals fall; poetic mood"
--download --output result.mp4
--t2i-prompt "A woman in Hanfu in a peach blossom forest"
--prompt "She turns slowly; petals fall; poetic mood"
--download --output result.mp4
With background music
添加背景音乐
python scripts/image_to_video.py
--image ./portrait.jpg
--audio-url "https://..."
--prompt "..." --download
--image ./portrait.jpg
--audio-url "https://..."
--prompt "..." --download
**Models**:
- `wan2.7-i2v` (default; includes sound; 5s/10s)
- `wan2.5-i2v-preview` (high-quality preview)
- `wan2.2-i2v-plus` (no built-in audio; faster)
---python scripts/image_to_video.py
--image ./portrait.jpg
--audio-url "https://..."
--prompt "..." --download
--image ./portrait.jpg
--audio-url "https://..."
--prompt "..." --download
**模型说明**:
- `wan2.7-i2v`(默认;自带音效;支持5秒/10秒时长)
- `wan2.5-i2v-preview`(高质量预览版)
- `wan2.2-i2v-plus`(无内置音效;生成速度更快)
---5. AA AnimateAnyone — full-body animation
5. AA AnimateAnyone——全身动画
When to use: Full-body photo + reference motion video → dance / motion video.
Requirements:
- Image: Single person, full body front, head to toe, aspect ratio 0.5–2.0
- Video: Full body in frame from first frame; mp4/avi/mov; fps ≥ 24; 2–60s
Three steps:
Step 1: animate-anyone-detect-gen2 (sync) → check_pass=true
↓
Step 2: animate-anyone-template-gen2 (async) → template_id (~3–5 min)
↓
Step 3: animate-anyone-gen2 (async) → video_url (~3–5 min)bash
undefined适用场景:全身照片+参考动作视频→舞蹈/动作视频。
要求:
- 图片:单人正面全身照,从头到脚完整入镜,宽高比0.5–2.0
- 视频:第一帧开始全身入镜;格式为mp4/avi/mov;帧率≥24;时长2–60秒
三步流程:
步骤1: animate-anyone-detect-gen2 (同步) → check_pass=true
↓
步骤2: animate-anyone-template-gen2(异步) → template_id(约3–5分钟)
↓
步骤3: animate-anyone-gen2 (异步) → video_url(约3–5分钟)bash
undefinedLocal files (auto convert + OSS upload)
本地文件(自动转换+OSS上传)
python scripts/animate_anyone.py
--image ./portrait_fullbody.jpg
--video ./dance.mp4
--download --output result.mp4
--image ./portrait_fullbody.jpg
--video ./dance.mp4
--download --output result.mp4
python scripts/animate_anyone.py
--image ./portrait_fullbody.jpg
--video ./dance.mp4
--download --output result.mp4
--image ./portrait_fullbody.jpg
--video ./dance.mp4
--download --output result.mp4
Use image as background
使用图片作为背景
python scripts/animate_anyone.py
--image ./portrait.jpg --video ./dance.mp4
--use-ref-img-bg --video-ratio 9:16 --download
--image ./portrait.jpg --video ./dance.mp4
--use-ref-img-bg --video-ratio 9:16 --download
python scripts/animate_anyone.py
--image ./portrait.jpg --video ./dance.mp4
--use-ref-img-bg --video-ratio 9:16 --download
--image ./portrait.jpg --video ./dance.mp4
--use-ref-img-bg --video-ratio 9:16 --download
Skip Step 2 (existing template_id)
跳过步骤2(使用已有template_id)
python scripts/animate_anyone.py
--image ./portrait.jpg
--template-id "AACT.xxx.xxx" --download
--image ./portrait.jpg
--template-id "AACT.xxx.xxx" --download
> Auto conversion: video webm/mkv/flv → mp4; image webp/heic → jpg; if fps is under 24, normalize to 24 fps
---python scripts/animate_anyone.py
--image ./portrait.jpg
--template-id "AACT.xxx.xxx" --download
--image ./portrait.jpg
--template-id "AACT.xxx.xxx" --download
> 自动转换:视频格式webm/mkv/flv→mp4;图片格式webp/heic→jpg;若帧率低于24,自动标准化为24帧
---6. EMO — talking head (legacy)
6. EMO——数字人分身(旧版本)
Note: Prefer LivePortrait; EMO suits cases that need stricter lip-sync.
bash
python scripts/portrait_animate.py \
--image ./portrait.jpg \
--audio ./speech.mp3 \
--download注意:优先使用LivePortrait;EMO适用于对唇形同步要求更严格的场景。
bash
python scripts/portrait_animate.py \
--image ./portrait.jpg \
--audio ./speech.mp3 \
--download7. LingMou — enterprise template video
7. LingMou——企业模板化视频
When to use: Corporate digital-human news, template-based broadcasts, scripted reads with optional character images.
适用场景:企业数字人新闻、模板化播报、带脚本的朗读视频(可选自定义人物图片)。
New workflow (prefer no template_id
)
template_id新版流程(优先不指定template_id
)
template_id- If the user provides : use that template to generate.
template_id - If no :
template_id- List existing broadcast templates for the account.
- If any exist, pick one at random for creation.
- If none, fetch public templates and copy up to 3 into the account.
- Pick one at random from the copy results and continue.
- Caveat: After a public template is copied, the copy may not yet be a fully “ready-to-render” template; some copies are still drafts and may lack clips, assets, or variable bindings—complete them in LingMou.
- If the user only gives an image and “make a talking video” without a script: confirm the spoken copy before generating.
- 如果用户提供:使用指定模板生成视频。
template_id - 如果未指定:
template_id- 列出账号下已有的播报模板。
- 若有模板,随机选择一个进行生成。
- 若无模板,获取公共模板并最多复制3个到账号下。
- 从复制的模板中随机选择一个继续生成。
- 注意:公共模板复制后,副本可能尚未完全准备就绪;部分副本仍为草稿状态,缺少片段、资源或变量绑定,需在LingMou平台中完成配置。
- 如果用户仅提供图片并要求“制作数字人分身视频”但未提供脚本:需先确认朗读内容再生成。
What scripts/avatar_video.py
supports
scripts/avatar_video.pyscripts/avatar_video.py
支持的功能
scripts/avatar_video.py- : list account templates
--list-templates - : list public templates (SDK 1.7.0+)
--list-public-templates - : copy up to 3 public templates (SDK 1.7.0+)
--copy-public-templates - Omit : random existing template
--template-id - When local templates are empty: auto try public-template copy as fallback
- : template detail and replaceable variables
--show-template-detail - Fills input text into template text variables (prefers /
text_content)test_text - If generation fails right after copying a public template, surfaces a clear error that the template may still need completion (no silent failure)
bash
undefined- : 列出账号下的模板
--list-templates - : 列出公共模板(SDK 1.7.0+)
--list-public-templates - : 最多复制3个公共模板(SDK 1.7.0+)
--copy-public-templates - 省略: 使用随机已有模板
--template-id - 本地模板为空时:自动尝试复制公共模板作为备选
- : 查看模板详情及可替换变量
--show-template-detail - 将输入文本填充到模板的文本变量中(优先匹配/
text_content)test_text - 若复制公共模板后立即生成失败,会明确提示模板可能仍需配置(不会静默失败)
bash
undefinedList templates
列出模板
python scripts/avatar_video.py --list-templates
python scripts/avatar_video.py --list-templates
Public templates (SDK 1.7.0+)
列出公共模板(SDK 1.7.0+)
python scripts/avatar_video.py --list-public-templates
python scripts/avatar_video.py --list-public-templates
Copy up to 3 public templates (SDK 1.7.0+)
复制最多3个公共模板(SDK 1.7.0+)
python scripts/avatar_video.py --copy-public-templates
python scripts/avatar_video.py --copy-public-templates
No template_id — random existing template
不指定template_id——使用随机已有模板
python scripts/avatar_video.py
--text "Hello, welcome to today's tech news."
--download
--text "Hello, welcome to today's tech news."
--download
python scripts/avatar_video.py
--text "Hello, welcome to today's tech news."
--download
--text "Hello, welcome to today's tech news."
--download
Specific template_id
指定template_id
python scripts/avatar_video.py
--template-id "BS1b2WNnRMu4ouRzT4clY9Jhg"
--text "Hello, welcome to today's tech news."
--download
--template-id "BS1b2WNnRMu4ouRzT4clY9Jhg"
--text "Hello, welcome to today's tech news."
--download
python scripts/avatar_video.py
--template-id "BS1b2WNnRMu4ouRzT4clY9Jhg"
--text "Hello, welcome to today's tech news."
--download
--template-id "BS1b2WNnRMu4ouRzT4clY9Jhg"
--text "Hello, welcome to today's tech news."
--download
Detail for randomly chosen template
查看随机选中模板的详情
python scripts/avatar_video.py
--show-template-detail
--text "This is a test script for broadcast."
--show-template-detail
--text "This is a test script for broadcast."
undefinedpython scripts/avatar_video.py
--show-template-detail
--text "This is a test script for broadcast."
--show-template-detail
--text "This is a test script for broadcast."
undefinedConversational usage
对话式使用指南
When the user says things like:
- “Make a talking video from this image”
- “Digital-human broadcast for me”
- “Upload image and make a news read”
Do this:
- Check whether they already gave copy/script ready to read.
- If not, ask: “What is the exact script to read? You can give bullet points and I can turn them into broadcast-ready copy.”
- With script in hand, run LingMou: prefer random existing template; if none locally, try public copy.
- If they uploaded a portrait but the template API does not use it, explain: this path is template-driven; for image-driven talking head, use LivePortrait or EMO.
当用户提出以下需求时:
- “用这张图片制作数字人分身视频”
- “帮我生成数字人播报视频”
- “上传图片并制作新闻朗读视频”
请按以下步骤操作:
- 检查用户是否已提供可直接朗读的文案/脚本。
- 若未提供,询问:“请提供具体的朗读脚本,你可以给出要点,我会将其转换为适合播报的文案。”
- 获取脚本后,运行LingMou:优先使用随机已有模板;若本地无模板,尝试复制公共模板。
- 如果用户上传了肖像但模板API未使用该图片,需说明:此路径为模板驱动;若需基于图片生成数字人分身,请使用LivePortrait或EMO。
API reference links
API参考链接
- LivePortrait: https://help.aliyun.com/zh/model-studio/liveportrait-api
- EMO (emo-detect + emo-v1): references/emo-api.md
- AA (Animate Anyone): references/aa-api.md
- T2I (text-to-image v2): https://help.aliyun.com/zh/model-studio/text-to-image-v2-api-reference
- I2V (image-to-video): https://help.aliyun.com/zh/model-studio/image-to-video-api-reference/
- Qwen TTS: https://help.aliyun.com/zh/model-studio/qwen-tts-realtime
- LingMou: references/lingmou-api.md
- OSS upload: references/oss-upload.md
- LivePortrait: https://help.aliyun.com/zh/model-studio/liveportrait-api
- EMO (emo-detect + emo-v1): references/emo-api.md
- AA (Animate Anyone): references/aa-api.md
- T2I (text-to-image v2): https://help.aliyun.com/zh/model-studio/text-to-image-v2-api-reference
- I2V (image-to-video): https://help.aliyun.com/zh/model-studio/image-to-video-api-reference/
- Qwen TTS: https://help.aliyun.com/zh/model-studio/qwen-tts-realtime
- LingMou: references/lingmou-api.md
- OSS upload: references/oss-upload.md