alicloud-ai-multimodal-qwen-vl
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCategory: provider
分类:provider
Model Studio Qwen VL (Image Understanding)
Model Studio Qwen VL(图像理解)
Use Qwen VL models for image input + text output understanding tasks via DashScope compatible-mode API.
通过兼容DashScope模式的API,使用Qwen VL模型完成图像输入+文本输出的理解任务。
Prerequisites
前置条件
- Install dependencies (recommended in a venv):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests- Set in environment, or add
DASHSCOPE_API_KEYtodashscope_api_key.~/.alibabacloud/credentials
- 安装依赖(推荐在虚拟环境中进行):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests- 在环境变量中设置,或在
DASHSCOPE_API_KEY中添加~/.alibabacloud/credentials配置。dashscope_api_key
Critical model names
关键模型名称
Prefer the Qwen3 VL family:
qwen3-vl-plusqwen3-vl-flash
When you need explicit "latest" routing or reproducible snapshots, use supported aliases/snapshots from the official model list, such as:
qwen3-vl-plus-latestqwen3-vl-plus-2025-12-19qwen3-vl-flash-latest
Legacy names still seen in some workloads:
qwen-vl-max-latestqwen-vl-plus-latest
优先选择Qwen3 VL系列:
qwen3-vl-plusqwen3-vl-flash
当需要明确的「最新版本」路由或可复现的快照时,可使用官方模型列表中的支持别名/快照,例如:
qwen3-vl-plus-latestqwen3-vl-plus-2025-12-19qwen3-vl-flash-latest
部分工作流中仍可见旧版模型名称:
qwen-vl-max-latestqwen-vl-plus-latest
Normalized interface (multimodal.chat)
标准化接口(multimodal.chat)
Request
请求参数
- (string, required): user question/instruction about image.
prompt - (string, required): HTTPS URL, local path, or
imageURL.data: - (string, optional): default
model.qwen3-vl-plus - (int, optional): default
max_tokens.512 - (float, optional): default
temperature.0.2 - (string, optional):
detail/auto/low, defaulthigh.auto - (bool, optional): return JSON-only response when possible.
json_mode - (object, optional): JSON Schema for structured extraction.
schema - (int, optional): retry count for
max_retries, default429/5xx.2 - (float, optional): exponential backoff base seconds, default
retry_backoff_s.1.5
- (字符串,必填):关于图像的用户问题/指令。
prompt - (字符串,必填):HTTPS链接、本地路径或
image格式URL。data: - (字符串,可选):默认值为
model。qwen3-vl-plus - (整数,可选):默认值为512。
max_tokens - (浮点数,可选):默认值为0.2。
temperature - (字符串,可选):可选值
detail/auto/low,默认值为high。auto - (布尔值,可选):开启时尽可能返回纯JSON格式响应。
json_mode - (对象,可选):用于结构化提取的JSON Schema。
schema - (整数,可选):针对
max_retries错误的重试次数,默认值为2。429/5xx - (浮点数,可选):指数退避的基础秒数,默认值为1.5。
retry_backoff_s
Response
响应参数
- (string): primary model answer.
text - (string): model actually used.
model - (object): token usage if returned by backend.
usage
- (字符串):模型返回的主要回答内容。
text - (字符串):实际调用的模型名称。
model - (对象):后端返回的token使用情况(若有)。
usage
Quickstart
快速开始
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"请概括这张图里的主要内容","image":"https://example.com/demo.jpg"}' \
--print-responseUsing local image:
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"提取图片中的关键信息","image":"./samples/invoice.png","model":"qwen3-vl-plus"}' \
--print-responseStructured extraction (JSON mode):
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"提取字段: title, amount, date","image":"./samples/invoice.png"}' \
--json-mode \
--print-responseStructured extraction (JSON Schema):
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"提取发票字段","image":"./samples/invoice.png"}' \
--schema skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/references/examples/invoice.schema.json \
--print-responsebash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"请概括这张图里的主要内容","image":"https://example.com/demo.jpg"}' \
--print-response使用本地图像:
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"提取图片中的关键信息","image":"./samples/invoice.png","model":"qwen3-vl-plus"}' \
--print-response结构化提取(JSON模式):
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"提取字段: title, amount, date","image":"./samples/invoice.png"}' \
--json-mode \
--print-response结构化提取(JSON Schema):
bash
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"提取发票字段","image":"./samples/invoice.png"}' \
--schema skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/references/examples/invoice.schema.json \
--print-responsecURL (compatible mode)
cURL(兼容模式)
bash
curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model":"qwen3-vl-plus",
"messages":[
{
"role":"user",
"content":[
{"type":"image_url","image_url":{"url":"https://example.com/demo.jpg"}},
{"type":"text","text":"请描述这张图并列出可执行动作"}
]
}
],
"max_tokens":512,
"temperature":0.2
}'bash
curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model":"qwen3-vl-plus",
"messages":[
{
"role":"user",
"content":[
{"type":"image_url","image_url":{"url":"https://example.com/demo.jpg"}},
{"type":"text","text":"请描述这张图并列出可执行动作"}
]
}
],
"max_tokens":512,
"temperature":0.2
}'Output location
输出位置
- If is set, JSON response is saved to that file.
--output - Default output dir convention: .
output/ai-multimodal-qwen-vl/
- 若设置参数,JSON响应将保存至指定文件。
--output - 默认输出目录规则:。
output/ai-multimodal-qwen-vl/
Smoke test
冒烟测试
bash
python tests/ai/multimodal/alicloud-ai-multimodal-qwen-vl-test/scripts/smoke_test_qwen_vl.py \
--image output/ai-image-qwen-image/images/vl_test_cat.pngbash
python tests/ai/multimodal/alicloud-ai-multimodal-qwen-vl-test/scripts/smoke_test_qwen_vl.py \
--image output/ai-image-qwen-image/images/vl_test_cat.pngError handling
错误处理
| Error | Likely cause | Action |
|---|---|---|
| 401/403 | Missing or invalid key | Check |
| 400 | Invalid request schema or unsupported image source | Validate |
| 429 | Rate limit | Retry with exponential backoff and lower concurrency. |
| 5xx | Temporary backend issue | Retry with backoff and idempotent request design. |
| 错误码 | 可能原因 | 处理措施 |
|---|---|---|
| 401/403 | 密钥缺失或无效 | 检查 |
| 400 | 请求格式无效或图像来源不支持 | 验证 |
| 429 | 触发速率限制 | 使用指数退避重试,并降低并发量。 |
| 5xx | 后端临时故障 | 使用退避策略重试,并设计幂等请求。 |
Operational guidance
操作指南
- For stable production behavior, pin snapshot model IDs instead of pure .
-latest - Compress very large images before upload to reduce latency and cost.
- Add explicit extraction constraints in prompt (fields, JSON shape, language).
- For OCR-like output, ask for confidence notes and unresolved text markers.
- 若需稳定的生产环境表现,请固定快照模型ID,而非仅使用别名。
-latest - 上传前压缩大尺寸图像,以降低延迟和成本。
- 在prompt中添加明确的提取约束(字段、JSON格式、语言)。
- 对于类OCR输出,可要求返回置信度说明及未识别文本标记。
References
参考资料
- Source list:
references/sources.md - API notes:
references/api_reference.md
- 来源列表:
references/sources.md - API说明:
references/api_reference.md