qianwen-vision

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Agent setup: If your agent doesn't auto-load skills (e.g. Claude Code), see agent-compatibility.md once per session.

Agent配置: 如果你的Agent无法自动加载技能（如Claude Code），请查看agent-compatibility.md，每个会话只需查看一次。

Qwen Vision (Image & Video Understanding)

Qwen Vision（图像与视频理解）

Analyze images and videos using Qwen VL and QVQ models. This skill is part of QianWen-AI/qianwen-ai.

使用Qwen VL和QVQ模型分析图像与视频。该技能属于QianWen-AI/qianwen-ai体系。

Skill directory

技能目录

Use this skill's internal files to execute and learn. Load reference files on demand when the default path fails or you need details.

Location	Purpose
`scripts/analyze.py`	Image/video understanding, multi-image, thinking mode
`scripts/reason.py`	Visual reasoning (QVQ, chain-of-thought, streaming)
`scripts/ocr.py`	OCR text extraction
`scripts/vision_lib.py`	Shared helpers (base64, upload, streaming)
`references/execution-guide.md`	Fallback: curl, code generation
`references/curl-examples.md`	Curl for base64, multi-image, video, OCR
`references/visual-reasoning.md`	QVQ and thinking mode details
`references/prompt-guide.md`	Query prompt templates by task, thinking mode decision
`references/ocr.md`	OCR parameters and examples
`references/sources.md`	Official documentation URLs
`references/agent-compatibility.md`	Agent self-check: register skills in project config for agents that don't auto-load

使用本技能的内部文件执行任务并学习。当默认路径失效或需要详细信息时，按需加载参考文件。

位置	用途
`scripts/analyze.py`	图像/视频理解、多图像分析、思维模式
`scripts/reason.py`	视觉推理（QVQ、思维链、流式输出）
`scripts/ocr.py`	OCR文本提取
`scripts/vision_lib.py`	共享辅助工具（base64、上传、流式输出）
`references/execution-guide.md`	备选方案：curl命令、代码生成
`references/curl-examples.md`	用于base64、多图像、视频、OCR的curl示例
`references/visual-reasoning.md`	QVQ与思维模式详情
`references/prompt-guide.md`	按任务分类的查询提示模板、思维模式决策
`references/ocr.md`	OCR参数与示例
`references/sources.md`	官方文档URL
`references/agent-compatibility.md`	Agent自检：对于无法自动加载技能的Agent，需在项目配置中注册技能

Security

安全规范

NEVER output any API key or credential in plaintext. Always use variable references (

$DASHSCOPE_API_KEY

in shell,

os.environ["DASHSCOPE_API_KEY"]

in Python). Any check or detection of credentials must be non-plaintext: report only status (e.g. "set" / "not set", "valid" / "invalid"), never the value. Never display contents of

.env

or config files that may contain secrets.

When the API key is not configured, NEVER ask the user to provide it directly. Instead, help create a

.env

file with a placeholder (

DASHSCOPE_API_KEY=sk-your-key-here

) and instruct the user to replace it with their actual key from the QianWen Console. Only write the actual key value if the user explicitly requests it.

绝对不要明文输出任何API密钥或凭证。请始终使用变量引用（Shell中使用

$DASHSCOPE_API_KEY

，Python中使用

os.environ["DASHSCOPE_API_KEY"]

）。对凭证的任何检查或检测都必须采用非明文方式：仅报告状态（例如“已设置”/“未设置”、“有效”/“无效”），绝不能显示具体值。永远不要显示

.env

或可能包含机密信息的配置文件内容。

当API密钥未配置时，绝对不要直接要求用户提供密钥。相反，应帮助用户创建一个包含占位符的

.env

文件（

DASHSCOPE_API_KEY=sk-your-key-here

），并指导用户从QianWen控制台获取实际密钥后替换占位符。仅当用户明确要求时，才写入实际密钥值。

Key Compatibility

密钥兼容性

Scripts require a standard QianWen API key (

sk-...

). Token Plan 团队版 keys (

sk-sp-...

) target a different endpoint (

token-plan.cn-beijing.maas.aliyuncs.com

) and do not include dedicated vision models (qwen3-vl-plus, qvq-max, qwen-vl-ocr, etc.). Standard

sk-

key required for vision. The scripts detect

sk-sp-

keys at startup and print a warning. If qianwen-ops-auth is installed, see its

references/tokenplan.md

for full details.

脚本需要标准QianWen API密钥（格式为

sk-...

）。Token Plan团队版密钥（格式为

sk-sp-...

）指向不同的端点（

token-plan.cn-beijing.maas.aliyuncs.com

），且不包含专用视觉模型（如qwen3-vl-plus、qvq-max、qwen-vl-ocr等）。使用视觉功能必须使用标准

sk-

格式密钥。脚本启动时会检测

sk-sp-

格式密钥并打印警告。如果已安装qianwen-ops-auth，请查看其

references/tokenplan.md

获取完整详情。

Model Selection

模型选择

Model	Use Case
qwen3.6-plus	Preferred — latest flagship, unified multimodal (text+image+video). Thinking on by default. Best balance of quality, speed, cost.
qwen3.5-plus	Unified multimodal (text+image+video). Thinking on by default.
qwen3.5-flash	Fast multimodal — cheaper, faster. Thinking on by default.
qwen3-vl-plus	High-precision — object localization (2D/3D), document/webpage parsing.
qwen3-vl-flash	Fast vision — lower latency, 33 languages.
qvq-max	Visual reasoning — chain-of-thought for math, charts. Streaming only.
qwen-vl-ocr	OCR — text extraction, table parsing, document scanning.
qwen-vl-max	Qwen2.5-VL — best-performing in 2.5 series.
qwen-vl-plus	Qwen2.5-VL — faster, good balance of performance and cost, 11 languages.

User specified a model → use directly.
Consult the qianwen-model-selector skill when model choice depends on requirement, scenario, or pricing.
No signal, clear task →
```
qwen3.6-plus
```
. Use
```
qwen3-vl-plus
```
for precise localization or 3D detection.

⚠️ Important: The model list above is a point-in-time snapshot and may be outdated. Model availability changes frequently. Always check the official model list for the authoritative, up-to-date catalog before making model decisions.

Model details: For more information about a specific model, direct the user to its detail page:
https://www.qianwenai.com/models/<model-name>
(replace
<model-name>
with the exact model ID, e.g.
qwen3.6-plus
→ https://www.qianwenai.com/models/qwen3.6-plus). NEVER modify or guess the model name in the URL.

Dynamic model queries: If the qianwen-model-selector skill or QianWen CLI (
qianwen models info <model>
) is available, use it for real-time model data. CLI requires authentication — see the qianwen-usage skill for login flow.

模型	使用场景
qwen3.6-plus	首选 — 最新旗舰模型，统一多模态（文本+图像+视频）。默认开启思维模式。在质量、速度、成本间达到最佳平衡。
qwen3.5-plus	统一多模态（文本+图像+视频）。默认开启思维模式。
qwen3.5-flash	快速多模态模型 — 成本更低、速度更快。默认开启思维模式。
qwen3-vl-plus	高精度模型 — 目标定位（2D/3D）、文档/网页解析。
qwen3-vl-flash	快速视觉模型 — 延迟更低，支持33种语言。
qvq-max	视觉推理模型 — 针对数学、图表的思维链推理。仅支持流式输出。
qwen-vl-ocr	OCR模型 — 文本提取、表格解析、文档扫描。
qwen-vl-max	Qwen2.5-VL — 2.5系列中性能最佳的模型。
qwen-vl-plus	Qwen2.5-VL — 速度更快，在性能与成本间达到良好平衡，支持11种语言。

用户指定模型 → 直接使用该模型。
当模型选择取决于需求、场景或定价时 → 咨询qianwen-model-selector技能。
无明确信号、任务清晰 → 使用
```
qwen3.6-plus
```
。如需精准定位或3D检测，使用
```
qwen3-vl-plus
```
。

⚠️ 重要提示: 上述模型列表为当前时间点的快照，可能已过时。模型可用性会频繁变化。在做出模型选择前，请务必查看官方模型列表获取权威、最新的目录信息。

模型详情: 如需了解特定模型的更多信息，请引导用户访问其详情页面：
https://www.qianwenai.com/models/<model-name>
（将
<model-name>
替换为准确的模型ID，例如
qwen3.6-plus
对应的链接为https://www.qianwenai.com/models/qwen3.6-plus）。**绝对不要修改或猜测URL中的模型名称。**

动态模型查询: 如果qianwen-model-selector技能或QianWen CLI（
qianwen models info <model>
）可用，请使用它们获取实时模型数据。CLI需要身份验证 — 请查看qianwen-usage技能了解登录流程。

Execution

执行步骤

Prerequisites

前置条件

API Key: Check that
```
DASHSCOPE_API_KEY
```
(or
```
QIANWEN_API_KEY
```
) is set using a non-plaintext check only (e.g. in shell:
```
[ -n "$DASHSCOPE_API_KEY" ]
```
; report only "set" or "not set", never the key value). If not set: run the * qianwen-ops-auth* skill if available; otherwise guide the user to obtain a key from QianWen Console and set it via
```
.env
```
file (
```
echo 'DASHSCOPE_API_KEY=sk-your-key-here' >> .env
```
in project root or current directory) or environment variable. The script searches for
```
.env
```
in the current working directory and the project root. Skills may be installed independently — do not assume qianwen-ops-auth is present.
Python 3.9+ (stdlib only, no pip install needed)

API密钥: 使用非明文方式检查
```
DASHSCOPE_API_KEY
```
（或
```
QIANWEN_API_KEY
```
）是否已设置（例如Shell中使用：
```
[ -n "$DASHSCOPE_API_KEY" ]
```
；仅报告“已设置”或“未设置”，绝不能显示密钥值）。如果未设置：若qianwen-ops-auth技能可用则运行该技能；否则引导用户从QianWen控制台获取密钥，并通过
```
.env
```
文件（在项目根目录或当前目录执行
```
echo 'DASHSCOPE_API_KEY=sk-your-key-here' >> .env
```
）或环境变量进行设置。脚本会在当前工作目录和项目根目录中搜索
```
.env
```
文件。技能可能独立安装 — 不要假设qianwen-ops-auth已存在。
Python 3.9+（仅使用标准库，无需通过pip安装依赖）

Environment Check

环境检查

Before first execution, verify Python is available:

bash

python3 --version  # must be 3.9+

python3

is not found, try

python --version

py -3 --version

. If Python is unavailable or below 3.9, skip to Path 2 (curl) in execution-guide.md.

首次执行前，请验证Python是否可用：

bash

python3 --version  # 版本必须为3.9+

如果找不到

python3

，请尝试

python --version

或

py -3 --version

。如果Python不可用或版本低于3.9，请直接跳转到execution-guide.md中的路径2（curl）。

Default: Run Script

默认方式：运行脚本

Script path: Scripts are in the

scripts/

subdirectory of this skill's directory (the directory containing this SKILL.md). You MUST first locate this skill's installation directory, then ALWAYS use the full absolute path to execute scripts. Do NOT assume scripts are in the current working directory. Do NOT use

cd

to switch directories before execution. Shared infrastructure lives in

scripts/vision_lib.py

Execution note: Run all scripts in the foreground — wait for stdout; do not background.

Discovery: Run

python3 <this-skill-dir>/scripts/analyze.py --help

(or

reason.py

ocr.py

) first to see all available arguments.

Script	Purpose	Default Model
`scripts/analyze.py`	Image understanding, multi-image, video, thinking mode, high-res	`qwen3.6-plus`
`scripts/reason.py`	Visual reasoning with chain-of-thought, video reasoning (always streaming)	`qvq-max`
`scripts/ocr.py`	OCR text extraction from documents, receipts, tables	`qwen-vl-ocr`

Input type fields (use exactly one in

--request

JSON):

Field	Use for	Example
`"image"`	Single image (URL or local path)	`"image": "photo.jpg"`
`"images"`	Multi-image comparison (array)	`"images": ["a.jpg", "b.jpg"]`
`"video"`	Video file (URL or local path)	`"video": "clip.mp4"`
`"video_frames"`	Video as frame array	`"video_frames": ["f1.jpg", "f2.jpg"]`

⚠️ Common mistake: Do NOT use
"image"
for video files — use
"video"
instead.

bash

undefined

脚本路径: 脚本位于本技能目录的

scripts/

子目录中（即包含本SKILL.md的目录）。你必须先定位到该技能的安装目录，然后始终使用完整绝对路径执行脚本。不要假设脚本位于当前工作目录。执行前不要使用

cd

切换目录。共享工具库位于

scripts/vision_lib.py

中。

执行注意事项: 所有脚本都在前台运行 — 等待标准输出，不要后台运行。

查看帮助: 先运行

python3 <本技能目录>/scripts/analyze.py --help

（或

reason.py

、

ocr.py

）查看所有可用参数。

脚本	用途	默认模型
`scripts/analyze.py`	图像理解、多图像分析、视频分析、思维模式、高分辨率处理	`qwen3.6-plus`
`scripts/reason.py`	基于思维链的视觉推理、视频推理（始终使用流式输出）	`qvq-max`
`scripts/ocr.py`	从文档、收据、表格中提取OCR文本	`qwen-vl-ocr`

输入类型字段（在

--request

JSON中仅使用其中一个）：

字段	适用场景	示例
`"image"`	单张图像（URL或本地路径）	`"image": "photo.jpg"`
`"images"`	多图像对比（数组）	`"images": ["a.jpg", "b.jpg"]`
`"video"`	视频文件（URL或本地路径）	`"video": "clip.mp4"`
`"video_frames"`	视频帧数组	`"video_frames": ["f1.jpg", "f2.jpg"]`

⚠️ 常见错误: 不要对视频文件使用
"image"
字段 — 请使用
"video"
字段。

bash

undefined

Image analysis

图像分析

python3 <this-skill-dir>/scripts/analyze.py
--request '{"prompt":"What is in this image?","image":"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}'
--output output/qianwen-vision/result.json --print-response

python3 <本技能目录>/scripts/analyze.py
--request '{"prompt":"这张图片里有什么？","image":"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}'
--output output/qianwen-vision/result.json --print-response

Video analysis (local file — add --upload-files for files >= 7 MB)

视频分析（本地文件 — 当文件大小≥7 MB时添加--upload-files参数）

python3 <this-skill-dir>/scripts/analyze.py
--request '{"prompt":"Describe what happens in this video","video":"clip.mp4"}'
--upload-files --print-response

python3 <this-skill-dir>/scripts/reason.py
--request '{"prompt":"Solve this math problem step by step","image":"problem.png"}'
--print-response

python3 <this-skill-dir>/scripts/ocr.py
--request '{"image":"invoice.jpg"}'
--print-response


| Argument | Description |
|----------|-------------|
| `--request '{...}'` | JSON request body |
| `--file path.json` | Load request from file |
| `--output path` | Save response JSON to path |
| `--print-response` | Print response to stdout |
| `--stream` | Enable streaming (auto for thinking/QVQ) |
| `--upload-files` | Upload local files to temp storage (for files > 7 MB) |
| `--schema path.json` | JSON Schema for structured extraction |

python3 <本技能目录>/scripts/analyze.py
--request '{"prompt":"描述这段视频中的内容","video":"clip.mp4"}'
--upload-files --print-response

python3 <本技能目录>/scripts/reason.py
--request '{"prompt":"一步步解决这道数学题","image":"problem.png"}'
--print-response

python3 <本技能目录>/scripts/ocr.py
--request '{"image":"invoice.jpg"}'
--print-response


| 参数 | 描述 |
|----------|-------------|
| `--request '{...}'` | JSON请求体 |
| `--file path.json` | 从文件加载请求 |
| `--output path` | 将响应JSON保存到指定路径 |
| `--print-response` | 将响应打印到标准输出 |
| `--stream` | 启用流式输出（思维模式/QVQ自动启用） |
| `--upload-files` | 将本地文件上传到临时存储（适用于大小>7 MB的文件） |
| `--schema path.json` | 用于结构化提取的JSON Schema |

Verify Result

结果验证

Exit code
```
0
```
+ output contains valid JSON with
```
choices
```
field → success
Non-zero exit, HTTP error, empty response, or JSON with
```
"code"
```
/
```
"message"
```
error → fail
Post-execution check: When
```
--output
```
is used, verify the response JSON file exists and contains expected content
MANDATORY — stderr signal check: After confirming the result, scan the command's stderr output for
```
[ACTION_REQUIRED]
```
or
```
[UPDATE_AVAILABLE]
```
. If either signal is present, you MUST follow the instructions in Update Check below before responding to the user.

退出码为
```
0
```
+ 输出包含带有
```
choices
```
字段的有效JSON → 执行成功
非零退出码、HTTP错误、空响应，或包含
```
"code"
```
/
```
"message"
```
错误字段的JSON → 执行失败
执行后检查: 使用
```
--output
```
参数时，验证响应JSON文件是否存在且包含预期内容
强制要求 — 标准错误信号检查: 确认结果后，扫描命令的标准错误输出是否包含
```
[ACTION_REQUIRED]
```
或
```
[UPDATE_AVAILABLE]
```
。如果存在任一信号，必须按照下方更新检查中的说明操作，再向用户回复。

On Failure

失败处理

If scripts fail, match the error output against the diagnostic table below to determine the resolution. If no match, read execution-guide.md for alternative paths: curl commands (Path 2), code generation (Path 3), and autonomous resolution (Path 5).

If Python is not available at all → skip directly to Path 2 (curl) in execution-guide.md.

Error Pattern	Diagnosis	Resolution
`command not found: python3`	Python not on PATH	Try `python` or `py -3` ; install Python 3.9+ if missing
`Python 3.9+ required`	Script version check failed	Upgrade Python to 3.9+
`SyntaxError` near type hints	Python < 3.9	Upgrade Python to 3.9+
`QIANWEN_API_KEY/DASHSCOPE_API_KEY not found`	Missing API key	Obtain key from QianWen Console; add to `.env` : `echo 'DASHSCOPE_API_KEY=sk-...' >> .env` ; or run qianwen-ops-auth if available
`HTTP 401`	Invalid or mismatched key	Run qianwen-ops-auth (non-plaintext check only); verify key is valid
`SSL: CERTIFICATE_VERIFY_FAILED`	SSL cert issue (proxy/corporate)	macOS: run `Install Certificates.command` ; else set `SSL_CERT_FILE` env var
`URLError` / `ConnectionError`	Network unreachable	Check internet; set `HTTPS_PROXY` if behind proxy
`HTTP 429`	Rate limited	Wait and retry with backoff
`HTTP 5xx`	Server error	Retry with backoff
`PermissionError`	Can't write output	Use `--output` to specify writable directory

如果脚本执行失败，请将错误输出与下方诊断表匹配以确定解决方案。如果无匹配项，请查看execution-guide.md获取备选方案：curl命令（路径2）、代码生成（路径3）和自主解决（路径5）。

如果完全无法使用Python → 直接跳转到execution-guide.md中的路径2（curl）。

错误模式	诊断结果	解决方案
`command not found: python3`	Python未在PATH中	尝试使用 `python` 或 `py -3` ；若缺失则安装Python 3.9+
`Python 3.9+ required`	脚本版本检查失败	将Python升级到3.9+
`SyntaxError` near type hints	Python版本低于3.9	将Python升级到3.9+
`QIANWEN_API_KEY/DASHSCOPE_API_KEY not found`	缺失API密钥	从QianWen控制台获取密钥；添加到 `.env` 文件： `echo 'DASHSCOPE_API_KEY=sk-...' >> .env` ；若qianwen-ops-auth可用则运行该技能
`HTTP 401`	密钥无效或不匹配	运行qianwen-ops-auth（仅采用非明文方式检查）；验证密钥是否有效
`SSL: CERTIFICATE_VERIFY_FAILED`	SSL证书问题（代理/企业环境）	macOS：运行 `Install Certificates.command` ；其他系统：设置 `SSL_CERT_FILE` 环境变量
`URLError` / `ConnectionError`	网络不可达	检查网络连接；若处于代理环境则设置 `HTTPS_PROXY`
`HTTP 429`	调用频率受限	等待后重试，采用指数退避策略
`HTTP 5xx`	服务器错误	等待后重试，采用指数退避策略
`PermissionError`	无法写入输出	使用 `--output` 参数指定可写入的目录

File Input

文件输入

The API accepts: HTTP/HTTPS URL, Base64 data URI, and oss://
URL. Local file paths are NOT directly supported — scripts handle conversion automatically. Pass local paths directly; no manual upload step needed.

Large file rule: If the local file is >= 7 MB, always add
--upload-files
. Base64 encoding inflates size by ~33% and will exceed the 10 MB API limit. Small files (including short video clips < 7 MB) can use the default base64 path.

Method	When to use	How
Online URL	File already hosted	Pass URL directly — preferred for large files
Base64 (default)	Local files < 7 MB (images or short video clips)	Script auto-converts to `data:` URI
Temp upload	Local files >= 7 MB	Add `--upload-files` flag → uploads to DashScope temp storage ( `oss://` URL, 48h TTL)

Production: Default temp storage has 48h TTL and 100 QPS upload limit — not suitable for production, high-concurrency, or load-testing. To use your own OSS bucket, set
QWEN_TMP_OSS_BUCKET
and
QWEN_TMP_OSS_REGION
in
.env
, install
pip install oss2
, and provide credentials via
QWEN_TMP_OSS_AK_ID
/
QWEN_TMP_OSS_AK_SECRET
or the standard
OSS_ACCESS_KEY_ID
/
OSS_ACCESS_KEY_SECRET
. Use a RAM user with least-privilege (
oss:PutObject
+
oss:GetObject
on target bucket only). The
--upload-files
flag is still required for vision scripts to trigger upload. If qianwen-ops-auth is installed, see its
references/custom-oss.md
for the full setup guide.

API支持：HTTP/HTTPS URL、Base64数据URI和**

oss://

URL**。不直接支持本地文件路径 — 脚本会自动处理转换。直接传入本地路径即可；无需手动上传步骤。

大文件规则：如果本地文件大小≥7 MB，请始终添加
--upload-files
参数。Base64编码会使文件大小增加约33%，可能超过10 MB的API限制。小文件（包括大小<7 MB的短视频片段）可使用默认的base64方式。

方式	适用场景	操作方式
在线URL	文件已托管在网络上	直接传入URL — 大文件首选方式
Base64（默认）	本地文件大小<7 MB（图像或短视频片段）	脚本自动转换为 `data:` URI
临时上传	本地文件大小≥7 MB	添加 `--upload-files` 参数 → 上传到DashScope临时存储（ `oss://` URL，有效期48小时）

生产环境注意事项: 默认临时存储的有效期为48小时，上传QPS限制为100 — 不适用于生产环境、高并发场景或负载测试。如需使用自有OSS存储桶，请在
.env
文件中设置
QWEN_TMP_OSS_BUCKET
和
QWEN_TMP_OSS_REGION
，安装
pip install oss2
，并通过
QWEN_TMP_OSS_AK_ID
/
QWEN_TMP_OSS_AK_SECRET
或标准的
OSS_ACCESS_KEY_ID
/
OSS_ACCESS_KEY_SECRET
提供凭证。请使用具有最小权限的RAM用户（仅对目标存储桶拥有
oss:PutObject
+
oss:GetObject
权限）。视觉脚本仍需要
--upload-files
参数来触发上传操作。如果已安装qianwen-ops-auth，请查看其
references/custom-oss.md
获取完整设置指南。

Input from Other Skills

来自其他技能的输入

When the input file comes from another skill's output (e.g., image-gen, video-gen):

Pass the URL directly (e.g.,
```
"image": "<image_url from image-gen>"
```
) — do NOT download the URL first
Downloading and re-passing as a local path wastes bandwidth and triggers unnecessary base64 encoding or OSS upload
All URL types are supported:
```
https://
```
,
```
oss://
```
,
```
data:
```

当输入文件来自其他技能的输出（例如图像生成、视频生成）：

直接传入URL（例如
```
"image": "<来自图像生成技能的image_url>"
```
） — 不要先下载URL对应的文件
下载后再作为本地路径传入会浪费带宽，并触发不必要的base64编码或OSS上传
支持所有类型的URL：
```
https://
```
、
```
oss://
```
、
```
data:
```

Thinking Mode

思维模式

Model	Thinking Default	Notes
`qwen3.6-plus`	On	Latest flagship. Disable with `enable_thinking: false` for simple tasks.
`qwen3.5-plus` / `qwen3.5-flash`	On	Disable with `enable_thinking: false` for simple tasks.
`qwen3-vl-plus` / `qwen3-vl-flash`	Off	Enable with `enable_thinking: true` .
`qvq-max`	Always on	Streaming output required.

See visual-reasoning.md for details.

模型	思维模式默认状态	说明
`qwen3.6-plus`	开启	最新旗舰模型。对于简单任务，可通过 `enable_thinking: false` 关闭。
`qwen3.5-plus` / `qwen3.5-flash`	开启	对于简单任务，可通过 `enable_thinking: false` 关闭。
`qwen3-vl-plus` / `qwen3-vl-flash`	关闭	可通过 `enable_thinking: true` 开启。
`qvq-max`	始终开启	必须使用流式输出。

详情请查看visual-reasoning.md。

OCR (qwen-vl-ocr)

OCR（qwen-vl-ocr）

Optimized for text extraction. Supports multi-language, skewed images, tables, formulas. See ocr.md for parameters and examples.

针对文本提取优化。支持多语言、倾斜图像、表格、公式。详情请查看ocr.md中的参数与示例。

Input Limits

输入限制

Images: BMP/JPEG/PNG/TIFF/WEBP/HEIC. Min 10px sides, aspect ratio <= 200:1. Max 20 MB (URL, Qwen3.5) / 10 MB (others).

Videos: MP4/AVI/MKV/MOV/FLV/WMV. Duration 2s–2h (Qwen3.5) / 2s–10min (others). Max 2 GB (URL) / 10 MB (base64). fps range [0.1, 10], default 2.0.

图像: 支持BMP/JPEG/PNG/TIFF/WEBP/HEIC格式。最小边长为10px，宽高比≤200:1。最大大小：URL方式为20 MB（Qwen3.5）/ 10 MB（其他模型）。

视频: 支持MP4/AVI/MKV/MOV/FLV/WMV格式。时长范围：2秒–2小时（Qwen3.5）/ 2秒–10分钟（其他模型）。最大大小：URL方式为2 GB / base64方式为10 MB。帧率范围[0.1, 10]，默认2.0。

Error Handling

错误处理

HTTP	Meaning	Action
401	Invalid or missing API key	Run qianwen-ops-auth if available; else prompt user to set key (non-plaintext check only)
400	Bad request (invalid format)	Verify messages format and image URL/format
429	Rate limited	Retry with exponential backoff
5xx	Server error	Retry with exponential backoff

Usage & billing: Use the qianwen-usage skill to check usage, free tier quota, and billing directly. Alternatively, the user can visit the QianWen console: Usage Analytics | Pay-as-you-go Billing | Token Plan 团队版 Subscription

NEVER fabricate, guess, or construct usage/billing/console URLs. Only provide the exact links listed in this skill. If a URL is not listed here, do not invent one.

HTTP状态码	含义	操作
401	API密钥无效或缺失	若qianwen-ops-auth可用则运行该技能；否则提示用户设置密钥（仅采用非明文方式检查）
400	请求格式错误	验证消息格式与图像URL/格式
429	调用频率受限	采用指数退避策略重试
5xx	服务器错误	采用指数退避策略重试

使用量与计费: 使用qianwen-usage技能直接查看使用量、免费额度和计费信息。用户也可以访问QianWen控制台：使用分析 | 按量计费 | Token Plan团队版订阅

绝对不要编造、猜测或构造使用量/计费/控制台URL。仅提供本技能中列出的准确链接。如果某个URL未在此列出，请不要自行创建。

Output Location

输出位置

Prefer the current working directory. Default subdirectory:

./output/qianwen-vision/

Write prohibition: NEVER write output files into this skill's installation directory or any

skills/

hierarchy. All generated content must go to

output/

under the current working directory or a user-specified path.

优先使用当前工作目录。默认子目录：

./output/qianwen-vision/

。

写入禁止: 绝对不要将输出文件写入本技能的安装目录或任何

skills/

层级目录中。所有生成的内容必须保存到当前工作目录下的

output/

目录或用户指定的路径。

Update Check (MANDATORY Post-Execution)

更新检查（执行后强制要求）

After script execution, check stderr for action signals. The script automatically detects update status and emits structured signals. You MUST act on them. This section also applies when the user explicitly asks to check for updates.

脚本执行后，请检查标准错误输出中的操作信号。脚本会自动检测更新状态并发出结构化信号。你必须根据信号执行相应操作。当用户明确要求检查更新时，本节内容同样适用。

Signal:

[ACTION_REQUIRED] qianwen-update-check skill is not installed

信号：

[ACTION_REQUIRED] qianwen-update-check skill is not installed

When stderr contains this signal, you MUST present the following options to the user and wait for their response before proceeding. Do NOT silently skip this.

The update-check skill is not installed. It keeps you informed about new QianWen-AI/qianwen-ai features and bug fixes.

Install now — I'll run the install command for you

Skip for now — do nothing this time

Never remind me — permanently suppress this prompt

After the user responds:

User chooses 1 → run:

npx skills add QianWen-AI/qianwen-ai --skill qianwen-update-check -y

, then run:

python3 <this-skill-scripts-dir>/gossamer.py --dismiss

User chooses 2 → run:

python3 <this-skill-scripts-dir>/gossamer.py --dismiss

, then continue normally

User chooses 3 → run:

python3 <this-skill-scripts-dir>/gossamer.py --never-install

, then continue normally

当标准错误输出包含该信号时，你必须向用户提供以下选项，并等待用户回复后再继续操作。不要跳过此步骤。

更新检查技能未安装。该技能可帮助你了解QianWen-AI/qianwen-ai的新功能和Bug修复。

立即安装 — 我将为你运行安装命令

暂时跳过 — 本次不执行任何操作

永不提醒 — 永久禁用此提示

用户回复后：

用户选择1 → 运行：

npx skills add QianWen-AI/qianwen-ai --skill qianwen-update-check -y

，然后运行：

python3 <本技能脚本目录>/gossamer.py --dismiss

用户选择2 → 运行：

python3 <本技能脚本目录>/gossamer.py --dismiss

，然后正常继续

用户选择3 → 运行：

python3 <本技能脚本目录>/gossamer.py --never-install

，然后正常继续

Signal:

[UPDATE_AVAILABLE]

信号：

[UPDATE_AVAILABLE]

When stderr contains this signal, you MUST append a brief update notice to your response, including the version info and the update command shown in the stderr output.

当标准错误输出包含该信号时，你必须在回复中附加简短的更新通知，包括标准错误输出中显示的版本信息和更新命令。

No signal in stderr

标准错误输出无信号

If stderr contains neither

[ACTION_REQUIRED]

nor

[UPDATE_AVAILABLE]

, no action is needed — the skill is installed and up to date (or cached within 24h).

如果标准错误输出既不包含

[ACTION_REQUIRED]

也不包含

[UPDATE_AVAILABLE]

，则无需操作 — 技能已安装且为最新版本（或24小时内已缓存）。

Explicit user request

用户明确要求检查更新

When the user explicitly asks to check for updates (e.g. "check for updates", "check version"):

Look for
```
qianwen-update-check/SKILL.md
```
in sibling skill directories.

If found — run:

python3 <qianwen-update-check-dir>/scripts/check_update.py --print-response

and report the result.

If not found — present the install options above.

当用户明确要求检查更新时（例如“检查更新”、“查看版本”）：

在同级技能目录中查找
```
qianwen-update-check/SKILL.md
```
。

如果找到 — 运行：

python3 <qianwen-update-check目录>/scripts/check_update.py --print-response

并报告结果。

如果未找到 — 向用户提供上述安装选项。

References

参考文档

execution-guide.md — Fallback paths (curl, code generation, autonomous)
curl-examples.md — Curl templates (base64, multi-image, video, OCR)
api-guide.md — API supplementary guide
visual-reasoning.md — QVQ visual reasoning guide
ocr.md — Qwen-VL-OCR text extraction guide
sources.md — Official documentation URLs

execution-guide.md — 备选方案（curl、代码生成、自主解决）
curl-examples.md — curl模板（base64、多图像、视频、OCR）
api-guide.md — API补充指南
visual-reasoning.md — QVQ视觉推理指南
ocr.md — Qwen-VL-OCR文本提取指南
sources.md — 官方文档URL

qianwen-vision

Original

Translation

Qwen Vision (Image & Video Understanding)

Qwen Vision（图像与视频理解）

Skill directory

技能目录

Security

安全规范

Key Compatibility

密钥兼容性

Model Selection

模型选择

Execution

执行步骤

Prerequisites

前置条件

Environment Check

环境检查

Default: Run Script

默认方式：运行脚本

Image analysis

图像分析

Video analysis (local file — add --upload-files for files >= 7 MB)

视频分析（本地文件 — 当文件大小≥7 MB时添加--upload-files参数）

Verify Result

结果验证

On Failure

失败处理

File Input

文件输入

Input from Other Skills

来自其他技能的输入

Thinking Mode

思维模式

OCR (qwen-vl-ocr)

OCR（qwen-vl-ocr）

Input Limits

输入限制

Error Handling

错误处理

Output Location

输出位置

Update Check (MANDATORY Post-Execution)

更新检查（执行后强制要求）

Signal: [ACTION_REQUIRED] qianwen-update-check skill is not installed

信号：[ACTION_REQUIRED] qianwen-update-check skill is not installed

Signal: [UPDATE_AVAILABLE]

信号：[UPDATE_AVAILABLE]

No signal in stderr

标准错误输出无信号

Explicit user request

用户明确要求检查更新

References

参考文档

Signal:
`[ACTION_REQUIRED] qianwen-update-check skill is not installed`

信号：
`[ACTION_REQUIRED] qianwen-update-check skill is not installed`

Signal:
`[UPDATE_AVAILABLE]`

信号：
`[UPDATE_AVAILABLE]`