open-autoglm-phone-agent

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Open-AutoGLM Phone Agent

Open-AutoGLM 手机Agent

Skill by ara.so — Daily 2026 Skills collection.

Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."

由ara.so开发的技能 — 属于Daily 2026技能合集。

Open-AutoGLM是一款开源AI手机Agent框架，支持通过自然语言控制Android、HarmonyOS NEXT和iOS设备。它采用参数规模为90亿的AutoGLM多模态大模型来识别屏幕内容，并执行多步骤任务，例如“打开美团并搜索附近的火锅店”。

Architecture Overview

架构概述

User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions

Model: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
Device control: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
Model serving: vLLM or SGLang (self-hosted) or BigModel/ModelScope API
Input: Screenshot + task description → Output: structured action commands

用户自然语言 → AutoGLM多模态大模型 → 屏幕感知 → ADB/HDC/WebDriverAgent → 设备操作

模型：AutoGLM-Phone-9B（中文优化版）或AutoGLM-Phone-9B-Multilingual（多语言版）
设备控制：ADB（Android）、HDC（HarmonyOS NEXT）、WebDriverAgent（iOS）
模型部署：vLLM或SGLang（本地部署），或BigModel/ModelScope API（云端调用）
输入输出：输入为截图+任务描述 → 输出为结构化操作指令

Installation

安装步骤

Prerequisites

前置条件

Python 3.10+
ADB installed and in PATH (Android) or HDC (HarmonyOS) or WebDriverAgent (iOS)
Android device with Developer Mode + USB Debugging enabled
ADB Keyboard APK installed on Android device (for text input)

Python 3.10及以上版本
已安装ADB（Android）、HDC（HarmonyOS）或WebDriverAgent（iOS）并配置到系统PATH
Android设备已开启开发者模式和USB调试
Android设备已安装ADB Keyboard APK（用于文本输入）

Install the framework

安装框架

bash

git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .

bash

git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .

Verify ADB connection

验证设备连接

bash

undefined

bash

undefined

Android

Android设备

adb devices

Expected: emulator-5554 device

预期输出：emulator-5554 device

HarmonyOS NEXT

HarmonyOS NEXT设备

hdc list targets

Expected: 7001005458323933328a01bce01c2500

预期输出：7001005458323933328a01bce01c2500

undefined

undefined

Model Deployment Options

模型部署选项

Option A: Third-party API (Recommended for quick start)

选项A：第三方API（快速上手推荐）

BigModel (ZhipuAI)

bash

export BIGMODEL_API_KEY="your-bigmodel-api-key"
python main.py \
  --base-url https://open.bigmodel.cn/api/paas/v4 \
  --model "autoglm-phone" \
  --apikey $BIGMODEL_API_KEY \
  "打开美团搜索附近的火锅店"

ModelScope

bash

export MODELSCOPE_API_KEY="your-modelscope-api-key"
python main.py \
  --base-url https://api-inference.modelscope.cn/v1 \
  --model "ZhipuAI/AutoGLM-Phone-9B" \
  --apikey $MODELSCOPE_API_KEY \
  "open Meituan and find nearby hotpot"

BigModel（智谱AI）

bash

export BIGMODEL_API_KEY="你的BigModel API密钥"
python main.py \
  --base-url https://open.bigmodel.cn/api/paas/v4 \
  --model "autoglm-phone" \
  --apikey $BIGMODEL_API_KEY \
  "打开美团搜索附近的火锅店"

ModelScope

bash

export MODELSCOPE_API_KEY="你的ModelScope API密钥"
python main.py \
  --base-url https://api-inference.modelscope.cn/v1 \
  --model "ZhipuAI/AutoGLM-Phone-9B" \
  --apikey $MODELSCOPE_API_KEY \
  "open Meituan and find nearby hotpot"

Option B: Self-hosted with vLLM

选项B：基于vLLM本地部署

bash

undefined

bash

undefined

Install vLLM (or use official Docker: docker pull vllm/vllm-openai:v0.12.0)

安装vLLM（或使用官方Docker镜像：docker pull vllm/vllm-openai:v0.12.0）

pip install vllm

Start model server (strictly follow these parameters)

启动模型服务（请严格遵循以下参数配置）

python3 -m vllm.entrypoints.openai.api_server
--served-model-name autoglm-phone-9b
--allowed-local-media-path /
--mm-encoder-tp-mode data
--mm_processor_cache_type shm
--mm_processor_kwargs '{"max_pixels":5000000}'
--max-model-len 25480
--chat-template-content-format string
--limit-mm-per-prompt '{"image":10}'
--model zai-org/AutoGLM-Phone-9B
--port 8000

undefined

undefined

Option C: Self-hosted with SGLang

选项C：基于SGLang本地部署

bash

undefined

bash

undefined

Install SGLang or use: docker pull lmsysorg/sglang:v0.5.6.post1

安装SGLang或使用Docker镜像：docker pull lmsysorg/sglang:v0.5.6.post1

Inside container: pip install nvidia-cudnn-cu12==9.16.0.29

在容器内执行：pip install nvidia-cudnn-cu12==9.16.0.29

python3 -m sglang.launch_server
--model-path zai-org/AutoGLM-Phone-9B
--served-model-name autoglm-phone-9b
--context-length 25480
--mm-enable-dp-encoder
--mm-process-config '{"image":{"max_pixels":5000000}}'
--port 8000

undefined

undefined

Verify deployment

验证部署结果

bash

python scripts/check_deployment_cn.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b

Expected output includes a

<think>...</think>

block followed by

<answer>do(action="Launch", app="...")

. If the chain-of-thought is very short or garbled, the model deployment has failed.

bash

python scripts/check_deployment_cn.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b

预期输出应包含

<think>...</think>

块，其后跟随

<answer>do(action="Launch", app="...")

。若思考过程极短或内容混乱，则模型部署失败。

Running the Agent

运行Agent

Basic CLI usage

基础CLI使用

bash

undefined

bash

undefined

Android device (default)

Android设备（默认）

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
"打开小红书搜索美食"

HarmonyOS device

HarmonyOS设备

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
--device-type hdc
"打开设置查看WiFi"

Multilingual model for English apps

多语言模型适配英文应用

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
"Open Instagram and search for travel photos"

undefined

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
"Open Instagram and search for travel photos"

undefined

Key CLI parameters

关键CLI参数

Parameter	Description	Default
`--base-url`	Model service endpoint	Required
`--model`	Model name on server	Required
`--apikey`	API key for third-party services	None
`--device-type`	`adb` (Android) or `hdc` (HarmonyOS)	`adb`
`--device-id`	Specific device serial number	Auto-detect

参数	描述	默认值
`--base-url`	模型服务端点	必填
`--model`	服务端的模型名称	必填
`--apikey`	第三方服务的API密钥	无
`--device-type`	设备类型： `adb` （Android）或 `hdc` （HarmonyOS）	`adb`
`--device-id`	特定设备序列号	自动检测

Python API Usage

Python API使用

Basic agent invocation

基础Agent调用

python

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig

config = AgentConfig(
    base_url="http://localhost:8000/v1",
    model="autoglm-phone-9b",
    device_type="adb",  # or "hdc" for HarmonyOS
)

agent = PhoneAgent(config)

python

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig

config = AgentConfig(
    base_url="http://localhost:8000/v1",
    model="autoglm-phone-9b",
    device_type="adb",  # HarmonyOS设备请改为"hdc"
)

agent = PhoneAgent(config)

Run a task

执行任务

result = agent.run("打开淘宝搜索蓝牙耳机") print(result)

undefined

result = agent.run("打开淘宝搜索蓝牙耳机") print(result)

undefined

Custom task with device selection

自定义任务与设备选择

python

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os

config = AgentConfig(
    base_url=os.environ["MODEL_BASE_URL"],
    model=os.environ["MODEL_NAME"],
    apikey=os.environ.get("MODEL_API_KEY"),
    device_type="adb",
    device_id="emulator-5554",  # specific device
)

agent = PhoneAgent(config)

python

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os

config = AgentConfig(
    base_url=os.environ["MODEL_BASE_URL"],
    model=os.environ["MODEL_NAME"],
    apikey=os.environ.get("MODEL_API_KEY"),
    device_type="adb",
    device_id="emulator-5554",  # 指定设备
)

agent = PhoneAgent(config)

Task with sensitive operation confirmation

带敏感操作确认的任务

result = agent.run( "在京东购买最便宜的蓝牙耳机", confirm_sensitive=True # prompt user before purchase actions )

undefined

result = agent.run( "在京东购买最便宜的蓝牙耳机", confirm_sensitive=True # 执行购买操作前将提示用户确认 )

undefined

Direct model API call (for testing/integration)

直接调用模型API（用于测试/集成）

python

import openai
import base64
import os
from pathlib import Path

client = openai.OpenAI(
    base_url=os.environ["MODEL_BASE_URL"],
    api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)

python

import openai
import base64
import os
from pathlib import Path

client = openai.OpenAI(
    base_url=os.environ["MODEL_BASE_URL"],
    api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)

Load screenshot

加载截图

screenshot_path = "screenshot.png" with open(screenshot_path, "rb") as f: image_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create( model="autoglm-phone-9b", messages=[ { "role": "user", "content": [ { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}, }, { "type": "text", "text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search", }, ], } ], )

print(response.choices[0].message.content)

screenshot_path = "screenshot.png" with open(screenshot_path, "rb") as f: image_b64 = base64.b64encode(f.read()).decode()

print(response.choices[0].message.content)

Output format: <think>...</think>\n<answer>do(action="...", ...)

输出格式：<think>...</think>\n<answer>do(action="...", ...)

undefined

undefined

Parsing model action output

解析模型操作输出

python

import re

def parse_action(model_output: str) -> dict:
    """Parse AutoGLM model output into structured action."""
    # Extract answer block
    answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
    if not answer_match:
        return {"action": "unknown"}
    
    answer = answer_match.group(1).strip()
    
    # Parse do() call
    # Format: do(action="ActionName", param1="value1", param2="value2")
    action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
    if not action_match:
        return {"action": "unknown", "raw": answer}
    
    action_name = action_match.group(1)
    params_str = action_match.group(2)
    
    # Parse parameters
    params = {}
    for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
        params[param_match.group(1)] = param_match.group(2)
    
    return {"action": action_name, **params}

python

import re

def parse_action(model_output: str) -> dict:
    """将AutoGLM模型输出解析为结构化操作指令。"""
    # 提取answer块
    answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
    if not answer_match:
        return {"action": "unknown"}
    
    answer = answer_match.group(1).strip()
    
    # 解析do()调用
    # 格式：do(action="ActionName", param1="value1", param2="value2")
    action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
    if not action_match:
        return {"action": "unknown", "raw": answer}
    
    action_name = action_match.group(1)
    params_str = action_match.group(2)
    
    # 解析参数
    params = {}
    for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
        params[param_match.group(1)] = param_match.group(2)
    
    return {"action": action_name, **params}

Example usage

使用示例

output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")' action = parse_action(output)

{"action": "Launch", "app": "京东"}

undefined

undefined

ADB Device Control Patterns

ADB设备控制常用方法

Common ADB operations used by the agent

Agent调用的常见ADB操作

python

import subprocess

def take_screenshot(device_id: str = None) -> bytes:
    """Capture current device screen."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["exec-out", "screencap", "-p"])
    result = subprocess.run(cmd, capture_output=True)
    return result.stdout

def send_tap(x: int, y: int, device_id: str = None):
    """Tap at screen coordinates."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "tap", str(x), str(y)])
    subprocess.run(cmd)

def send_text_adb_keyboard(text: str, device_id: str = None):
    """Send text via ADB Keyboard (must be installed and enabled)."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    # Enable ADB keyboard first
    cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
    subprocess.run(cmd_enable)
    # Send text
    cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
                      "--es", "msg", text]
    subprocess.run(cmd_text)

def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
    """Swipe gesture on screen."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "swipe",
                str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
    subprocess.run(cmd)

def press_back(device_id: str = None):
    """Press Android back button."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
    subprocess.run(cmd)

def launch_app(package_name: str, device_id: str = None):
    """Launch app by package name."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "monkey", "-p", package_name, "-c",
                "android.intent.category.LAUNCHER", "1"])
    subprocess.run(cmd)

python

import subprocess

def take_screenshot(device_id: str = None) -> bytes:
    """捕获设备当前屏幕。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["exec-out", "screencap", "-p"])
    result = subprocess.run(cmd, capture_output=True)
    return result.stdout

def send_tap(x: int, y: int, device_id: str = None):
    """点击屏幕指定坐标。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "tap", str(x), str(y)])
    subprocess.run(cmd)

def send_text_adb_keyboard(text: str, device_id: str = None):
    """通过ADB Keyboard发送文本（需已安装并启用）。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    # 先启用ADB Keyboard
    cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
    subprocess.run(cmd_enable)
    # 发送文本
    cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
                      "--es", "msg", text]
    subprocess.run(cmd_text)

def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
    """屏幕滑动操作。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "swipe",
                str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
    subprocess.run(cmd)

def press_back(device_id: str = None):
    """按下Android返回键。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
    subprocess.run(cmd)

def launch_app(package_name: str, device_id: str = None):
    """通过包名启动应用。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "monkey", "-p", package_name, "-c",
                "android.intent.category.LAUNCHER", "1"])
    subprocess.run(cmd)

Midscene.js Integration

Midscene.js集成

For JavaScript/TypeScript automation using AutoGLM:

javascript

// .env configuration
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key

import { AndroidAgent } from "@midscene/android";

const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么？");

使用AutoGLM实现JavaScript/TypeScript自动化：

javascript

// .env配置
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key

import { AndroidAgent } from "@midscene/android";

const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么？");

Remote ADB (WiFi Debugging)

远程ADB（WiFi调试）

bash

undefined

bash

undefined

Connect device via USB first, then enable TCP/IP mode

先通过USB连接设备，然后开启TCP/IP模式

adb tcpip 5555

Get device IP address

获取设备IP地址

adb shell ip addr show wlan0

Connect wirelessly (disconnect USB after this)

无线连接（连接成功后断开USB）

adb connect 192.168.1.100:5555

Verify connection

验证连接

adb devices

192.168.1.100:5555 device

预期输出：192.168.1.100:5555 device

Use with agent

结合Agent使用

python main.py
--base-url http://model-server:8000/v1
--model autoglm-phone-9b
--device-id "192.168.1.100:5555"
"打开支付宝查看余额"

undefined

python main.py
--base-url http://model-server:8000/v1
--model autoglm-phone-9b
--device-id "192.168.1.100:5555"
"打开支付宝查看余额"

undefined

Common Action Types

常见操作类型

The AutoGLM model outputs structured actions:

Action	Description	Example
`Launch`	Open an app	`do(action="Launch", app="微信")`
`Tap`	Tap screen element	`do(action="Tap", element="搜索框")`
`Type`	Input text	`do(action="Type", text="火锅")`
`Swipe`	Scroll/swipe	`do(action="Swipe", direction="up")`
`Back`	Press back button	`do(action="Back")`
`Home`	Go to home screen	`do(action="Home")`
`Finish`	Task complete	`do(action="Finish", result="已完成搜索")`

AutoGLM模型会输出结构化操作指令：

操作	描述	示例
`Launch`	打开应用	`do(action="Launch", app="微信")`
`Tap`	点击屏幕元素	`do(action="Tap", element="搜索框")`
`Type`	输入文本	`do(action="Type", text="火锅")`
`Swipe`	滑动屏幕	`do(action="Swipe", direction="up")`
`Back`	按下返回键	`do(action="Back")`
`Home`	返回主屏幕	`do(action="Home")`
`Finish`	任务完成	`do(action="Finish", result="已完成搜索")`

Model Selection Guide

模型选择指南

Model	Use Case	Languages
`AutoGLM-Phone-9B`	Chinese apps (WeChat, Taobao, Meituan)	Chinese-optimized
`AutoGLM-Phone-9B-Multilingual`	International apps, mixed content	Chinese + English + others

HuggingFace:

zai-org/AutoGLM-Phone-9B

zai-org/AutoGLM-Phone-9B-Multilingual

ModelScope:

ZhipuAI/AutoGLM-Phone-9B

ZhipuAI/AutoGLM-Phone-9B-Multilingual

模型	使用场景	语言支持
`AutoGLM-Phone-9B`	中文应用（微信、淘宝、美团等）	中文优化
`AutoGLM-Phone-9B-Multilingual`	国际应用、多语言内容	中文+英文+其他语言

HuggingFace地址：

zai-org/AutoGLM-Phone-9B

zai-org/AutoGLM-Phone-9B-Multilingual

ModelScope地址：

ZhipuAI/AutoGLM-Phone-9B

ZhipuAI/AutoGLM-Phone-9B-Multilingual

Environment Variables Reference

环境变量参考

bash

undefined

bash

undefined

Model service

模型服务配置

export MODEL_BASE_URL="http://localhost:8000/v1" export MODEL_NAME="autoglm-phone-9b" export MODEL_API_KEY="" # Required for BigModel/ModelScope APIs

export MODEL_BASE_URL="http://localhost:8000/v1" export MODEL_NAME="autoglm-phone-9b" export MODEL_API_KEY="" # 使用BigModel/ModelScope API时必填

BigModel API

BigModel API配置

export BIGMODEL_API_KEY="" export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"

ModelScope API

ModelScope API配置

export MODELSCOPE_API_KEY="" export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"

Device configuration

设备配置

export ADB_DEVICE_ID="" # Leave empty for auto-detect export HDC_DEVICE_ID="" # HarmonyOS device ID

undefined

export ADB_DEVICE_ID="" # 留空则自动检测 export HDC_DEVICE_ID="" # HarmonyOS设备ID

undefined

Troubleshooting

常见问题排查

Model output is garbled or very short chain-of-thought

模型输出混乱或思考过程极短

Cause: Incorrect vLLM/SGLang startup parameters. Fix: Ensure

--chat-template-content-format string

(vLLM) and

--mm-process-config

with

max_pixels:5000000

are set. Check transformers version compatibility.

原因：vLLM/SGLang启动参数配置错误。 解决方法：确保已设置

--chat-template-content-format string

（vLLM）和包含

max_pixels:5000000

的

--mm-process-config

参数。同时检查transformers版本兼容性。

adb devices

shows no devices

adb devices

无设备显示

Fix:

Verify USB cable supports data transfer (not charge-only)
Accept "Allow USB debugging" dialog on phone
Try
```
adb kill-server && adb start-server
```
Some devices require reboot after enabling developer options

解决方法：

确认USB数据线支持数据传输（而非仅充电）
在手机上点击“允许USB调试”弹窗
尝试执行
```
adb kill-server && adb start-server
```
部分设备需开启开发者选项后重启

Text input not working on Android

Android设备无法输入文本

Fix: ADB Keyboard must be installed AND enabled:

bash

adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME

解决方法：必须安装并启用ADB Keyboard：

bash

adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME

Agent stuck in a loop

Agent陷入循环

Cause: Model cannot identify a path to complete the task. Fix: The framework includes sensitive operation confirmation — ensure

confirm_sensitive=True

for purchase/delete tasks. For login/CAPTCHA screens, the agent supports human takeover.

原因：模型无法识别任务执行路径。 解决方法：框架内置敏感操作确认机制 — 对于购买/删除等任务，确保开启

confirm_sensitive=True

。对于登录/验证码界面，Agent支持人工接管。

vLLM CUDA out of memory

vLLM出现CUDA内存不足

Fix: AutoGLM-Phone-9B requires ~20GB VRAM. Use

--tensor-parallel-size 2

for multi-GPU, or use the API service instead.

解决方法：AutoGLM-Phone-9B约需20GB显存。可使用

--tensor-parallel-size 2

实现多GPU部署，或改用API服务。

Connection refused to model server

无法连接到模型服务

Fix: Check firewall rules. For remote server:

bash

undefined

解决方法：检查防火墙规则。对于远程服务器，执行以下命令测试连通性：

bash

undefined

Test connectivity

测试连接

curl http://YOUR_SERVER_IP:8000/v1/models

Should return model list JSON

应返回模型列表JSON

undefined

undefined

HDC device not recognized (HarmonyOS)

HDC无法识别HarmonyOS设备

Fix: HarmonyOS NEXT (not earlier versions) is required. Enable developer mode in Settings → About → Version Number (tap 10 times rapidly).

解决方法：仅支持HarmonyOS NEXT（不支持旧版本）。在设置→关于→版本号（连续点击10次）开启开发者模式。

iOS Setup

iOS设备配置

For iPhone automation, see the dedicated setup guide:

bash

undefined

iPhone自动化配置请参考专属指南：

bash

undefined

After configuring WebDriverAgent per docs/ios_setup/ios_setup.md

按照docs/ios_setup/ios_setup.md配置WebDriverAgent后执行

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
--device-type ios
"Open Maps and navigate to Central Park"

undefined

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
--device-type ios
"Open Maps and navigate to Central Park"

undefined