open-autoglm-phone-agent
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpen-AutoGLM Phone Agent
Open-AutoGLM 手机Agent
Skill by ara.so — Daily 2026 Skills collection.
Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."
由ara.so开发的技能 — 属于Daily 2026技能合集。
Open-AutoGLM是一款开源AI手机Agent框架,支持通过自然语言控制Android、HarmonyOS NEXT和iOS设备。它采用参数规模为90亿的AutoGLM多模态大模型来识别屏幕内容,并执行多步骤任务,例如“打开美团并搜索附近的火锅店”。
Architecture Overview
架构概述
User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions- Model: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
- Device control: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
- Model serving: vLLM or SGLang (self-hosted) or BigModel/ModelScope API
- Input: Screenshot + task description → Output: structured action commands
用户自然语言 → AutoGLM多模态大模型 → 屏幕感知 → ADB/HDC/WebDriverAgent → 设备操作- 模型:AutoGLM-Phone-9B(中文优化版)或AutoGLM-Phone-9B-Multilingual(多语言版)
- 设备控制:ADB(Android)、HDC(HarmonyOS NEXT)、WebDriverAgent(iOS)
- 模型部署:vLLM或SGLang(本地部署),或BigModel/ModelScope API(云端调用)
- 输入输出:输入为截图+任务描述 → 输出为结构化操作指令
Installation
安装步骤
Prerequisites
前置条件
- Python 3.10+
- ADB installed and in PATH (Android) or HDC (HarmonyOS) or WebDriverAgent (iOS)
- Android device with Developer Mode + USB Debugging enabled
- ADB Keyboard APK installed on Android device (for text input)
- Python 3.10及以上版本
- 已安装ADB(Android)、HDC(HarmonyOS)或WebDriverAgent(iOS)并配置到系统PATH
- Android设备已开启开发者模式和USB调试
- Android设备已安装ADB Keyboard APK(用于文本输入)
Install the framework
安装框架
bash
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .bash
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .Verify ADB connection
验证设备连接
bash
undefinedbash
undefinedAndroid
Android设备
adb devices
adb devices
Expected: emulator-5554 device
预期输出:emulator-5554 device
HarmonyOS NEXT
HarmonyOS NEXT设备
hdc list targets
hdc list targets
Expected: 7001005458323933328a01bce01c2500
预期输出:7001005458323933328a01bce01c2500
undefinedundefinedModel Deployment Options
模型部署选项
Option A: Third-party API (Recommended for quick start)
选项A:第三方API(快速上手推荐)
BigModel (ZhipuAI)
bash
export BIGMODEL_API_KEY="your-bigmodel-api-key"
python main.py \
--base-url https://open.bigmodel.cn/api/paas/v4 \
--model "autoglm-phone" \
--apikey $BIGMODEL_API_KEY \
"打开美团搜索附近的火锅店"ModelScope
bash
export MODELSCOPE_API_KEY="your-modelscope-api-key"
python main.py \
--base-url https://api-inference.modelscope.cn/v1 \
--model "ZhipuAI/AutoGLM-Phone-9B" \
--apikey $MODELSCOPE_API_KEY \
"open Meituan and find nearby hotpot"BigModel(智谱AI)
bash
export BIGMODEL_API_KEY="你的BigModel API密钥"
python main.py \
--base-url https://open.bigmodel.cn/api/paas/v4 \
--model "autoglm-phone" \
--apikey $BIGMODEL_API_KEY \
"打开美团搜索附近的火锅店"ModelScope
bash
export MODELSCOPE_API_KEY="你的ModelScope API密钥"
python main.py \
--base-url https://api-inference.modelscope.cn/v1 \
--model "ZhipuAI/AutoGLM-Phone-9B" \
--apikey $MODELSCOPE_API_KEY \
"open Meituan and find nearby hotpot"Option B: Self-hosted with vLLM
选项B:基于vLLM本地部署
bash
undefinedbash
undefinedInstall vLLM (or use official Docker: docker pull vllm/vllm-openai:v0.12.0)
安装vLLM(或使用官方Docker镜像:docker pull vllm/vllm-openai:v0.12.0)
pip install vllm
pip install vllm
Start model server (strictly follow these parameters)
启动模型服务(请严格遵循以下参数配置)
python3 -m vllm.entrypoints.openai.api_server
--served-model-name autoglm-phone-9b
--allowed-local-media-path /
--mm-encoder-tp-mode data
--mm_processor_cache_type shm
--mm_processor_kwargs '{"max_pixels":5000000}'
--max-model-len 25480
--chat-template-content-format string
--limit-mm-per-prompt '{"image":10}'
--model zai-org/AutoGLM-Phone-9B
--port 8000
--served-model-name autoglm-phone-9b
--allowed-local-media-path /
--mm-encoder-tp-mode data
--mm_processor_cache_type shm
--mm_processor_kwargs '{"max_pixels":5000000}'
--max-model-len 25480
--chat-template-content-format string
--limit-mm-per-prompt '{"image":10}'
--model zai-org/AutoGLM-Phone-9B
--port 8000
undefinedpython3 -m vllm.entrypoints.openai.api_server
--served-model-name autoglm-phone-9b
--allowed-local-media-path /
--mm-encoder-tp-mode data
--mm_processor_cache_type shm
--mm_processor_kwargs '{"max_pixels":5000000}'
--max-model-len 25480
--chat-template-content-format string
--limit-mm-per-prompt '{"image":10}'
--model zai-org/AutoGLM-Phone-9B
--port 8000
--served-model-name autoglm-phone-9b
--allowed-local-media-path /
--mm-encoder-tp-mode data
--mm_processor_cache_type shm
--mm_processor_kwargs '{"max_pixels":5000000}'
--max-model-len 25480
--chat-template-content-format string
--limit-mm-per-prompt '{"image":10}'
--model zai-org/AutoGLM-Phone-9B
--port 8000
undefinedOption C: Self-hosted with SGLang
选项C:基于SGLang本地部署
bash
undefinedbash
undefinedInstall SGLang or use: docker pull lmsysorg/sglang:v0.5.6.post1
安装SGLang或使用Docker镜像:docker pull lmsysorg/sglang:v0.5.6.post1
Inside container: pip install nvidia-cudnn-cu12==9.16.0.29
在容器内执行:pip install nvidia-cudnn-cu12==9.16.0.29
python3 -m sglang.launch_server
--model-path zai-org/AutoGLM-Phone-9B
--served-model-name autoglm-phone-9b
--context-length 25480
--mm-enable-dp-encoder
--mm-process-config '{"image":{"max_pixels":5000000}}'
--port 8000
--model-path zai-org/AutoGLM-Phone-9B
--served-model-name autoglm-phone-9b
--context-length 25480
--mm-enable-dp-encoder
--mm-process-config '{"image":{"max_pixels":5000000}}'
--port 8000
undefinedpython3 -m sglang.launch_server
--model-path zai-org/AutoGLM-Phone-9B
--served-model-name autoglm-phone-9b
--context-length 25480
--mm-enable-dp-encoder
--mm-process-config '{"image":{"max_pixels":5000000}}'
--port 8000
--model-path zai-org/AutoGLM-Phone-9B
--served-model-name autoglm-phone-9b
--context-length 25480
--mm-enable-dp-encoder
--mm-process-config '{"image":{"max_pixels":5000000}}'
--port 8000
undefinedVerify deployment
验证部署结果
bash
python scripts/check_deployment_cn.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9bExpected output includes a block followed by . If the chain-of-thought is very short or garbled, the model deployment has failed.
<think>...</think><answer>do(action="Launch", app="...")bash
python scripts/check_deployment_cn.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b预期输出应包含块,其后跟随。若思考过程极短或内容混乱,则模型部署失败。
<think>...</think><answer>do(action="Launch", app="...")Running the Agent
运行Agent
Basic CLI usage
基础CLI使用
bash
undefinedbash
undefinedAndroid device (default)
Android设备(默认)
HarmonyOS device
HarmonyOS设备
python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
--device-type hdc
"打开设置查看WiFi"
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
--device-type hdc
"打开设置查看WiFi"
python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
--device-type hdc
"打开设置查看WiFi"
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
--device-type hdc
"打开设置查看WiFi"
Multilingual model for English apps
多语言模型适配英文应用
python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
"Open Instagram and search for travel photos"
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
"Open Instagram and search for travel photos"
undefinedpython main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
"Open Instagram and search for travel photos"
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
"Open Instagram and search for travel photos"
undefinedKey CLI parameters
关键CLI参数
| Parameter | Description | Default |
|---|---|---|
| Model service endpoint | Required |
| Model name on server | Required |
| API key for third-party services | None |
| | |
| Specific device serial number | Auto-detect |
| 参数 | 描述 | 默认值 |
|---|---|---|
| 模型服务端点 | 必填 |
| 服务端的模型名称 | 必填 |
| 第三方服务的API密钥 | 无 |
| 设备类型: | |
| 特定设备序列号 | 自动检测 |
Python API Usage
Python API使用
Basic agent invocation
基础Agent调用
python
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
config = AgentConfig(
base_url="http://localhost:8000/v1",
model="autoglm-phone-9b",
device_type="adb", # or "hdc" for HarmonyOS
)
agent = PhoneAgent(config)python
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
config = AgentConfig(
base_url="http://localhost:8000/v1",
model="autoglm-phone-9b",
device_type="adb", # HarmonyOS设备请改为"hdc"
)
agent = PhoneAgent(config)Run a task
执行任务
result = agent.run("打开淘宝搜索蓝牙耳机")
print(result)
undefinedresult = agent.run("打开淘宝搜索蓝牙耳机")
print(result)
undefinedCustom task with device selection
自定义任务与设备选择
python
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os
config = AgentConfig(
base_url=os.environ["MODEL_BASE_URL"],
model=os.environ["MODEL_NAME"],
apikey=os.environ.get("MODEL_API_KEY"),
device_type="adb",
device_id="emulator-5554", # specific device
)
agent = PhoneAgent(config)python
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os
config = AgentConfig(
base_url=os.environ["MODEL_BASE_URL"],
model=os.environ["MODEL_NAME"],
apikey=os.environ.get("MODEL_API_KEY"),
device_type="adb",
device_id="emulator-5554", # 指定设备
)
agent = PhoneAgent(config)Task with sensitive operation confirmation
带敏感操作确认的任务
result = agent.run(
"在京东购买最便宜的蓝牙耳机",
confirm_sensitive=True # prompt user before purchase actions
)
undefinedresult = agent.run(
"在京东购买最便宜的蓝牙耳机",
confirm_sensitive=True # 执行购买操作前将提示用户确认
)
undefinedDirect model API call (for testing/integration)
直接调用模型API(用于测试/集成)
python
import openai
import base64
import os
from pathlib import Path
client = openai.OpenAI(
base_url=os.environ["MODEL_BASE_URL"],
api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)python
import openai
import base64
import os
from pathlib import Path
client = openai.OpenAI(
base_url=os.environ["MODEL_BASE_URL"],
api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)Load screenshot
加载截图
screenshot_path = "screenshot.png"
with open(screenshot_path, "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="autoglm-phone-9b",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_b64}"},
},
{
"type": "text",
"text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search",
},
],
}
],
)
print(response.choices[0].message.content)
screenshot_path = "screenshot.png"
with open(screenshot_path, "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="autoglm-phone-9b",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_b64}"},
},
{
"type": "text",
"text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search",
},
],
}
],
)
print(response.choices[0].message.content)
Output format: <think>...</think>\n<answer>do(action="...", ...)
输出格式:<think>...</think>\n<answer>do(action="...", ...)
undefinedundefinedParsing model action output
解析模型操作输出
python
import re
def parse_action(model_output: str) -> dict:
"""Parse AutoGLM model output into structured action."""
# Extract answer block
answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
if not answer_match:
return {"action": "unknown"}
answer = answer_match.group(1).strip()
# Parse do() call
# Format: do(action="ActionName", param1="value1", param2="value2")
action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
if not action_match:
return {"action": "unknown", "raw": answer}
action_name = action_match.group(1)
params_str = action_match.group(2)
# Parse parameters
params = {}
for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
params[param_match.group(1)] = param_match.group(2)
return {"action": action_name, **params}python
import re
def parse_action(model_output: str) -> dict:
"""将AutoGLM模型输出解析为结构化操作指令。"""
# 提取answer块
answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
if not answer_match:
return {"action": "unknown"}
answer = answer_match.group(1).strip()
# 解析do()调用
# 格式:do(action="ActionName", param1="value1", param2="value2")
action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
if not action_match:
return {"action": "unknown", "raw": answer}
action_name = action_match.group(1)
params_str = action_match.group(2)
# 解析参数
params = {}
for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
params[param_match.group(1)] = param_match.group(2)
return {"action": action_name, **params}Example usage
使用示例
output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")'
action = parse_action(output)
output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")'
action = parse_action(output)
{"action": "Launch", "app": "京东"}
{"action": "Launch", "app": "京东"}
undefinedundefinedADB Device Control Patterns
ADB设备控制常用方法
Common ADB operations used by the agent
Agent调用的常见ADB操作
python
import subprocess
def take_screenshot(device_id: str = None) -> bytes:
"""Capture current device screen."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["exec-out", "screencap", "-p"])
result = subprocess.run(cmd, capture_output=True)
return result.stdout
def send_tap(x: int, y: int, device_id: str = None):
"""Tap at screen coordinates."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "tap", str(x), str(y)])
subprocess.run(cmd)
def send_text_adb_keyboard(text: str, device_id: str = None):
"""Send text via ADB Keyboard (must be installed and enabled)."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
# Enable ADB keyboard first
cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
subprocess.run(cmd_enable)
# Send text
cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
"--es", "msg", text]
subprocess.run(cmd_text)
def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
"""Swipe gesture on screen."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "swipe",
str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
subprocess.run(cmd)
def press_back(device_id: str = None):
"""Press Android back button."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
subprocess.run(cmd)
def launch_app(package_name: str, device_id: str = None):
"""Launch app by package name."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "monkey", "-p", package_name, "-c",
"android.intent.category.LAUNCHER", "1"])
subprocess.run(cmd)python
import subprocess
def take_screenshot(device_id: str = None) -> bytes:
"""捕获设备当前屏幕。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["exec-out", "screencap", "-p"])
result = subprocess.run(cmd, capture_output=True)
return result.stdout
def send_tap(x: int, y: int, device_id: str = None):
"""点击屏幕指定坐标。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "tap", str(x), str(y)])
subprocess.run(cmd)
def send_text_adb_keyboard(text: str, device_id: str = None):
"""通过ADB Keyboard发送文本(需已安装并启用)。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
# 先启用ADB Keyboard
cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
subprocess.run(cmd_enable)
# 发送文本
cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
"--es", "msg", text]
subprocess.run(cmd_text)
def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
"""屏幕滑动操作。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "swipe",
str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
subprocess.run(cmd)
def press_back(device_id: str = None):
"""按下Android返回键。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
subprocess.run(cmd)
def launch_app(package_name: str, device_id: str = None):
"""通过包名启动应用。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "monkey", "-p", package_name, "-c",
"android.intent.category.LAUNCHER", "1"])
subprocess.run(cmd)Midscene.js Integration
Midscene.js集成
For JavaScript/TypeScript automation using AutoGLM:
javascript
// .env configuration
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key
import { AndroidAgent } from "@midscene/android";
const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么?");使用AutoGLM实现JavaScript/TypeScript自动化:
javascript
// .env配置
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key
import { AndroidAgent } from "@midscene/android";
const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么?");Remote ADB (WiFi Debugging)
远程ADB(WiFi调试)
bash
undefinedbash
undefinedConnect device via USB first, then enable TCP/IP mode
先通过USB连接设备,然后开启TCP/IP模式
adb tcpip 5555
adb tcpip 5555
Get device IP address
获取设备IP地址
adb shell ip addr show wlan0
adb shell ip addr show wlan0
Connect wirelessly (disconnect USB after this)
无线连接(连接成功后断开USB)
adb connect 192.168.1.100:5555
adb connect 192.168.1.100:5555
Verify connection
验证连接
adb devices
adb devices
192.168.1.100:5555 device
预期输出:192.168.1.100:5555 device
Use with agent
结合Agent使用
python main.py
--base-url http://model-server:8000/v1
--model autoglm-phone-9b
--device-id "192.168.1.100:5555"
"打开支付宝查看余额"
--base-url http://model-server:8000/v1
--model autoglm-phone-9b
--device-id "192.168.1.100:5555"
"打开支付宝查看余额"
undefinedpython main.py
--base-url http://model-server:8000/v1
--model autoglm-phone-9b
--device-id "192.168.1.100:5555"
"打开支付宝查看余额"
--base-url http://model-server:8000/v1
--model autoglm-phone-9b
--device-id "192.168.1.100:5555"
"打开支付宝查看余额"
undefinedCommon Action Types
常见操作类型
The AutoGLM model outputs structured actions:
| Action | Description | Example |
|---|---|---|
| Open an app | |
| Tap screen element | |
| Input text | |
| Scroll/swipe | |
| Press back button | |
| Go to home screen | |
| Task complete | |
AutoGLM模型会输出结构化操作指令:
| 操作 | 描述 | 示例 |
|---|---|---|
| 打开应用 | |
| 点击屏幕元素 | |
| 输入文本 | |
| 滑动屏幕 | |
| 按下返回键 | |
| 返回主屏幕 | |
| 任务完成 | |
Model Selection Guide
模型选择指南
| Model | Use Case | Languages |
|---|---|---|
| Chinese apps (WeChat, Taobao, Meituan) | Chinese-optimized |
| International apps, mixed content | Chinese + English + others |
- HuggingFace: /
zai-org/AutoGLM-Phone-9Bzai-org/AutoGLM-Phone-9B-Multilingual - ModelScope: /
ZhipuAI/AutoGLM-Phone-9BZhipuAI/AutoGLM-Phone-9B-Multilingual
| 模型 | 使用场景 | 语言支持 |
|---|---|---|
| 中文应用(微信、淘宝、美团等) | 中文优化 |
| 国际应用、多语言内容 | 中文+英文+其他语言 |
- HuggingFace地址:/
zai-org/AutoGLM-Phone-9Bzai-org/AutoGLM-Phone-9B-Multilingual - ModelScope地址:/
ZhipuAI/AutoGLM-Phone-9BZhipuAI/AutoGLM-Phone-9B-Multilingual
Environment Variables Reference
环境变量参考
bash
undefinedbash
undefinedModel service
模型服务配置
export MODEL_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="autoglm-phone-9b"
export MODEL_API_KEY="" # Required for BigModel/ModelScope APIs
export MODEL_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="autoglm-phone-9b"
export MODEL_API_KEY="" # 使用BigModel/ModelScope API时必填
BigModel API
BigModel API配置
export BIGMODEL_API_KEY=""
export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"
export BIGMODEL_API_KEY=""
export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"
ModelScope API
ModelScope API配置
export MODELSCOPE_API_KEY=""
export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"
export MODELSCOPE_API_KEY=""
export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"
Device configuration
设备配置
export ADB_DEVICE_ID="" # Leave empty for auto-detect
export HDC_DEVICE_ID="" # HarmonyOS device ID
undefinedexport ADB_DEVICE_ID="" # 留空则自动检测
export HDC_DEVICE_ID="" # HarmonyOS设备ID
undefinedTroubleshooting
常见问题排查
Model output is garbled or very short chain-of-thought
模型输出混乱或思考过程极短
Cause: Incorrect vLLM/SGLang startup parameters.
Fix: Ensure (vLLM) and with are set. Check transformers version compatibility.
--chat-template-content-format string--mm-process-configmax_pixels:5000000原因:vLLM/SGLang启动参数配置错误。
解决方法:确保已设置(vLLM)和包含的参数。同时检查transformers版本兼容性。
--chat-template-content-format stringmax_pixels:5000000--mm-process-configadb devices
shows no devices
adb devicesadb devices
无设备显示
adb devicesFix:
- Verify USB cable supports data transfer (not charge-only)
- Accept "Allow USB debugging" dialog on phone
- Try
adb kill-server && adb start-server - Some devices require reboot after enabling developer options
解决方法:
- 确认USB数据线支持数据传输(而非仅充电)
- 在手机上点击“允许USB调试”弹窗
- 尝试执行
adb kill-server && adb start-server - 部分设备需开启开发者选项后重启
Text input not working on Android
Android设备无法输入文本
Fix: ADB Keyboard must be installed AND enabled:
bash
adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME解决方法:必须安装并启用ADB Keyboard:
bash
adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIMEAgent stuck in a loop
Agent陷入循环
Cause: Model cannot identify a path to complete the task.
Fix: The framework includes sensitive operation confirmation — ensure for purchase/delete tasks. For login/CAPTCHA screens, the agent supports human takeover.
confirm_sensitive=True原因:模型无法识别任务执行路径。
解决方法:框架内置敏感操作确认机制 — 对于购买/删除等任务,确保开启。对于登录/验证码界面,Agent支持人工接管。
confirm_sensitive=TruevLLM CUDA out of memory
vLLM出现CUDA内存不足
Fix: AutoGLM-Phone-9B requires ~20GB VRAM. Use for multi-GPU, or use the API service instead.
--tensor-parallel-size 2解决方法:AutoGLM-Phone-9B约需20GB显存。可使用实现多GPU部署,或改用API服务。
--tensor-parallel-size 2Connection refused to model server
无法连接到模型服务
Fix: Check firewall rules. For remote server:
bash
undefined解决方法:检查防火墙规则。对于远程服务器,执行以下命令测试连通性:
bash
undefinedTest connectivity
测试连接
curl http://YOUR_SERVER_IP:8000/v1/models
curl http://YOUR_SERVER_IP:8000/v1/models
Should return model list JSON
应返回模型列表JSON
undefinedundefinedHDC device not recognized (HarmonyOS)
HDC无法识别HarmonyOS设备
Fix: HarmonyOS NEXT (not earlier versions) is required. Enable developer mode in Settings → About → Version Number (tap 10 times rapidly).
解决方法:仅支持HarmonyOS NEXT(不支持旧版本)。在设置→关于→版本号(连续点击10次)开启开发者模式。
iOS Setup
iOS设备配置
For iPhone automation, see the dedicated setup guide:
bash
undefinediPhone自动化配置请参考专属指南:
bash
undefinedAfter configuring WebDriverAgent per docs/ios_setup/ios_setup.md
按照docs/ios_setup/ios_setup.md配置WebDriverAgent后执行
python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
--device-type ios
"Open Maps and navigate to Central Park"
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
--device-type ios
"Open Maps and navigate to Central Park"
undefinedpython main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
--device-type ios
"Open Maps and navigate to Central Park"
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
--device-type ios
"Open Maps and navigate to Central Park"
undefined