open-autoglm-phone-agent

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Open-AutoGLM Phone Agent

Open-AutoGLM 手机Agent

Skill by ara.so — Daily 2026 Skills collection.
Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."
ara.so开发的技能 — 属于Daily 2026技能合集。
Open-AutoGLM是一款开源AI手机Agent框架,支持通过自然语言控制Android、HarmonyOS NEXT和iOS设备。它采用参数规模为90亿的AutoGLM多模态大模型来识别屏幕内容,并执行多步骤任务,例如“打开美团并搜索附近的火锅店”。

Architecture Overview

架构概述

User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions
  • Model: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
  • Device control: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
  • Model serving: vLLM or SGLang (self-hosted) or BigModel/ModelScope API
  • Input: Screenshot + task description → Output: structured action commands
用户自然语言 → AutoGLM多模态大模型 → 屏幕感知 → ADB/HDC/WebDriverAgent → 设备操作
  • 模型:AutoGLM-Phone-9B(中文优化版)或AutoGLM-Phone-9B-Multilingual(多语言版)
  • 设备控制:ADB(Android)、HDC(HarmonyOS NEXT)、WebDriverAgent(iOS)
  • 模型部署:vLLM或SGLang(本地部署),或BigModel/ModelScope API(云端调用)
  • 输入输出:输入为截图+任务描述 → 输出为结构化操作指令

Installation

安装步骤

Prerequisites

前置条件

  • Python 3.10+
  • ADB installed and in PATH (Android) or HDC (HarmonyOS) or WebDriverAgent (iOS)
  • Android device with Developer Mode + USB Debugging enabled
  • ADB Keyboard APK installed on Android device (for text input)
  • Python 3.10及以上版本
  • 已安装ADB(Android)、HDC(HarmonyOS)或WebDriverAgent(iOS)并配置到系统PATH
  • Android设备已开启开发者模式和USB调试
  • Android设备已安装ADB Keyboard APK(用于文本输入)

Install the framework

安装框架

bash
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .
bash
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .

Verify ADB connection

验证设备连接

bash
undefined
bash
undefined

Android

Android设备

adb devices
adb devices

Expected: emulator-5554 device

预期输出:emulator-5554 device

HarmonyOS NEXT

HarmonyOS NEXT设备

hdc list targets
hdc list targets

Expected: 7001005458323933328a01bce01c2500

预期输出:7001005458323933328a01bce01c2500

undefined
undefined

Model Deployment Options

模型部署选项

Option A: Third-party API (Recommended for quick start)

选项A:第三方API(快速上手推荐)

BigModel (ZhipuAI)
bash
export BIGMODEL_API_KEY="your-bigmodel-api-key"
python main.py \
  --base-url https://open.bigmodel.cn/api/paas/v4 \
  --model "autoglm-phone" \
  --apikey $BIGMODEL_API_KEY \
  "打开美团搜索附近的火锅店"
ModelScope
bash
export MODELSCOPE_API_KEY="your-modelscope-api-key"
python main.py \
  --base-url https://api-inference.modelscope.cn/v1 \
  --model "ZhipuAI/AutoGLM-Phone-9B" \
  --apikey $MODELSCOPE_API_KEY \
  "open Meituan and find nearby hotpot"
BigModel(智谱AI)
bash
export BIGMODEL_API_KEY="你的BigModel API密钥"
python main.py \
  --base-url https://open.bigmodel.cn/api/paas/v4 \
  --model "autoglm-phone" \
  --apikey $BIGMODEL_API_KEY \
  "打开美团搜索附近的火锅店"
ModelScope
bash
export MODELSCOPE_API_KEY="你的ModelScope API密钥"
python main.py \
  --base-url https://api-inference.modelscope.cn/v1 \
  --model "ZhipuAI/AutoGLM-Phone-9B" \
  --apikey $MODELSCOPE_API_KEY \
  "open Meituan and find nearby hotpot"

Option B: Self-hosted with vLLM

选项B:基于vLLM本地部署

bash
undefined
bash
undefined

Install vLLM (or use official Docker: docker pull vllm/vllm-openai:v0.12.0)

安装vLLM(或使用官方Docker镜像:docker pull vllm/vllm-openai:v0.12.0)

pip install vllm
pip install vllm

Start model server (strictly follow these parameters)

启动模型服务(请严格遵循以下参数配置)

python3 -m vllm.entrypoints.openai.api_server
--served-model-name autoglm-phone-9b
--allowed-local-media-path /
--mm-encoder-tp-mode data
--mm_processor_cache_type shm
--mm_processor_kwargs '{"max_pixels":5000000}'
--max-model-len 25480
--chat-template-content-format string
--limit-mm-per-prompt '{"image":10}'
--model zai-org/AutoGLM-Phone-9B
--port 8000
undefined
python3 -m vllm.entrypoints.openai.api_server
--served-model-name autoglm-phone-9b
--allowed-local-media-path /
--mm-encoder-tp-mode data
--mm_processor_cache_type shm
--mm_processor_kwargs '{"max_pixels":5000000}'
--max-model-len 25480
--chat-template-content-format string
--limit-mm-per-prompt '{"image":10}'
--model zai-org/AutoGLM-Phone-9B
--port 8000
undefined

Option C: Self-hosted with SGLang

选项C:基于SGLang本地部署

bash
undefined
bash
undefined

Install SGLang or use: docker pull lmsysorg/sglang:v0.5.6.post1

安装SGLang或使用Docker镜像:docker pull lmsysorg/sglang:v0.5.6.post1

Inside container: pip install nvidia-cudnn-cu12==9.16.0.29

在容器内执行:pip install nvidia-cudnn-cu12==9.16.0.29

python3 -m sglang.launch_server
--model-path zai-org/AutoGLM-Phone-9B
--served-model-name autoglm-phone-9b
--context-length 25480
--mm-enable-dp-encoder
--mm-process-config '{"image":{"max_pixels":5000000}}'
--port 8000
undefined
python3 -m sglang.launch_server
--model-path zai-org/AutoGLM-Phone-9B
--served-model-name autoglm-phone-9b
--context-length 25480
--mm-enable-dp-encoder
--mm-process-config '{"image":{"max_pixels":5000000}}'
--port 8000
undefined

Verify deployment

验证部署结果

bash
python scripts/check_deployment_cn.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b
Expected output includes a
<think>...</think>
block followed by
<answer>do(action="Launch", app="...")
. If the chain-of-thought is very short or garbled, the model deployment has failed.
bash
python scripts/check_deployment_cn.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b
预期输出应包含
<think>...</think>
块,其后跟随
<answer>do(action="Launch", app="...")
若思考过程极短或内容混乱,则模型部署失败。

Running the Agent

运行Agent

Basic CLI usage

基础CLI使用

bash
undefined
bash
undefined

Android device (default)

Android设备(默认)

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
"打开小红书搜索美食"
python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
"打开小红书搜索美食"

HarmonyOS device

HarmonyOS设备

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
--device-type hdc
"打开设置查看WiFi"
python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b
--device-type hdc
"打开设置查看WiFi"

Multilingual model for English apps

多语言模型适配英文应用

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
"Open Instagram and search for travel photos"
undefined
python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
"Open Instagram and search for travel photos"
undefined

Key CLI parameters

关键CLI参数

ParameterDescriptionDefault
--base-url
Model service endpointRequired
--model
Model name on serverRequired
--apikey
API key for third-party servicesNone
--device-type
adb
(Android) or
hdc
(HarmonyOS)
adb
--device-id
Specific device serial numberAuto-detect
参数描述默认值
--base-url
模型服务端点必填
--model
服务端的模型名称必填
--apikey
第三方服务的API密钥
--device-type
设备类型:
adb
(Android)或
hdc
(HarmonyOS)
adb
--device-id
特定设备序列号自动检测

Python API Usage

Python API使用

Basic agent invocation

基础Agent调用

python
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig

config = AgentConfig(
    base_url="http://localhost:8000/v1",
    model="autoglm-phone-9b",
    device_type="adb",  # or "hdc" for HarmonyOS
)

agent = PhoneAgent(config)
python
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig

config = AgentConfig(
    base_url="http://localhost:8000/v1",
    model="autoglm-phone-9b",
    device_type="adb",  # HarmonyOS设备请改为"hdc"
)

agent = PhoneAgent(config)

Run a task

执行任务

result = agent.run("打开淘宝搜索蓝牙耳机") print(result)
undefined
result = agent.run("打开淘宝搜索蓝牙耳机") print(result)
undefined

Custom task with device selection

自定义任务与设备选择

python
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os

config = AgentConfig(
    base_url=os.environ["MODEL_BASE_URL"],
    model=os.environ["MODEL_NAME"],
    apikey=os.environ.get("MODEL_API_KEY"),
    device_type="adb",
    device_id="emulator-5554",  # specific device
)

agent = PhoneAgent(config)
python
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os

config = AgentConfig(
    base_url=os.environ["MODEL_BASE_URL"],
    model=os.environ["MODEL_NAME"],
    apikey=os.environ.get("MODEL_API_KEY"),
    device_type="adb",
    device_id="emulator-5554",  # 指定设备
)

agent = PhoneAgent(config)

Task with sensitive operation confirmation

带敏感操作确认的任务

result = agent.run( "在京东购买最便宜的蓝牙耳机", confirm_sensitive=True # prompt user before purchase actions )
undefined
result = agent.run( "在京东购买最便宜的蓝牙耳机", confirm_sensitive=True # 执行购买操作前将提示用户确认 )
undefined

Direct model API call (for testing/integration)

直接调用模型API(用于测试/集成)

python
import openai
import base64
import os
from pathlib import Path

client = openai.OpenAI(
    base_url=os.environ["MODEL_BASE_URL"],
    api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)
python
import openai
import base64
import os
from pathlib import Path

client = openai.OpenAI(
    base_url=os.environ["MODEL_BASE_URL"],
    api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)

Load screenshot

加载截图

screenshot_path = "screenshot.png" with open(screenshot_path, "rb") as f: image_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create( model="autoglm-phone-9b", messages=[ { "role": "user", "content": [ { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}, }, { "type": "text", "text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search", }, ], } ], )
print(response.choices[0].message.content)
screenshot_path = "screenshot.png" with open(screenshot_path, "rb") as f: image_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create( model="autoglm-phone-9b", messages=[ { "role": "user", "content": [ { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}, }, { "type": "text", "text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search", }, ], } ], )
print(response.choices[0].message.content)

Output format: <think>...</think>\n<answer>do(action="...", ...)

输出格式:<think>...</think>\n<answer>do(action="...", ...)

undefined
undefined

Parsing model action output

解析模型操作输出

python
import re

def parse_action(model_output: str) -> dict:
    """Parse AutoGLM model output into structured action."""
    # Extract answer block
    answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
    if not answer_match:
        return {"action": "unknown"}
    
    answer = answer_match.group(1).strip()
    
    # Parse do() call
    # Format: do(action="ActionName", param1="value1", param2="value2")
    action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
    if not action_match:
        return {"action": "unknown", "raw": answer}
    
    action_name = action_match.group(1)
    params_str = action_match.group(2)
    
    # Parse parameters
    params = {}
    for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
        params[param_match.group(1)] = param_match.group(2)
    
    return {"action": action_name, **params}
python
import re

def parse_action(model_output: str) -> dict:
    """将AutoGLM模型输出解析为结构化操作指令。"""
    # 提取answer块
    answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
    if not answer_match:
        return {"action": "unknown"}
    
    answer = answer_match.group(1).strip()
    
    # 解析do()调用
    # 格式:do(action="ActionName", param1="value1", param2="value2")
    action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
    if not action_match:
        return {"action": "unknown", "raw": answer}
    
    action_name = action_match.group(1)
    params_str = action_match.group(2)
    
    # 解析参数
    params = {}
    for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
        params[param_match.group(1)] = param_match.group(2)
    
    return {"action": action_name, **params}

Example usage

使用示例

output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")' action = parse_action(output)
output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")' action = parse_action(output)

{"action": "Launch", "app": "京东"}

{"action": "Launch", "app": "京东"}

undefined
undefined

ADB Device Control Patterns

ADB设备控制常用方法

Common ADB operations used by the agent

Agent调用的常见ADB操作

python
import subprocess

def take_screenshot(device_id: str = None) -> bytes:
    """Capture current device screen."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["exec-out", "screencap", "-p"])
    result = subprocess.run(cmd, capture_output=True)
    return result.stdout

def send_tap(x: int, y: int, device_id: str = None):
    """Tap at screen coordinates."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "tap", str(x), str(y)])
    subprocess.run(cmd)

def send_text_adb_keyboard(text: str, device_id: str = None):
    """Send text via ADB Keyboard (must be installed and enabled)."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    # Enable ADB keyboard first
    cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
    subprocess.run(cmd_enable)
    # Send text
    cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
                      "--es", "msg", text]
    subprocess.run(cmd_text)

def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
    """Swipe gesture on screen."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "swipe",
                str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
    subprocess.run(cmd)

def press_back(device_id: str = None):
    """Press Android back button."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
    subprocess.run(cmd)

def launch_app(package_name: str, device_id: str = None):
    """Launch app by package name."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "monkey", "-p", package_name, "-c",
                "android.intent.category.LAUNCHER", "1"])
    subprocess.run(cmd)
python
import subprocess

def take_screenshot(device_id: str = None) -> bytes:
    """捕获设备当前屏幕。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["exec-out", "screencap", "-p"])
    result = subprocess.run(cmd, capture_output=True)
    return result.stdout

def send_tap(x: int, y: int, device_id: str = None):
    """点击屏幕指定坐标。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "tap", str(x), str(y)])
    subprocess.run(cmd)

def send_text_adb_keyboard(text: str, device_id: str = None):
    """通过ADB Keyboard发送文本(需已安装并启用)。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    # 先启用ADB Keyboard
    cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
    subprocess.run(cmd_enable)
    # 发送文本
    cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
                      "--es", "msg", text]
    subprocess.run(cmd_text)

def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
    """屏幕滑动操作。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "swipe",
                str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
    subprocess.run(cmd)

def press_back(device_id: str = None):
    """按下Android返回键。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
    subprocess.run(cmd)

def launch_app(package_name: str, device_id: str = None):
    """通过包名启动应用。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "monkey", "-p", package_name, "-c",
                "android.intent.category.LAUNCHER", "1"])
    subprocess.run(cmd)

Midscene.js Integration

Midscene.js集成

For JavaScript/TypeScript automation using AutoGLM:
javascript
// .env configuration
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key

import { AndroidAgent } from "@midscene/android";

const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么?");
使用AutoGLM实现JavaScript/TypeScript自动化:
javascript
// .env配置
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key

import { AndroidAgent } from "@midscene/android";

const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么?");

Remote ADB (WiFi Debugging)

远程ADB(WiFi调试)

bash
undefined
bash
undefined

Connect device via USB first, then enable TCP/IP mode

先通过USB连接设备,然后开启TCP/IP模式

adb tcpip 5555
adb tcpip 5555

Get device IP address

获取设备IP地址

adb shell ip addr show wlan0
adb shell ip addr show wlan0

Connect wirelessly (disconnect USB after this)

无线连接(连接成功后断开USB)

adb connect 192.168.1.100:5555
adb connect 192.168.1.100:5555

Verify connection

验证连接

adb devices
adb devices

192.168.1.100:5555 device

预期输出:192.168.1.100:5555 device

Use with agent

结合Agent使用

python main.py
--base-url http://model-server:8000/v1
--model autoglm-phone-9b
--device-id "192.168.1.100:5555"
"打开支付宝查看余额"
undefined
python main.py
--base-url http://model-server:8000/v1
--model autoglm-phone-9b
--device-id "192.168.1.100:5555"
"打开支付宝查看余额"
undefined

Common Action Types

常见操作类型

The AutoGLM model outputs structured actions:
ActionDescriptionExample
Launch
Open an app
do(action="Launch", app="微信")
Tap
Tap screen element
do(action="Tap", element="搜索框")
Type
Input text
do(action="Type", text="火锅")
Swipe
Scroll/swipe
do(action="Swipe", direction="up")
Back
Press back button
do(action="Back")
Home
Go to home screen
do(action="Home")
Finish
Task complete
do(action="Finish", result="已完成搜索")
AutoGLM模型会输出结构化操作指令:
操作描述示例
Launch
打开应用
do(action="Launch", app="微信")
Tap
点击屏幕元素
do(action="Tap", element="搜索框")
Type
输入文本
do(action="Type", text="火锅")
Swipe
滑动屏幕
do(action="Swipe", direction="up")
Back
按下返回键
do(action="Back")
Home
返回主屏幕
do(action="Home")
Finish
任务完成
do(action="Finish", result="已完成搜索")

Model Selection Guide

模型选择指南

ModelUse CaseLanguages
AutoGLM-Phone-9B
Chinese apps (WeChat, Taobao, Meituan)Chinese-optimized
AutoGLM-Phone-9B-Multilingual
International apps, mixed contentChinese + English + others
  • HuggingFace:
    zai-org/AutoGLM-Phone-9B
    /
    zai-org/AutoGLM-Phone-9B-Multilingual
  • ModelScope:
    ZhipuAI/AutoGLM-Phone-9B
    /
    ZhipuAI/AutoGLM-Phone-9B-Multilingual
模型使用场景语言支持
AutoGLM-Phone-9B
中文应用(微信、淘宝、美团等)中文优化
AutoGLM-Phone-9B-Multilingual
国际应用、多语言内容中文+英文+其他语言
  • HuggingFace地址:
    zai-org/AutoGLM-Phone-9B
    /
    zai-org/AutoGLM-Phone-9B-Multilingual
  • ModelScope地址:
    ZhipuAI/AutoGLM-Phone-9B
    /
    ZhipuAI/AutoGLM-Phone-9B-Multilingual

Environment Variables Reference

环境变量参考

bash
undefined
bash
undefined

Model service

模型服务配置

export MODEL_BASE_URL="http://localhost:8000/v1" export MODEL_NAME="autoglm-phone-9b" export MODEL_API_KEY="" # Required for BigModel/ModelScope APIs
export MODEL_BASE_URL="http://localhost:8000/v1" export MODEL_NAME="autoglm-phone-9b" export MODEL_API_KEY="" # 使用BigModel/ModelScope API时必填

BigModel API

BigModel API配置

export BIGMODEL_API_KEY="" export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"
export BIGMODEL_API_KEY="" export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"

ModelScope API

ModelScope API配置

export MODELSCOPE_API_KEY="" export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"
export MODELSCOPE_API_KEY="" export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"

Device configuration

设备配置

export ADB_DEVICE_ID="" # Leave empty for auto-detect export HDC_DEVICE_ID="" # HarmonyOS device ID
undefined
export ADB_DEVICE_ID="" # 留空则自动检测 export HDC_DEVICE_ID="" # HarmonyOS设备ID
undefined

Troubleshooting

常见问题排查

Model output is garbled or very short chain-of-thought

模型输出混乱或思考过程极短

Cause: Incorrect vLLM/SGLang startup parameters. Fix: Ensure
--chat-template-content-format string
(vLLM) and
--mm-process-config
with
max_pixels:5000000
are set. Check transformers version compatibility.
原因:vLLM/SGLang启动参数配置错误。 解决方法:确保已设置
--chat-template-content-format string
(vLLM)和包含
max_pixels:5000000
--mm-process-config
参数。同时检查transformers版本兼容性。

adb devices
shows no devices

adb devices
无设备显示

Fix:
  1. Verify USB cable supports data transfer (not charge-only)
  2. Accept "Allow USB debugging" dialog on phone
  3. Try
    adb kill-server && adb start-server
  4. Some devices require reboot after enabling developer options
解决方法
  1. 确认USB数据线支持数据传输(而非仅充电)
  2. 在手机上点击“允许USB调试”弹窗
  3. 尝试执行
    adb kill-server && adb start-server
  4. 部分设备需开启开发者选项后重启

Text input not working on Android

Android设备无法输入文本

Fix: ADB Keyboard must be installed AND enabled:
bash
adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME
解决方法:必须安装并启用ADB Keyboard:
bash
adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME

Agent stuck in a loop

Agent陷入循环

Cause: Model cannot identify a path to complete the task. Fix: The framework includes sensitive operation confirmation — ensure
confirm_sensitive=True
for purchase/delete tasks. For login/CAPTCHA screens, the agent supports human takeover.
原因:模型无法识别任务执行路径。 解决方法:框架内置敏感操作确认机制 — 对于购买/删除等任务,确保开启
confirm_sensitive=True
。对于登录/验证码界面,Agent支持人工接管。

vLLM CUDA out of memory

vLLM出现CUDA内存不足

Fix: AutoGLM-Phone-9B requires ~20GB VRAM. Use
--tensor-parallel-size 2
for multi-GPU, or use the API service instead.
解决方法:AutoGLM-Phone-9B约需20GB显存。可使用
--tensor-parallel-size 2
实现多GPU部署,或改用API服务。

Connection refused to model server

无法连接到模型服务

Fix: Check firewall rules. For remote server:
bash
undefined
解决方法:检查防火墙规则。对于远程服务器,执行以下命令测试连通性:
bash
undefined

Test connectivity

测试连接

curl http://YOUR_SERVER_IP:8000/v1/models
curl http://YOUR_SERVER_IP:8000/v1/models

Should return model list JSON

应返回模型列表JSON

undefined
undefined

HDC device not recognized (HarmonyOS)

HDC无法识别HarmonyOS设备

Fix: HarmonyOS NEXT (not earlier versions) is required. Enable developer mode in Settings → About → Version Number (tap 10 times rapidly).
解决方法:仅支持HarmonyOS NEXT(不支持旧版本)。在设置→关于→版本号(连续点击10次)开启开发者模式。

iOS Setup

iOS设备配置

For iPhone automation, see the dedicated setup guide:
bash
undefined
iPhone自动化配置请参考专属指南:
bash
undefined

After configuring WebDriverAgent per docs/ios_setup/ios_setup.md

按照docs/ios_setup/ios_setup.md配置WebDriverAgent后执行

python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
--device-type ios
"Open Maps and navigate to Central Park"
undefined
python main.py
--base-url http://localhost:8000/v1
--model autoglm-phone-9b-multilingual
--device-type ios
"Open Maps and navigate to Central Park"
undefined