mobilerun

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Mobilerun

Mobilerun

Mobilerun turns your Android phone into a tool that AI can control. Instead of manually tapping through apps, you connect your phone and let an AI agent do it for you -- navigate apps, fill out forms, extract information, automate repetitive tasks, or anything else you'd normally do by hand. It works with your own personal device through a simple app called Droidrun Portal, and everything happens through a straightforward API: take screenshots to see the screen, read the UI tree to understand what's on it, then tap, swipe, and type to interact. No rooting, no emulators, just your real phone controlled remotely.
Base URL:
https://api.mobilerun.ai/v1
Auth:
Authorization: Bearer <MOBILERUN_API_KEY>
Important: The base domain (
https://api.mobilerun.ai/
) returns 404. You must always include
/v1
in the path. All API calls should be made via
curl
. Example:
bash
curl -s https://api.mobilerun.ai/v1/devices \
  -H "Authorization: Bearer $MOBILERUN_API_KEY"
Mobilerun 可将你的Android手机转变为可由AI控制的工具。无需手动点击操作应用,你只需连接手机,即可让AI Agent代你完成操作——比如导航应用、填写表单、提取信息、自动化重复任务,或是任何你通常手动完成的操作。它通过一款名为Droidrun Portal的简易应用与你的个人设备配合工作,所有操作均通过简洁的API实现:截图查看屏幕内容、读取UI树了解当前界面元素,随后进行点击、滑动和输入等交互操作。无需Root权限,无需模拟器,只需远程控制你的真实手机即可。
基础URL:
https://api.mobilerun.ai/v1
认证方式:
Authorization: Bearer <MOBILERUN_API_KEY>
重要提示: 基础域名(
https://api.mobilerun.ai/
)会返回404错误,你必须始终在路径中包含
/v1
。所有API调用都应通过
curl
执行。示例:
bash
curl -s https://api.mobilerun.ai/v1/devices \
  -H "Authorization: Bearer $MOBILERUN_API_KEY"

Before You Start

开始之前

The API key (
MOBILERUN_API_KEY
) is already available -- OpenClaw handles credential setup before this skill loads. Do NOT ask the user for an API key. Just use it.
  1. Check for devices:
    bash
    curl -s https://api.mobilerun.ai/v1/devices \
      -H "Authorization: Bearer $MOBILERUN_API_KEY"
    • 200
      with a device in
      state: "ready"
      = good to go, skip all setup, just do what the user asked
    • 200
      but no devices or all
      state: "disconnected"
      = device issue (see step 2)
    • 401
      = key is invalid, expired, or revoked -- ask the user to check https://cloud.mobilerun.ai/api-keys
  2. Only if no ready device: tell the user the device status and suggest a fix:
    • No devices at all = user hasn't connected a phone yet, guide them to Portal APK (see reference.md)
    • Device with
      state: "disconnected"
      = Portal app lost connection, ask user to reopen it
  3. Confirm device is responsive (optional, only if first action fails):
    bash
    curl -s https://api.mobilerun.ai/v1/devices/{deviceId}/screenshot \
      -H "Authorization: Bearer $MOBILERUN_API_KEY" -o screenshot.png
    If this returns a PNG image, the device is working.
Key principle: If a device is ready, go straight to executing the user's request. Don't walk them through setup they've already completed.
Be smart about context gathering: Before taking actions or asking the user questions, use available tools to understand the situation. List packages to find the right app, take a screenshot to see the current screen, read the UI state to understand what's interactive. If the task is obvious (e.g. "change font size" clearly means go to Settings), just do it. Only ask the user when something is genuinely ambiguous.
What to show the user: Only report user-relevant device info: device name, state (
ready
/
disconnected
). Do NOT surface internal fields like
streamUrl
,
streamToken
, socket status,
assignedAt
,
terminatesAt
, or
taskCount
unless the user explicitly asks for technical details. If a device is
disconnected
, simply tell the user their phone is disconnected and ask them to open the Portal app and tap Connect. If they need help, walk them through the setup steps in reference.md.
Clean up cloud devices: Cloud devices consume credits while running. Always terminate cloud devices (
DELETE /devices/{deviceId}
) when you're done using them -- don't leave them running. This applies whether you provisioned the device yourself or finished a task on an existing cloud device that the user no longer needs.
Privacy: Screenshots and the UI tree can contain sensitive personal data. Never share or transmit this data to anyone other than the user. Never print, log, or reveal the
MOBILERUN_API_KEY
in chat -- use it only for API calls.

API密钥(
MOBILERUN_API_KEY
)已准备就绪——OpenClaw会在加载此技能前完成凭证配置。不要向用户索要API密钥,直接使用即可。
  1. 检查设备状态:
    bash
    curl -s https://api.mobilerun.ai/v1/devices \
      -H "Authorization: Bearer $MOBILERUN_API_KEY"
    • 返回
      200
      且存在
      state: "ready"
      的设备 = 可直接执行用户请求,跳过所有设置步骤
    • 返回
      200
      但无设备或所有设备
      state: "disconnected"
      = 设备连接问题(见步骤2)
    • 返回
      401
      = 密钥无效、过期或已被吊销——请用户检查https://cloud.mobilerun.ai/api-keys
  2. 仅当无就绪设备时: 告知用户设备状态并建议修复方案:
    • 无任何设备 = 用户尚未连接手机,引导他们安装Portal APK(参考reference.md
    • 设备状态为
      disconnected
      = Portal应用失去连接,请用户重新打开该应用
  3. 确认设备响应(可选,仅当首次操作失败时执行):
    bash
    curl -s https://api.mobilerun.ai/v1/devices/{deviceId}/screenshot \
      -H "Authorization: Bearer $MOBILERUN_API_KEY" -o screenshot.png
    如果返回PNG图片,则设备可正常工作。
核心原则: 如果设备处于就绪状态,直接执行用户请求。无需引导用户完成已完成的设置步骤。
智能收集上下文: 在执行操作或向用户提问前,使用可用工具了解当前情况。列出已安装应用以找到目标应用、截图查看当前屏幕、读取UI状态了解可交互元素。如果任务目标明确(例如“更改字体大小”显然是指进入设置),直接执行即可。仅当存在真正的歧义时,才向用户提问。
需向用户展示的信息: 仅报告与用户相关的设备信息:设备名称、状态(
ready
/
disconnected
)。不要暴露内部字段,如
streamUrl
streamToken
、套接字状态、
assignedAt
terminatesAt
taskCount
,除非用户明确要求技术细节。如果设备处于
disconnected
状态,只需告知用户手机已断开连接,并请他们打开Portal应用点击“连接”。如果用户需要帮助,引导他们查看reference.md中的设置步骤。
清理云设备: 云设备运行时会消耗点数。使用完毕后,务必终止云设备(
DELETE /devices/{deviceId}
)——不要让其持续运行。无论你是自行配置的设备,还是完成任务后用户不再需要的现有云设备,都需执行此操作。
隐私注意事项: 截图和UI树可能包含敏感个人数据。切勿将此类数据分享或传输给用户以外的任何人。切勿在聊天中打印、记录或泄露
MOBILERUN_API_KEY
——仅将其用于API调用。

Device Management

设备管理

Device States

设备状态

StateMeaning
creating
Device is being provisioned (cloud devices only)
assigned
Device is assigned but not yet ready
ready
Device is connected and accepting commands
disconnected
Connection lost -- Portal app may be closed or phone lost network
terminated
Device has been shut down (cloud devices only)
maintenance
Device is undergoing maintenance (cloud devices only)
unknown
Unexpected state
状态含义
creating
设备正在配置中(仅云设备)
assigned
设备已分配但尚未就绪
ready
设备已连接并可接收命令
disconnected
连接已丢失——Portal应用可能已关闭或手机失去网络连接
terminated
设备已关闭(仅云设备)
maintenance
设备正在维护中(仅云设备)
unknown
意外状态

List Devices

列出设备

GET /devices
Query params:
  • state
    -- filter by state (array, e.g.
    state=ready&state=assigned
    )
  • type
    --
    dedicated_emulated_device
    ,
    dedicated_physical_device
    ,
    dedicated_premium_device
  • name
    -- filter by device name (partial match)
  • page
    (default: 1),
    pageSize
    (default: 20)
  • orderBy
    --
    id
    ,
    createdAt
    ,
    updatedAt
    ,
    assignedAt
    (default:
    createdAt
    )
  • orderByDirection
    --
    asc
    ,
    desc
    (default:
    desc
    )
Response:
{ items: DeviceInfo[], pagination: Meta }
GET /devices
查询参数:
  • state
    -- 按状态筛选(数组,例如
    state=ready&state=assigned
  • type
    --
    dedicated_emulated_device
    ,
    dedicated_physical_device
    ,
    dedicated_premium_device
  • name
    -- 按设备名称筛选(部分匹配)
  • page
    (默认值:1),
    pageSize
    (默认值:20)
  • orderBy
    --
    id
    ,
    createdAt
    ,
    updatedAt
    ,
    assignedAt
    (默认值:
    createdAt
  • orderByDirection
    --
    asc
    ,
    desc
    (默认值:
    desc
响应格式:
{ items: DeviceInfo[], pagination: Meta }

Get Device Info

获取设备信息

GET /devices/{deviceId}
Returns device details including
state
,
stateMessage
,
type
, and more.
GET /devices/{deviceId}
返回设备详细信息,包括
state
stateMessage
type
等。

Get Device Count

获取设备数量

GET /devices/count
Returns a map of device types to counts.
GET /devices/count
返回设备类型与对应数量的映射表。

Provision a Cloud Device

配置云设备

Cloud devices require an active subscription. If the user's plan doesn't support it, the API will return a
403
error -- inform the user they need to terminate an existing device or upgrade at https://cloud.mobilerun.ai/billing. See reference.md for plan details.
POST /devices
Content-Type: application/json

{
  "name": "my-device",
  "apps": ["com.example.app"]
}
Query param:
  • deviceType
    --
    dedicated_emulated_device
    ,
    dedicated_physical_device
    ,
    dedicated_premium_device
After provisioning, wait for it to become ready:
GET /devices/{deviceId}/wait
This blocks until the device state transitions to
ready
.
Cloud device workflow:
  1. POST /devices?deviceType=dedicated_emulated_device
    -- provision, returns device in
    creating
    state
  2. GET /devices/{deviceId}/wait
    -- blocks until
    ready
  3. Use the
    deviceId
    for phone control or tasks
Temporary device for a task: When the user wants to run a task but has no ready device, provision a temporary cloud device, run the task on it, then clean up:
  1. POST /devices?deviceType=dedicated_emulated_device
    with
    {"name": "temp-task-device", "apps": [...]}
    -- include any apps the task needs
  2. GET /devices/{deviceId}/wait
    -- wait until ready
  3. POST /tasks
    with the new
    deviceId
    -- run the task
  4. Monitor via
    GET /tasks/{taskId}/status
    until the task finishes
  5. DELETE /devices/{deviceId}
    -- terminate the device after the task completes (or fails)
Always terminate temporary devices after use -- they consume credits while running.
云设备需要有效订阅。如果用户的套餐不支持该功能,API会返回
403
错误——告知用户需要终止现有设备或访问https://cloud.mobilerun.ai/billing升级套餐。套餐详情请参考[reference.md](./reference.md)。
POST /devices
Content-Type: application/json

{
  "name": "my-device",
  "apps": ["com.example.app"]
}
查询参数:
  • deviceType
    --
    dedicated_emulated_device
    ,
    dedicated_physical_device
    ,
    dedicated_premium_device
配置完成后,等待设备就绪:
GET /devices/{deviceId}/wait
该请求会阻塞,直到设备状态变为
ready
云设备工作流:
  1. POST /devices?deviceType=dedicated_emulated_device
    -- 配置设备,返回处于
    creating
    状态的设备
  2. GET /devices/{deviceId}/wait
    -- 阻塞等待直到设备状态变为
    ready
  3. 使用
    deviceId
    进行手机控制或任务执行
任务临时设备: 当用户想要执行任务但无就绪设备时,配置临时云设备,在其上运行任务,随后清理设备:
  1. POST /devices?deviceType=dedicated_emulated_device
    ,参数为
    {"name": "temp-task-device", "apps": [...]}
    —— 包含任务所需的所有应用
  2. GET /devices/{deviceId}/wait
    —— 等待设备就绪
  3. POST /tasks
    ,使用新的
    deviceId
    —— 执行任务
  4. 通过
    GET /tasks/{taskId}/status
    监控任务直到完成
  5. DELETE /devices/{deviceId}
    —— 任务完成(或失败)后终止设备
使用完毕后务必终止临时设备——它们运行时会消耗点数。

Terminate a Cloud Device

终止云设备

DELETE /devices/{deviceId}
Content-Type: application/json

{}
Personal devices cannot be terminated via the API. They disconnect when the Portal app is closed.
DELETE /devices/{deviceId}
Content-Type: application/json

{}
个人设备无法通过API终止,关闭Portal应用即可断开连接。

Get Device Time

获取设备时间

GET /devices/{deviceId}/time
Returns the current time on the device as a string.

GET /devices/{deviceId}/time
返回设备当前时间的字符串格式。

Screen Observation

屏幕监控

Take Screenshot

截图

GET /devices/{deviceId}/screenshot
Query param:
hideOverlay
(default:
false
)
Returns a PNG image as binary data. Use this to see what's currently displayed on screen.
GET /devices/{deviceId}/screenshot
查询参数:
hideOverlay
(默认值:
false
返回二进制格式的PNG图片,用于查看当前屏幕显示内容。

Get UI State (Accessibility Tree)

获取UI状态(无障碍树)

GET /devices/{deviceId}/ui-state
Query param:
filter
(default:
false
) -- set to
true
to filter out non-interactive elements.
Returns an
AndroidState
object with three sections:
GET /devices/{deviceId}/ui-state
查询参数:
filter
(默认值:
false
)—— 设置为
true
可过滤掉非交互元素。
返回包含三个部分的
AndroidState
对象:

phone_state

phone_state

json
{
  "keyboardVisible": false,
  "packageName": "app.lawnchair",
  "currentApp": "Lawnchair",
  "isEditable": false,
  "focusedElement": {
    "className": "string",
    "resourceId": "string",
    "text": "string"
  }
}
  • currentApp
    -- human-readable name of the foreground app
  • packageName
    -- Android package name of the foreground app
  • keyboardVisible
    -- whether the soft keyboard is showing
  • isEditable
    -- whether the currently focused element accepts text input
  • focusedElement
    -- details about the focused UI element (if any)
json
{
  "keyboardVisible": false,
  "packageName": "app.lawnchair",
  "currentApp": "Lawnchair",
  "isEditable": false,
  "focusedElement": {
    "className": "string",
    "resourceId": "string",
    "text": "string"
  }
}
  • currentApp
    -- 前台应用的可读名称
  • packageName
    -- 前台应用的Android包名
  • keyboardVisible
    -- 软键盘是否显示
  • isEditable
    -- 当前聚焦的元素是否接受文本输入
  • focusedElement
    -- 聚焦UI元素的详细信息(如果存在)

device_context

device_context

json
{
  "screen_bounds": { "width": 720, "height": 1616 },
  "display_metrics": {
    "density": 1.75,
    "densityDpi": 280,
    "scaledDensity": 1.75,
    "widthPixels": 720,
    "heightPixels": 1616
  },
  "filtering_params": {
    "min_element_size": 5,
    "overlay_offset": 0
  }
}
  • screen_bounds
    -- the actual screen resolution in pixels. All tap/swipe coordinates use this coordinate space.
  • display_metrics
    -- physical display properties (density, DPI)
json
{
  "screen_bounds": { "width": 720, "height": 1616 },
  "display_metrics": {
    "density": 1.75,
    "densityDpi": 280,
    "scaledDensity": 1.75,
    "widthPixels": 720,
    "heightPixels": 1616
  },
  "filtering_params": {
    "min_element_size": 5,
    "overlay_offset": 0
  }
}
  • screen_bounds
    -- 屏幕实际分辨率(像素)。所有点击/滑动坐标均使用此坐标系统。
  • display_metrics
    -- 屏幕物理属性(密度、DPI)

a11y_tree (Accessibility Tree)

a11y_tree(无障碍树)

A recursive tree of UI elements. Each node has:
json
{
  "className": "android.widget.TextView",
  "packageName": "app.lawnchair",
  "resourceId": "app.lawnchair:id/search_container",
  "text": "Search",
  "contentDescription": "",
  "boundsInScreen": { "left": 48, "top": 1420, "right": 671, "bottom": 1532 },
  "isClickable": true,
  "isLongClickable": false,
  "isEditable": false,
  "isScrollable": false,
  "isEnabled": true,
  "isVisibleToUser": true,
  "isCheckable": false,
  "isChecked": false,
  "isFocusable": false,
  "isFocused": false,
  "isSelected": false,
  "isPassword": false,
  "hint": "",
  "childCount": 0,
  "children": []
}
Key node fields:
  • text
    -- the visible text on the element
  • contentDescription
    -- accessibility label (useful when
    text
    is empty, e.g. icon buttons)
  • resourceId
    -- Android resource ID (e.g.
    com.app:id/button_ok
    ) -- useful for identifying elements
  • boundsInScreen
    -- pixel coordinates as
    {left, top, right, bottom}
    . To tap an element, calculate its center:
    x = (left + right) / 2
    ,
    y = (top + bottom) / 2
  • isClickable
    -- whether the element responds to taps
  • isEditable
    -- whether the element is a text input field
  • isScrollable
    -- whether the element supports scrolling (swipe gestures)
  • children
    -- nested child elements (the tree is recursive)
Example: reading a home screen
FrameLayout (0,0,720,1616)
  ScrollView (0,0,720,1616) [scrollable]
    FrameLayout (14,113,706,326)
      LinearLayout (42,128,706,310) [clickable]
        TextView (42,156,706,198) "Tap to set up"
  View (0,94,720,1574) "Home"
  TextView (14,1222,187,1422) "Phone" [clickable]
  TextView (187,1222,360,1422) "Contacts" [clickable]
  TextView (360,1222,533,1422) "Files" [clickable]
  TextView (533,1222,706,1422) "Chrome" [clickable]
  FrameLayout (48,1420,671,1532) "Search" [clickable]
To tap "Chrome": bounds are (533,1222,706,1422), so tap at x=(533+706)/2=619, y=(1222+1422)/2=1322.
Use
filter=true
for a cleaner tree focused on actionable elements (filters out non-interactive containers).

UI元素的递归树结构。每个节点包含:
json
{
  "className": "android.widget.TextView",
  "packageName": "app.lawnchair",
  "resourceId": "app.lawnchair:id/search_container",
  "text": "Search",
  "contentDescription": "",
  "boundsInScreen": { "left": 48, "top": 1420, "right": 671, "bottom": 1532 },
  "isClickable": true,
  "isLongClickable": false,
  "isEditable": false,
  "isScrollable": false,
  "isEnabled": true,
  "isVisibleToUser": true,
  "isCheckable": false,
  "isChecked": false,
  "isFocusable": false,
  "isFocused": false,
  "isSelected": false,
  "isPassword": false,
  "hint": "",
  "childCount": 0,
  "children": []
}
节点核心字段:
  • text
    -- 元素上的可见文本
  • contentDescription
    -- 无障碍标签(当
    text
    为空时有用,例如图标按钮)
  • resourceId
    -- Android资源ID(例如
    com.app:id/button_ok
    )—— 用于识别元素
  • boundsInScreen
    -- 像素坐标,格式为
    {left, top, right, bottom}
    。要点击元素,计算其中心坐标:
    x = (left + right) / 2
    ,
    y = (top + bottom) / 2
  • isClickable
    -- 元素是否响应点击
  • isEditable
    -- 元素是否为文本输入框
  • isScrollable
    -- 元素是否支持滚动(滑动手势)
  • children
    -- 嵌套的子元素(树结构为递归)
示例:读取主屏幕
FrameLayout (0,0,720,1616)
  ScrollView (0,0,720,1616) [可滚动]
    FrameLayout (14,113,706,326)
      LinearLayout (42,128,706,310) [可点击]
        TextView (42,156,706,198) "点击设置"
  View (0,94,720,1574) "主页"
  TextView (14,1222,187,1422) "电话" [可点击]
  TextView (187,1222,360,1422) "联系人" [可点击]
  TextView (360,1222,533,1422) "文件" [可点击]
  TextView (533,1222,706,1422) "Chrome" [可点击]
  FrameLayout (48,1420,671,1532) "搜索" [可点击]
要点击“Chrome”:坐标为(533,1222,706,1422),所以点击中心x=(533+706)/2=619,y=(1222+1422)/2=1322。
使用
filter=true
可获取更简洁的树结构,仅显示可操作元素(过滤掉非交互容器)。

Device Actions

设备操作

All action endpoints take a
deviceId
path parameter.
所有操作端点均需传入
deviceId
路径参数。

Tap

点击

POST /devices/{deviceId}/tap
Content-Type: application/json

{ "x": 540, "y": 960 }
Taps at pixel coordinates. Use the
screen_bounds
from UI state and element bounds from the a11y tree to calculate where to tap.
POST /devices/{deviceId}/tap
Content-Type: application/json

{ "x": 540, "y": 960 }
在指定像素坐标处点击。使用UI状态中的
screen_bounds
和无障碍树中的元素坐标计算点击位置。

Swipe

滑动

POST /devices/{deviceId}/swipe
Content-Type: application/json

{
  "startX": 540,
  "startY": 1200,
  "endX": 540,
  "endY": 400,
  "duration": 300
}
duration
is in milliseconds (minimum: 10). Common patterns:
  • Scroll down: swipe from bottom to top (high startY -> low endY)
  • Scroll up: swipe from top to bottom
  • Swipe left/right: adjust X coordinates, keep Y similar
POST /devices/{deviceId}/swipe
Content-Type: application/json

{
  "startX": 540,
  "startY": 1200,
  "endX": 540,
  "endY": 400,
  "duration": 300
}
duration
单位为毫秒(最小值:10)。常见滑动模式:
  • 向下滚动:从下往上滑动(startY值高,endY值低)
  • 向上滚动:从上往下滑动
  • 左右滑动:调整X坐标,保持Y坐标相近

Global Actions

全局操作

POST /devices/{deviceId}/global
Content-Type: application/json

{ "action": 2 }
Action codeButton
1
BACK
2
HOME
3
RECENT
POST /devices/{deviceId}/global
Content-Type: application/json

{ "action": 2 }
操作代码对应按钮
1
返回键
2
主页键
3
最近任务键

Type Text

输入文本

POST /devices/{deviceId}/keyboard
Content-Type: application/json

{ "text": "Hello world", "clear": false }
Types text into the currently focused input field.
  • clear: true
    -- clears the field before typing
  • Make sure an input field is focused first (check
    phone_state.isEditable
    )
  • If the keyboard isn't visible, you may need to tap on an input field first
POST /devices/{deviceId}/keyboard
Content-Type: application/json

{ "text": "Hello world", "clear": false }
在当前聚焦的输入框中输入文本。
  • clear: true
    -- 输入前清空输入框内容
  • 确保输入框已聚焦(检查
    phone_state.isEditable
  • 如果键盘未显示,可能需要先点击输入框

Press Key

按键

PUT /devices/{deviceId}/keyboard
Content-Type: application/json

{ "key": 66 }
Sends an Android keycode. Only text-input-related keycodes are supported.
KeycodeKey
4
BACK
61
TAB
66
ENTER
67
DEL (backspace)
112
FORWARD_DEL (delete)
For system navigation (home, back, recent), use
POST /devices/{id}/global
instead.
PUT /devices/{deviceId}/keyboard
Content-Type: application/json

{ "key": 66 }
发送Android按键码。仅支持与文本输入相关的按键码。
按键码对应按键
4
返回键
61
Tab键
66
回车键
67
退格键
112
删除键
系统导航操作(主页、返回、最近任务)请使用
POST /devices/{id}/global

Clear Input

清空输入

DELETE /devices/{deviceId}/keyboard
Clears the currently focused input field.

DELETE /devices/{deviceId}/keyboard
清空当前聚焦的输入框内容。

App Management

应用管理

List Installed Apps

列出已安装应用

GET /devices/{deviceId}/apps
Query param:
includeSystemApps
(default:
false
)
Returns an array of
AppInfo
:
json
{
  "packageName": "com.example.app",
  "label": "Example App",
  "versionName": "1.2.3",
  "versionCode": 123,
  "isSystemApp": false
}
GET /devices/{deviceId}/apps
查询参数:
includeSystemApps
(默认值:
false
返回
AppInfo
数组:
json
{
  "packageName": "com.example.app",
  "label": "示例应用",
  "versionName": "1.2.3",
  "versionCode": 123,
  "isSystemApp": false
}

List Package Names

列出包名

GET /devices/{deviceId}/packages
Query param:
includeSystemPackages
(default:
false
)
Returns a string array of package names. Lighter than the full app list.
GET /devices/{deviceId}/packages
查询参数:
includeSystemPackages
(默认值:
false
返回包名字符串数组,比完整应用列表更轻量化。

Install App

安装应用

POST /devices/{deviceId}/apps
Content-Type: application/json

{ "packageName": "com.example.app" }
Installs an app from the Mobilerun app library (not the Play Store directly). Takes a couple of minutes and there's no status endpoint -- you'd have to poll
GET /devices/{id}/apps
to confirm.
Prefer manually installing via Play Store instead. Open the Play Store app on the device, search for the app, and tap install -- this is faster and more reliable. Only use this API endpoint if the user explicitly asks for it.
On personal devices, this endpoint may fail because Android blocks app installations from unknown sources by default.
POST /devices/{deviceId}/apps
Content-Type: application/json

{ "packageName": "com.example.app" }
从Mobilerun应用库安装应用(并非直接从Play Store安装)。 安装需要几分钟时间,且无状态查询端点——需轮询
GET /devices/{id}/apps
确认安装状态。
优先建议通过Play Store手动安装。在设备上打开Play Store应用,搜索应用并点击安装——此方式更快更可靠。仅当用户明确要求时,才使用此API端点。
在个人设备上,此端点可能执行失败,因为Android默认阻止从未知来源安装应用。

Start App

启动应用

PUT /devices/{deviceId}/apps/{packageName}
Content-Type: application/json

{}
Optional body:
{ "activity": "com.example.app.MainActivity" }
-- to launch a specific activity. Usually omitting activity is fine; it launches the default/main activity.
PUT /devices/{deviceId}/apps/{packageName}
Content-Type: application/json

{}
可选请求体:
{ "activity": "com.example.app.MainActivity" }
—— 启动指定Activity。 通常可省略该参数,默认启动主Activity。

Stop App

停止应用

PATCH /devices/{deviceId}/apps/{packageName}
Content-Type: application/json

{}
PATCH /devices/{deviceId}/apps/{packageName}
Content-Type: application/json

{}

Uninstall App

卸载应用

DELETE /devices/{deviceId}/apps/{packageName}
Content-Type: application/json

{}

DELETE /devices/{deviceId}/apps/{packageName}
Content-Type: application/json

{}

App Library (Upload & Manage APKs)

应用库(上传与管理APK)

The app library stores APKs that can be pre-installed on cloud devices. Only one app per package name is allowed -- to update an app, delete the existing one first, then re-upload.
应用库用于存储可预安装在云设备上的APK。每个包名仅允许存在一个应用——如需更新应用,需先删除现有版本,再重新上传。

List Apps in Library

列出应用库中的应用

GET /apps
Query params:
  • page
    (default: 1),
    pageSize
    (default: 10)
  • source
    --
    all
    ,
    uploaded
    ,
    store
    ,
    queued
    (default:
    all
    )
  • query
    -- search by name
  • sortBy
    --
    createdAt
    ,
    name
    (default:
    createdAt
    )
  • order
    --
    asc
    ,
    desc
    (default:
    desc
    )
GET /apps
查询参数:
  • page
    (默认值:1),
    pageSize
    (默认值:10)
  • source
    --
    all
    ,
    uploaded
    ,
    store
    ,
    queued
    (默认值:
    all
  • query
    -- 按名称搜索
  • sortBy
    --
    createdAt
    ,
    name
    (默认值:
    createdAt
  • order
    --
    asc
    ,
    desc
    (默认值:
    desc

Get App by ID

通过ID获取应用信息

GET /apps/{id}
GET /apps/{id}

Upload an APK

上传APK

Uploading is a 3-step process:
Step 1: Create signed upload URL
POST /apps/create-signed-upload-url
Content-Type: application/json

{
  "displayName": "My App",
  "packageName": "com.example.myapp",
  "versionName": "1.0.0",
  "versionCode": 1,
  "targetSdk": 34,
  "sizeBytes": 5242880,
  "files": [
    { "fileName": "base.apk", "contentType": "application/vnd.android.package-archive" }
  ],
  "country": "US"
}
Required:
displayName
,
packageName
,
versionName
,
versionCode
,
targetSdk
,
sizeBytes
,
files
Optional:
description
,
iconURL
,
developerName
,
categoryName
,
ratingScore
,
ratingCount
Returns the app
id
and pre-signed R2 upload URLs for each file.
Step 2: Upload the APK file(s)
Upload each file directly to its pre-signed R2 URL using a PUT request.
Step 3: Confirm the upload
POST /apps/{id}/confirm-upload
Verifies the file exists in R2 and sets the app status to
available
.
If the upload failed, mark it:
POST /apps/{id}/mark-failed
上传分为3个步骤:
步骤1:创建签名上传URL
POST /apps/create-signed-upload-url
Content-Type: application/json

{
  "displayName": "我的应用",
  "packageName": "com.example.myapp",
  "versionName": "1.0.0",
  "versionCode": 1,
  "targetSdk": 34,
  "sizeBytes": 5242880,
  "files": [
    { "fileName": "base.apk", "contentType": "application/vnd.android.package-archive" }
  ],
  "country": "US"
}
必填参数:
displayName
,
packageName
,
versionName
,
versionCode
,
targetSdk
,
sizeBytes
,
files
可选参数:
description
,
iconURL
,
developerName
,
categoryName
,
ratingScore
,
ratingCount
返回应用
id
和每个文件的预签名R2上传URL。
步骤2:上传APK文件
使用PUT请求将每个文件直接上传到对应的预签名R2 URL。
步骤3:确认上传
POST /apps/{id}/confirm-upload
验证文件是否存在于R2中,并将应用状态设置为
available
如果上传失败,标记为失败:
POST /apps/{id}/mark-failed

Delete an App

删除应用

DELETE /apps/{id}
Removes the app from R2 storage and the database. Use this before re-uploading an app with the same package name.
DELETE /apps/{id}
从R2存储和数据库中删除应用。重新上传同包名的应用前需执行此操作。

Re-uploading an App

重新上传应用

Only one app per package name is allowed. To update:
  1. Find the existing app:
    GET /apps?query=com.example.myapp
  2. Delete it:
    DELETE /apps/{id}
  3. Upload the new version using the 3-step upload flow above

每个包名仅允许存在一个应用。如需更新:
  1. 查找现有应用:
    GET /apps?query=com.example.myapp
  2. 删除该应用:
    DELETE /apps/{id}
  3. 使用上述3步上传流程上传新版本

Tasks (AI Agent)

任务(AI Agent)

Instead of controlling a phone step-by-step, you can submit a natural language goal and let Mobilerun's AI agent execute it autonomously on the device with its own screen analysis, observe-act loop, and error recovery.
Tasks require a paid subscription with credits. If the user doesn't have an active plan, the API will return an error -- let them know they need a subscription at https://cloud.mobilerun.ai/billing. See reference.md for plan and credit details.
无需逐步控制手机,你可以提交自然语言描述的目标,让Mobilerun的AI Agent在设备上自主执行任务,它会自行进行屏幕分析、观察-操作循环和错误恢复。
任务执行需要付费订阅的点数。如果用户没有有效套餐,API会返回错误——告知用户需要访问https://cloud.mobilerun.ai/billing订阅套餐。套餐和点数详情请参考[reference.md](./reference.md)。

Run a Task

执行任务

POST /tasks
Content-Type: application/json

{
  "task": "Open Chrome and search for weather",
  "deviceId": "uuid-of-device",
  "llmModel": "google/gemini-3.1-flash-lite-preview"
}
Required fields:
  • task
    -- natural language description of what to do (min 1 char)
  • deviceId
    -- UUID of the device to run on. Must be a device in
    ready
    state.
Optional fields:
  • llmModel
    -- which model to use (default:
    google/gemini-3.1-flash-lite-preview
    , see
    GET /models
    for available models)
  • apps
    -- list of app package names to pre-install
  • credentials
    -- list of
    { packageName, credentialNames[] }
    for app logins
  • maxSteps
    -- max agent steps (default: 100)
  • reasoning
    -- enable reasoning/thinking (default: true). Always set to
    false
    unless the user explicitly requests it.
  • vision
    -- enable vision/screenshot analysis (default: false)
  • temperature
    -- LLM temperature (default: 0.5)
  • executionTimeout
    -- timeout in seconds (default: 1000)
  • outputSchema
    -- JSON schema for structured output (nullable). Only use when the user explicitly asks for structured/formatted data. When set, the agent returns its result as a JSON object matching the schema in the task's
    output
    field.
  • vpnCountry
    -- route through VPN in a specific country:
    US
    ,
    BR
    ,
    FR
    ,
    DE
    ,
    IN
    ,
    JP
    ,
    KR
    ,
    ZA
    . Only use if the task specifically requires a certain region. VPN adds latency -- avoid unless needed.
Returns:
json
{
  "id": "uuid",
  "streamUrl": "string"
}
POST /tasks
Content-Type: application/json

{
  "task": "打开Chrome并搜索天气",
  "deviceId": "uuid-of-device",
  "llmModel": "google/gemini-3.1-flash-lite-preview"
}
必填字段:
  • task
    -- 任务目标的自然语言描述(最少1个字符)
  • deviceId
    -- 执行任务的设备UUID,设备必须处于
    ready
    状态
可选字段:
  • llmModel
    -- 使用的模型(默认值:
    google/gemini-3.1-flash-lite-preview
    ,可通过
    GET /models
    查看可用模型)
  • apps
    -- 预安装的应用包名列表
  • credentials
    -- 应用登录凭证列表,格式为
    { packageName, credentialNames[] }
  • maxSteps
    -- Agent最大执行步数(默认值:100)
  • reasoning
    -- 启用推理/思考功能(默认值:true)。除非用户明确要求,否则始终设置为
    false
  • vision
    -- 启用视觉/截图分析(默认值:false)
  • temperature
    -- LLM温度参数(默认值:0.5)
  • executionTimeout
    -- 超时时间(秒,默认值:1000)
  • outputSchema
    -- 结构化输出的JSON schema(可为空)。仅当用户明确要求结构化/格式化数据时使用。设置后,Agent会在任务的
    output
    字段中返回符合该schema的JSON对象。
  • vpnCountry
    -- 通过指定国家的VPN路由:
    US
    ,
    BR
    ,
    FR
    ,
    DE
    ,
    IN
    ,
    JP
    ,
    KR
    ,
    ZA
    。仅当任务明确需要特定区域时使用。VPN会增加延迟——非必要时避免使用。
返回结果:
json
{
  "id": "uuid",
  "streamUrl": "string"
}

Writing Task Prompts

编写任务提示词

You don't see the phone screen -- the agent on the device does. Write prompts that describe what to achieve, not how to navigate the UI. The on-device agent will figure out the taps, swipes, and navigation itself.
Don't assume the UI -- describe the goal:
  • Bad:
    "Tap the three dots menu in the top right, then tap Settings, scroll down and tap the Dark Mode toggle"
  • Good:
    "Open Settings in the Chrome app and enable Dark Mode"
  • You don't know what the screen looks like. The on-device agent can see it -- let it handle the navigation.
Be specific about the important details:
  • Name the exact app (not "the browser" -- say "Chrome")
  • Specify exact text to type or send
  • Say what counts as success
  • Name the person, contact, or item to find
Examples by task type:
Simple action:
"Open the Settings app, go to Display, and enable Dark Mode"
Multi-step with messaging:
"Open WhatsApp, find the conversation with John Smith, and send: Running 10 minutes late, sorry!"
Information extraction:
"Open Chrome, go to amazon.com, search for 'wireless headphones', and report back the name and price of the top 3 results"
Form filling:
"Open Chrome, go to docs.google.com/forms/d/abc123, and fill in the form with: Name = Sarah Connor, Email = sarah@example.com, Department = Engineering. Then submit the form."
App configuration:
"Open Spotify, go to Settings, turn off Autoplay, set Audio Quality to Very High, and disable Canvas"
Verification / checking:
"Open Gmail, check if there are any unread emails from support@stripe.com in the last 24 hours, and tell me the subject lines"
Multi-app workflow:
"Open Google Maps, search for 'Italian restaurants near me', find the highest rated one that's currently open, then open Chrome and search for that restaurant's menu"
Break down complex goals -- tell the agent what you want, not the steps:
  • Bad:
    "Order me an Uber to work"
  • Good:
    "Open the Uber app, set the destination to 123 Main Street, select UberX, and stop before confirming the ride so I can review the price"
Include safety conditions when appropriate:
  • "If the app asks for login, stop and tell me"
  • "If the price is over $50, don't purchase -- just report the price"
你无法看到手机屏幕——设备上的Agent可以。编写提示词时,描述要实现的目标,而非UI导航步骤。设备上的Agent会自行决定点击、滑动和导航操作。
不要假设UI结构——描述目标即可:
  • 错误示例:
    "点击右上角的三点菜单,然后点击设置,向下滚动并点击深色模式开关"
  • 正确示例:
    "打开Chrome应用的设置并启用深色模式"
  • 你不知道屏幕的具体布局,设备上的Agent可以看到——让它处理导航操作。
明确重要细节:
  • 指定具体应用(不要说“浏览器”,要说“Chrome”)
  • 明确要输入或发送的文本
  • 说明成功的标准
  • 指定要查找的联系人、物品等
按任务类型分类的示例:
简单操作:
"打开设置应用,进入显示选项并启用深色模式"
多步骤消息任务:
"打开WhatsApp,找到与John Smith的对话,并发送:抱歉,我要晚10分钟到!"
信息提取:
"打开Chrome,访问amazon.com,搜索'无线耳机',并返回排名前三的商品名称和价格"
表单填写:
"打开Chrome,访问docs.google.com/forms/d/abc123,填写表单:姓名=Sarah Connor,邮箱=sarah@example.com,部门=Engineering。然后提交表单。"
应用配置:
"打开Spotify,进入设置,关闭自动播放,将音频质量设置为极高,并禁用Canvas"
验证/检查:
"打开Gmail,检查过去24小时内是否有来自support@stripe.com的未读邮件,并告知我邮件主题"
多应用工作流:
"打开Google Maps,搜索'附近的意大利餐厅',找到评分最高且当前营业的餐厅,然后打开Chrome搜索该餐厅的菜单"
拆分复杂目标——告诉Agent要做什么,而非步骤:
  • 错误示例:
    "帮我叫一辆Uber去上班"
  • 正确示例:
    "打开Uber应用,设置目的地为123 Main Street,选择UberX,在确认行程前停止操作,以便我查看价格"
必要时添加安全条件:
  • "如果应用要求登录,停止操作并告知我"
  • "如果价格超过50美元,不要下单——只需告知我价格"

Check Task Status

检查任务状态

GET /tasks/{task_id}/status
Use this to monitor task progress:
json
{
  "status": "running",
  "succeeded": null,
  "message": null,
  "output": null,
  "steps": 5,
  "lastResponse": { "event": "ManagerPlanEvent", "data": { ... } }
}
  • While running:
    lastResponse
    contains the agent's latest thinking, plan, and actions. Check this to understand what the agent is doing and where it's up to.
  • When finished:
    status
    is
    completed
    or
    failed
    ,
    message
    has the final answer or failure reason,
    succeeded
    is
    true
    /
    false
    ,
    lastResponse
    is
    null
    .
  • Statuses:
    created
    ,
    running
    ,
    paused
    ,
    completed
    ,
    failed
    ,
    cancelled
GET /tasks/{task_id}/status
用于监控任务进度:
json
{
  "status": "running",
  "succeeded": null,
  "message": null,
  "output": null,
  "steps": 5,
  "lastResponse": { "event": "ManagerPlanEvent", "data": { ... } }
}
  • 任务运行中:
    lastResponse
    包含Agent最新的思考、计划和操作。查看该字段可了解Agent正在执行的操作和当前进度。
  • 任务完成时:
    status
    completed
    failed
    message
    包含最终结果或失败原因
    succeeded
    true
    /
    false
    lastResponse
    null
  • 状态值:
    created
    ,
    running
    ,
    paused
    ,
    completed
    ,
    failed
    ,
    cancelled

Monitoring a Running Task

监控运行中的任务

After creating a task, follow this pattern:
  1. Immediately tell the user the task is running (task ID, what it's doing).
  2. After 5 seconds -- do the first status check. This catches quick tasks and confirms the agent started.
  3. After 30 seconds -- check again if still running.
  4. Subsequent checks -- use your judgement on the interval based on:
    • Task complexity -- a simple "open Chrome" task finishes fast; a multi-app workflow takes longer, so space out checks accordingly.
    • Progress -- if steps are increasing and
      lastResponse
      is changing, the agent is working well; you can wait longer between checks. If the step count and
      lastResponse
      haven't changed, the agent may be stuck; check sooner and consider warning the user.
    • Time elapsed -- the longer a task has been running successfully, the more you can trust it and wait between checks.
At each check:
  • Report to the user what the agent is doing (from
    lastResponse
    -- its current plan, thinking, what step it's on).
  • Optionally take a screenshot (
    GET /devices/{id}/screenshot
    ) to show the user what's on screen.
  • Optionally read the UI state (
    GET /devices/{id}/ui-state
    ) for more context.
  • Give the user a meaningful update, not just "still running" -- e.g. "The agent is on step 8, currently in the Settings app looking for display options."
When the task finishes:
  • Report the result (
    message
    ,
    succeeded
    ,
    output
    ).
  • If the task failed unexpectedly, auto-submit feedback (see Feedback section).
If the agent seems stuck:
  • Send a message via
    POST /tasks/{id}/message
    to nudge it in the right direction.
  • Let the user know and ask if they want to steer it or cancel.
创建任务后,按照以下流程操作:
  1. 立即告知用户任务已开始运行(提供任务ID和任务内容)。
  2. 5秒后——首次检查状态。可快速发现已完成的任务,并确认Agent已启动。
  3. 30秒后——如果任务仍在运行,再次检查状态。
  4. 后续检查——根据以下因素判断检查间隔:
    • 任务复杂度——简单的“打开Chrome”任务完成速度快;多应用工作流耗时更长,因此需拉长检查间隔。
    • 进度情况——如果步数在增加且
      lastResponse
      在变化,说明Agent运行正常;可延长检查间隔。如果步数和
      lastResponse
      未变化,Agent可能已卡住;需缩短检查间隔,并考虑提醒用户。
    • 已耗时——任务成功运行的时间越长,可适当延长检查间隔。
每次检查时:
  • 向用户报告Agent的当前操作(来自
    lastResponse
    ——其当前计划、思考和执行步骤)。
  • 可选:截图(
    GET /devices/{id}/screenshot
    )展示当前屏幕内容。
  • 可选:读取UI状态(
    GET /devices/{id}/ui-state
    )获取更多上下文。
  • 向用户提供有意义的更新,而非仅告知“仍在运行”——例如:“Agent已执行到第8步,当前正在设置应用中查找显示选项。”
任务完成时:
  • 报告任务结果(
    message
    ,
    succeeded
    ,
    output
    )。
  • 如果任务意外失败,自动提交反馈(见反馈部分)。
如果Agent似乎卡住:
  • 通过
    POST /tasks/{id}/message
    发送消息,引导Agent回到正确方向。
  • 告知用户并询问是否需要引导Agent或取消任务。

Send Message to Task

向任务发送消息

POST /tasks/{task_id}/message
Content-Type: application/json

{ "message": "Actually, search for 'weather in London' instead" }
Send instructions to steer a running agent task. Use this to correct the agent, provide additional context, or change direction mid-task. The message is queued and delivered to the agent at the next step.
POST /tasks/{task_id}/message
Content-Type: application/json

{ "message": "实际上,改为搜索'伦敦的天气'" }
发送指令引导运行中的Agent任务。用于纠正Agent操作、提供额外上下文或中途更改任务方向。消息会被加入队列,在Agent的下一个步骤中传递给它。

Cancel Task

取消任务

POST /tasks/{task_id}/cancel
POST /tasks/{task_id}/cancel

Get Task Details

获取任务详情

GET /tasks/{task_id}
Returns the full task object including configuration, status, and trajectory.
GET /tasks/{task_id}
返回完整的任务对象,包括配置、状态和执行轨迹。

List Tasks

列出任务

GET /tasks
Query params:
  • status
    --
    created
    ,
    running
    ,
    paused
    ,
    completed
    ,
    failed
    ,
    cancelled
  • orderBy
    --
    id
    ,
    createdAt
    ,
    finishedAt
    ,
    status
    (default:
    createdAt
    )
  • orderByDirection
    --
    asc
    ,
    desc
    (default:
    desc
    )
  • query
    -- search in task description (max 128 chars)
  • page
    (default: 1),
    pageSize
    (default: 20, max: 100)
GET /tasks
查询参数:
  • status
    --
    created
    ,
    running
    ,
    paused
    ,
    completed
    ,
    failed
    ,
    cancelled
  • orderBy
    --
    id
    ,
    createdAt
    ,
    finishedAt
    ,
    status
    (默认值:
    createdAt
  • orderByDirection
    --
    asc
    ,
    desc
    (默认值:
    desc
  • query
    -- 在任务描述中搜索(最多128个字符)
  • page
    (默认值:1),
    pageSize
    (默认值:20,最大值:100)

Task Screenshots & UI States

任务截图与UI状态

GET /tasks/{task_id}/screenshots         -- list all screenshot URLs
GET /tasks/{task_id}/screenshots/{index}  -- get screenshot at index
GET /tasks/{task_id}/ui_states            -- list all UI state URLs
GET /tasks/{task_id}/ui_states/{index}    -- get UI state at index
GET /tasks/{task_id}/screenshots         -- 列出所有截图URL
GET /tasks/{task_id}/screenshots/{index}  -- 获取指定索引的截图
GET /tasks/{task_id}/ui_states            -- 列出所有UI状态URL
GET /tasks/{task_id}/ui_states/{index}    -- 获取指定索引的UI状态

Get Task Trajectory

获取任务执行轨迹

GET /tasks/{task_id}/trajectory
Returns the full history of events from the task execution.
GET /tasks/{task_id}/trajectory
返回任务执行的完整事件历史。

Available LLM Models

可用LLM模型

GET /models
Returns the list of models available for tasks. Default:
google/gemini-3.1-flash-lite-preview
.
GET /models
返回可用于任务的模型列表。默认模型:
google/gemini-3.1-flash-lite-preview

List Tasks for a Device

列出设备的任务

GET /devices/{deviceId}/tasks
Query params:
page
,
pageSize
,
orderBy
,
orderByDirection

GET /devices/{deviceId}/tasks
查询参数:
page
,
pageSize
,
orderBy
,
orderByDirection

Feedback

反馈

Submit feedback to help improve the Mobilerun platform. This is important for identifying bugs and improving agent performance.
When to auto-submit feedback:
  • When a task fails unexpectedly
  • When the agent behaves incorrectly or produces wrong results
  • When API errors occur that seem like platform bugs
  • Include the
    taskId
    , error details, and what happened
When the user asks to submit feedback:
  • Ask for a few details (what happened, what they expected) but don't push hard
  • If they don't want to elaborate, just submit with whatever details you have
POST /feedback
Content-Type: application/json

{
  "title": "Task failed unexpectedly",
  "feedback": "The agent got stuck on the login screen and timed out after 50 steps.",
  "rating": 2,
  "taskId": "uuid-of-related-task"
}
Required fields:
  • title
    -- short summary (3-100 chars)
  • feedback
    -- detailed description (10-4000 chars)
  • rating
    -- 1 to 5
Optional fields:
  • taskId
    -- UUID of a related task
StatusMeaning
201
Feedback submitted
400
Validation error
401
Invalid or missing API key
429
Rate limited -- 15/day cap reached

提交反馈以帮助改进Mobilerun平台。这对发现Bug和提升Agent性能非常重要。
自动提交反馈的场景:
  • 任务意外失败时
  • Agent操作错误或返回错误结果时
  • 出现平台Bug类的API错误时
  • 需包含
    taskId
    、错误详情和事件描述
当用户要求提交反馈时:
  • 询问一些细节(发生了什么、预期结果是什么),但不要过度追问
  • 如果用户不想详细说明,使用已有信息提交即可
POST /feedback
Content-Type: application/json

{
"title": "任务意外失败",
"feedback": "Agent在登录界面卡住,50步后超时。",
"rating": 2,
"taskId": "uuid-of-related-task"
}
必填字段:
  • title
    -- 简短摘要(3-100个字符)
  • feedback
    -- 详细描述(10-4000个字符)
  • rating
    -- 1到5分
可选字段:
  • taskId
    -- 相关任务的UUID
状态码含义
201
反馈已提交
400
验证错误
401
API密钥无效或缺失
429
达到频率限制——每日最多提交15条

Common Patterns

常见模式

Observe-Act Loop: Most phone control tasks follow this cycle:
  1. Take a screenshot and/or read the UI state
  2. Decide what action to perform
  3. Execute the action (tap, type, swipe, etc.)
  4. Observe again to verify the result
  5. Repeat
Finding tap coordinates: Use
GET /devices/{id}/ui-state?filter=true
to get the accessibility tree with element bounds, then calculate the center of the target element:
x = (left + right) / 2
,
y = (top + bottom) / 2
.
When an action doesn't work:
  • Take a screenshot and re-read the UI state -- the screen may have changed or your tap coordinates may have been off.
  • If an element isn't visible, try scrolling (swipe up/down) to reveal it.
  • If a tap didn't register, recalculate coordinates from the latest UI state and try again.
  • If the app is unresponsive, try pressing HOME and reopening the app.
  • If you're stuck after 2-3 attempts, tell the user what's happening and ask how to proceed.
Typing into a field:
  1. Check
    phone_state.isEditable
    -- if false, tap the input field first
  2. Optionally clear existing text with
    clear: true
  3. Send the text via
    POST /devices/{id}/keyboard
观察-操作循环: 大多数手机控制任务遵循以下循环:
  1. 截图和/或读取UI状态
  2. 决定要执行的操作
  3. 执行操作(点击、输入、滑动等)
  4. 再次观察以验证结果
  5. 重复上述步骤
查找点击坐标: 使用
GET /devices/{id}/ui-state?filter=true
获取包含元素坐标的无障碍树,然后计算目标元素的中心坐标:
x = (left + right) / 2
,
y = (top + bottom) / 2
.
当操作失败时:
  • 截图并重新读取UI状态——屏幕可能已变化,或点击坐标有误。
  • 如果元素不可见,尝试滚动(上下滑动)以显示元素。
  • 如果点击未生效,根据最新的UI状态重新计算坐标并再次尝试。
  • 如果应用无响应,尝试按主页键后重新打开应用。
  • 如果尝试2-3次后仍失败,告知用户当前情况并询问如何继续。
在输入框中输入文本:
  1. 检查
    phone_state.isEditable
    ——如果为false,先点击输入框
  2. 可选:使用
    clear: true
    清空现有文本
  3. 通过
    POST /devices/{id}/keyboard
    发送文本

Two Ways to Control a Device

两种设备控制方式

You have two approaches -- choose based on the task:
  1. Direct control -- You drive the device step-by-step: screenshot, tap, swipe, type. Best for simple, quick actions on a single device.
  2. Mobilerun Agent -- Submit a natural language goal via
    POST /tasks
    and the agent executes it autonomously. Best for complex or multi-step tasks. Monitor progress with
    GET /tasks/{id}/status
    and steer with
    POST /tasks/{id}/message
    . Requires credits (paid plan).
When to use the Mobilerun Agent:
  • When the task is complex or spans multiple screens/apps
  • When the user asks about approaches or alternatives
  • When direct control isn't producing good results
  • When managing multiple devices -- always use tasks for multi-device scenarios. Direct control is sequential (one action at a time on one device), so controlling multiple devices by hand is too slow. Submit a task to each device and monitor them in parallel.
Breaking big goals into sub-tasks: If a goal is too complex for a single task (many steps, multiple apps, high chance of failure), break it into smaller sequential sub-tasks:
  1. Split the goal into clear, self-contained sub-goals
  2. Submit the first sub-task via
    POST /tasks
  3. Wait for it to complete, check the result
  4. If it succeeded, submit the next sub-task (the device is already in the right state from the previous task)
  5. Repeat until done
Example: "Order groceries from the Instacart app" could be:
  1. "Open Instacart and search for 'organic bananas', add the first result to cart"
  2. "Search for 'whole milk', add the first result to cart"
  3. "Go to cart and report back the total price -- do not checkout"
This gives you checkpoints between steps, lets you steer or abort early, and keeps each task focused so the agent is less likely to get lost.
Combining both approaches: You can mix direct control and tasks in the same workflow:
  • Use direct control to quickly set something up (open the right app, navigate to a screen), then launch a task for the complex part.
  • Let a task do the heavy lifting, then use direct control for a precise final action (e.g. verify a specific element on screen).
  • Use direct control for a quick check (screenshot to see what's on screen), then decide whether to handle it manually or submit a task.
Only suggest tools and approaches available through this skill -- do not recommend external tools like ADB, scrcpy, Appium, Tasker, etc.

你有两种控制方式——根据任务选择:
  1. 直接控制——你逐步驱动设备:截图、点击、滑动、输入。适用于单设备上的简单、快速操作。
  2. Mobilerun Agent——通过
    POST /tasks
    提交自然语言目标,由Agent自主执行。适用于复杂或多步骤任务。通过
    GET /tasks/{id}/status
    监控进度,通过
    POST /tasks/{id}/message
    引导任务。需要点数(付费套餐)。
何时使用Mobilerun Agent:
  • 任务复杂或涉及多个屏幕/应用时
  • 用户询问操作方式或替代方案时
  • 直接控制效果不佳时
  • 管理多设备时——多设备场景下始终使用任务。直接控制是串行的(一次只能在一个设备上执行一个操作),因此手动控制多设备效率极低。为每个设备提交任务并并行监控。
拆分大型目标为子任务: 如果单个任务无法处理复杂目标(步骤多、涉及多应用、失败风险高),将其拆分为多个独立的子任务:
  1. 将大目标拆分为清晰、独立的子目标
  2. 通过
    POST /tasks
    提交第一个子任务
  3. 等待任务完成,检查结果
  4. 如果成功,提交下一个子任务(设备已处于上一个任务完成后的状态)
  5. 重复直到完成所有子任务
示例:“从Instacart应用订购杂货”可拆分为:
  1. "打开Instacart,搜索'有机香蕉',将第一个结果加入购物车"
  2. "搜索'全脂牛奶',将第一个结果加入购物车"
  3. "进入购物车并报告总价——不要结账"
这样可在步骤间设置检查点,便于引导或提前终止任务,同时每个任务聚焦单一目标,降低Agent迷路的概率。
组合两种控制方式: 你可以在同一个工作流中混合使用直接控制和任务:
  • 使用直接控制快速完成设置(打开目标应用、导航到指定屏幕),然后启动任务处理复杂部分。
  • 让Agent完成主要工作,然后使用直接控制执行精确的最终操作(例如验证屏幕上的特定元素)。
  • 使用直接控制快速检查(截图查看屏幕内容),然后决定是手动处理还是提交任务。
仅推荐此技能中提供的工具和方式——不要推荐外部工具如ADB、scrcpy、Appium、Tasker等。

Error Handling

错误处理

All API errors follow this format:
json
{
  "title": "Unauthorized",
  "status": 401,
  "detail": "Invalid API key.",
  "errors": []
}
ErrorLikely causeWhat to do
401
Invalid or expired API keyAsk user to verify key at https://cloud.mobilerun.ai/api-keys
402
on
POST /tasks
Insufficient creditsUser needs to add credits or upgrade plan
403
with "limit reached"
Plan limit hit (max concurrent devices)User needs to terminate a device or upgrade
404
/
500
on device action
Device not found or invalid IDVerify device ID, re-list devices
Empty device listNo device connectedGuide user to connect via Portal APK (see reference.md)
Device
disconnected
Portal app closed or phone lost networkAsk user to check phone and reopen Portal
Billing/plan error on
POST /devices
Free plan, cloud devices need subscriptionTell user to check plans at https://cloud.mobilerun.ai/billing
Action fails on valid deviceDevice may be busy, locked, or unresponsiveTry taking a screenshot first to check state
所有API错误均遵循以下格式:
json
{
"title": "Unauthorized",
"status": 401,
"detail": "Invalid API key.",
"errors": []
}
错误可能原因处理方式
401
API密钥无效或过期请用户在https://cloud.mobilerun.ai/api-keys验证密钥
POST /tasks
返回
402
点数不足用户需要购买点数或升级套餐
403
并提示"limit reached"
达到套餐限制(最大并发设备数)用户需要终止一个设备或升级套餐
设备操作返回
404
/
500
设备不存在或ID无效验证设备ID,重新列出设备
设备列表为空无设备连接引导用户通过Portal APK连接设备(参考reference.md
设备状态为
disconnected
Portal应用已关闭或手机失去网络请用户检查手机并重新打开Portal应用
POST /devices
返回账单/套餐错误
免费套餐,云设备需要付费订阅告知用户查看https://cloud.mobilerun.ai/billing的套餐信息
有效设备上的操作失败设备可能繁忙、锁定或无响应先尝试截图检查设备状态