native-devtools-mcp-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Native Devtools MCP Automation

Native Devtools MCP自动化工具

Skill by ara.so — Devtools Skills collection.
native-devtools-mcp
is an MCP server that gives AI agents direct control over native desktop apps, Chrome/Electron browsers, and Android devices. It provides screenshots, OCR, accessibility-first element lookup, input simulation, window management, Chrome DevTools Protocol (CDP), and ADB — all in one local server.
Works with Claude Desktop, Claude Code, Cursor, and any MCP-compatible client.
ara.so开发的Skill——Devtools Skills系列之一。
native-devtools-mcp
是一款MCP服务器,可让AI Agent直接控制原生桌面应用、Chrome/Electron浏览器以及Android设备。它集截图、OCR、无障碍优先元素查找、输入模拟、窗口管理、Chrome DevTools Protocol(CDP)和ADB功能于一体,部署在本地服务器中。
兼容Claude Desktop、Claude Code、Cursor及所有支持MCP的客户端。

Platform Support

平台支持

  • macOS: Full support with Accessibility tree dispatch (preferred), screenshots, OCR (Vision), input simulation
  • Windows: UI Automation, screenshots, OCR (Windows Media OCR), input simulation
  • Android: ADB-based screenshots, uiautomator text lookup, input, app management
  • Chrome/Electron: CDP-based DOM automation for web content and Electron apps
  • macOS: 全面支持无障碍树调度(推荐方式)、截图、OCR(Vision)、输入模拟
  • Windows: UI自动化、截图、OCR(Windows Media OCR)、输入模拟
  • Android: 基于ADB的截图、uiautomator文本查找、输入、应用管理
  • Chrome/Electron: 基于CDP的Web内容与Electron应用DOM自动化

Installation

安装

Quick Start (no install)

快速启动(无需安装)

bash
npx -y native-devtools-mcp
bash
npx -y native-devtools-mcp

Global Install

全局安装

bash
npm install -g native-devtools-mcp
bash
npm install -g native-devtools-mcp

Build from Source (Rust)

从源码构建(Rust)

bash
git clone https://github.com/sh3ll3x3c/native-devtools-mcp
cd native-devtools-mcp
cargo build --release
bash
git clone https://github.com/sh3ll3x3c/native-devtools-mcp
cd native-devtools-mcp
cargo build --release

Binary: ./target/release/native-devtools-mcp

二进制文件路径: ./target/release/native-devtools-mcp

undefined
undefined

Setup Wizard

设置向导

Run the setup wizard to configure permissions and MCP clients:
bash
npx native-devtools-mcp setup
This will:
  1. Check permissions (Accessibility and Screen Recording on macOS)
  2. Detect MCP clients (Claude Desktop, Claude Code, Cursor)
  3. Write the correct configuration
运行设置向导配置权限与MCP客户端:
bash
npx native-devtools-mcp setup
该向导将:
  1. 检查权限(macOS系统的无障碍与屏幕录制权限)
  2. 检测MCP客户端(Claude Desktop、Claude Code、Cursor)
  3. 写入正确的配置信息

MCP Client Configuration

MCP客户端配置

Claude Desktop (macOS)

Claude Desktop(macOS)

Config file:
~/Library/Application Support/Claude/claude_desktop_config.json
json
{
  "mcpServers": {
    "native-devtools": {
      "command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
    }
  }
}
配置文件路径:
~/Library/Application Support/Claude/claude_desktop_config.json
json
{
  "mcpServers": {
    "native-devtools": {
      "command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
    }
  }
}

Claude Desktop (Windows)

Claude Desktop(Windows)

Config file:
%APPDATA%\Claude\claude_desktop_config.json
json
{
  "mcpServers": {
    "native-devtools": {
      "command": "C:\\path\\to\\native-devtools-mcp.exe"
    }
  }
}
配置文件路径:
%APPDATA%\Claude\claude_desktop_config.json
json
{
  "mcpServers": {
    "native-devtools": {
      "command": "C:\\path\\to\\native-devtools-mcp.exe"
    }
  }
}

Claude Code / Cursor / Other MCP Clients

Claude Code / Cursor / 其他MCP客户端

json
{
  "mcpServers": {
    "native-devtools": {
      "command": "npx",
      "args": ["-y", "native-devtools-mcp"]
    }
  }
}
json
{
  "mcpServers": {
    "native-devtools": {
      "command": "npx",
      "args": ["-y", "native-devtools-mcp"]
    }
  }
}

Claude Code Auto-Approval

Claude Code自动授权

To avoid approving every tool call, add to
.claude/settings.local.json
:
json
{
  "permissions": {
    "allow": ["mcp__native-devtools__*"]
  }
}
为避免每次工具调用都需手动授权,可添加配置到
.claude/settings.local.json
json
{
  "permissions": {
    "allow": ["mcp__native-devtools__*"]
  }
}

Three Approaches to Interaction

三种交互方式

1. Visual (Universal)

1. 视觉方式(通用型)

Works with any app — games, Qt apps, custom renderers, anything without an accessible tree.
Key Tools:
take_screenshot
,
find_text
,
click
,
type_text
,
find_image
适用于所有应用——游戏、Qt应用、自定义渲染器等无无障碍树的应用。
核心工具:
take_screenshot
,
find_text
,
click
,
type_text
,
find_image

2. AX Dispatch (macOS - Preferred for Native Apps)

2. AX调度(macOS - 原生应用推荐方式)

Element-precise automation for AppKit/SwiftUI apps. Doesn't move the cursor or steal focus.
Key Tools:
take_ax_snapshot
,
ax_click
,
ax_set_value
,
ax_select
针对AppKit/SwiftUI应用的精准元素自动化,无需移动光标或抢占焦点。
核心工具:
take_ax_snapshot
,
ax_click
,
ax_set_value
,
ax_select

3. CDP (Chrome/Electron)

3. CDP方式(Chrome/Electron)

DOM-level automation for web content and Electron apps.
Key Tools:
cdp_connect
,
cdp_find_elements
,
cdp_click
,
cdp_fill
,
cdp_navigate
针对Web内容与Electron应用的DOM级自动化。
核心工具:
cdp_connect
,
cdp_find_elements
,
cdp_click
,
cdp_fill
,
cdp_navigate

Core Tools Reference

核心工具参考

Screenshot & OCR

截图与OCR

typescript
// Take a full screen screenshot
take_screenshot()

// Take a window screenshot
take_screenshot(window_id: number)

// Take a region screenshot
take_screenshot(
  x: number,
  y: number,
  width: number,
  height: number
)

// Find text with OCR
find_text(
  text: string,
  window_id?: number,
  x?: number,
  y?: number,
  width?: number,
  height?: number
)
typescript
// 截取全屏截图
take_screenshot()

// 截取指定窗口截图
take_screenshot(window_id: number)

// 截取指定区域截图
take_screenshot(
  x: number,
  y: number,
  width: number,
  height: number
)

// 通过OCR查找文本
find_text(
  text: string,
  window_id?: number,
  x?: number,
  y?: number,
  width?: number,
  height?: number
)

Input Simulation

输入模拟

typescript
// Click at global coordinates
click(x: number, y: number)

// Click relative to window
click(x: number, y: number, window_id: number)

// Click relative to screenshot
click(x: number, y: number, screenshot_id: string)

// Double-click
click(x: number, y: number, double_click: true)

// Right-click
click(x: number, y: number, right_click: true)

// Drag
drag(
  from_x: number,
  from_y: number,
  to_x: number,
  to_y: number,
  window_id?: number
)

// Type text
type_text(text: string)

// Press key
press_key(key: string, modifiers?: string[])
// Examples: "Return", "Tab", "Escape"
// Modifiers: ["command"], ["control", "shift"]

// Scroll
scroll(delta_x: number, delta_y: number)
typescript
// 在全局坐标位置点击
click(x: number, y: number)

// 相对于窗口的位置点击
click(x: number, y: number, window_id: number)

// 相对于截图的位置点击
click(x: number, y: number, screenshot_id: string)

// 双击
click(x: number, y: number, double_click: true)

// 右键点击
click(x: number, y: number, right_click: true)

// 拖拽
drag(
  from_x: number,
  from_y: number,
  to_x: number,
  to_y: number,
  window_id?: number
)

// 输入文本
type_text(text: string)

// 按键
press_key(key: string, modifiers?: string[])
// 示例: "Return", "Tab", "Escape"
// 修饰键: ["command"], ["control", "shift"]

// 滚动
scroll(delta_x: number, delta_y: number)

Window Management

窗口管理

typescript
// List all windows
list_windows()

// Focus a window
focus_window(window_id: number)

// Launch an app
launch_app(app_name: string, args?: string[])

// Quit an app
quit_app(app_name: string)

// Record window frames
record_window(
  window_id: number,
  duration_ms: number,
  interval_ms?: number
)
typescript
// 列出所有窗口
list_windows()

// 聚焦指定窗口
focus_window(window_id: number)

// 启动应用
launch_app(app_name: string, args?: string[])

// 退出应用
quit_app(app_name: string)

// 录制窗口画面
record_window(
  window_id: number,
  duration_ms: number,
  interval_ms?: number
)

macOS Accessibility Tree (AX Dispatch)

macOS无障碍树(AX调度)

typescript
// Take AX snapshot
take_ax_snapshot(
  window_id?: number,
  include_descriptions?: boolean
)
// Returns tree with element UIDs (a1, a2, a3...)

// Click an AX element
ax_click(uid: string)

// Set value (text fields, sliders)
ax_set_value(uid: string, value: string)

// Select item (menus, lists)
ax_select(uid: string)

// Inspect element details
ax_inspect(uid: string)
AX Dispatch Flow Example:
typescript
// 1. Take AX snapshot of System Settings
const windows = await list_windows();
const settingsWindow = windows.find(w => w.app_name === "System Settings");
const snapshot = await take_ax_snapshot(settingsWindow.id);

// 2. Find the element you want (e.g., "a12" is the "Privacy & Security" button)
// The LLM reads the tree structure from the snapshot

// 3. Click it without moving the cursor
await ax_click("a12");

// 4. Take another snapshot to verify
const updatedSnapshot = await take_ax_snapshot(settingsWindow.id);
typescript
// 截取AX快照
take_ax_snapshot(
  window_id?: number,
  include_descriptions?: boolean
)
// 返回包含元素UID的树结构(a1, a2, a3...)

// 点击AX元素
ax_click(uid: string)

// 设置元素值(文本框、滑块等)
ax_set_value(uid: string, value: string)

// 选择元素(菜单、列表等)
ax_select(uid: string)

// 检查元素详情
ax_inspect(uid: string)
AX调度流程示例:
typescript
// 1. 截取系统设置的AX快照
const windows = await list_windows();
const settingsWindow = windows.find(w => w.app_name === "System Settings");
const snapshot = await take_ax_snapshot(settingsWindow.id);

// 2. 找到目标元素(例如:"a12"是"隐私与安全性"按钮)
// 由LLM从快照中读取树结构并识别元素

// 3. 无需移动光标即可点击
await ax_click("a12");

// 4. 再次截取快照验证操作结果
const updatedSnapshot = await take_ax_snapshot(settingsWindow.id);

Template Matching

模板匹配

typescript
// Load an image template
load_image(
  path: string,
  name: string
)

// Find the template in a screenshot
find_image(
  template_name: string,
  screenshot_id?: string,
  window_id?: number,
  threshold?: number  // 0.0-1.0, default 0.8
)
Template Matching Flow:
typescript
// 1. Save a reference image of a button/icon
// (manually crop from a screenshot)

// 2. Load it
await load_image("/path/to/button.png", "submit_button");

// 3. Take a screenshot
const screenshot = await take_screenshot();

// 4. Find the template
const matches = await find_image("submit_button", screenshot.id);

// 5. Click the first match
if (matches.length > 0) {
  await click(matches[0].x, matches[0].y);
}
typescript
// 加载图像模板
load_image(
  path: string,
  name: string
)

// 在截图中查找模板
find_image(
  template_name: string,
  screenshot_id?: string,
  window_id?: number,
  threshold?: number  // 0.0-1.0,默认值0.8
)
模板匹配流程:
typescript
// 1. 保存按钮/图标的参考图像
// (手动从截图中裁剪)

// 2. 加载图像模板
await load_image("/path/to/button.png", "submit_button");

// 3. 截取屏幕截图
const screenshot = await take_screenshot();

// 4. 查找模板位置
const matches = await find_image("submit_button", screenshot.id);

// 5. 点击第一个匹配位置
if (matches.length > 0) {
  await click(matches[0].x, matches[0].y);
}

Chrome DevTools Protocol (CDP)

Chrome DevTools Protocol(CDP)

typescript
// Connect to Chrome/Electron
cdp_connect(
  port: number,
  host?: string  // default "localhost"
)

// Navigate
cdp_navigate(url: string)

// Find elements (returns UIDs: d1, d2, d3...)
cdp_find_elements(
  query: string,
  limit?: number
)

// Take DOM snapshot
cdp_take_dom_snapshot()

// Click element
cdp_click(uid: string)

// Hover over element
cdp_hover(uid: string)

// Fill input field
cdp_fill(uid: string, value: string)

// Type into element
cdp_type(uid: string, text: string)

// Press key
cdp_press_key(key: string)
// Examples: "Enter", "Tab", "Escape"

// Wait for condition
cdp_wait_for(
  text?: string[],
  selector?: string,
  timeout_ms?: number
)

// Evaluate JavaScript
cdp_eval(expression: string)

// Handle alert/confirm/prompt
cdp_handle_dialog(accept: boolean, prompt_text?: string)

// Manage tabs
cdp_new_tab(url?: string)
cdp_close_tab(tab_id: string)
cdp_list_tabs()
cdp_switch_tab(tab_id: string)

// Inspect element
cdp_inspect_element(uid: string)

// Get element attributes
cdp_get_attributes(uid: string)

// Screenshot element
cdp_screenshot_element(uid: string)
CDP Flow Example:
typescript
// 1. Launch Chrome with remote debugging
await launch_app(
  "Google Chrome",
  ["--remote-debugging-port=9222", "--user-data-dir=/tmp/chrome-profile"]
);

// 2. Connect
await cdp_connect(9222);

// 3. Navigate
await cdp_navigate("https://example.com");

// 4. Find elements
const elements = await cdp_find_elements("search");
// Returns: [{ uid: "d1", tag: "input", text: "", ... }, ...]

// 5. Fill and submit
await cdp_fill("d1", "search query");
await cdp_press_key("Enter");

// 6. Wait for results
await cdp_wait_for(["Results"], null, 5000);

// 7. Take DOM snapshot
const dom = await cdp_take_dom_snapshot();
typescript
// 连接到Chrome/Electron
cdp_connect(
  port: number,
  host?: string  // 默认值 "localhost"
)

// 导航到指定URL
cdp_navigate(url: string)

// 查找元素(返回UID: d1, d2, d3...)
cdp_find_elements(
  query: string,
  limit?: number
)

// 截取DOM快照
cdp_take_dom_snapshot()

// 点击元素
cdp_click(uid: string)

// 悬停在元素上
cdp_hover(uid: string)

// 填充输入框
cdp_fill(uid: string, value: string)

// 在元素中输入文本
cdp_type(uid: string, text: string)

// 按键
cdp_press_key(key: string)
// 示例: "Enter", "Tab", "Escape"

// 等待条件满足
cdp_wait_for(
  text?: string[],
  selector?: string,
  timeout_ms?: number
)

// 执行JavaScript
cdp_eval(expression: string)

// 处理弹窗/确认框/提示框
cdp_handle_dialog(accept: boolean, prompt_text?: string)

// 标签页管理
cdp_new_tab(url?: string)
cdp_close_tab(tab_id: string)
cdp_list_tabs()
cdp_switch_tab(tab_id: string)

// 检查元素
cdp_inspect_element(uid: string)

// 获取元素属性
cdp_get_attributes(uid: string)

// 截取元素截图
cdp_screenshot_element(uid: string)
CDP流程示例:
typescript
// 1. 启动开启远程调试的Chrome
await launch_app(
  "Google Chrome",
  ["--remote-debugging-port=9222", "--user-data-dir=/tmp/chrome-profile"]
);

// 2. 建立CDP连接
await cdp_connect(9222);

// 3. 导航到指定页面
await cdp_navigate("https://example.com");

// 4. 查找元素
const elements = await cdp_find_elements("search");
// 返回结果: [{ uid: "d1", tag: "input", text: "", ... }, ...]

// 5. 填充内容并提交
await cdp_fill("d1", "search query");
await cdp_press_key("Enter");

// 6. 等待搜索结果加载
await cdp_wait_for(["Results"], null, 5000);

// 7. 截取DOM快照
const dom = await cdp_take_dom_snapshot();

Android (ADB)

Android(ADB)

typescript
// List connected devices
adb_devices()

// Take screenshot
adb_screenshot(device_id?: string)

// Find text (uiautomator)
adb_find_text(
  text: string,
  device_id?: string
)

// Tap coordinates
adb_tap(
  x: number,
  y: number,
  device_id?: string
)

// Type text
adb_type(text: string, device_id?: string)

// Press key
adb_press_key(
  key: string,
  device_id?: string
)
// Examples: "KEYCODE_HOME", "KEYCODE_BACK"

// Swipe
adb_swipe(
  from_x: number,
  from_y: number,
  to_x: number,
  to_y: number,
  duration_ms?: number,
  device_id?: string
)

// Launch app
adb_launch_app(
  package: string,
  device_id?: string
)

// Stop app
adb_stop_app(
  package: string,
  device_id?: string
)
typescript
// 列出已连接设备
adb_devices()

// 截取设备屏幕截图
adb_screenshot(device_id?: string)

// 查找文本(基于uiautomator)
adb_find_text(
  text: string,
  device_id?: string
)

// 点击指定坐标
adb_tap(
  x: number,
  y: number,
  device_id?: string
)

// 输入文本
adb_type(text: string, device_id?: string)

// 按键
adb_press_key(
  key: string,
  device_id?: string
)
// 示例: "KEYCODE_HOME", "KEYCODE_BACK"

// 滑动操作
adb_swipe(
  from_x: number,
  from_y: number,
  to_x: number,
  to_y: number,
  duration_ms?: number,
  device_id?: string
)

// 启动应用
adb_launch_app(
  package: string,
  device_id?: string
)

// 停止应用
adb_stop_app(
  package: string,
  device_id?: string
)

Common Patterns

常用模式

Pattern 1: Visual Navigation with OCR

模式1:基于OCR的视觉导航

typescript
// 1. Take a screenshot
const screenshot = await take_screenshot();

// 2. Find the target text
const matches = await find_text("Submit");

// 3. Click the first match
if (matches.length > 0) {
  await click(matches[0].center_x, matches[0].center_y, screenshot.id);
}
typescript
// 1. 截取屏幕截图
const screenshot = await take_screenshot();

// 2. 查找目标文本
const matches = await find_text("Submit");

// 3. 点击第一个匹配位置
if (matches.length > 0) {
  await click(matches[0].center_x, matches[0].center_y, screenshot.id);
}

Pattern 2: AX Dispatch on macOS (Preferred)

模式2:macOS平台AX调度(推荐)

typescript
// 1. List windows
const windows = await list_windows();
const targetWindow = windows.find(w => w.title.includes("Notes"));

// 2. Take AX snapshot
const snapshot = await take_ax_snapshot(targetWindow.id, true);

// 3. Find element by role/label in the tree
// (LLM identifies "a5" is the "New Note" button from the snapshot)

// 4. Click without moving cursor
await ax_click("a5");

// 5. Set value in text field (e.g., "a8")
await ax_set_value("a8", "Meeting notes for 2026-05-18");
typescript
// 1. 列出所有窗口
const windows = await list_windows();
const targetWindow = windows.find(w => w.title.includes("Notes"));

// 2. 截取AX快照
const snapshot = await take_ax_snapshot(targetWindow.id, true);

// 3. 通过角色/标签在树结构中查找元素
// (LLM从快照中识别出"a5"是"新建笔记"按钮)

// 4. 无需移动光标即可点击
await ax_click("a5");

// 5. 在文本框中输入内容(例如元素"a8")
await ax_set_value("a8", "2026-05-18会议纪要");

Pattern 3: Web Automation with CDP

模式3:基于CDP的Web自动化

typescript
// 1. Launch Chrome with debugging
await launch_app(
  "Google Chrome",
  ["--remote-debugging-port=9222", "--new-window", "https://github.com/login"]
);

// 2. Connect
await cdp_connect(9222);

// 3. Find login fields
const elements = await cdp_find_elements("login");
// Returns: [{ uid: "d1", tag: "input", ... }, { uid: "d2", tag: "input", type: "password", ... }]

// 4. Fill credentials from env
await cdp_fill("d1", process.env.GITHUB_USERNAME);
await cdp_fill("d2", process.env.GITHUB_PASSWORD);

// 5. Click submit button
const submitElements = await cdp_find_elements("Sign in");
await cdp_click(submitElements[0].uid);

// 6. Wait for redirect
await cdp_wait_for(null, "header", 10000);
typescript
// 1. 启动开启调试模式的Chrome
await launch_app(
  "Google Chrome",
  ["--remote-debugging-port=9222", "--new-window", "https://github.com/login"]
);

// 2. 建立CDP连接
await cdp_connect(9222);

// 3. 查找登录字段
const elements = await cdp_find_elements("login");
// 返回结果: [{ uid: "d1", tag: "input", ... }, { uid: "d2", tag: "input", type: "password", ... }]

// 4. 从环境变量中读取凭据并填充
await cdp_fill("d1", process.env.GITHUB_USERNAME);
await cdp_fill("d2", process.env.GITHUB_PASSWORD);

// 5. 点击提交按钮
const submitElements = await cdp_find_elements("Sign in");
await cdp_click(submitElements[0].uid);

// 6. 等待页面跳转
await cdp_wait_for(null, "header", 10000);

Pattern 4: Electron App Automation

模式4:Electron应用自动化

typescript
// 1. Launch Electron app with debugging
await launch_app(
  "Signal",
  ["--remote-debugging-port=9223"]
);

// 2. Connect CDP
await cdp_connect(9223);

// 3. Automate like a web app
await cdp_find_elements("compose");
await cdp_click("d1");
await cdp_type("d1", "Hello from automation!");
await cdp_press_key("Enter");
typescript
// 1. 启动开启调试模式的Electron应用
await launch_app(
  "Signal",
  ["--remote-debugging-port=9223"]
);

// 2. 建立CDP连接
await cdp_connect(9223);

// 3. 像操作Web应用一样自动化
await cdp_find_elements("compose");
await cdp_click("d1");
await cdp_type("d1", "来自自动化工具的消息!");
await cdp_press_key("Enter");

Pattern 5: Template Matching for Custom UI

模式5:自定义UI的模板匹配

typescript
// 1. Load icon template
await load_image("/path/to/gear-icon.png", "settings_icon");

// 2. Take screenshot
const screenshot = await take_screenshot();

// 3. Find icon
const matches = await find_image("settings_icon", screenshot.id, null, 0.85);

// 4. Click if found
if (matches.length > 0) {
  await click(matches[0].x, matches[0].y, screenshot.id);
}
typescript
// 1. 加载图标模板
await load_image("/path/to/gear-icon.png", "settings_icon");

// 2. 截取屏幕截图
const screenshot = await take_screenshot();

// 3. 查找图标位置
const matches = await find_image("settings_icon", screenshot.id, null, 0.85);

// 4. 找到后点击
if (matches.length > 0) {
  await click(matches[0].x, matches[0].y, screenshot.id);
}

Pattern 6: Android UI Automation

模式6:Android UI自动化

typescript
// 1. List devices
const devices = await adb_devices();
const deviceId = devices[0].id;

// 2. Take screenshot
const screenshot = await adb_screenshot(deviceId);

// 3. Find text
const matches = await adb_find_text("Settings", deviceId);

// 4. Tap
if (matches.length > 0) {
  await adb_tap(matches[0].center_x, matches[0].center_y, deviceId);
}

// 5. Type in a field
await adb_tap(500, 300, deviceId);  // Focus input
await adb_type("Hello Android", deviceId);
await adb_press_key("KEYCODE_ENTER", deviceId);
typescript
// 1. 列出已连接设备
const devices = await adb_devices();
const deviceId = devices[0].id;

// 2. 截取设备屏幕截图
const screenshot = await adb_screenshot(deviceId);

// 3. 查找文本
const matches = await adb_find_text("Settings", deviceId);

// 4. 点击匹配位置
if (matches.length > 0) {
  await adb_tap(matches[0].center_x, matches[0].center_y, deviceId);
}

// 5. 在输入框中输入内容
await adb_tap(500, 300, deviceId);  // 聚焦输入框
await adb_type("Hello Android", deviceId);
await adb_press_key("KEYCODE_ENTER", deviceId);

Operational Safety

操作安全提示

  • Hands off: When the agent is clicking/typing, don't move your mouse or type. Real hardware inputs conflict with simulated ones.
  • Focus matters: Ensure the target window is visible. If a popup steals focus, clicks may land in the wrong window.
  • Prefer AX Dispatch on macOS: For native apps, use
    take_ax_snapshot
    +
    ax_click
    /
    ax_set_value
    to avoid moving the cursor and stealing focus.
  • 请勿手动干预: 当Agent执行点击/输入操作时,请勿移动鼠标或手动输入,真实硬件输入会与模拟输入冲突。
  • 窗口焦点: 确保目标窗口处于可见状态,若弹窗抢占焦点,点击操作可能会落在错误窗口中。
  • macOS优先使用AX调度: 针对原生应用,使用
    take_ax_snapshot
    +
    ax_click
    /
    ax_set_value
    组合,避免移动光标和抢占焦点。

Permissions (macOS)

权限要求(macOS)

The server requires:
  • Accessibility: For input simulation and AX tree access
  • Screen Recording: For screenshots
Grant both in System Settings → Privacy & Security → Accessibility and Screen Recording.
Without these, clicks silently fail and screenshots return black rectangles.
服务器需要以下权限:
  • 无障碍权限: 用于输入模拟和AX树访问
  • 屏幕录制权限: 用于截图
请在系统设置→隐私与安全性→无障碍屏幕录制中授予权限。
若未授予权限,点击操作会静默失败,截图会返回黑色矩形。

Troubleshooting

故障排查

macOS: Clicks don't work

macOS:点击操作无效

  • Cause: Missing Accessibility permission
  • Fix: System Settings → Privacy & Security → Accessibility → enable the app
  • 原因: 缺少无障碍权限
  • 解决方法: 系统设置→隐私与安全性→无障碍→启用对应应用

macOS: Screenshots are black

macOS:截图显示黑色

  • Cause: Missing Screen Recording permission
  • Fix: System Settings → Privacy & Security → Screen Recording → enable the app
  • 原因: 缺少屏幕录制权限
  • 解决方法: 系统设置→隐私与安全性→屏幕录制→启用对应应用

CDP: Can't connect

CDP:无法连接

  • Cause: App not launched with
    --remote-debugging-port
  • Fix: Use
    launch_app
    with the correct args:
    typescript
    await launch_app("Google Chrome", ["--remote-debugging-port=9222"]);
  • 原因: 应用未使用
    --remote-debugging-port
    参数启动
  • 解决方法: 使用
    launch_app
    启动应用并添加正确参数:
    typescript
    await launch_app("Google Chrome", ["--remote-debugging-port=9222"]);

ADB: No devices found

ADB:未找到设备

  • Cause: USB debugging not enabled or device not connected
  • Fix:
    1. Enable USB debugging on Android device (Settings → Developer options)
    2. Connect via USB or Wi-Fi (
      adb tcpip 5555
      , then
      adb connect <ip>:5555
      )
    3. Run
      adb devices
      to verify
  • 原因: 未启用USB调试或设备未连接
  • 解决方法:
    1. 在Android设备上启用USB调试(设置→开发者选项)
    2. 通过USB或Wi-Fi连接(
      adb tcpip 5555
      ,然后
      adb connect <ip>:5555
    3. 运行
      adb devices
      验证连接

OCR finds nothing

OCR无法识别文本

  • Cause: Text too small, low contrast, or obscured
  • Workarounds:
    1. Use template matching instead (
      load_image
      +
      find_image
      )
    2. Use AX Dispatch on macOS (
      take_ax_snapshot
      )
    3. Use CDP for web content (
      cdp_find_elements
      )
  • 原因: 文本过小、对比度低或被遮挡
  • 替代方案:
    1. 使用模板匹配替代(
      load_image
      +
      find_image
    2. 在macOS上使用AX调度(
      take_ax_snapshot
    3. 针对Web内容使用CDP(
      cdp_find_elements

Windows: UI Automation elements missing

Windows:UI自动化元素缺失

  • Cause: Some Qt/Electron apps don't expose UI Automation
  • Workaround: Use visual approach (screenshots + OCR) or CDP for Electron
  • 原因: 部分Qt/Electron应用未暴露UI自动化接口
  • 替代方案: 使用视觉方式(截图+OCR)或针对Electron应用使用CDP

Real-World Examples

实际应用示例

Example 1: Automate System Settings on macOS

示例1:macOS系统设置自动化

typescript
// 1. Launch System Settings
await launch_app("System Settings");

// 2. Wait for window
const windows = await list_windows();
const settingsWindow = windows.find(w => w.app_name === "System Settings");

// 3. Take AX snapshot
const snapshot = await take_ax_snapshot(settingsWindow.id, true);

// 4. Find "Privacy & Security" in the sidebar (e.g., a12)
await ax_click("a12");

// 5. Take another snapshot to find the next element
const privacySnapshot = await take_ax_snapshot(settingsWindow.id, true);

// 6. Click "Screen Recording" (e.g., a25)
await ax_click("a25");
typescript
// 1. 启动系统设置
await launch_app("System Settings");

// 2. 等待窗口加载
const windows = await list_windows();
const settingsWindow = windows.find(w => w.app_name === "System Settings");

// 3. 截取AX快照
const snapshot = await take_ax_snapshot(settingsWindow.id, true);

// 4. 在侧边栏中找到"隐私与安全性"(例如元素a12)
await ax_click("a12");

// 5. 再次截取快照查找下一个元素
const privacySnapshot = await take_ax_snapshot(settingsWindow.id, true);

// 6. 点击"屏幕录制"(例如元素a25)
await ax_click("a25");

Example 2: Fill a Web Form with CDP

示例2:基于CDP的Web表单填充

typescript
// 1. Launch Chrome
await launch_app("Google Chrome", [
  "--remote-debugging-port=9222",
  "--new-window",
  "https://example.com/contact"
]);

// 2. Connect CDP
await cdp_connect(9222);

// 3. Find form fields
const fields = await cdp_find_elements("contact form");
// Returns: [
//   { uid: "d1", tag: "input", placeholder: "Name", ... },
//   { uid: "d2", tag: "input", placeholder: "Email", ... },
//   { uid: "d3", tag: "textarea", placeholder: "Message", ... }
// ]

// 4. Fill the form
await cdp_fill("d1", "John Doe");
await cdp_fill("d2", "john@example.com");
await cdp_fill("d3", "This is a test message.");

// 5. Submit
const submitBtn = await cdp_find_elements("Submit");
await cdp_click(submitBtn[0].uid);

// 6. Wait for confirmation
await cdp_wait_for(["Thank you"], null, 5000);
typescript
// 1. 启动Chrome
await launch_app("Google Chrome", [
  "--remote-debugging-port=9222",
  "--new-window",
  "https://example.com/contact"
]);

// 2. 建立CDP连接
await cdp_connect(9222);

// 3. 查找表单字段
const fields = await cdp_find_elements("contact form");
// 返回结果: [
//   { uid: "d1", tag: "input", placeholder: "Name", ... },
//   { uid: "d2", tag: "input", placeholder: "Email", ... },
//   { uid: "d3", tag: "textarea", placeholder: "Message", ... }
// ]

// 4. 填充表单
await cdp_fill("d1", "John Doe");
await cdp_fill("d2", "john@example.com");
await cdp_fill("d3", "这是一条测试消息。");

// 5. 提交表单
const submitBtn = await cdp_find_elements("Submit");
await cdp_click(submitBtn[0].uid);

// 6. 等待确认信息
await cdp_wait_for(["Thank you"], null, 5000);

Example 3: Android App Testing

示例3:Android应用测试

typescript
// 1. List devices
const devices = await adb_devices();
const device = devices[0].id;

// 2. Launch app
await adb_launch_app("com.example.app", device);

// 3. Wait and screenshot
await new Promise(resolve => setTimeout(resolve, 2000));
const screenshot = await adb_screenshot(device);

// 4. Find and tap "Sign In" button
const signInMatches = await adb_find_text("Sign In", device);
if (signInMatches.length > 0) {
  await adb_tap(signInMatches[0].center_x, signInMatches[0].center_y, device);
}

// 5. Fill username
const usernameMatches = await adb_find_text("Username", device);
await adb_tap(usernameMatches[0].center_x, usernameMatches[0].center_y, device);
await adb_type(process.env.TEST_USERNAME, device);

// 6. Fill password
await adb_press_key("KEYCODE_TAB", device);
await adb_type(process.env.TEST_PASSWORD, device);

// 7. Submit
await adb_press_key("KEYCODE_ENTER", device);
typescript
// 1. 列出已连接设备
const devices = await adb_devices();
const device = devices[0].id;

// 2. 启动应用
await adb_launch_app("com.example.app", device);

// 3. 等待应用加载并截图
await new Promise(resolve => setTimeout(resolve, 2000));
const screenshot = await adb_screenshot(device);

// 4. 查找并点击"登录"按钮
const signInMatches = await adb_find_text("Sign In", device);
if (signInMatches.length > 0) {
  await adb_tap(signInMatches[0].center_x, signInMatches[0].center_y, device);
}

// 5. 填充用户名
const usernameMatches = await adb_find_text("Username", device);
await adb_tap(usernameMatches[0].center_x, usernameMatches[0].center_y, device);
await adb_type(process.env.TEST_USERNAME, device);

// 6. 填充密码
await adb_press_key("KEYCODE_TAB", device);
await adb_type(process.env.TEST_PASSWORD, device);

// 7. 提交登录
await adb_press_key("KEYCODE_ENTER", device);

Example 4: Automate VS Code with CDP

示例4:基于CDP的VS Code自动化

typescript
// 1. Launch VS Code with debugging
await launch_app("Visual Studio Code", ["--remote-debugging-port=9224"]);

// 2. Connect
await cdp_connect(9224);

// 3. Open command palette
await cdp_press_key("Meta+Shift+P");  // Meta = Cmd on macOS, Win on Windows

// 4. Wait for palette
await cdp_wait_for(null, ".quick-input-widget", 2000);

// 5. Type command
await cdp_type(".quick-input-widget input", "File: Open File");
await cdp_press_key("Enter");

// 6. Navigate file picker with JS
await cdp_eval(`
  document.querySelector('.monaco-inputbox input').value = '/path/to/file.js';
  document.querySelector('.monaco-inputbox input').dispatchEvent(new Event('input'));
`);
await cdp_press_key("Enter");
typescript
// 1. 启动开启调试模式的VS Code
await launch_app("Visual Studio Code", ["--remote-debugging-port=9224"]);

// 2. 建立CDP连接
await cdp_connect(9224);

// 3. 打开命令面板
await cdp_press_key("Meta+Shift+P");  // Meta在macOS上是Cmd键,Windows上是Win键

// 4. 等待命令面板加载
await cdp_wait_for(null, ".quick-input-widget", 2000);

// 5. 输入命令
await cdp_type(".quick-input-widget input", "File: Open File");
await cdp_press_key("Enter");

// 6. 通过JavaScript导航文件选择器
await cdp_eval(`
  document.querySelector('.monaco-inputbox input').value = '/path/to/file.js';
  document.querySelector('.monaco-inputbox input').dispatchEvent(new Event('input'));
`);
await cdp_press_key("Enter");

Best Practices

最佳实践

  1. Always verify state: Take a screenshot or snapshot after an action to confirm it succeeded.
  2. Use the right tool for the job:
    • Native macOS apps → AX Dispatch
    • Web/Electron → CDP
    • Custom/legacy UI → Visual (screenshots + OCR or template matching)
  3. Handle timing: Add
    cdp_wait_for
    or manual delays after navigation/clicks before the next action.
  4. Reference env vars for secrets: Never hardcode credentials in automation scripts.
  5. Use
    record_window
    for debugging
    : Record a window's state over time to understand UI behavior.
  6. Test permissions early: Run
    npx native-devtools-mcp setup
    before writing automation scripts.
  1. 始终验证状态: 执行操作后截取截图或快照,确认操作成功。
  2. 选择合适的工具:
    • macOS原生应用 → AX调度
    • Web/Electron应用 → CDP
    • 自定义/遗留UI → 视觉方式(截图+OCR或模板匹配)
  3. 处理时序问题: 导航/点击操作后添加
    cdp_wait_for
    或手动延迟,再执行下一步操作。
  4. 使用环境变量存储敏感信息: 切勿在自动化脚本中硬编码凭据。
  5. 使用
    record_window
    调试
    : 录制窗口状态变化,分析UI行为。
  6. 提前测试权限: 编写自动化脚本前,先运行
    npx native-devtools-mcp setup
    配置权限。

Additional Resources

额外资源