native-devtools-mcp-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNative Devtools MCP Automation
Native Devtools MCP自动化工具
Skill by ara.so — Devtools Skills collection.
native-devtools-mcpWorks with Claude Desktop, Claude Code, Cursor, and any MCP-compatible client.
由ara.so开发的Skill——Devtools Skills系列之一。
native-devtools-mcp兼容Claude Desktop、Claude Code、Cursor及所有支持MCP的客户端。
Platform Support
平台支持
- macOS: Full support with Accessibility tree dispatch (preferred), screenshots, OCR (Vision), input simulation
- Windows: UI Automation, screenshots, OCR (Windows Media OCR), input simulation
- Android: ADB-based screenshots, uiautomator text lookup, input, app management
- Chrome/Electron: CDP-based DOM automation for web content and Electron apps
- macOS: 全面支持无障碍树调度(推荐方式)、截图、OCR(Vision)、输入模拟
- Windows: UI自动化、截图、OCR(Windows Media OCR)、输入模拟
- Android: 基于ADB的截图、uiautomator文本查找、输入、应用管理
- Chrome/Electron: 基于CDP的Web内容与Electron应用DOM自动化
Installation
安装
Quick Start (no install)
快速启动(无需安装)
bash
npx -y native-devtools-mcpbash
npx -y native-devtools-mcpGlobal Install
全局安装
bash
npm install -g native-devtools-mcpbash
npm install -g native-devtools-mcpBuild from Source (Rust)
从源码构建(Rust)
bash
git clone https://github.com/sh3ll3x3c/native-devtools-mcp
cd native-devtools-mcp
cargo build --releasebash
git clone https://github.com/sh3ll3x3c/native-devtools-mcp
cd native-devtools-mcp
cargo build --releaseBinary: ./target/release/native-devtools-mcp
二进制文件路径: ./target/release/native-devtools-mcp
undefinedundefinedSetup Wizard
设置向导
Run the setup wizard to configure permissions and MCP clients:
bash
npx native-devtools-mcp setupThis will:
- Check permissions (Accessibility and Screen Recording on macOS)
- Detect MCP clients (Claude Desktop, Claude Code, Cursor)
- Write the correct configuration
运行设置向导配置权限与MCP客户端:
bash
npx native-devtools-mcp setup该向导将:
- 检查权限(macOS系统的无障碍与屏幕录制权限)
- 检测MCP客户端(Claude Desktop、Claude Code、Cursor)
- 写入正确的配置信息
MCP Client Configuration
MCP客户端配置
Claude Desktop (macOS)
Claude Desktop(macOS)
Config file:
~/Library/Application Support/Claude/claude_desktop_config.jsonjson
{
"mcpServers": {
"native-devtools": {
"command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
}
}
}配置文件路径:
~/Library/Application Support/Claude/claude_desktop_config.jsonjson
{
"mcpServers": {
"native-devtools": {
"command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
}
}
}Claude Desktop (Windows)
Claude Desktop(Windows)
Config file:
%APPDATA%\Claude\claude_desktop_config.jsonjson
{
"mcpServers": {
"native-devtools": {
"command": "C:\\path\\to\\native-devtools-mcp.exe"
}
}
}配置文件路径:
%APPDATA%\Claude\claude_desktop_config.jsonjson
{
"mcpServers": {
"native-devtools": {
"command": "C:\\path\\to\\native-devtools-mcp.exe"
}
}
}Claude Code / Cursor / Other MCP Clients
Claude Code / Cursor / 其他MCP客户端
json
{
"mcpServers": {
"native-devtools": {
"command": "npx",
"args": ["-y", "native-devtools-mcp"]
}
}
}json
{
"mcpServers": {
"native-devtools": {
"command": "npx",
"args": ["-y", "native-devtools-mcp"]
}
}
}Claude Code Auto-Approval
Claude Code自动授权
To avoid approving every tool call, add to :
.claude/settings.local.jsonjson
{
"permissions": {
"allow": ["mcp__native-devtools__*"]
}
}为避免每次工具调用都需手动授权,可添加配置到:
.claude/settings.local.jsonjson
{
"permissions": {
"allow": ["mcp__native-devtools__*"]
}
}Three Approaches to Interaction
三种交互方式
1. Visual (Universal)
1. 视觉方式(通用型)
Works with any app — games, Qt apps, custom renderers, anything without an accessible tree.
Key Tools: , , , ,
take_screenshotfind_textclicktype_textfind_image适用于所有应用——游戏、Qt应用、自定义渲染器等无无障碍树的应用。
核心工具: , , , ,
take_screenshotfind_textclicktype_textfind_image2. AX Dispatch (macOS - Preferred for Native Apps)
2. AX调度(macOS - 原生应用推荐方式)
Element-precise automation for AppKit/SwiftUI apps. Doesn't move the cursor or steal focus.
Key Tools: , , ,
take_ax_snapshotax_clickax_set_valueax_select针对AppKit/SwiftUI应用的精准元素自动化,无需移动光标或抢占焦点。
核心工具: , , ,
take_ax_snapshotax_clickax_set_valueax_select3. CDP (Chrome/Electron)
3. CDP方式(Chrome/Electron)
DOM-level automation for web content and Electron apps.
Key Tools: , , , ,
cdp_connectcdp_find_elementscdp_clickcdp_fillcdp_navigate针对Web内容与Electron应用的DOM级自动化。
核心工具: , , , ,
cdp_connectcdp_find_elementscdp_clickcdp_fillcdp_navigateCore Tools Reference
核心工具参考
Screenshot & OCR
截图与OCR
typescript
// Take a full screen screenshot
take_screenshot()
// Take a window screenshot
take_screenshot(window_id: number)
// Take a region screenshot
take_screenshot(
x: number,
y: number,
width: number,
height: number
)
// Find text with OCR
find_text(
text: string,
window_id?: number,
x?: number,
y?: number,
width?: number,
height?: number
)typescript
// 截取全屏截图
take_screenshot()
// 截取指定窗口截图
take_screenshot(window_id: number)
// 截取指定区域截图
take_screenshot(
x: number,
y: number,
width: number,
height: number
)
// 通过OCR查找文本
find_text(
text: string,
window_id?: number,
x?: number,
y?: number,
width?: number,
height?: number
)Input Simulation
输入模拟
typescript
// Click at global coordinates
click(x: number, y: number)
// Click relative to window
click(x: number, y: number, window_id: number)
// Click relative to screenshot
click(x: number, y: number, screenshot_id: string)
// Double-click
click(x: number, y: number, double_click: true)
// Right-click
click(x: number, y: number, right_click: true)
// Drag
drag(
from_x: number,
from_y: number,
to_x: number,
to_y: number,
window_id?: number
)
// Type text
type_text(text: string)
// Press key
press_key(key: string, modifiers?: string[])
// Examples: "Return", "Tab", "Escape"
// Modifiers: ["command"], ["control", "shift"]
// Scroll
scroll(delta_x: number, delta_y: number)typescript
// 在全局坐标位置点击
click(x: number, y: number)
// 相对于窗口的位置点击
click(x: number, y: number, window_id: number)
// 相对于截图的位置点击
click(x: number, y: number, screenshot_id: string)
// 双击
click(x: number, y: number, double_click: true)
// 右键点击
click(x: number, y: number, right_click: true)
// 拖拽
drag(
from_x: number,
from_y: number,
to_x: number,
to_y: number,
window_id?: number
)
// 输入文本
type_text(text: string)
// 按键
press_key(key: string, modifiers?: string[])
// 示例: "Return", "Tab", "Escape"
// 修饰键: ["command"], ["control", "shift"]
// 滚动
scroll(delta_x: number, delta_y: number)Window Management
窗口管理
typescript
// List all windows
list_windows()
// Focus a window
focus_window(window_id: number)
// Launch an app
launch_app(app_name: string, args?: string[])
// Quit an app
quit_app(app_name: string)
// Record window frames
record_window(
window_id: number,
duration_ms: number,
interval_ms?: number
)typescript
// 列出所有窗口
list_windows()
// 聚焦指定窗口
focus_window(window_id: number)
// 启动应用
launch_app(app_name: string, args?: string[])
// 退出应用
quit_app(app_name: string)
// 录制窗口画面
record_window(
window_id: number,
duration_ms: number,
interval_ms?: number
)macOS Accessibility Tree (AX Dispatch)
macOS无障碍树(AX调度)
typescript
// Take AX snapshot
take_ax_snapshot(
window_id?: number,
include_descriptions?: boolean
)
// Returns tree with element UIDs (a1, a2, a3...)
// Click an AX element
ax_click(uid: string)
// Set value (text fields, sliders)
ax_set_value(uid: string, value: string)
// Select item (menus, lists)
ax_select(uid: string)
// Inspect element details
ax_inspect(uid: string)AX Dispatch Flow Example:
typescript
// 1. Take AX snapshot of System Settings
const windows = await list_windows();
const settingsWindow = windows.find(w => w.app_name === "System Settings");
const snapshot = await take_ax_snapshot(settingsWindow.id);
// 2. Find the element you want (e.g., "a12" is the "Privacy & Security" button)
// The LLM reads the tree structure from the snapshot
// 3. Click it without moving the cursor
await ax_click("a12");
// 4. Take another snapshot to verify
const updatedSnapshot = await take_ax_snapshot(settingsWindow.id);typescript
// 截取AX快照
take_ax_snapshot(
window_id?: number,
include_descriptions?: boolean
)
// 返回包含元素UID的树结构(a1, a2, a3...)
// 点击AX元素
ax_click(uid: string)
// 设置元素值(文本框、滑块等)
ax_set_value(uid: string, value: string)
// 选择元素(菜单、列表等)
ax_select(uid: string)
// 检查元素详情
ax_inspect(uid: string)AX调度流程示例:
typescript
// 1. 截取系统设置的AX快照
const windows = await list_windows();
const settingsWindow = windows.find(w => w.app_name === "System Settings");
const snapshot = await take_ax_snapshot(settingsWindow.id);
// 2. 找到目标元素(例如:"a12"是"隐私与安全性"按钮)
// 由LLM从快照中读取树结构并识别元素
// 3. 无需移动光标即可点击
await ax_click("a12");
// 4. 再次截取快照验证操作结果
const updatedSnapshot = await take_ax_snapshot(settingsWindow.id);Template Matching
模板匹配
typescript
// Load an image template
load_image(
path: string,
name: string
)
// Find the template in a screenshot
find_image(
template_name: string,
screenshot_id?: string,
window_id?: number,
threshold?: number // 0.0-1.0, default 0.8
)Template Matching Flow:
typescript
// 1. Save a reference image of a button/icon
// (manually crop from a screenshot)
// 2. Load it
await load_image("/path/to/button.png", "submit_button");
// 3. Take a screenshot
const screenshot = await take_screenshot();
// 4. Find the template
const matches = await find_image("submit_button", screenshot.id);
// 5. Click the first match
if (matches.length > 0) {
await click(matches[0].x, matches[0].y);
}typescript
// 加载图像模板
load_image(
path: string,
name: string
)
// 在截图中查找模板
find_image(
template_name: string,
screenshot_id?: string,
window_id?: number,
threshold?: number // 0.0-1.0,默认值0.8
)模板匹配流程:
typescript
// 1. 保存按钮/图标的参考图像
// (手动从截图中裁剪)
// 2. 加载图像模板
await load_image("/path/to/button.png", "submit_button");
// 3. 截取屏幕截图
const screenshot = await take_screenshot();
// 4. 查找模板位置
const matches = await find_image("submit_button", screenshot.id);
// 5. 点击第一个匹配位置
if (matches.length > 0) {
await click(matches[0].x, matches[0].y);
}Chrome DevTools Protocol (CDP)
Chrome DevTools Protocol(CDP)
typescript
// Connect to Chrome/Electron
cdp_connect(
port: number,
host?: string // default "localhost"
)
// Navigate
cdp_navigate(url: string)
// Find elements (returns UIDs: d1, d2, d3...)
cdp_find_elements(
query: string,
limit?: number
)
// Take DOM snapshot
cdp_take_dom_snapshot()
// Click element
cdp_click(uid: string)
// Hover over element
cdp_hover(uid: string)
// Fill input field
cdp_fill(uid: string, value: string)
// Type into element
cdp_type(uid: string, text: string)
// Press key
cdp_press_key(key: string)
// Examples: "Enter", "Tab", "Escape"
// Wait for condition
cdp_wait_for(
text?: string[],
selector?: string,
timeout_ms?: number
)
// Evaluate JavaScript
cdp_eval(expression: string)
// Handle alert/confirm/prompt
cdp_handle_dialog(accept: boolean, prompt_text?: string)
// Manage tabs
cdp_new_tab(url?: string)
cdp_close_tab(tab_id: string)
cdp_list_tabs()
cdp_switch_tab(tab_id: string)
// Inspect element
cdp_inspect_element(uid: string)
// Get element attributes
cdp_get_attributes(uid: string)
// Screenshot element
cdp_screenshot_element(uid: string)CDP Flow Example:
typescript
// 1. Launch Chrome with remote debugging
await launch_app(
"Google Chrome",
["--remote-debugging-port=9222", "--user-data-dir=/tmp/chrome-profile"]
);
// 2. Connect
await cdp_connect(9222);
// 3. Navigate
await cdp_navigate("https://example.com");
// 4. Find elements
const elements = await cdp_find_elements("search");
// Returns: [{ uid: "d1", tag: "input", text: "", ... }, ...]
// 5. Fill and submit
await cdp_fill("d1", "search query");
await cdp_press_key("Enter");
// 6. Wait for results
await cdp_wait_for(["Results"], null, 5000);
// 7. Take DOM snapshot
const dom = await cdp_take_dom_snapshot();typescript
// 连接到Chrome/Electron
cdp_connect(
port: number,
host?: string // 默认值 "localhost"
)
// 导航到指定URL
cdp_navigate(url: string)
// 查找元素(返回UID: d1, d2, d3...)
cdp_find_elements(
query: string,
limit?: number
)
// 截取DOM快照
cdp_take_dom_snapshot()
// 点击元素
cdp_click(uid: string)
// 悬停在元素上
cdp_hover(uid: string)
// 填充输入框
cdp_fill(uid: string, value: string)
// 在元素中输入文本
cdp_type(uid: string, text: string)
// 按键
cdp_press_key(key: string)
// 示例: "Enter", "Tab", "Escape"
// 等待条件满足
cdp_wait_for(
text?: string[],
selector?: string,
timeout_ms?: number
)
// 执行JavaScript
cdp_eval(expression: string)
// 处理弹窗/确认框/提示框
cdp_handle_dialog(accept: boolean, prompt_text?: string)
// 标签页管理
cdp_new_tab(url?: string)
cdp_close_tab(tab_id: string)
cdp_list_tabs()
cdp_switch_tab(tab_id: string)
// 检查元素
cdp_inspect_element(uid: string)
// 获取元素属性
cdp_get_attributes(uid: string)
// 截取元素截图
cdp_screenshot_element(uid: string)CDP流程示例:
typescript
// 1. 启动开启远程调试的Chrome
await launch_app(
"Google Chrome",
["--remote-debugging-port=9222", "--user-data-dir=/tmp/chrome-profile"]
);
// 2. 建立CDP连接
await cdp_connect(9222);
// 3. 导航到指定页面
await cdp_navigate("https://example.com");
// 4. 查找元素
const elements = await cdp_find_elements("search");
// 返回结果: [{ uid: "d1", tag: "input", text: "", ... }, ...]
// 5. 填充内容并提交
await cdp_fill("d1", "search query");
await cdp_press_key("Enter");
// 6. 等待搜索结果加载
await cdp_wait_for(["Results"], null, 5000);
// 7. 截取DOM快照
const dom = await cdp_take_dom_snapshot();Android (ADB)
Android(ADB)
typescript
// List connected devices
adb_devices()
// Take screenshot
adb_screenshot(device_id?: string)
// Find text (uiautomator)
adb_find_text(
text: string,
device_id?: string
)
// Tap coordinates
adb_tap(
x: number,
y: number,
device_id?: string
)
// Type text
adb_type(text: string, device_id?: string)
// Press key
adb_press_key(
key: string,
device_id?: string
)
// Examples: "KEYCODE_HOME", "KEYCODE_BACK"
// Swipe
adb_swipe(
from_x: number,
from_y: number,
to_x: number,
to_y: number,
duration_ms?: number,
device_id?: string
)
// Launch app
adb_launch_app(
package: string,
device_id?: string
)
// Stop app
adb_stop_app(
package: string,
device_id?: string
)typescript
// 列出已连接设备
adb_devices()
// 截取设备屏幕截图
adb_screenshot(device_id?: string)
// 查找文本(基于uiautomator)
adb_find_text(
text: string,
device_id?: string
)
// 点击指定坐标
adb_tap(
x: number,
y: number,
device_id?: string
)
// 输入文本
adb_type(text: string, device_id?: string)
// 按键
adb_press_key(
key: string,
device_id?: string
)
// 示例: "KEYCODE_HOME", "KEYCODE_BACK"
// 滑动操作
adb_swipe(
from_x: number,
from_y: number,
to_x: number,
to_y: number,
duration_ms?: number,
device_id?: string
)
// 启动应用
adb_launch_app(
package: string,
device_id?: string
)
// 停止应用
adb_stop_app(
package: string,
device_id?: string
)Common Patterns
常用模式
Pattern 1: Visual Navigation with OCR
模式1:基于OCR的视觉导航
typescript
// 1. Take a screenshot
const screenshot = await take_screenshot();
// 2. Find the target text
const matches = await find_text("Submit");
// 3. Click the first match
if (matches.length > 0) {
await click(matches[0].center_x, matches[0].center_y, screenshot.id);
}typescript
// 1. 截取屏幕截图
const screenshot = await take_screenshot();
// 2. 查找目标文本
const matches = await find_text("Submit");
// 3. 点击第一个匹配位置
if (matches.length > 0) {
await click(matches[0].center_x, matches[0].center_y, screenshot.id);
}Pattern 2: AX Dispatch on macOS (Preferred)
模式2:macOS平台AX调度(推荐)
typescript
// 1. List windows
const windows = await list_windows();
const targetWindow = windows.find(w => w.title.includes("Notes"));
// 2. Take AX snapshot
const snapshot = await take_ax_snapshot(targetWindow.id, true);
// 3. Find element by role/label in the tree
// (LLM identifies "a5" is the "New Note" button from the snapshot)
// 4. Click without moving cursor
await ax_click("a5");
// 5. Set value in text field (e.g., "a8")
await ax_set_value("a8", "Meeting notes for 2026-05-18");typescript
// 1. 列出所有窗口
const windows = await list_windows();
const targetWindow = windows.find(w => w.title.includes("Notes"));
// 2. 截取AX快照
const snapshot = await take_ax_snapshot(targetWindow.id, true);
// 3. 通过角色/标签在树结构中查找元素
// (LLM从快照中识别出"a5"是"新建笔记"按钮)
// 4. 无需移动光标即可点击
await ax_click("a5");
// 5. 在文本框中输入内容(例如元素"a8")
await ax_set_value("a8", "2026-05-18会议纪要");Pattern 3: Web Automation with CDP
模式3:基于CDP的Web自动化
typescript
// 1. Launch Chrome with debugging
await launch_app(
"Google Chrome",
["--remote-debugging-port=9222", "--new-window", "https://github.com/login"]
);
// 2. Connect
await cdp_connect(9222);
// 3. Find login fields
const elements = await cdp_find_elements("login");
// Returns: [{ uid: "d1", tag: "input", ... }, { uid: "d2", tag: "input", type: "password", ... }]
// 4. Fill credentials from env
await cdp_fill("d1", process.env.GITHUB_USERNAME);
await cdp_fill("d2", process.env.GITHUB_PASSWORD);
// 5. Click submit button
const submitElements = await cdp_find_elements("Sign in");
await cdp_click(submitElements[0].uid);
// 6. Wait for redirect
await cdp_wait_for(null, "header", 10000);typescript
// 1. 启动开启调试模式的Chrome
await launch_app(
"Google Chrome",
["--remote-debugging-port=9222", "--new-window", "https://github.com/login"]
);
// 2. 建立CDP连接
await cdp_connect(9222);
// 3. 查找登录字段
const elements = await cdp_find_elements("login");
// 返回结果: [{ uid: "d1", tag: "input", ... }, { uid: "d2", tag: "input", type: "password", ... }]
// 4. 从环境变量中读取凭据并填充
await cdp_fill("d1", process.env.GITHUB_USERNAME);
await cdp_fill("d2", process.env.GITHUB_PASSWORD);
// 5. 点击提交按钮
const submitElements = await cdp_find_elements("Sign in");
await cdp_click(submitElements[0].uid);
// 6. 等待页面跳转
await cdp_wait_for(null, "header", 10000);Pattern 4: Electron App Automation
模式4:Electron应用自动化
typescript
// 1. Launch Electron app with debugging
await launch_app(
"Signal",
["--remote-debugging-port=9223"]
);
// 2. Connect CDP
await cdp_connect(9223);
// 3. Automate like a web app
await cdp_find_elements("compose");
await cdp_click("d1");
await cdp_type("d1", "Hello from automation!");
await cdp_press_key("Enter");typescript
// 1. 启动开启调试模式的Electron应用
await launch_app(
"Signal",
["--remote-debugging-port=9223"]
);
// 2. 建立CDP连接
await cdp_connect(9223);
// 3. 像操作Web应用一样自动化
await cdp_find_elements("compose");
await cdp_click("d1");
await cdp_type("d1", "来自自动化工具的消息!");
await cdp_press_key("Enter");Pattern 5: Template Matching for Custom UI
模式5:自定义UI的模板匹配
typescript
// 1. Load icon template
await load_image("/path/to/gear-icon.png", "settings_icon");
// 2. Take screenshot
const screenshot = await take_screenshot();
// 3. Find icon
const matches = await find_image("settings_icon", screenshot.id, null, 0.85);
// 4. Click if found
if (matches.length > 0) {
await click(matches[0].x, matches[0].y, screenshot.id);
}typescript
// 1. 加载图标模板
await load_image("/path/to/gear-icon.png", "settings_icon");
// 2. 截取屏幕截图
const screenshot = await take_screenshot();
// 3. 查找图标位置
const matches = await find_image("settings_icon", screenshot.id, null, 0.85);
// 4. 找到后点击
if (matches.length > 0) {
await click(matches[0].x, matches[0].y, screenshot.id);
}Pattern 6: Android UI Automation
模式6:Android UI自动化
typescript
// 1. List devices
const devices = await adb_devices();
const deviceId = devices[0].id;
// 2. Take screenshot
const screenshot = await adb_screenshot(deviceId);
// 3. Find text
const matches = await adb_find_text("Settings", deviceId);
// 4. Tap
if (matches.length > 0) {
await adb_tap(matches[0].center_x, matches[0].center_y, deviceId);
}
// 5. Type in a field
await adb_tap(500, 300, deviceId); // Focus input
await adb_type("Hello Android", deviceId);
await adb_press_key("KEYCODE_ENTER", deviceId);typescript
// 1. 列出已连接设备
const devices = await adb_devices();
const deviceId = devices[0].id;
// 2. 截取设备屏幕截图
const screenshot = await adb_screenshot(deviceId);
// 3. 查找文本
const matches = await adb_find_text("Settings", deviceId);
// 4. 点击匹配位置
if (matches.length > 0) {
await adb_tap(matches[0].center_x, matches[0].center_y, deviceId);
}
// 5. 在输入框中输入内容
await adb_tap(500, 300, deviceId); // 聚焦输入框
await adb_type("Hello Android", deviceId);
await adb_press_key("KEYCODE_ENTER", deviceId);Operational Safety
操作安全提示
- Hands off: When the agent is clicking/typing, don't move your mouse or type. Real hardware inputs conflict with simulated ones.
- Focus matters: Ensure the target window is visible. If a popup steals focus, clicks may land in the wrong window.
- Prefer AX Dispatch on macOS: For native apps, use +
take_ax_snapshot/ax_clickto avoid moving the cursor and stealing focus.ax_set_value
- 请勿手动干预: 当Agent执行点击/输入操作时,请勿移动鼠标或手动输入,真实硬件输入会与模拟输入冲突。
- 窗口焦点: 确保目标窗口处于可见状态,若弹窗抢占焦点,点击操作可能会落在错误窗口中。
- macOS优先使用AX调度: 针对原生应用,使用+
take_ax_snapshot/ax_click组合,避免移动光标和抢占焦点。ax_set_value
Permissions (macOS)
权限要求(macOS)
The server requires:
- Accessibility: For input simulation and AX tree access
- Screen Recording: For screenshots
Grant both in System Settings → Privacy & Security → Accessibility and Screen Recording.
Without these, clicks silently fail and screenshots return black rectangles.
服务器需要以下权限:
- 无障碍权限: 用于输入模拟和AX树访问
- 屏幕录制权限: 用于截图
请在系统设置→隐私与安全性→无障碍和屏幕录制中授予权限。
若未授予权限,点击操作会静默失败,截图会返回黑色矩形。
Troubleshooting
故障排查
macOS: Clicks don't work
macOS:点击操作无效
- Cause: Missing Accessibility permission
- Fix: System Settings → Privacy & Security → Accessibility → enable the app
- 原因: 缺少无障碍权限
- 解决方法: 系统设置→隐私与安全性→无障碍→启用对应应用
macOS: Screenshots are black
macOS:截图显示黑色
- Cause: Missing Screen Recording permission
- Fix: System Settings → Privacy & Security → Screen Recording → enable the app
- 原因: 缺少屏幕录制权限
- 解决方法: 系统设置→隐私与安全性→屏幕录制→启用对应应用
CDP: Can't connect
CDP:无法连接
- Cause: App not launched with
--remote-debugging-port - Fix: Use with the correct args:
launch_apptypescriptawait launch_app("Google Chrome", ["--remote-debugging-port=9222"]);
- 原因: 应用未使用参数启动
--remote-debugging-port - 解决方法: 使用启动应用并添加正确参数:
launch_apptypescriptawait launch_app("Google Chrome", ["--remote-debugging-port=9222"]);
ADB: No devices found
ADB:未找到设备
- Cause: USB debugging not enabled or device not connected
- Fix:
- Enable USB debugging on Android device (Settings → Developer options)
- Connect via USB or Wi-Fi (, then
adb tcpip 5555)adb connect <ip>:5555 - Run to verify
adb devices
- 原因: 未启用USB调试或设备未连接
- 解决方法:
- 在Android设备上启用USB调试(设置→开发者选项)
- 通过USB或Wi-Fi连接(,然后
adb tcpip 5555)adb connect <ip>:5555 - 运行验证连接
adb devices
OCR finds nothing
OCR无法识别文本
- Cause: Text too small, low contrast, or obscured
- Workarounds:
- Use template matching instead (+
load_image)find_image - Use AX Dispatch on macOS ()
take_ax_snapshot - Use CDP for web content ()
cdp_find_elements
- Use template matching instead (
- 原因: 文本过小、对比度低或被遮挡
- 替代方案:
- 使用模板匹配替代(+
load_image)find_image - 在macOS上使用AX调度()
take_ax_snapshot - 针对Web内容使用CDP()
cdp_find_elements
- 使用模板匹配替代(
Windows: UI Automation elements missing
Windows:UI自动化元素缺失
- Cause: Some Qt/Electron apps don't expose UI Automation
- Workaround: Use visual approach (screenshots + OCR) or CDP for Electron
- 原因: 部分Qt/Electron应用未暴露UI自动化接口
- 替代方案: 使用视觉方式(截图+OCR)或针对Electron应用使用CDP
Real-World Examples
实际应用示例
Example 1: Automate System Settings on macOS
示例1:macOS系统设置自动化
typescript
// 1. Launch System Settings
await launch_app("System Settings");
// 2. Wait for window
const windows = await list_windows();
const settingsWindow = windows.find(w => w.app_name === "System Settings");
// 3. Take AX snapshot
const snapshot = await take_ax_snapshot(settingsWindow.id, true);
// 4. Find "Privacy & Security" in the sidebar (e.g., a12)
await ax_click("a12");
// 5. Take another snapshot to find the next element
const privacySnapshot = await take_ax_snapshot(settingsWindow.id, true);
// 6. Click "Screen Recording" (e.g., a25)
await ax_click("a25");typescript
// 1. 启动系统设置
await launch_app("System Settings");
// 2. 等待窗口加载
const windows = await list_windows();
const settingsWindow = windows.find(w => w.app_name === "System Settings");
// 3. 截取AX快照
const snapshot = await take_ax_snapshot(settingsWindow.id, true);
// 4. 在侧边栏中找到"隐私与安全性"(例如元素a12)
await ax_click("a12");
// 5. 再次截取快照查找下一个元素
const privacySnapshot = await take_ax_snapshot(settingsWindow.id, true);
// 6. 点击"屏幕录制"(例如元素a25)
await ax_click("a25");Example 2: Fill a Web Form with CDP
示例2:基于CDP的Web表单填充
typescript
// 1. Launch Chrome
await launch_app("Google Chrome", [
"--remote-debugging-port=9222",
"--new-window",
"https://example.com/contact"
]);
// 2. Connect CDP
await cdp_connect(9222);
// 3. Find form fields
const fields = await cdp_find_elements("contact form");
// Returns: [
// { uid: "d1", tag: "input", placeholder: "Name", ... },
// { uid: "d2", tag: "input", placeholder: "Email", ... },
// { uid: "d3", tag: "textarea", placeholder: "Message", ... }
// ]
// 4. Fill the form
await cdp_fill("d1", "John Doe");
await cdp_fill("d2", "john@example.com");
await cdp_fill("d3", "This is a test message.");
// 5. Submit
const submitBtn = await cdp_find_elements("Submit");
await cdp_click(submitBtn[0].uid);
// 6. Wait for confirmation
await cdp_wait_for(["Thank you"], null, 5000);typescript
// 1. 启动Chrome
await launch_app("Google Chrome", [
"--remote-debugging-port=9222",
"--new-window",
"https://example.com/contact"
]);
// 2. 建立CDP连接
await cdp_connect(9222);
// 3. 查找表单字段
const fields = await cdp_find_elements("contact form");
// 返回结果: [
// { uid: "d1", tag: "input", placeholder: "Name", ... },
// { uid: "d2", tag: "input", placeholder: "Email", ... },
// { uid: "d3", tag: "textarea", placeholder: "Message", ... }
// ]
// 4. 填充表单
await cdp_fill("d1", "John Doe");
await cdp_fill("d2", "john@example.com");
await cdp_fill("d3", "这是一条测试消息。");
// 5. 提交表单
const submitBtn = await cdp_find_elements("Submit");
await cdp_click(submitBtn[0].uid);
// 6. 等待确认信息
await cdp_wait_for(["Thank you"], null, 5000);Example 3: Android App Testing
示例3:Android应用测试
typescript
// 1. List devices
const devices = await adb_devices();
const device = devices[0].id;
// 2. Launch app
await adb_launch_app("com.example.app", device);
// 3. Wait and screenshot
await new Promise(resolve => setTimeout(resolve, 2000));
const screenshot = await adb_screenshot(device);
// 4. Find and tap "Sign In" button
const signInMatches = await adb_find_text("Sign In", device);
if (signInMatches.length > 0) {
await adb_tap(signInMatches[0].center_x, signInMatches[0].center_y, device);
}
// 5. Fill username
const usernameMatches = await adb_find_text("Username", device);
await adb_tap(usernameMatches[0].center_x, usernameMatches[0].center_y, device);
await adb_type(process.env.TEST_USERNAME, device);
// 6. Fill password
await adb_press_key("KEYCODE_TAB", device);
await adb_type(process.env.TEST_PASSWORD, device);
// 7. Submit
await adb_press_key("KEYCODE_ENTER", device);typescript
// 1. 列出已连接设备
const devices = await adb_devices();
const device = devices[0].id;
// 2. 启动应用
await adb_launch_app("com.example.app", device);
// 3. 等待应用加载并截图
await new Promise(resolve => setTimeout(resolve, 2000));
const screenshot = await adb_screenshot(device);
// 4. 查找并点击"登录"按钮
const signInMatches = await adb_find_text("Sign In", device);
if (signInMatches.length > 0) {
await adb_tap(signInMatches[0].center_x, signInMatches[0].center_y, device);
}
// 5. 填充用户名
const usernameMatches = await adb_find_text("Username", device);
await adb_tap(usernameMatches[0].center_x, usernameMatches[0].center_y, device);
await adb_type(process.env.TEST_USERNAME, device);
// 6. 填充密码
await adb_press_key("KEYCODE_TAB", device);
await adb_type(process.env.TEST_PASSWORD, device);
// 7. 提交登录
await adb_press_key("KEYCODE_ENTER", device);Example 4: Automate VS Code with CDP
示例4:基于CDP的VS Code自动化
typescript
// 1. Launch VS Code with debugging
await launch_app("Visual Studio Code", ["--remote-debugging-port=9224"]);
// 2. Connect
await cdp_connect(9224);
// 3. Open command palette
await cdp_press_key("Meta+Shift+P"); // Meta = Cmd on macOS, Win on Windows
// 4. Wait for palette
await cdp_wait_for(null, ".quick-input-widget", 2000);
// 5. Type command
await cdp_type(".quick-input-widget input", "File: Open File");
await cdp_press_key("Enter");
// 6. Navigate file picker with JS
await cdp_eval(`
document.querySelector('.monaco-inputbox input').value = '/path/to/file.js';
document.querySelector('.monaco-inputbox input').dispatchEvent(new Event('input'));
`);
await cdp_press_key("Enter");typescript
// 1. 启动开启调试模式的VS Code
await launch_app("Visual Studio Code", ["--remote-debugging-port=9224"]);
// 2. 建立CDP连接
await cdp_connect(9224);
// 3. 打开命令面板
await cdp_press_key("Meta+Shift+P"); // Meta在macOS上是Cmd键,Windows上是Win键
// 4. 等待命令面板加载
await cdp_wait_for(null, ".quick-input-widget", 2000);
// 5. 输入命令
await cdp_type(".quick-input-widget input", "File: Open File");
await cdp_press_key("Enter");
// 6. 通过JavaScript导航文件选择器
await cdp_eval(`
document.querySelector('.monaco-inputbox input').value = '/path/to/file.js';
document.querySelector('.monaco-inputbox input').dispatchEvent(new Event('input'));
`);
await cdp_press_key("Enter");Best Practices
最佳实践
- Always verify state: Take a screenshot or snapshot after an action to confirm it succeeded.
- Use the right tool for the job:
- Native macOS apps → AX Dispatch
- Web/Electron → CDP
- Custom/legacy UI → Visual (screenshots + OCR or template matching)
- Handle timing: Add or manual delays after navigation/clicks before the next action.
cdp_wait_for - Reference env vars for secrets: Never hardcode credentials in automation scripts.
- Use for debugging: Record a window's state over time to understand UI behavior.
record_window - Test permissions early: Run before writing automation scripts.
npx native-devtools-mcp setup
- 始终验证状态: 执行操作后截取截图或快照,确认操作成功。
- 选择合适的工具:
- macOS原生应用 → AX调度
- Web/Electron应用 → CDP
- 自定义/遗留UI → 视觉方式(截图+OCR或模板匹配)
- 处理时序问题: 导航/点击操作后添加或手动延迟,再执行下一步操作。
cdp_wait_for - 使用环境变量存储敏感信息: 切勿在自动化脚本中硬编码凭据。
- 使用调试: 录制窗口状态变化,分析UI行为。
record_window - 提前测试权限: 编写自动化脚本前,先运行配置权限。
npx native-devtools-mcp setup