open-computer-use-automation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

open-computer-use-automation

Skill by ara.so — Codex Skills collection.

由ara.so开发的Skill——Codex Skills合集。

Overview

概述

open-computer-use

is an open-source Computer Use service wrapped as MCP (Model Context Protocol). It enables AI agents to interact with desktop applications through accessibility APIs on macOS, Linux, and Windows. Inspired by OpenAI's Codex Computer Use, it provides non-intrusive automation using native accessibility features.

The project exposes tools to:

List running applications
Get application UI state and elements
Click elements, buttons, and UI components
Type text and press keys
Take screenshots
Manage app focus and windows

open-computer-use

是一个封装为MCP（Model Context Protocol）的开源Computer Use服务。它允许AI Agent通过macOS、Linux和Windows上的辅助功能API与桌面应用交互。灵感来自OpenAI的Codex Computer Use，它使用原生辅助功能提供非侵入式自动化。

该项目提供以下工具：

列出运行中的应用程序
获取应用UI状态和元素
点击元素、按钮和UI组件
输入文本和按键
截取屏幕截图
管理应用焦点和窗口

Installation

安装

Global Installation

全局安装

bash

npm i -g open-computer-use

bash

npm i -g open-computer-use

macOS Permissions

macOS权限设置

On first run, grant Accessibility and Screen Recording permissions:

bash

open-computer-use

Follow the system prompts to enable permissions in System Settings.

首次运行时，授予辅助功能和屏幕录制权限：

bash

open-computer-use

按照系统提示在“系统设置”中启用权限。

MCP Integration

MCP集成

Install into your AI agent:

bash

undefined

安装到你的AI Agent中：

bash

undefined

Codex

open-computer-use install-codex-mcp

Claude Code

open-computer-use install-claude-mcp

Gemini CLI (project scope)

Gemini CLI（项目范围）

open-computer-use install-gemini-mcp

Gemini CLI (user scope)

Gemini CLI（用户范围）

open-computer-use install-gemini-mcp --scope user

opencode

open-computer-use install-opencode-mcp

undefined

open-computer-use install-opencode-mcp

undefined

Manual MCP Configuration

手动MCP配置

Add to your MCP client config (e.g.,

~/.codex/config.toml

~/.claude.json

json

{
  "mcpServers": {
    "open-computer-use": {
      "command": "open-computer-use",
      "args": ["mcp"]
    }
  }
}

添加到你的MCP客户端配置文件中（例如

~/.codex/config.toml

、

~/.claude.json

）：

json

{
  "mcpServers": {
    "open-computer-use": {
      "command": "open-computer-use",
      "args": ["mcp"]
    }
  }
}

Skill Installation

Skill安装

bash

undefined

bash

undefined

Install for Codex

为Codex安装

npx skills add iFurySt/open-codex-computer-use -g -a codex --skill open-computer-use -y

Install for Claude Code

为Claude Code安装

npx skills add iFurySt/open-codex-computer-use -g -a claude-code --skill open-computer-use -y

Update existing skill

更新已安装的Skill

npx skills update open-computer-use -g -y

List installed skills

列出已安装的Skill

npx skills ls -g -a codex | rg 'open-computer-use'

undefined

npx skills ls -g -a codex | rg 'open-computer-use'

undefined

Core Commands

核心命令

CLI Usage

CLI使用

bash

undefined

bash

undefined

Check permissions and system readiness

检查权限和系统就绪状态

open-computer-use doctor

Call a single tool (returns MCP JSON)

调用单个工具（返回MCP JSON）

open-computer-use call list_apps

Call with arguments

带参数调用

open-computer-use call get_app_state --args '{"app":"TextEdit"}'

Run a sequence of operations (maintains element_index state)

运行一系列操作（保持element_index状态）

open-computer-use call --calls '[ {"tool":"get_app_state","args":{"app":"TextEdit"}}, {"tool":"press_key","args":{"app":"TextEdit","key":"Return"}} ]'

Run sequence from file with custom sleep between operations

从文件运行操作序列，并自定义操作间隔时间

open-computer-use call --calls-file sequence.json --sleep 0.5

Show help

显示帮助信息

open-computer-use -h

undefined

open-computer-use -h

undefined

Codex Plugin Installation

Codex插件安装

For Codex App (macOS):

bash

open-computer-use install-codex-plugin

针对Codex App（macOS）：

bash

open-computer-use install-codex-plugin

MCP Tools Reference

MCP工具参考

list_apps

List all running applications.

Arguments: None

Returns: Array of app names

json

{
  "apps": ["Safari", "TextEdit", "Terminal"]
}

列出所有运行中的应用程序。

参数：无

返回值：应用名称数组

json

{
  "apps": ["Safari", "TextEdit", "Terminal"]
}

get_app_state

Get the UI element tree for an application.

Arguments:

```
app
```
(string, required): Application name
```
include_screenshot
```
(boolean, optional): Include base64 screenshot

Returns: UI hierarchy with element metadata

json

{
  "app": "TextEdit",
  "elements": [
    {
      "element_index": 0,
      "role": "AXWindow",
      "title": "Untitled",
      "children": [...]
    }
  ],
  "screenshot": "data:image/png;base64,..."
}

获取应用程序的UI元素树。

参数：

```
app
```
（字符串，必填）：应用程序名称
```
include_screenshot
```
（布尔值，可选）：是否包含base64格式的截图

返回值：带元素元数据的UI层级结构

json

{
  "app": "TextEdit",
  "elements": [
    {
      "element_index": 0,
      "role": "AXWindow",
      "title": "Untitled",
      "children": [...]
    }
  ],
  "screenshot": "data:image/png;base64,..."
}

click_element

Click a UI element by index.

Arguments:

```
app
```
(string, required): Application name
```
element_index
```
(number, required): Element index from
```
get_app_state
```
```
click_type
```
(string, optional): "single" (default) or "double"

Returns: Success confirmation

json

{
  "success": true,
  "element_index": 5
}

通过索引点击UI元素。

参数：

```
app
```
（字符串，必填）：应用程序名称
```
element_index
```
（数字，必填）：来自
```
get_app_state
```
的元素索引
```
click_type
```
（字符串，可选）："single"（默认）或"double"

返回值：操作成功确认

json

{
  "success": true,
  "element_index": 5
}

type_text

Type text into the focused field.

Arguments:

```
app
```
(string, required): Application name
```
text
```
(string, required): Text to type

Returns: Success confirmation

在焦点输入框中输入文本。

参数：

```
app
```
（字符串，必填）：应用程序名称
```
text
```
（字符串，必填）：要输入的文本

返回值：操作成功确认

press_key

Press a keyboard key or key combination.

Arguments:

```
app
```
(string, required): Application name
```
key
```
(string, required): Key name (e.g., "Return", "Tab", "Command+S")

Supported keys: Return, Tab, Space, Delete, Escape, Arrow keys, Command+[key], etc.

按下键盘按键或组合键。

参数：

```
app
```
（字符串，必填）：应用程序名称
```
key
```
（字符串，必填）：按键名称（例如 "Return"、"Tab"、"Command+S"）

支持的按键： Return、Tab、Space、Delete、Escape、方向键、Command+[按键]等。

take_screenshot

Capture the current screen.

Arguments:

```
app
```
(string, optional): Application name to focus

Returns: Base64-encoded PNG

捕获当前屏幕。

参数：

```
app
```
（字符串，可选）：要聚焦的应用程序名称

返回值：Base64编码的PNG图片

activate_app

Bring an application to the foreground.

Arguments:

```
app
```
(string, required): Application name

将应用程序切换到前台。

参数：

```
app
```
（字符串，必填）：应用程序名称

Usage Patterns

使用模式

Basic App Interaction

基础应用交互

typescript

// From an MCP client or AI agent

// 1. List running apps
const apps = await call_tool("list_apps");

// 2. Get app UI state
const state = await call_tool("get_app_state", {
  app: "TextEdit",
  include_screenshot: true
});

// 3. Find and click a button
// (element_index 3 might be a "Save" button from state.elements)
await call_tool("click_element", {
  app: "TextEdit",
  element_index: 3
});

// 4. Type text
await call_tool("type_text", {
  app: "TextEdit",
  text: "Hello, world!"
});

// 5. Save with keyboard shortcut
await call_tool("press_key", {
  app: "TextEdit",
  key: "Command+S"
});

typescript

// 从MCP客户端或AI Agent中调用

// 1. 列出运行中的应用
const apps = await call_tool("list_apps");

// 2. 获取应用UI状态
const state = await call_tool("get_app_state", {
  app: "TextEdit",
  include_screenshot: true
});

// 3. 查找并点击按钮
// （element_index 3可能是state.elements中的“保存”按钮）
await call_tool("click_element", {
  app: "TextEdit",
  element_index: 3
});

// 4. 输入文本
await call_tool("type_text", {
  app: "TextEdit",
  text: "Hello, world!"
});

// 5. 使用键盘快捷键保存
await call_tool("press_key", {
  app: "TextEdit",
  key: "Command+S"
});

Sequence Execution

序列执行

Create a JSON sequence file

automation.json

json

[
  {
    "tool": "activate_app",
    "args": {"app": "TextEdit"}
  },
  {
    "tool": "get_app_state",
    "args": {"app": "TextEdit"}
  },
  {
    "tool": "type_text",
    "args": {
      "app": "TextEdit",
      "text": "This is automated text."
    }
  },
  {
    "tool": "press_key",
    "args": {
      "app": "TextEdit",
      "key": "Return"
    }
  },
  {
    "tool": "take_screenshot",
    "args": {"app": "TextEdit"}
  }
]

Run it:

bash

open-computer-use call --calls-file automation.json --sleep 1

创建JSON序列文件

automation.json

：

json

[
  {
    "tool": "activate_app",
    "args": {"app": "TextEdit"}
  },
  {
    "tool": "get_app_state",
    "args": {"app": "TextEdit"}
  },
  {
    "tool": "type_text",
    "args": {
      "app": "TextEdit",
      "text": "This is automated text."
    }
  },
  {
    "tool": "press_key",
    "args": {
      "app": "TextEdit",
      "key": "Return"
    }
  },
  {
    "tool": "take_screenshot",
    "args": {"app": "TextEdit"}
  }
]

运行该序列：

bash

open-computer-use call --calls-file automation.json --sleep 1

Finding Elements

查找元素

When you call

get_app_state

, inspect the returned elements to find the one you need:

json

{
  "elements": [
    {
      "element_index": 0,
      "role": "AXWindow",
      "title": "Document",
      "children": [
        {
          "element_index": 1,
          "role": "AXButton",
          "title": "Close",
          "enabled": true
        },
        {
          "element_index": 2,
          "role": "AXTextArea",
          "value": "Current text content"
        }
      ]
    }
  ]
}

Use

element_index

from this tree when calling

click_element

调用

get_app_state

后，检查返回的元素以找到所需元素：

json

{
  "elements": [
    {
      "element_index": 0,
      "role": "AXWindow",
      "title": "Document",
      "children": [
        {
          "element_index": 1,
          "role": "AXButton",
          "title": "Close",
          "enabled": true
        },
        {
          "element_index": 2,
          "role": "AXTextArea",
          "value": "Current text content"
        }
      ]
    }
  ]
}

调用

click_element

时使用此树中的

element_index

。

Cross-Platform Considerations

跨平台注意事项

macOS: Requires Accessibility and Screen Recording permissions
Linux: Uses AT-SPI (accessibility toolkit)
Windows: Uses UI Automation API

All platforms use the same MCP interface, but element roles and properties may differ slightly.

macOS：需要辅助功能和屏幕录制权限
Linux：使用AT-SPI（辅助功能工具包）
Windows：使用UI Automation API

所有平台使用相同的MCP接口，但元素角色和属性可能略有不同。

Configuration

配置

Environment Variables

环境变量

No environment variables required for basic operation. Permissions are handled at the OS level.

基础操作无需环境变量。权限由操作系统层面处理。

Custom Sleep Between Operations

自定义操作间隔时间

Default sleep is 1 second. Customize with

--sleep

bash

open-computer-use call --calls-file seq.json --sleep 0.5

默认间隔为1秒。使用

--sleep

参数自定义：

bash

open-computer-use call --calls-file seq.json --sleep 0.5

MCP Server Args

MCP服务器参数

When configuring MCP manually, you can pass custom args:

json

{
  "mcpServers": {
    "open-computer-use": {
      "command": "open-computer-use",
      "args": ["mcp"],
      "env": {}
    }
  }
}

手动配置MCP时，可以传递自定义参数：

json

{
  "mcpServers": {
    "open-computer-use": {
      "command": "open-computer-use",
      "args": ["mcp"],
      "env": {}
    }
  }
}

Troubleshooting

故障排除

Permission Denied (macOS)

权限被拒绝（macOS）

Symptom: Cannot access UI elements or take screenshots.

Solution:

Run
```
open-computer-use doctor
```
to check permissions
Grant Accessibility permission in System Settings → Privacy & Security
Grant Screen Recording permission
Restart the terminal or agent

症状： 无法访问UI元素或截取屏幕截图。

解决方案：

运行
```
open-computer-use doctor
```
检查权限
在“系统设置”→“隐私与安全性”中授予辅助功能权限
授予屏幕录制权限
重启终端或Agent

App Not Found

应用未找到

Symptom:

list_apps

doesn't show the target application.

Solution:

Ensure the app is running
Check exact app name (case-sensitive):
```
open-computer-use call list_apps
```
Some apps use different process names (e.g., "Google Chrome" vs "Chrome")

症状：

list_apps

未显示目标应用程序。

解决方案：

确保应用正在运行
检查准确的应用名称（区分大小写）：
```
open-computer-use call list_apps
```
某些应用使用不同的进程名称（例如“Google Chrome” vs “Chrome”）

Element Index Invalid

元素索引无效

Symptom:

click_element

fails with invalid index.

Solution:

Refresh app state with
```
get_app_state
```
before clicking
Element indices can change when UI updates
Use sequences to maintain state across operations

症状：

click_element

调用失败，提示索引无效。

解决方案：

点击前使用
```
get_app_state
```
刷新应用状态
UI更新时元素索引可能会变化
使用序列操作来保持跨操作的状态

MCP Server Not Starting

MCP服务器无法启动

Symptom: Agent can't connect to

open-computer-use

Solution:

bash

undefined

症状： Agent无法连接到

open-computer-use

。

解决方案：

bash

undefined

Verify installation

验证安装

which open-computer-use

Test manual MCP mode

测试手动MCP模式

open-computer-use mcp

Reinstall globally

重新全局安装

npm i -g open-computer-use

Check agent config file syntax

检查Agent配置文件语法

cat ~/.codex/config.toml # or relevant config

undefined

cat ~/.codex/config.toml # 或相关配置文件

undefined

Linux: AT-SPI Not Available

Linux：AT-SPI不可用

Symptom: Tools fail on Linux with accessibility errors.

Solution:

bash

undefined

症状： Linux上工具运行失败，出现辅助功能错误。

解决方案：

bash

undefined

Install AT-SPI dependencies (Ubuntu/Debian)

安装AT-SPI依赖（Ubuntu/Debian）

sudo apt-get install at-spi2-core

Enable accessibility

启用辅助功能

gsettings set org.gnome.desktop.interface toolkit-accessibility true

undefined

gsettings set org.gnome.desktop.interface toolkit-accessibility true

undefined

Advanced Usage

高级用法

Programmatic Integration (TypeScript)

程序化集成（TypeScript）

If building a custom MCP client or agent:

typescript

import { exec } from 'child_process';
import { promisify } from 'util';

const execAsync = promisify(exec);

async function automateApp(appName: string) {
  // Get app state
  const { stdout } = await execAsync(
    `open-computer-use call get_app_state --args '{"app":"${appName}"}'`
  );
  
  const state = JSON.parse(stdout);
  
  // Find button with specific title
  const button = state.elements
    .flatMap(e => e.children || [])
    .find(e => e.role === 'AXButton' && e.title === 'Submit');
  
  if (button) {
    // Click it
    await execAsync(
      `open-computer-use call click_element --args '{"app":"${appName}","element_index":${button.element_index}}'`
    );
  }
}

await automateApp('Safari');

如果构建自定义MCP客户端或Agent：

typescript

import { exec } from 'child_process';
import { promisify } from 'util';

const execAsync = promisify(exec);

async function automateApp(appName: string) {
  // 获取应用状态
  const { stdout } = await execAsync(
    `open-computer-use call get_app_state --args '{"app":"${appName}"}'`
  );
  
  const state = JSON.parse(stdout);
  
  // 查找特定标题的按钮
  const button = state.elements
    .flatMap(e => e.children || [])
    .find(e => e.role === 'AXButton' && e.title === 'Submit');
  
  if (button) {
    // 点击该按钮
    await execAsync(
      `open-computer-use call click_element --args '{"app":"${appName}","element_index":${button.element_index}}'`
    );
  }
}

await automateApp('Safari');

Custom Skill Integration

自定义Skill集成

When writing agent prompts or skills that use

open-computer-use

markdown

To interact with desktop apps:
1. Always list apps first to verify the target is running
2. Get app state to find element indices
3. Use element_index from state when clicking
4. Add small delays between operations (1s default)
5. Take screenshots to verify results

Example workflow:
- list_apps → verify "Safari" is running
- get_app_state(app="Safari") → find address bar element_index
- click_element(element_index=X) → focus address bar
- type_text(text="https://example.com") → enter URL
- press_key(key="Return") → navigate

编写使用

open-computer-use

的Agent提示词或Skill时：

markdown

与桌面应用交互的步骤：
1. 始终先列出应用，确认目标应用正在运行
2. 获取应用状态以找到元素索引
3. 点击时使用状态中的element_index
4. 在操作之间添加短暂延迟（默认1秒）
5. 截取屏幕截图以验证结果

示例工作流：
- list_apps → 确认"Safari"正在运行
- get_app_state(app="Safari") → 找到地址栏的element_index
- click_element(element_index=X) → 聚焦地址栏
- type_text(text="https://example.com") → 输入URL
- press_key(key="Return") → 导航到该地址