open-computer-use-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

open-computer-use-automation

open-computer-use-automation

Skill by ara.so — Codex Skills collection.
ara.so开发的Skill——Codex Skills合集。

Overview

概述

open-computer-use
is an open-source Computer Use service wrapped as MCP (Model Context Protocol). It enables AI agents to interact with desktop applications through accessibility APIs on macOS, Linux, and Windows. Inspired by OpenAI's Codex Computer Use, it provides non-intrusive automation using native accessibility features.
The project exposes tools to:
  • List running applications
  • Get application UI state and elements
  • Click elements, buttons, and UI components
  • Type text and press keys
  • Take screenshots
  • Manage app focus and windows
open-computer-use
是一个封装为MCP(Model Context Protocol)的开源Computer Use服务。它允许AI Agent通过macOS、Linux和Windows上的辅助功能API与桌面应用交互。灵感来自OpenAI的Codex Computer Use,它使用原生辅助功能提供非侵入式自动化。
该项目提供以下工具:
  • 列出运行中的应用程序
  • 获取应用UI状态和元素
  • 点击元素、按钮和UI组件
  • 输入文本和按键
  • 截取屏幕截图
  • 管理应用焦点和窗口

Installation

安装

Global Installation

全局安装

bash
npm i -g open-computer-use
bash
npm i -g open-computer-use

macOS Permissions

macOS权限设置

On first run, grant Accessibility and Screen Recording permissions:
bash
open-computer-use
Follow the system prompts to enable permissions in System Settings.
首次运行时,授予辅助功能屏幕录制权限:
bash
open-computer-use
按照系统提示在“系统设置”中启用权限。

MCP Integration

MCP集成

Install into your AI agent:
bash
undefined
安装到你的AI Agent中:
bash
undefined

Codex

Codex

open-computer-use install-codex-mcp
open-computer-use install-codex-mcp

Claude Code

Claude Code

open-computer-use install-claude-mcp
open-computer-use install-claude-mcp

Gemini CLI (project scope)

Gemini CLI(项目范围)

open-computer-use install-gemini-mcp
open-computer-use install-gemini-mcp

Gemini CLI (user scope)

Gemini CLI(用户范围)

open-computer-use install-gemini-mcp --scope user
open-computer-use install-gemini-mcp --scope user

opencode

opencode

open-computer-use install-opencode-mcp
undefined
open-computer-use install-opencode-mcp
undefined

Manual MCP Configuration

手动MCP配置

Add to your MCP client config (e.g.,
~/.codex/config.toml
,
~/.claude.json
):
json
{
  "mcpServers": {
    "open-computer-use": {
      "command": "open-computer-use",
      "args": ["mcp"]
    }
  }
}
添加到你的MCP客户端配置文件中(例如
~/.codex/config.toml
~/.claude.json
):
json
{
  "mcpServers": {
    "open-computer-use": {
      "command": "open-computer-use",
      "args": ["mcp"]
    }
  }
}

Skill Installation

Skill安装

bash
undefined
bash
undefined

Install for Codex

为Codex安装

npx skills add iFurySt/open-codex-computer-use -g -a codex --skill open-computer-use -y
npx skills add iFurySt/open-codex-computer-use -g -a codex --skill open-computer-use -y

Install for Claude Code

为Claude Code安装

npx skills add iFurySt/open-codex-computer-use -g -a claude-code --skill open-computer-use -y
npx skills add iFurySt/open-codex-computer-use -g -a claude-code --skill open-computer-use -y

Update existing skill

更新已安装的Skill

npx skills update open-computer-use -g -y
npx skills update open-computer-use -g -y

List installed skills

列出已安装的Skill

npx skills ls -g -a codex | rg 'open-computer-use'
undefined
npx skills ls -g -a codex | rg 'open-computer-use'
undefined

Core Commands

核心命令

CLI Usage

CLI使用

bash
undefined
bash
undefined

Check permissions and system readiness

检查权限和系统就绪状态

open-computer-use doctor
open-computer-use doctor

Call a single tool (returns MCP JSON)

调用单个工具(返回MCP JSON)

open-computer-use call list_apps
open-computer-use call list_apps

Call with arguments

带参数调用

open-computer-use call get_app_state --args '{"app":"TextEdit"}'
open-computer-use call get_app_state --args '{"app":"TextEdit"}'

Run a sequence of operations (maintains element_index state)

运行一系列操作(保持element_index状态)

open-computer-use call --calls '[ {"tool":"get_app_state","args":{"app":"TextEdit"}}, {"tool":"press_key","args":{"app":"TextEdit","key":"Return"}} ]'
open-computer-use call --calls '[ {"tool":"get_app_state","args":{"app":"TextEdit"}}, {"tool":"press_key","args":{"app":"TextEdit","key":"Return"}} ]'

Run sequence from file with custom sleep between operations

从文件运行操作序列,并自定义操作间隔时间

open-computer-use call --calls-file sequence.json --sleep 0.5
open-computer-use call --calls-file sequence.json --sleep 0.5

Show help

显示帮助信息

open-computer-use -h
undefined
open-computer-use -h
undefined

Codex Plugin Installation

Codex插件安装

For Codex App (macOS):
bash
open-computer-use install-codex-plugin
针对Codex App(macOS):
bash
open-computer-use install-codex-plugin

MCP Tools Reference

MCP工具参考

list_apps

list_apps

List all running applications.
Arguments: None
Returns: Array of app names
json
{
  "apps": ["Safari", "TextEdit", "Terminal"]
}
列出所有运行中的应用程序。
参数:无
返回值:应用名称数组
json
{
  "apps": ["Safari", "TextEdit", "Terminal"]
}

get_app_state

get_app_state

Get the UI element tree for an application.
Arguments:
  • app
    (string, required): Application name
  • include_screenshot
    (boolean, optional): Include base64 screenshot
Returns: UI hierarchy with element metadata
json
{
  "app": "TextEdit",
  "elements": [
    {
      "element_index": 0,
      "role": "AXWindow",
      "title": "Untitled",
      "children": [...]
    }
  ],
  "screenshot": "data:image/png;base64,..."
}
获取应用程序的UI元素树。
参数:
  • app
    (字符串,必填):应用程序名称
  • include_screenshot
    (布尔值,可选):是否包含base64格式的截图
返回值:带元素元数据的UI层级结构
json
{
  "app": "TextEdit",
  "elements": [
    {
      "element_index": 0,
      "role": "AXWindow",
      "title": "Untitled",
      "children": [...]
    }
  ],
  "screenshot": "data:image/png;base64,..."
}

click_element

click_element

Click a UI element by index.
Arguments:
  • app
    (string, required): Application name
  • element_index
    (number, required): Element index from
    get_app_state
  • click_type
    (string, optional): "single" (default) or "double"
Returns: Success confirmation
json
{
  "success": true,
  "element_index": 5
}
通过索引点击UI元素。
参数:
  • app
    (字符串,必填):应用程序名称
  • element_index
    (数字,必填):来自
    get_app_state
    的元素索引
  • click_type
    (字符串,可选):"single"(默认)或"double"
返回值:操作成功确认
json
{
  "success": true,
  "element_index": 5
}

type_text

type_text

Type text into the focused field.
Arguments:
  • app
    (string, required): Application name
  • text
    (string, required): Text to type
Returns: Success confirmation
在焦点输入框中输入文本。
参数:
  • app
    (字符串,必填):应用程序名称
  • text
    (字符串,必填):要输入的文本
返回值:操作成功确认

press_key

press_key

Press a keyboard key or key combination.
Arguments:
  • app
    (string, required): Application name
  • key
    (string, required): Key name (e.g., "Return", "Tab", "Command+S")
Supported keys: Return, Tab, Space, Delete, Escape, Arrow keys, Command+[key], etc.
按下键盘按键或组合键。
参数:
  • app
    (字符串,必填):应用程序名称
  • key
    (字符串,必填):按键名称(例如 "Return"、"Tab"、"Command+S")
支持的按键: Return、Tab、Space、Delete、Escape、方向键、Command+[按键]等。

take_screenshot

take_screenshot

Capture the current screen.
Arguments:
  • app
    (string, optional): Application name to focus
Returns: Base64-encoded PNG
捕获当前屏幕。
参数:
  • app
    (字符串,可选):要聚焦的应用程序名称
返回值:Base64编码的PNG图片

activate_app

activate_app

Bring an application to the foreground.
Arguments:
  • app
    (string, required): Application name
将应用程序切换到前台。
参数:
  • app
    (字符串,必填):应用程序名称

Usage Patterns

使用模式

Basic App Interaction

基础应用交互

typescript
// From an MCP client or AI agent

// 1. List running apps
const apps = await call_tool("list_apps");

// 2. Get app UI state
const state = await call_tool("get_app_state", {
  app: "TextEdit",
  include_screenshot: true
});

// 3. Find and click a button
// (element_index 3 might be a "Save" button from state.elements)
await call_tool("click_element", {
  app: "TextEdit",
  element_index: 3
});

// 4. Type text
await call_tool("type_text", {
  app: "TextEdit",
  text: "Hello, world!"
});

// 5. Save with keyboard shortcut
await call_tool("press_key", {
  app: "TextEdit",
  key: "Command+S"
});
typescript
// 从MCP客户端或AI Agent中调用

// 1. 列出运行中的应用
const apps = await call_tool("list_apps");

// 2. 获取应用UI状态
const state = await call_tool("get_app_state", {
  app: "TextEdit",
  include_screenshot: true
});

// 3. 查找并点击按钮
// (element_index 3可能是state.elements中的“保存”按钮)
await call_tool("click_element", {
  app: "TextEdit",
  element_index: 3
});

// 4. 输入文本
await call_tool("type_text", {
  app: "TextEdit",
  text: "Hello, world!"
});

// 5. 使用键盘快捷键保存
await call_tool("press_key", {
  app: "TextEdit",
  key: "Command+S"
});

Sequence Execution

序列执行

Create a JSON sequence file
automation.json
:
json
[
  {
    "tool": "activate_app",
    "args": {"app": "TextEdit"}
  },
  {
    "tool": "get_app_state",
    "args": {"app": "TextEdit"}
  },
  {
    "tool": "type_text",
    "args": {
      "app": "TextEdit",
      "text": "This is automated text."
    }
  },
  {
    "tool": "press_key",
    "args": {
      "app": "TextEdit",
      "key": "Return"
    }
  },
  {
    "tool": "take_screenshot",
    "args": {"app": "TextEdit"}
  }
]
Run it:
bash
open-computer-use call --calls-file automation.json --sleep 1
创建JSON序列文件
automation.json
json
[
  {
    "tool": "activate_app",
    "args": {"app": "TextEdit"}
  },
  {
    "tool": "get_app_state",
    "args": {"app": "TextEdit"}
  },
  {
    "tool": "type_text",
    "args": {
      "app": "TextEdit",
      "text": "This is automated text."
    }
  },
  {
    "tool": "press_key",
    "args": {
      "app": "TextEdit",
      "key": "Return"
    }
  },
  {
    "tool": "take_screenshot",
    "args": {"app": "TextEdit"}
  }
]
运行该序列:
bash
open-computer-use call --calls-file automation.json --sleep 1

Finding Elements

查找元素

When you call
get_app_state
, inspect the returned elements to find the one you need:
json
{
  "elements": [
    {
      "element_index": 0,
      "role": "AXWindow",
      "title": "Document",
      "children": [
        {
          "element_index": 1,
          "role": "AXButton",
          "title": "Close",
          "enabled": true
        },
        {
          "element_index": 2,
          "role": "AXTextArea",
          "value": "Current text content"
        }
      ]
    }
  ]
}
Use
element_index
from this tree when calling
click_element
.
调用
get_app_state
后,检查返回的元素以找到所需元素:
json
{
  "elements": [
    {
      "element_index": 0,
      "role": "AXWindow",
      "title": "Document",
      "children": [
        {
          "element_index": 1,
          "role": "AXButton",
          "title": "Close",
          "enabled": true
        },
        {
          "element_index": 2,
          "role": "AXTextArea",
          "value": "Current text content"
        }
      ]
    }
  ]
}
调用
click_element
时使用此树中的
element_index

Cross-Platform Considerations

跨平台注意事项

  • macOS: Requires Accessibility and Screen Recording permissions
  • Linux: Uses AT-SPI (accessibility toolkit)
  • Windows: Uses UI Automation API
All platforms use the same MCP interface, but element roles and properties may differ slightly.
  • macOS:需要辅助功能和屏幕录制权限
  • Linux:使用AT-SPI(辅助功能工具包)
  • Windows:使用UI Automation API
所有平台使用相同的MCP接口,但元素角色和属性可能略有不同。

Configuration

配置

Environment Variables

环境变量

No environment variables required for basic operation. Permissions are handled at the OS level.
基础操作无需环境变量。权限由操作系统层面处理。

Custom Sleep Between Operations

自定义操作间隔时间

Default sleep is 1 second. Customize with
--sleep
:
bash
open-computer-use call --calls-file seq.json --sleep 0.5
默认间隔为1秒。使用
--sleep
参数自定义:
bash
open-computer-use call --calls-file seq.json --sleep 0.5

MCP Server Args

MCP服务器参数

When configuring MCP manually, you can pass custom args:
json
{
  "mcpServers": {
    "open-computer-use": {
      "command": "open-computer-use",
      "args": ["mcp"],
      "env": {}
    }
  }
}
手动配置MCP时,可以传递自定义参数:
json
{
  "mcpServers": {
    "open-computer-use": {
      "command": "open-computer-use",
      "args": ["mcp"],
      "env": {}
    }
  }
}

Troubleshooting

故障排除

Permission Denied (macOS)

权限被拒绝(macOS)

Symptom: Cannot access UI elements or take screenshots.
Solution:
  1. Run
    open-computer-use doctor
    to check permissions
  2. Grant Accessibility permission in System Settings → Privacy & Security
  3. Grant Screen Recording permission
  4. Restart the terminal or agent
症状: 无法访问UI元素或截取屏幕截图。
解决方案:
  1. 运行
    open-computer-use doctor
    检查权限
  2. 在“系统设置”→“隐私与安全性”中授予辅助功能权限
  3. 授予屏幕录制权限
  4. 重启终端或Agent

App Not Found

应用未找到

Symptom:
list_apps
doesn't show the target application.
Solution:
  • Ensure the app is running
  • Check exact app name (case-sensitive):
    open-computer-use call list_apps
  • Some apps use different process names (e.g., "Google Chrome" vs "Chrome")
症状:
list_apps
未显示目标应用程序。
解决方案:
  • 确保应用正在运行
  • 检查准确的应用名称(区分大小写):
    open-computer-use call list_apps
  • 某些应用使用不同的进程名称(例如“Google Chrome” vs “Chrome”)

Element Index Invalid

元素索引无效

Symptom:
click_element
fails with invalid index.
Solution:
  • Refresh app state with
    get_app_state
    before clicking
  • Element indices can change when UI updates
  • Use sequences to maintain state across operations
症状:
click_element
调用失败,提示索引无效。
解决方案:
  • 点击前使用
    get_app_state
    刷新应用状态
  • UI更新时元素索引可能会变化
  • 使用序列操作来保持跨操作的状态

MCP Server Not Starting

MCP服务器无法启动

Symptom: Agent can't connect to
open-computer-use
.
Solution:
bash
undefined
症状: Agent无法连接到
open-computer-use
解决方案:
bash
undefined

Verify installation

验证安装

which open-computer-use
which open-computer-use

Test manual MCP mode

测试手动MCP模式

open-computer-use mcp
open-computer-use mcp

Reinstall globally

重新全局安装

npm i -g open-computer-use
npm i -g open-computer-use

Check agent config file syntax

检查Agent配置文件语法

cat ~/.codex/config.toml # or relevant config
undefined
cat ~/.codex/config.toml # 或相关配置文件
undefined

Linux: AT-SPI Not Available

Linux:AT-SPI不可用

Symptom: Tools fail on Linux with accessibility errors.
Solution:
bash
undefined
症状: Linux上工具运行失败,出现辅助功能错误。
解决方案:
bash
undefined

Install AT-SPI dependencies (Ubuntu/Debian)

安装AT-SPI依赖(Ubuntu/Debian)

sudo apt-get install at-spi2-core
sudo apt-get install at-spi2-core

Enable accessibility

启用辅助功能

gsettings set org.gnome.desktop.interface toolkit-accessibility true
undefined
gsettings set org.gnome.desktop.interface toolkit-accessibility true
undefined

Advanced Usage

高级用法

Programmatic Integration (TypeScript)

程序化集成(TypeScript)

If building a custom MCP client or agent:
typescript
import { exec } from 'child_process';
import { promisify } from 'util';

const execAsync = promisify(exec);

async function automateApp(appName: string) {
  // Get app state
  const { stdout } = await execAsync(
    `open-computer-use call get_app_state --args '{"app":"${appName}"}'`
  );
  
  const state = JSON.parse(stdout);
  
  // Find button with specific title
  const button = state.elements
    .flatMap(e => e.children || [])
    .find(e => e.role === 'AXButton' && e.title === 'Submit');
  
  if (button) {
    // Click it
    await execAsync(
      `open-computer-use call click_element --args '{"app":"${appName}","element_index":${button.element_index}}'`
    );
  }
}

await automateApp('Safari');
如果构建自定义MCP客户端或Agent:
typescript
import { exec } from 'child_process';
import { promisify } from 'util';

const execAsync = promisify(exec);

async function automateApp(appName: string) {
  // 获取应用状态
  const { stdout } = await execAsync(
    `open-computer-use call get_app_state --args '{"app":"${appName}"}'`
  );
  
  const state = JSON.parse(stdout);
  
  // 查找特定标题的按钮
  const button = state.elements
    .flatMap(e => e.children || [])
    .find(e => e.role === 'AXButton' && e.title === 'Submit');
  
  if (button) {
    // 点击该按钮
    await execAsync(
      `open-computer-use call click_element --args '{"app":"${appName}","element_index":${button.element_index}}'`
    );
  }
}

await automateApp('Safari');

Custom Skill Integration

自定义Skill集成

When writing agent prompts or skills that use
open-computer-use
:
markdown
To interact with desktop apps:
1. Always list apps first to verify the target is running
2. Get app state to find element indices
3. Use element_index from state when clicking
4. Add small delays between operations (1s default)
5. Take screenshots to verify results

Example workflow:
- list_apps → verify "Safari" is running
- get_app_state(app="Safari") → find address bar element_index
- click_element(element_index=X) → focus address bar
- type_text(text="https://example.com") → enter URL
- press_key(key="Return") → navigate
编写使用
open-computer-use
的Agent提示词或Skill时:
markdown
与桌面应用交互的步骤:
1. 始终先列出应用,确认目标应用正在运行
2. 获取应用状态以找到元素索引
3. 点击时使用状态中的element_index
4. 在操作之间添加短暂延迟(默认1秒)
5. 截取屏幕截图以验证结果

示例工作流:
- list_apps → 确认"Safari"正在运行
- get_app_state(app="Safari") → 找到地址栏的element_index
- click_element(element_index=X) → 聚焦地址栏
- type_text(text="https://example.com") → 输入URL
- press_key(key="Return") → 导航到该地址

Related Tools

相关工具

  • Cursor Motion: Separate macOS app for smooth cursor animations (download from releases page)
  • open-browser-use: Companion project for browser-specific automation
  • Cursor Motion:用于平滑光标动画的独立macOS应用(从发布页面下载)
  • open-browser-use:针对浏览器自动化的配套项目

Resources

资源