open-computer-use-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseopen-computer-use-automation
open-computer-use-automation
Overview
概述
open-computer-useThe project exposes tools to:
- List running applications
- Get application UI state and elements
- Click elements, buttons, and UI components
- Type text and press keys
- Take screenshots
- Manage app focus and windows
open-computer-use该项目提供以下工具:
- 列出运行中的应用程序
- 获取应用UI状态和元素
- 点击元素、按钮和UI组件
- 输入文本和按键
- 截取屏幕截图
- 管理应用焦点和窗口
Installation
安装
Global Installation
全局安装
bash
npm i -g open-computer-usebash
npm i -g open-computer-usemacOS Permissions
macOS权限设置
On first run, grant Accessibility and Screen Recording permissions:
bash
open-computer-useFollow the system prompts to enable permissions in System Settings.
首次运行时,授予辅助功能和屏幕录制权限:
bash
open-computer-use按照系统提示在“系统设置”中启用权限。
MCP Integration
MCP集成
Install into your AI agent:
bash
undefined安装到你的AI Agent中:
bash
undefinedCodex
Codex
open-computer-use install-codex-mcp
open-computer-use install-codex-mcp
Claude Code
Claude Code
open-computer-use install-claude-mcp
open-computer-use install-claude-mcp
Gemini CLI (project scope)
Gemini CLI(项目范围)
open-computer-use install-gemini-mcp
open-computer-use install-gemini-mcp
Gemini CLI (user scope)
Gemini CLI(用户范围)
open-computer-use install-gemini-mcp --scope user
open-computer-use install-gemini-mcp --scope user
opencode
opencode
open-computer-use install-opencode-mcp
undefinedopen-computer-use install-opencode-mcp
undefinedManual MCP Configuration
手动MCP配置
Add to your MCP client config (e.g., , ):
~/.codex/config.toml~/.claude.jsonjson
{
"mcpServers": {
"open-computer-use": {
"command": "open-computer-use",
"args": ["mcp"]
}
}
}添加到你的MCP客户端配置文件中(例如 、):
~/.codex/config.toml~/.claude.jsonjson
{
"mcpServers": {
"open-computer-use": {
"command": "open-computer-use",
"args": ["mcp"]
}
}
}Skill Installation
Skill安装
bash
undefinedbash
undefinedInstall for Codex
为Codex安装
npx skills add iFurySt/open-codex-computer-use -g -a codex --skill open-computer-use -y
npx skills add iFurySt/open-codex-computer-use -g -a codex --skill open-computer-use -y
Install for Claude Code
为Claude Code安装
npx skills add iFurySt/open-codex-computer-use -g -a claude-code --skill open-computer-use -y
npx skills add iFurySt/open-codex-computer-use -g -a claude-code --skill open-computer-use -y
Update existing skill
更新已安装的Skill
npx skills update open-computer-use -g -y
npx skills update open-computer-use -g -y
List installed skills
列出已安装的Skill
npx skills ls -g -a codex | rg 'open-computer-use'
undefinednpx skills ls -g -a codex | rg 'open-computer-use'
undefinedCore Commands
核心命令
CLI Usage
CLI使用
bash
undefinedbash
undefinedCheck permissions and system readiness
检查权限和系统就绪状态
open-computer-use doctor
open-computer-use doctor
Call a single tool (returns MCP JSON)
调用单个工具(返回MCP JSON)
open-computer-use call list_apps
open-computer-use call list_apps
Call with arguments
带参数调用
open-computer-use call get_app_state --args '{"app":"TextEdit"}'
open-computer-use call get_app_state --args '{"app":"TextEdit"}'
Run a sequence of operations (maintains element_index state)
运行一系列操作(保持element_index状态)
open-computer-use call --calls '[
{"tool":"get_app_state","args":{"app":"TextEdit"}},
{"tool":"press_key","args":{"app":"TextEdit","key":"Return"}}
]'
open-computer-use call --calls '[
{"tool":"get_app_state","args":{"app":"TextEdit"}},
{"tool":"press_key","args":{"app":"TextEdit","key":"Return"}}
]'
Run sequence from file with custom sleep between operations
从文件运行操作序列,并自定义操作间隔时间
open-computer-use call --calls-file sequence.json --sleep 0.5
open-computer-use call --calls-file sequence.json --sleep 0.5
Show help
显示帮助信息
open-computer-use -h
undefinedopen-computer-use -h
undefinedCodex Plugin Installation
Codex插件安装
For Codex App (macOS):
bash
open-computer-use install-codex-plugin针对Codex App(macOS):
bash
open-computer-use install-codex-pluginMCP Tools Reference
MCP工具参考
list_apps
list_apps
List all running applications.
Arguments: None
Returns: Array of app names
json
{
"apps": ["Safari", "TextEdit", "Terminal"]
}列出所有运行中的应用程序。
参数:无
返回值:应用名称数组
json
{
"apps": ["Safari", "TextEdit", "Terminal"]
}get_app_state
get_app_state
Get the UI element tree for an application.
Arguments:
- (string, required): Application name
app - (boolean, optional): Include base64 screenshot
include_screenshot
Returns: UI hierarchy with element metadata
json
{
"app": "TextEdit",
"elements": [
{
"element_index": 0,
"role": "AXWindow",
"title": "Untitled",
"children": [...]
}
],
"screenshot": "data:image/png;base64,..."
}获取应用程序的UI元素树。
参数:
- (字符串,必填):应用程序名称
app - (布尔值,可选):是否包含base64格式的截图
include_screenshot
返回值:带元素元数据的UI层级结构
json
{
"app": "TextEdit",
"elements": [
{
"element_index": 0,
"role": "AXWindow",
"title": "Untitled",
"children": [...]
}
],
"screenshot": "data:image/png;base64,..."
}click_element
click_element
Click a UI element by index.
Arguments:
- (string, required): Application name
app - (number, required): Element index from
element_indexget_app_state - (string, optional): "single" (default) or "double"
click_type
Returns: Success confirmation
json
{
"success": true,
"element_index": 5
}通过索引点击UI元素。
参数:
- (字符串,必填):应用程序名称
app - (数字,必填):来自
element_index的元素索引get_app_state - (字符串,可选):"single"(默认)或"double"
click_type
返回值:操作成功确认
json
{
"success": true,
"element_index": 5
}type_text
type_text
Type text into the focused field.
Arguments:
- (string, required): Application name
app - (string, required): Text to type
text
Returns: Success confirmation
在焦点输入框中输入文本。
参数:
- (字符串,必填):应用程序名称
app - (字符串,必填):要输入的文本
text
返回值:操作成功确认
press_key
press_key
Press a keyboard key or key combination.
Arguments:
- (string, required): Application name
app - (string, required): Key name (e.g., "Return", "Tab", "Command+S")
key
Supported keys: Return, Tab, Space, Delete, Escape, Arrow keys, Command+[key], etc.
按下键盘按键或组合键。
参数:
- (字符串,必填):应用程序名称
app - (字符串,必填):按键名称(例如 "Return"、"Tab"、"Command+S")
key
支持的按键: Return、Tab、Space、Delete、Escape、方向键、Command+[按键]等。
take_screenshot
take_screenshot
Capture the current screen.
Arguments:
- (string, optional): Application name to focus
app
Returns: Base64-encoded PNG
捕获当前屏幕。
参数:
- (字符串,可选):要聚焦的应用程序名称
app
返回值:Base64编码的PNG图片
activate_app
activate_app
Bring an application to the foreground.
Arguments:
- (string, required): Application name
app
将应用程序切换到前台。
参数:
- (字符串,必填):应用程序名称
app
Usage Patterns
使用模式
Basic App Interaction
基础应用交互
typescript
// From an MCP client or AI agent
// 1. List running apps
const apps = await call_tool("list_apps");
// 2. Get app UI state
const state = await call_tool("get_app_state", {
app: "TextEdit",
include_screenshot: true
});
// 3. Find and click a button
// (element_index 3 might be a "Save" button from state.elements)
await call_tool("click_element", {
app: "TextEdit",
element_index: 3
});
// 4. Type text
await call_tool("type_text", {
app: "TextEdit",
text: "Hello, world!"
});
// 5. Save with keyboard shortcut
await call_tool("press_key", {
app: "TextEdit",
key: "Command+S"
});typescript
// 从MCP客户端或AI Agent中调用
// 1. 列出运行中的应用
const apps = await call_tool("list_apps");
// 2. 获取应用UI状态
const state = await call_tool("get_app_state", {
app: "TextEdit",
include_screenshot: true
});
// 3. 查找并点击按钮
// (element_index 3可能是state.elements中的“保存”按钮)
await call_tool("click_element", {
app: "TextEdit",
element_index: 3
});
// 4. 输入文本
await call_tool("type_text", {
app: "TextEdit",
text: "Hello, world!"
});
// 5. 使用键盘快捷键保存
await call_tool("press_key", {
app: "TextEdit",
key: "Command+S"
});Sequence Execution
序列执行
Create a JSON sequence file :
automation.jsonjson
[
{
"tool": "activate_app",
"args": {"app": "TextEdit"}
},
{
"tool": "get_app_state",
"args": {"app": "TextEdit"}
},
{
"tool": "type_text",
"args": {
"app": "TextEdit",
"text": "This is automated text."
}
},
{
"tool": "press_key",
"args": {
"app": "TextEdit",
"key": "Return"
}
},
{
"tool": "take_screenshot",
"args": {"app": "TextEdit"}
}
]Run it:
bash
open-computer-use call --calls-file automation.json --sleep 1创建JSON序列文件:
automation.jsonjson
[
{
"tool": "activate_app",
"args": {"app": "TextEdit"}
},
{
"tool": "get_app_state",
"args": {"app": "TextEdit"}
},
{
"tool": "type_text",
"args": {
"app": "TextEdit",
"text": "This is automated text."
}
},
{
"tool": "press_key",
"args": {
"app": "TextEdit",
"key": "Return"
}
},
{
"tool": "take_screenshot",
"args": {"app": "TextEdit"}
}
]运行该序列:
bash
open-computer-use call --calls-file automation.json --sleep 1Finding Elements
查找元素
When you call , inspect the returned elements to find the one you need:
get_app_statejson
{
"elements": [
{
"element_index": 0,
"role": "AXWindow",
"title": "Document",
"children": [
{
"element_index": 1,
"role": "AXButton",
"title": "Close",
"enabled": true
},
{
"element_index": 2,
"role": "AXTextArea",
"value": "Current text content"
}
]
}
]
}Use from this tree when calling .
element_indexclick_element调用后,检查返回的元素以找到所需元素:
get_app_statejson
{
"elements": [
{
"element_index": 0,
"role": "AXWindow",
"title": "Document",
"children": [
{
"element_index": 1,
"role": "AXButton",
"title": "Close",
"enabled": true
},
{
"element_index": 2,
"role": "AXTextArea",
"value": "Current text content"
}
]
}
]
}调用时使用此树中的。
click_elementelement_indexCross-Platform Considerations
跨平台注意事项
- macOS: Requires Accessibility and Screen Recording permissions
- Linux: Uses AT-SPI (accessibility toolkit)
- Windows: Uses UI Automation API
All platforms use the same MCP interface, but element roles and properties may differ slightly.
- macOS:需要辅助功能和屏幕录制权限
- Linux:使用AT-SPI(辅助功能工具包)
- Windows:使用UI Automation API
所有平台使用相同的MCP接口,但元素角色和属性可能略有不同。
Configuration
配置
Environment Variables
环境变量
No environment variables required for basic operation. Permissions are handled at the OS level.
基础操作无需环境变量。权限由操作系统层面处理。
Custom Sleep Between Operations
自定义操作间隔时间
Default sleep is 1 second. Customize with :
--sleepbash
open-computer-use call --calls-file seq.json --sleep 0.5默认间隔为1秒。使用参数自定义:
--sleepbash
open-computer-use call --calls-file seq.json --sleep 0.5MCP Server Args
MCP服务器参数
When configuring MCP manually, you can pass custom args:
json
{
"mcpServers": {
"open-computer-use": {
"command": "open-computer-use",
"args": ["mcp"],
"env": {}
}
}
}手动配置MCP时,可以传递自定义参数:
json
{
"mcpServers": {
"open-computer-use": {
"command": "open-computer-use",
"args": ["mcp"],
"env": {}
}
}
}Troubleshooting
故障排除
Permission Denied (macOS)
权限被拒绝(macOS)
Symptom: Cannot access UI elements or take screenshots.
Solution:
- Run to check permissions
open-computer-use doctor - Grant Accessibility permission in System Settings → Privacy & Security
- Grant Screen Recording permission
- Restart the terminal or agent
症状: 无法访问UI元素或截取屏幕截图。
解决方案:
- 运行检查权限
open-computer-use doctor - 在“系统设置”→“隐私与安全性”中授予辅助功能权限
- 授予屏幕录制权限
- 重启终端或Agent
App Not Found
应用未找到
Symptom: doesn't show the target application.
list_appsSolution:
- Ensure the app is running
- Check exact app name (case-sensitive):
open-computer-use call list_apps - Some apps use different process names (e.g., "Google Chrome" vs "Chrome")
症状: 未显示目标应用程序。
list_apps解决方案:
- 确保应用正在运行
- 检查准确的应用名称(区分大小写):
open-computer-use call list_apps - 某些应用使用不同的进程名称(例如“Google Chrome” vs “Chrome”)
Element Index Invalid
元素索引无效
Symptom: fails with invalid index.
click_elementSolution:
- Refresh app state with before clicking
get_app_state - Element indices can change when UI updates
- Use sequences to maintain state across operations
症状: 调用失败,提示索引无效。
click_element解决方案:
- 点击前使用刷新应用状态
get_app_state - UI更新时元素索引可能会变化
- 使用序列操作来保持跨操作的状态
MCP Server Not Starting
MCP服务器无法启动
Symptom: Agent can't connect to .
open-computer-useSolution:
bash
undefined症状: Agent无法连接到。
open-computer-use解决方案:
bash
undefinedVerify installation
验证安装
which open-computer-use
which open-computer-use
Test manual MCP mode
测试手动MCP模式
open-computer-use mcp
open-computer-use mcp
Reinstall globally
重新全局安装
npm i -g open-computer-use
npm i -g open-computer-use
Check agent config file syntax
检查Agent配置文件语法
cat ~/.codex/config.toml # or relevant config
undefinedcat ~/.codex/config.toml # 或相关配置文件
undefinedLinux: AT-SPI Not Available
Linux:AT-SPI不可用
Symptom: Tools fail on Linux with accessibility errors.
Solution:
bash
undefined症状: Linux上工具运行失败,出现辅助功能错误。
解决方案:
bash
undefinedInstall AT-SPI dependencies (Ubuntu/Debian)
安装AT-SPI依赖(Ubuntu/Debian)
sudo apt-get install at-spi2-core
sudo apt-get install at-spi2-core
Enable accessibility
启用辅助功能
gsettings set org.gnome.desktop.interface toolkit-accessibility true
undefinedgsettings set org.gnome.desktop.interface toolkit-accessibility true
undefinedAdvanced Usage
高级用法
Programmatic Integration (TypeScript)
程序化集成(TypeScript)
If building a custom MCP client or agent:
typescript
import { exec } from 'child_process';
import { promisify } from 'util';
const execAsync = promisify(exec);
async function automateApp(appName: string) {
// Get app state
const { stdout } = await execAsync(
`open-computer-use call get_app_state --args '{"app":"${appName}"}'`
);
const state = JSON.parse(stdout);
// Find button with specific title
const button = state.elements
.flatMap(e => e.children || [])
.find(e => e.role === 'AXButton' && e.title === 'Submit');
if (button) {
// Click it
await execAsync(
`open-computer-use call click_element --args '{"app":"${appName}","element_index":${button.element_index}}'`
);
}
}
await automateApp('Safari');如果构建自定义MCP客户端或Agent:
typescript
import { exec } from 'child_process';
import { promisify } from 'util';
const execAsync = promisify(exec);
async function automateApp(appName: string) {
// 获取应用状态
const { stdout } = await execAsync(
`open-computer-use call get_app_state --args '{"app":"${appName}"}'`
);
const state = JSON.parse(stdout);
// 查找特定标题的按钮
const button = state.elements
.flatMap(e => e.children || [])
.find(e => e.role === 'AXButton' && e.title === 'Submit');
if (button) {
// 点击该按钮
await execAsync(
`open-computer-use call click_element --args '{"app":"${appName}","element_index":${button.element_index}}'`
);
}
}
await automateApp('Safari');Custom Skill Integration
自定义Skill集成
When writing agent prompts or skills that use :
open-computer-usemarkdown
To interact with desktop apps:
1. Always list apps first to verify the target is running
2. Get app state to find element indices
3. Use element_index from state when clicking
4. Add small delays between operations (1s default)
5. Take screenshots to verify results
Example workflow:
- list_apps → verify "Safari" is running
- get_app_state(app="Safari") → find address bar element_index
- click_element(element_index=X) → focus address bar
- type_text(text="https://example.com") → enter URL
- press_key(key="Return") → navigate编写使用的Agent提示词或Skill时:
open-computer-usemarkdown
与桌面应用交互的步骤:
1. 始终先列出应用,确认目标应用正在运行
2. 获取应用状态以找到元素索引
3. 点击时使用状态中的element_index
4. 在操作之间添加短暂延迟(默认1秒)
5. 截取屏幕截图以验证结果
示例工作流:
- list_apps → 确认"Safari"正在运行
- get_app_state(app="Safari") → 找到地址栏的element_index
- click_element(element_index=X) → 聚焦地址栏
- type_text(text="https://example.com") → 输入URL
- press_key(key="Return") → 导航到该地址Related Tools
相关工具
- Cursor Motion: Separate macOS app for smooth cursor animations (download from releases page)
- open-browser-use: Companion project for browser-specific automation
- Cursor Motion:用于平滑光标动画的独立macOS应用(从发布页面下载)
- open-browser-use:针对浏览器自动化的配套项目