usecomputer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseusecomputer
usecomputer
Desktop automation CLI for AI agents. Works on macOS, Linux (X11), and
Windows. Takes screenshots, clicks, types, scrolls, drags using native
platform APIs through a Zig binary — no Node.js required at runtime.
面向AI Agent的桌面自动化CLI工具。支持macOS、Linux(X11)和Windows系统。通过Zig二进制文件调用原生平台API实现截图、点击、输入、滚动、拖拽操作——运行时无需Node.js。
Always start with --help
始终先查看--help
Always run before using this tool. The help output
is the source of truth for all commands, options, and examples. Never guess
command syntax — check help first.
usecomputer --helpWhen running help commands, read the full untruncated output. Never pipe
help through , , or — you will miss critical options.
headtailsedbash
usecomputer --help
usecomputer screenshot --help
usecomputer click --help
usecomputer drag --help**使用此工具前,请务必运行。**帮助输出是所有命令、选项和示例的权威来源。切勿猜测命令语法——先查看帮助信息。
usecomputer --help运行帮助命令时,请阅读完整的未截断输出。切勿通过、或管道处理帮助内容——你会错过关键选项。
headtailsedbash
usecomputer --help
usecomputer screenshot --help
usecomputer click --help
usecomputer drag --helpInstall
安装
bash
npm install -g usecomputerRequirements:
- macOS — Accessibility permission enabled for your terminal app
- Linux — X11 session with set (Wayland via XWayland works too)
DISPLAY - Windows — run in an interactive desktop session
bash
npm install -g usecomputer要求:
- macOS — 为终端应用启用辅助功能权限
- Linux — 设置了的X11会话(通过XWayland运行的Wayland也适用)
DISPLAY - Windows — 在交互式桌面会话中运行
Core loop: screenshot -> act -> screenshot
核心循环:截图 -> 执行操作 -> 截图
Every computer use session follows a feedback loop:
screenshot -> send to model -> model returns action -> execute action -> screenshot again
^ |
|________________________________________________________________________|- Take a screenshot with
usecomputer screenshot --json - Send the screenshot image to the model
- Model returns coordinates or an action (click, type, press, scroll)
- Execute the action, passing the exact from step 1
--coord-map - Take a fresh screenshot and go back to step 2
每次计算机使用会话都遵循以下反馈循环:
截图 -> 发送至模型 -> 模型返回操作指令 -> 执行操作 -> 再次截图
^ |
|________________________________________________________________________|- 使用截取屏幕
usecomputer screenshot --json - 将截图图像发送至模型
- 模型返回坐标或操作指令(点击、输入、按键、滚动)
- 执行操作,传入步骤1中**精确的**参数
--coord-map - 截取新的截图,回到步骤2
Full cycle example
完整流程示例
bash
undefinedbash
undefined1. take screenshot (always use --json to get coordMap)
1. 截取屏幕(务必使用--json获取coordMap)
usecomputer screenshot ./tmp/screen.png --json
usecomputer screenshot ./tmp/screen.png --json
output: {"path":"./tmp/screen.png","coordMap":"0,0,3440,1440,1568,657",...}
输出:{"path":"./tmp/screen.png","coordMap":"0,0,3440,1440,1568,657",...}
2. send ./tmp/screen.png to the model
2. 将./tmp/screen.png发送至模型
3. model says: "click the Save button at x=740 y=320"
3. 模型返回:"点击位于x=740 y=320的保存按钮"
4. click using the coord-map from the screenshot output
4. 使用截图输出中的coord-map执行点击
usecomputer click -x 740 -y 320 --coord-map "0,0,3440,1440,1568,657"
usecomputer click -x 740 -y 320 --coord-map "0,0,3440,1440,1568,657"
5. take a fresh screenshot to see what happened
5. 截取新的截图查看操作结果
usecomputer screenshot ./tmp/screen.png --json
usecomputer screenshot ./tmp/screen.png --json
... repeat
... 重复上述步骤
**Never skip `--coord-map`.** Screenshots are scaled (longest edge <= 1568px).
The coord-map maps screenshot-space pixels back to real desktop coordinates.
Without it, clicks land in wrong positions.
**Always take a fresh screenshot after each action.** The UI changes after
every click, scroll, or keystroke — menus open, pages scroll, dialogs appear.
Never reuse a stale screenshot.
**切勿省略`--coord-map`**。截图会被缩放(最长边≤1568px)。coord-map用于将截图空间的像素映射回真实桌面坐标。没有它,点击会定位到错误位置。
**每次操作后务必截取新的截图**。每次点击、滚动或按键后UI都会变化——菜单打开、页面滚动、对话框弹出。切勿重复使用过时的截图。Window-scoped screenshots
窗口范围截图
Full-desktop screenshots include everything — dock, menu bar, background
windows. For better accuracy, capture only the target application window.
This produces a smaller, more focused image the model can reason about.
全屏截图包含所有内容—— Dock、菜单栏、后台窗口。为提高准确性,可仅捕获目标应用窗口。这样生成的图像更小、更聚焦,便于模型推理。
Step 1: find the window ID
步骤1:查找窗口ID
bash
usecomputer window list --jsonThis returns an array of visible windows with their , ,
, position, and size. Find the window you want to target.
idownerNametitlebash
usecomputer window list --json此命令返回可见窗口的数组,包含窗口的、、、位置和尺寸。找到你要定位的窗口。
idownerNametitleStep 2: screenshot that window
步骤2:截取该窗口
bash
usecomputer screenshot ./tmp/app.png --window 12345 --jsonbash
usecomputer screenshot ./tmp/app.png --window 12345 --jsonoutput: {"path":"./tmp/app.png","coordMap":"200,100,1200,800,1568,1045",...}
输出:{"path":"./tmp/app.png","coordMap":"200,100,1200,800,1568,1045",...}
The coord-map in the output is scoped to that window's region on screen.
输出中的coord-map仅限于该窗口在屏幕上的区域。Step 3: act using the coord-map
步骤3:使用coord-map执行操作
bash
undefinedbash
undefinedmodel analyzes ./tmp/app.png and says click at x=400 y=220
模型分析./tmp/app.png后返回点击x=400 y=220的位置
usecomputer click -x 400 -y 220 --coord-map "200,100,1200,800,1568,1045"
The coord-map handles the translation from the window screenshot's pixel
space back to the correct desktop coordinates. The click lands on the
right spot even though the screenshot only showed one window.usecomputer click -x 400 -y 220 --coord-map "200,100,1200,800,1568,1045"
coord-map负责将窗口截图的像素空间转换回正确的桌面坐标。即使截图只显示了一个窗口,点击也会定位到正确的位置。Region screenshots
区域截图
You can also capture an arbitrary rectangle of the screen:
bash
usecomputer screenshot ./tmp/region.png --region "100,100,800,600" --jsonThe coord-map works the same way — pass it to subsequent pointer commands.
你也可以捕获屏幕上任意矩形区域:
bash
usecomputer screenshot ./tmp/region.png --region "100,100,800,600" --jsoncoord-map的工作方式相同——将其传入后续的指针命令即可。
Coord-map explained
coord-map说明
The coord-map is 6 comma-separated values emitted by every screenshot:
captureX,captureY,captureWidth,captureHeight,imageWidth,imageHeight- captureX, captureY — top-left corner of the captured region in desktop coordinates
- captureWidth, captureHeight — size of the captured region in desktop pixels
- imageWidth, imageHeight — size of the output PNG (after scaling)
When you pass to , , , or ,
the command maps your screenshot-space x,y coordinates back to the real
desktop position using these values.
--coord-mapclickhoverdragmouse movecoord-map是每次截图输出的6个逗号分隔值:
captureX,captureY,captureWidth,captureHeight,imageWidth,imageHeight- captureX, captureY — 捕获区域在桌面坐标中的左上角
- captureWidth, captureHeight — 捕获区域在桌面像素中的尺寸
- imageWidth, imageHeight — 输出PNG的尺寸(缩放后)
当你将传入、、或命令时,该命令会使用这些值将截图空间的x,y坐标映射回真实桌面位置。
--coord-mapclickhoverdragmouse moveValidating coordinates with debug-point
使用debug-point验证坐标
Before clicking, you can validate where the click would land:
bash
usecomputer debug-point -x 400 -y 220 --coord-map "0,0,1600,900,1568,882"This captures a screenshot and draws a red marker at the mapped coordinate.
Send the output image back to the model so it can see if the target is
correct and adjust if needed.
点击前,你可以验证点击位置:
bash
usecomputer debug-point -x 400 -y 220 --coord-map "0,0,1600,900,1568,882"此命令会截取屏幕并在映射后的坐标处绘制红色标记。将输出图像发送回模型,以便模型确认目标是否正确并进行必要调整。
Quick examples
快速示例
bash
undefinedbash
undefinedscreenshot the primary display
截取主显示器屏幕
usecomputer screenshot ./tmp/screen.png --json
usecomputer screenshot ./tmp/screen.png --json
screenshot a specific display (0-indexed)
截取指定显示器(从0开始索引)
usecomputer screenshot ./tmp/screen.png --display 1 --json
usecomputer screenshot ./tmp/screen.png --display 1 --json
click at screenshot coordinates
在截图坐标处点击
usecomputer click -x 600 -y 400 --coord-map "0,0,1600,900,1568,882"
usecomputer click -x 600 -y 400 --coord-map "0,0,1600,900,1568,882"
right-click
右键点击
usecomputer click -x 600 -y 400 --button right --coord-map "..."
usecomputer click -x 600 -y 400 --button right --coord-map "..."
double-click
双击
usecomputer click -x 600 -y 400 --count 2 --coord-map "..."
usecomputer click -x 600 -y 400 --count 2 --coord-map "..."
click with modifier keys held
按住修饰键点击
usecomputer click -x 600 -y 400 --modifier option --coord-map "..."
usecomputer click -x 600 -y 400 --modifier cmd --modifier shift --coord-map "..."
usecomputer click -x 600 -y 400 --modifier option --coord-map "..."
usecomputer click -x 600 -y 400 --modifier cmd --modifier shift --coord-map "..."
type text
输入文本
usecomputer type "hello from usecomputer"
usecomputer type "hello from usecomputer"
type long text from stdin
从标准输入输入长文本
cat ./notes.txt | usecomputer type --stdin --chunk-size 4000 --chunk-delay 15
cat ./notes.txt | usecomputer type --stdin --chunk-size 4000 --chunk-delay 15
press a key
按键
usecomputer press "enter"
usecomputer press "enter"
press a shortcut
按下快捷键
usecomputer press "cmd+s"
usecomputer press "cmd+shift+p"
usecomputer press "cmd+s"
usecomputer press "cmd+shift+p"
press with repeat
重复按键
usecomputer press "down" --count 10 --delay 30
usecomputer press "down" --count 10 --delay 30
scroll
滚动
usecomputer scroll down 5
usecomputer scroll up 3
usecomputer scroll down 5 --at "400,300"
usecomputer scroll down 5
usecomputer scroll up 3
usecomputer scroll down 5 --at "400,300"
drag (straight line)
拖拽(直线)
usecomputer drag 100,200 500,600
usecomputer drag 100,200 500,600
drag (curved path with bezier control point)
拖拽(带贝塞尔控制点的曲线路径)
usecomputer drag 100,200 500,600 300,50
usecomputer drag 100,200 500,600 300,50
drag with coord-map
使用coord-map拖拽
usecomputer drag 100,200 500,600 --coord-map "..."
usecomputer drag 100,200 500,600 --coord-map "..."
mouse position
获取鼠标位置
usecomputer mouse position --json
usecomputer mouse position --json
list displays
列出显示器
usecomputer display list --json
usecomputer display list --json
list windows
列出窗口
usecomputer window list --json
usecomputer window list --json
list desktops with windows
列出包含窗口的桌面
usecomputer desktop list --windows --json
undefinedusecomputer desktop list --windows --json
undefinedSystem prompt tips for accurate clicking
实现精准点击的系统提示技巧
When using GPT-5.4 or Claude for computer use, keep the system prompt short
and task-focused. Verbose system prompts reduce click accuracy.
GPT-5.4: Use on screenshot inputs. This is the
single most important setting for click accuracy. Avoid or
.
detail: "original"detail: "high"detail: "low"Claude: Use the tool type with and
matching the screenshot dimensions from the coord-map
output.
computer_20251124display_width_pxdisplay_height_pxGeneral rules:
- Take a fresh screenshot after every action
- Always pass the coord-map from the screenshot the model analyzed
- If clicks land in wrong spots, use to diagnose
debug-point - If the model returns coordinates outside screenshot dimensions, re-send the screenshot and remind it of the image size
使用GPT-5.4或Claude进行计算机操作时,系统提示应简洁且聚焦任务。冗长的系统提示会降低点击准确性。
GPT-5.4: 在截图输入中使用。这是提高点击准确性最重要的设置。避免使用或。
detail: "original"detail: "high"detail: "low"Claude: 使用工具类型,设置和与coord-map输出中的截图尺寸匹配。
computer_20251124display_width_pxdisplay_height_px通用规则:
- 每次操作后截取新的截图
- 始终传入模型分析的截图对应的coord-map
- 如果点击位置错误,使用诊断问题
debug-point - 如果模型返回的坐标超出截图尺寸,重新发送截图并提醒模型图像尺寸
Troubleshooting
故障排除
-
Clicks land in wrong position — you probably forgot, or you are passing a coord-map from a different screenshot than the one the model analyzed. Always use the coord-map from the most recent screenshot.
--coord-map -
Retina displays — usecomputer handles scaling internally via coord-map. Do not try to manually account for display scaling.
-
Stale screenshots — the most common source of bugs. Always take a fresh screenshot after each action. The UI changes constantly.
-
Permission errors on macOS — enable Accessibility permission for your terminal app in System Settings > Privacy & Security > Accessibility.
-
X11 errors on Linux — ensureis set. For XWayland, screenshot falls back to XGetImage automatically if XShm fails.
DISPLAY
-
点击位置错误 — 你可能忘记添加,或者传入的coord-map来自与模型分析的截图不同的截图。务必使用最新截图的coord-map。
--coord-map -
Retina显示器 — usecomputer通过coord-map内部处理缩放。请勿手动调整显示缩放。
-
过时截图 — 这是最常见的错误来源。每次操作后务必截取新的截图。UI会不断变化。
-
macOS权限错误 — 在系统设置 > 隐私与安全性 > 辅助功能中为终端应用启用辅助功能权限。
-
Linux X11错误 — 确保已设置。对于XWayland,如果XShm失败,截图会自动回退到XGetImage。
DISPLAY