usecomputer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

usecomputer

usecomputer

Desktop automation CLI for AI agents. Works on macOS, Linux (X11), and Windows. Takes screenshots, clicks, types, scrolls, drags using native platform APIs through a Zig binary — no Node.js required at runtime.
面向AI Agent的桌面自动化CLI工具。支持macOS、Linux(X11)和Windows系统。通过Zig二进制文件调用原生平台API实现截图、点击、输入、滚动、拖拽操作——运行时无需Node.js。

Always start with --help

始终先查看--help

Always run
usecomputer --help
before using this tool.
The help output is the source of truth for all commands, options, and examples. Never guess command syntax — check help first.
When running help commands, read the full untruncated output. Never pipe help through
head
,
tail
, or
sed
— you will miss critical options.
bash
usecomputer --help
usecomputer screenshot --help
usecomputer click --help
usecomputer drag --help
**使用此工具前,请务必运行
usecomputer --help
。**帮助输出是所有命令、选项和示例的权威来源。切勿猜测命令语法——先查看帮助信息。
运行帮助命令时,请阅读完整的未截断输出。切勿通过
head
tail
sed
管道处理帮助内容——你会错过关键选项。
bash
usecomputer --help
usecomputer screenshot --help
usecomputer click --help
usecomputer drag --help

Install

安装

bash
npm install -g usecomputer
Requirements:
  • macOS — Accessibility permission enabled for your terminal app
  • Linux — X11 session with
    DISPLAY
    set (Wayland via XWayland works too)
  • Windows — run in an interactive desktop session
bash
npm install -g usecomputer
要求:
  • macOS — 为终端应用启用辅助功能权限
  • Linux — 设置了
    DISPLAY
    的X11会话(通过XWayland运行的Wayland也适用)
  • Windows — 在交互式桌面会话中运行

Core loop: screenshot -> act -> screenshot

核心循环:截图 -> 执行操作 -> 截图

Every computer use session follows a feedback loop:
screenshot -> send to model -> model returns action -> execute action -> screenshot again
     ^                                                                        |
     |________________________________________________________________________|
  1. Take a screenshot with
    usecomputer screenshot --json
  2. Send the screenshot image to the model
  3. Model returns coordinates or an action (click, type, press, scroll)
  4. Execute the action, passing the exact
    --coord-map
    from step 1
  5. Take a fresh screenshot and go back to step 2
每次计算机使用会话都遵循以下反馈循环:
截图 -> 发送至模型 -> 模型返回操作指令 -> 执行操作 -> 再次截图
     ^                                                                        |
     |________________________________________________________________________|
  1. 使用
    usecomputer screenshot --json
    截取屏幕
  2. 将截图图像发送至模型
  3. 模型返回坐标或操作指令(点击、输入、按键、滚动)
  4. 执行操作,传入步骤1中**精确的
    --coord-map
    **参数
  5. 截取新的截图,回到步骤2

Full cycle example

完整流程示例

bash
undefined
bash
undefined

1. take screenshot (always use --json to get coordMap)

1. 截取屏幕(务必使用--json获取coordMap)

usecomputer screenshot ./tmp/screen.png --json
usecomputer screenshot ./tmp/screen.png --json

output: {"path":"./tmp/screen.png","coordMap":"0,0,3440,1440,1568,657",...}

输出:{"path":"./tmp/screen.png","coordMap":"0,0,3440,1440,1568,657",...}

2. send ./tmp/screen.png to the model

2. 将./tmp/screen.png发送至模型

3. model says: "click the Save button at x=740 y=320"

3. 模型返回:"点击位于x=740 y=320的保存按钮"

4. click using the coord-map from the screenshot output

4. 使用截图输出中的coord-map执行点击

usecomputer click -x 740 -y 320 --coord-map "0,0,3440,1440,1568,657"
usecomputer click -x 740 -y 320 --coord-map "0,0,3440,1440,1568,657"

5. take a fresh screenshot to see what happened

5. 截取新的截图查看操作结果

usecomputer screenshot ./tmp/screen.png --json
usecomputer screenshot ./tmp/screen.png --json

... repeat

... 重复上述步骤


**Never skip `--coord-map`.** Screenshots are scaled (longest edge <= 1568px).
The coord-map maps screenshot-space pixels back to real desktop coordinates.
Without it, clicks land in wrong positions.

**Always take a fresh screenshot after each action.** The UI changes after
every click, scroll, or keystroke — menus open, pages scroll, dialogs appear.
Never reuse a stale screenshot.

**切勿省略`--coord-map`**。截图会被缩放(最长边≤1568px)。coord-map用于将截图空间的像素映射回真实桌面坐标。没有它,点击会定位到错误位置。

**每次操作后务必截取新的截图**。每次点击、滚动或按键后UI都会变化——菜单打开、页面滚动、对话框弹出。切勿重复使用过时的截图。

Window-scoped screenshots

窗口范围截图

Full-desktop screenshots include everything — dock, menu bar, background windows. For better accuracy, capture only the target application window. This produces a smaller, more focused image the model can reason about.
全屏截图包含所有内容—— Dock、菜单栏、后台窗口。为提高准确性,可仅捕获目标应用窗口。这样生成的图像更小、更聚焦,便于模型推理。

Step 1: find the window ID

步骤1:查找窗口ID

bash
usecomputer window list --json
This returns an array of visible windows with their
id
,
ownerName
,
title
, position, and size. Find the window you want to target.
bash
usecomputer window list --json
此命令返回可见窗口的数组,包含窗口的
id
ownerName
title
、位置和尺寸。找到你要定位的窗口。

Step 2: screenshot that window

步骤2:截取该窗口

bash
usecomputer screenshot ./tmp/app.png --window 12345 --json
bash
usecomputer screenshot ./tmp/app.png --window 12345 --json

output: {"path":"./tmp/app.png","coordMap":"200,100,1200,800,1568,1045",...}

输出:{"path":"./tmp/app.png","coordMap":"200,100,1200,800,1568,1045",...}


The coord-map in the output is scoped to that window's region on screen.

输出中的coord-map仅限于该窗口在屏幕上的区域。

Step 3: act using the coord-map

步骤3:使用coord-map执行操作

bash
undefined
bash
undefined

model analyzes ./tmp/app.png and says click at x=400 y=220

模型分析./tmp/app.png后返回点击x=400 y=220的位置

usecomputer click -x 400 -y 220 --coord-map "200,100,1200,800,1568,1045"

The coord-map handles the translation from the window screenshot's pixel
space back to the correct desktop coordinates. The click lands on the
right spot even though the screenshot only showed one window.
usecomputer click -x 400 -y 220 --coord-map "200,100,1200,800,1568,1045"

coord-map负责将窗口截图的像素空间转换回正确的桌面坐标。即使截图只显示了一个窗口,点击也会定位到正确的位置。

Region screenshots

区域截图

You can also capture an arbitrary rectangle of the screen:
bash
usecomputer screenshot ./tmp/region.png --region "100,100,800,600" --json
The coord-map works the same way — pass it to subsequent pointer commands.
你也可以捕获屏幕上任意矩形区域:
bash
usecomputer screenshot ./tmp/region.png --region "100,100,800,600" --json
coord-map的工作方式相同——将其传入后续的指针命令即可。

Coord-map explained

coord-map说明

The coord-map is 6 comma-separated values emitted by every screenshot:
captureX,captureY,captureWidth,captureHeight,imageWidth,imageHeight
  • captureX, captureY — top-left corner of the captured region in desktop coordinates
  • captureWidth, captureHeight — size of the captured region in desktop pixels
  • imageWidth, imageHeight — size of the output PNG (after scaling)
When you pass
--coord-map
to
click
,
hover
,
drag
, or
mouse move
, the command maps your screenshot-space x,y coordinates back to the real desktop position using these values.
coord-map是每次截图输出的6个逗号分隔值:
captureX,captureY,captureWidth,captureHeight,imageWidth,imageHeight
  • captureX, captureY — 捕获区域在桌面坐标中的左上角
  • captureWidth, captureHeight — 捕获区域在桌面像素中的尺寸
  • imageWidth, imageHeight — 输出PNG的尺寸(缩放后)
当你将
--coord-map
传入
click
hover
drag
mouse move
命令时,该命令会使用这些值将截图空间的x,y坐标映射回真实桌面位置。

Validating coordinates with debug-point

使用debug-point验证坐标

Before clicking, you can validate where the click would land:
bash
usecomputer debug-point -x 400 -y 220 --coord-map "0,0,1600,900,1568,882"
This captures a screenshot and draws a red marker at the mapped coordinate. Send the output image back to the model so it can see if the target is correct and adjust if needed.
点击前,你可以验证点击位置:
bash
usecomputer debug-point -x 400 -y 220 --coord-map "0,0,1600,900,1568,882"
此命令会截取屏幕并在映射后的坐标处绘制红色标记。将输出图像发送回模型,以便模型确认目标是否正确并进行必要调整。

Quick examples

快速示例

bash
undefined
bash
undefined

screenshot the primary display

截取主显示器屏幕

usecomputer screenshot ./tmp/screen.png --json
usecomputer screenshot ./tmp/screen.png --json

screenshot a specific display (0-indexed)

截取指定显示器(从0开始索引)

usecomputer screenshot ./tmp/screen.png --display 1 --json
usecomputer screenshot ./tmp/screen.png --display 1 --json

click at screenshot coordinates

在截图坐标处点击

usecomputer click -x 600 -y 400 --coord-map "0,0,1600,900,1568,882"
usecomputer click -x 600 -y 400 --coord-map "0,0,1600,900,1568,882"

right-click

右键点击

usecomputer click -x 600 -y 400 --button right --coord-map "..."
usecomputer click -x 600 -y 400 --button right --coord-map "..."

double-click

双击

usecomputer click -x 600 -y 400 --count 2 --coord-map "..."
usecomputer click -x 600 -y 400 --count 2 --coord-map "..."

click with modifier keys held

按住修饰键点击

usecomputer click -x 600 -y 400 --modifier option --coord-map "..." usecomputer click -x 600 -y 400 --modifier cmd --modifier shift --coord-map "..."
usecomputer click -x 600 -y 400 --modifier option --coord-map "..." usecomputer click -x 600 -y 400 --modifier cmd --modifier shift --coord-map "..."

type text

输入文本

usecomputer type "hello from usecomputer"
usecomputer type "hello from usecomputer"

type long text from stdin

从标准输入输入长文本

cat ./notes.txt | usecomputer type --stdin --chunk-size 4000 --chunk-delay 15
cat ./notes.txt | usecomputer type --stdin --chunk-size 4000 --chunk-delay 15

press a key

按键

usecomputer press "enter"
usecomputer press "enter"

press a shortcut

按下快捷键

usecomputer press "cmd+s" usecomputer press "cmd+shift+p"
usecomputer press "cmd+s" usecomputer press "cmd+shift+p"

press with repeat

重复按键

usecomputer press "down" --count 10 --delay 30
usecomputer press "down" --count 10 --delay 30

scroll

滚动

usecomputer scroll down 5 usecomputer scroll up 3 usecomputer scroll down 5 --at "400,300"
usecomputer scroll down 5 usecomputer scroll up 3 usecomputer scroll down 5 --at "400,300"

drag (straight line)

拖拽(直线)

usecomputer drag 100,200 500,600
usecomputer drag 100,200 500,600

drag (curved path with bezier control point)

拖拽(带贝塞尔控制点的曲线路径)

usecomputer drag 100,200 500,600 300,50
usecomputer drag 100,200 500,600 300,50

drag with coord-map

使用coord-map拖拽

usecomputer drag 100,200 500,600 --coord-map "..."
usecomputer drag 100,200 500,600 --coord-map "..."

mouse position

获取鼠标位置

usecomputer mouse position --json
usecomputer mouse position --json

list displays

列出显示器

usecomputer display list --json
usecomputer display list --json

list windows

列出窗口

usecomputer window list --json
usecomputer window list --json

list desktops with windows

列出包含窗口的桌面

usecomputer desktop list --windows --json
undefined
usecomputer desktop list --windows --json
undefined

System prompt tips for accurate clicking

实现精准点击的系统提示技巧

When using GPT-5.4 or Claude for computer use, keep the system prompt short and task-focused. Verbose system prompts reduce click accuracy.
GPT-5.4: Use
detail: "original"
on screenshot inputs. This is the single most important setting for click accuracy. Avoid
detail: "high"
or
detail: "low"
.
Claude: Use the
computer_20251124
tool type with
display_width_px
and
display_height_px
matching the screenshot dimensions from the coord-map output.
General rules:
  • Take a fresh screenshot after every action
  • Always pass the coord-map from the screenshot the model analyzed
  • If clicks land in wrong spots, use
    debug-point
    to diagnose
  • If the model returns coordinates outside screenshot dimensions, re-send the screenshot and remind it of the image size
使用GPT-5.4或Claude进行计算机操作时,系统提示应简洁且聚焦任务。冗长的系统提示会降低点击准确性。
GPT-5.4: 在截图输入中使用
detail: "original"
。这是提高点击准确性最重要的设置。避免使用
detail: "high"
detail: "low"
Claude: 使用
computer_20251124
工具类型,设置
display_width_px
display_height_px
与coord-map输出中的截图尺寸匹配。
通用规则:
  • 每次操作后截取新的截图
  • 始终传入模型分析的截图对应的coord-map
  • 如果点击位置错误,使用
    debug-point
    诊断问题
  • 如果模型返回的坐标超出截图尺寸,重新发送截图并提醒模型图像尺寸

Troubleshooting

故障排除

  1. Clicks land in wrong position — you probably forgot
    --coord-map
    , or you are passing a coord-map from a different screenshot than the one the model analyzed. Always use the coord-map from the most recent screenshot.
  2. Retina displays — usecomputer handles scaling internally via coord-map. Do not try to manually account for display scaling.
  3. Stale screenshots — the most common source of bugs. Always take a fresh screenshot after each action. The UI changes constantly.
  4. Permission errors on macOS — enable Accessibility permission for your terminal app in System Settings > Privacy & Security > Accessibility.
  5. X11 errors on Linux — ensure
    DISPLAY
    is set. For XWayland, screenshot falls back to XGetImage automatically if XShm fails.
  1. 点击位置错误 — 你可能忘记添加
    --coord-map
    ,或者传入的coord-map来自与模型分析的截图不同的截图。务必使用最新截图的coord-map。
  2. Retina显示器 — usecomputer通过coord-map内部处理缩放。请勿手动调整显示缩放。
  3. 过时截图 — 这是最常见的错误来源。每次操作后务必截取新的截图。UI会不断变化。
  4. macOS权限错误 — 在系统设置 > 隐私与安全性 > 辅助功能中为终端应用启用辅助功能权限。
  5. Linux X11错误 — 确保已设置
    DISPLAY
    。对于XWayland,如果XShm失败,截图会自动回退到XGetImage。