computer-control
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseComputer Control Skill
计算机控制Skill
Use Claude's Computer Use API to see and control desktop
environments through screenshots and mouse/keyboard actions.
使用Claude的Computer Use API,通过截图和鼠标/键盘操作查看并控制桌面环境。
When To Use
适用场景
- Automating GUI-based workflows that lack CLI alternatives
- Testing web applications through visual interaction
- Filling forms, navigating menus, or interacting with desktop apps
- Building automation pipelines that need visual verification
- 自动化无CLI替代方案的基于GUI的工作流
- 通过视觉交互测试Web应用
- 填写表单、导航菜单或与桌面应用交互
- 构建需要视觉验证的自动化流水线
When NOT To Use
不适用场景
- Tasks achievable through CLI or API (no GUI needed)
- Browser automation better served by Playwright or CDP
Why this stays opt-in. Per docs/inclusive-defaults.md (TRUE-exception category 4), Computer Use takes screenshots and synthesizes keyboard/mouse input — cross-process side effects that must always be explicitly invoked, never default-on.
- 可通过CLI或API完成的任务(无需GUI)
- 更适合用Playwright或CDP实现的浏览器自动化
为何保持可选启用:根据docs/inclusive-defaults.md(TRUE例外类别4),计算机使用功能会截取屏幕截图并生成键盘/鼠标输入——这类跨进程副作用必须始终显式调用,绝不能默认开启。
Architecture
架构
The computer use system has three layers:
- Display Toolkit () - executes OS-level actions via xdotool/scrot on the real or virtual display
phantom.display - Agent Loop () - manages the conversation cycle between Claude API and the display toolkit
phantom.loop - CLI () - command-line interface for running tasks or checking environment readiness
phantom.cli
User Task
|
v
Agent Loop <----> Claude API (beta)
| |
v v
Display Toolkit tool_use responses
| (click, type, screenshot)
v
OS Commands (xdotool, scrot)
|
v
Display (X11 / Xvfb / WSLg)计算机使用系统分为三层:
- 显示工具包()- 通过xdotool/scrot在真实或虚拟显示器上执行操作系统级操作
phantom.display - Agent循环()- 管理Claude API与显示工具包之间的对话周期
phantom.loop - CLI()- 用于运行任务或检查环境就绪状态的命令行界面
phantom.cli
User Task
|
v
Agent Loop <----> Claude API (beta)
| |
v v
Display Toolkit tool_use responses
| (click, type, screenshot)
v
OS Commands (xdotool, scrot)
|
v
Display (X11 / Xvfb / WSLg)Quick Start
快速开始
Check environment
检查环境
bash
cd plugins/phantom
uv run python -m phantom.cli --checkbash
cd plugins/phantom
uv run python -m phantom.cli --checkRun a task
运行任务
bash
export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"bash
export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"Use in Python
在Python中使用
python
from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop
result = run_loop(
task="Take a screenshot of the desktop",
api_key="sk-ant-...",
loop_config=LoopConfig(
model="claude-sonnet-4-6",
max_iterations=10,
),
display_config=DisplayConfig(width=1920, height=1080),
)
print(f"Done in {result.iterations} iterations")
print(result.final_text)python
from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop
result = run_loop(
task="Take a screenshot of the desktop",
api_key="sk-ant-...",
loop_config=LoopConfig(
model="claude-sonnet-4-6",
max_iterations=10,
),
display_config=DisplayConfig(width=1920, height=1080),
)
print(f"Done in {result.iterations} iterations")
print(result.final_text)API Versions
API版本
| Model | Tool Version | Beta Flag |
|---|---|---|
| Opus 4.6, Sonnet 4.6, Opus 4.5 | | |
| Sonnet 4.5, Haiku 4.5, older | | |
The function handles this mapping
automatically based on the model name.
resolve_tool_version()| 模型 | 工具版本 | Beta标识 |
|---|---|---|
| Opus 4.6, Sonnet 4.6, Opus 4.5 | | |
| Sonnet 4.5, Haiku 4.5, 旧版本 | | |
resolve_tool_version()Available Actions
可用操作
All versions:
- - capture display
screenshot - - click at
left_click[x, y] - - type text string
type - - press key combo (e.g.,
key)ctrl+s - - move cursor
mouse_move
Enhanced (20250124+):
- - scroll with direction and amount
scroll - - drag between coordinates
left_click_drag - ,
right_click,middle_click,double_clicktriple_click - - hold key for duration
hold_key - - pause between actions
wait
Latest (20251124):
- - inspect screen region at full resolution
zoom
所有版本:
- - 捕获显示器画面
screenshot - - 在
left_click位置左键点击[x, y] - - 输入文本字符串
type - - 按下组合键(例如:
key)ctrl+s - - 移动光标
mouse_move
增强版(20250124+):
- - 按方向和幅度滚动
scroll - - 在坐标间拖动
left_click_drag - 、
right_click、middle_click、double_clicktriple_click - - 按住按键一段时间
hold_key - - 在操作间暂停
wait
最新版(20251124):
- - 以全分辨率检查屏幕区域
zoom
Safety
安全注意事项
Computer use carries risks. Follow these guidelines:
- Use a sandbox: Run in Docker or a VM, not your main OS
- Limit access: Do not provide login credentials unless necessary, and never for banking or sensitive services
- Set iteration caps: Always use to prevent runaway API costs
max_iterations - Human approval: For actions with real-world consequences,
add confirmation callbacks via
on_action - Close sensitive apps: Claude sees the full screen via screenshots; close anything private before starting
计算机使用功能存在风险,请遵循以下准则:
- 使用沙箱环境:在Docker或虚拟机中运行,不要在主操作系统中使用
- 限制访问权限:除非必要,不要提供登录凭据,且绝不要用于银行或敏感服务
- 设置迭代上限:始终使用以避免API费用失控
max_iterations - 人工审批:对于有现实影响的操作,通过添加确认回调
on_action - 关闭敏感应用:Claude会通过截图看到整个屏幕;开始前请关闭所有隐私相关应用
Environment Requirements
环境要求
Linux (native or WSL2 with WSLg):
bash
sudo apt install xdotool scrot xclipHeadless (Docker/CI):
bash
undefinedLinux(原生或带WSLg的WSL2):
bash
sudo apt install xdotool scrot xclip无头环境(Docker/CI):
bash
undefinedInstall Xvfb for virtual display
安装Xvfb以提供虚拟显示器
sudo apt install xvfb xdotool scrot xclip
Xvfb :1 -screen 0 1920x1080x24 &
export DISPLAY=:1
undefinedsudo apt install xvfb xdotool scrot xclip
Xvfb :1 -screen 0 1920x1080x24 &
export DISPLAY=:1
undefinedPrompting Tips
提示词技巧
- Be specific about each step of the task
- Add "After each step, take a screenshot and verify" to catch mistakes early
- Use keyboard shortcuts when UI elements are hard to click
- Provide example screenshots for repeatable workflows
- Set a system prompt with domain-specific instructions
- 明确说明任务的每一步
- 添加“每一步完成后,截取屏幕截图并验证”以尽早发现错误
- 当UI元素难以点击时使用键盘快捷键
- 为可重复工作流提供示例截图
- 设置包含领域特定指令的系统提示词