computer-control

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Computer Control Skill

计算机控制Skill

Use Claude's Computer Use API to see and control desktop environments through screenshots and mouse/keyboard actions.
使用Claude的Computer Use API,通过截图和鼠标/键盘操作查看并控制桌面环境。

When To Use

适用场景

  • Automating GUI-based workflows that lack CLI alternatives
  • Testing web applications through visual interaction
  • Filling forms, navigating menus, or interacting with desktop apps
  • Building automation pipelines that need visual verification
  • 自动化无CLI替代方案的基于GUI的工作流
  • 通过视觉交互测试Web应用
  • 填写表单、导航菜单或与桌面应用交互
  • 构建需要视觉验证的自动化流水线

When NOT To Use

不适用场景

  • Tasks achievable through CLI or API (no GUI needed)
  • Browser automation better served by Playwright or CDP
Why this stays opt-in. Per docs/inclusive-defaults.md (TRUE-exception category 4), Computer Use takes screenshots and synthesizes keyboard/mouse input — cross-process side effects that must always be explicitly invoked, never default-on.
  • 可通过CLI或API完成的任务(无需GUI)
  • 更适合用Playwright或CDP实现的浏览器自动化
为何保持可选启用:根据docs/inclusive-defaults.md(TRUE例外类别4),计算机使用功能会截取屏幕截图并生成键盘/鼠标输入——这类跨进程副作用必须始终显式调用,绝不能默认开启。

Architecture

架构

The computer use system has three layers:
  1. Display Toolkit (
    phantom.display
    ) - executes OS-level actions via xdotool/scrot on the real or virtual display
  2. Agent Loop (
    phantom.loop
    ) - manages the conversation cycle between Claude API and the display toolkit
  3. CLI (
    phantom.cli
    ) - command-line interface for running tasks or checking environment readiness
User Task
    |
    v
Agent Loop  <---->  Claude API (beta)
    |                   |
    v                   v
Display Toolkit    tool_use responses
    |              (click, type, screenshot)
    v
OS Commands (xdotool, scrot)
    |
    v
Display (X11 / Xvfb / WSLg)
计算机使用系统分为三层:
  1. 显示工具包
    phantom.display
    )- 通过xdotool/scrot在真实或虚拟显示器上执行操作系统级操作
  2. Agent循环
    phantom.loop
    )- 管理Claude API与显示工具包之间的对话周期
  3. CLI
    phantom.cli
    )- 用于运行任务或检查环境就绪状态的命令行界面
User Task
    |
    v
Agent Loop  <---->  Claude API (beta)
    |                   |
    v                   v
Display Toolkit    tool_use responses
    |              (click, type, screenshot)
    v
OS Commands (xdotool, scrot)
    |
    v
Display (X11 / Xvfb / WSLg)

Quick Start

快速开始

Check environment

检查环境

bash
cd plugins/phantom
uv run python -m phantom.cli --check
bash
cd plugins/phantom
uv run python -m phantom.cli --check

Run a task

运行任务

bash
export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"
bash
export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"

Use in Python

在Python中使用

python
from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop

result = run_loop(
    task="Take a screenshot of the desktop",
    api_key="sk-ant-...",
    loop_config=LoopConfig(
        model="claude-sonnet-4-6",
        max_iterations=10,
    ),
    display_config=DisplayConfig(width=1920, height=1080),
)

print(f"Done in {result.iterations} iterations")
print(result.final_text)
python
from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop

result = run_loop(
    task="Take a screenshot of the desktop",
    api_key="sk-ant-...",
    loop_config=LoopConfig(
        model="claude-sonnet-4-6",
        max_iterations=10,
    ),
    display_config=DisplayConfig(width=1920, height=1080),
)

print(f"Done in {result.iterations} iterations")
print(result.final_text)

API Versions

API版本

ModelTool VersionBeta Flag
Opus 4.6, Sonnet 4.6, Opus 4.5
computer_20251124
computer-use-2025-11-24
Sonnet 4.5, Haiku 4.5, older
computer_20250124
computer-use-2025-01-24
The
resolve_tool_version()
function handles this mapping automatically based on the model name.
模型工具版本Beta标识
Opus 4.6, Sonnet 4.6, Opus 4.5
computer_20251124
computer-use-2025-11-24
Sonnet 4.5, Haiku 4.5, 旧版本
computer_20250124
computer-use-2025-01-24
resolve_tool_version()
函数会根据模型名称自动处理版本映射。

Available Actions

可用操作

All versions:
  • screenshot
    - capture display
  • left_click
    - click at
    [x, y]
  • type
    - type text string
  • key
    - press key combo (e.g.,
    ctrl+s
    )
  • mouse_move
    - move cursor
Enhanced (20250124+):
  • scroll
    - scroll with direction and amount
  • left_click_drag
    - drag between coordinates
  • right_click
    ,
    middle_click
    ,
    double_click
    ,
    triple_click
  • hold_key
    - hold key for duration
  • wait
    - pause between actions
Latest (20251124):
  • zoom
    - inspect screen region at full resolution
所有版本:
  • screenshot
    - 捕获显示器画面
  • left_click
    - 在
    [x, y]
    位置左键点击
  • type
    - 输入文本字符串
  • key
    - 按下组合键(例如:
    ctrl+s
  • mouse_move
    - 移动光标
增强版(20250124+):
  • scroll
    - 按方向和幅度滚动
  • left_click_drag
    - 在坐标间拖动
  • right_click
    middle_click
    double_click
    triple_click
  • hold_key
    - 按住按键一段时间
  • wait
    - 在操作间暂停
最新版(20251124):
  • zoom
    - 以全分辨率检查屏幕区域

Safety

安全注意事项

Computer use carries risks. Follow these guidelines:
  1. Use a sandbox: Run in Docker or a VM, not your main OS
  2. Limit access: Do not provide login credentials unless necessary, and never for banking or sensitive services
  3. Set iteration caps: Always use
    max_iterations
    to prevent runaway API costs
  4. Human approval: For actions with real-world consequences, add confirmation callbacks via
    on_action
  5. Close sensitive apps: Claude sees the full screen via screenshots; close anything private before starting
计算机使用功能存在风险,请遵循以下准则:
  1. 使用沙箱环境:在Docker或虚拟机中运行,不要在主操作系统中使用
  2. 限制访问权限:除非必要,不要提供登录凭据,且绝不要用于银行或敏感服务
  3. 设置迭代上限:始终使用
    max_iterations
    以避免API费用失控
  4. 人工审批:对于有现实影响的操作,通过
    on_action
    添加确认回调
  5. 关闭敏感应用:Claude会通过截图看到整个屏幕;开始前请关闭所有隐私相关应用

Environment Requirements

环境要求

Linux (native or WSL2 with WSLg):
bash
sudo apt install xdotool scrot xclip
Headless (Docker/CI):
bash
undefined
Linux(原生或带WSLg的WSL2):
bash
sudo apt install xdotool scrot xclip
无头环境(Docker/CI):
bash
undefined

Install Xvfb for virtual display

安装Xvfb以提供虚拟显示器

sudo apt install xvfb xdotool scrot xclip Xvfb :1 -screen 0 1920x1080x24 & export DISPLAY=:1
undefined
sudo apt install xvfb xdotool scrot xclip Xvfb :1 -screen 0 1920x1080x24 & export DISPLAY=:1
undefined

Prompting Tips

提示词技巧

  1. Be specific about each step of the task
  2. Add "After each step, take a screenshot and verify" to catch mistakes early
  3. Use keyboard shortcuts when UI elements are hard to click
  4. Provide example screenshots for repeatable workflows
  5. Set a system prompt with domain-specific instructions
  1. 明确说明任务的每一步
  2. 添加“每一步完成后,截取屏幕截图并验证”以尽早发现错误
  3. 当UI元素难以点击时使用键盘快捷键
  4. 为可重复工作流提供示例截图
  5. 设置包含领域特定指令的系统提示词