computer-use

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Computer Use

计算机使用功能

Use this skill when the task should operate through Orca's desktop computer-use surface rather than native Codex computer tools, raw AppleScript, ad hoc screenshots, or direct app internals.
当任务需要通过Orca的桌面计算机操作界面而非原生Codex计算机工具、原始AppleScript、临时截图或直接访问应用内部来执行时,可使用此技能。

Preconditions

前置条件

  • Prefer the public
    orca computer ...
    command.
  • In this Orca worktree, use
    ./config/scripts/orca-dev computer ...
    when testing the local dev runtime.
  • Prefer
    --json
    for agent-driven calls. Screenshot image bytes are omitted from JSON and written to
    screenshot.path
    when present.
  • Do not push, submit forms, send messages, buy items, delete data, or change account settings unless the user explicitly asked for that specific action.
  • If an app contains sensitive content, read only what the user requested and avoid unnecessary screenshots or logs.
Check runtime availability first:
bash
orca status --json
orca computer capabilities --json
For local development against this worktree:
bash
./config/scripts/orca-dev status --json
  • 优先使用公开的
    orca computer ...
    命令。
  • 在Orca工作树中,测试本地开发运行时请使用
    ./config/scripts/orca-dev computer ...
  • 对于Agent驱动的调用,优先使用
    --json
    参数。截图字节数据不会包含在JSON中,若存在则会写入
    screenshot.path
    指定路径。
  • 除非用户明确要求,否则请勿推送、提交表单、发送消息、购买物品、删除数据或更改账户设置。
  • 如果应用包含敏感内容,仅读取用户请求的部分,避免不必要的截图或日志。
先检查运行时可用性:
bash
orca status --json
orca computer capabilities --json
针对本地工作树进行开发时:
bash
./config/scripts/orca-dev status --json

Core Workflow

核心工作流

Use a snapshot-act-snapshot loop:
  1. Discover apps:
bash
orca computer list-apps --json
  1. Get a fresh state for the target app:
bash
orca computer get-app-state --app com.spotify.client --json
  1. Choose an element from that state.
  2. Perform one action:
bash
orca computer click --app com.spotify.client --element-index 42 --json
  1. Inspect the action result before deciding whether to act again. Actions return a fresh state:
bash
orca computer click --app com.spotify.client --element-index 42 --json
Element indexes are scoped to the current app state. They can go stale after navigation, focus changes, scrolling, window changes, or app re-rendering. Never carry indexes across unrelated steps without refreshing state.
使用“快照-操作-快照”循环:
  1. 发现应用:
bash
orca computer list-apps --json
  1. 获取目标应用的最新状态:
bash
orca computer get-app-state --app com.spotify.client --json
  1. 从该状态中选择一个元素。
  2. 执行一项操作:
bash
orca computer click --app com.spotify.client --element-index 42 --json
  1. 在决定是否再次执行操作前,先检查操作结果。操作会返回最新状态:
bash
orca computer click --app com.spotify.client --element-index 42 --json
元素索引仅在当前应用状态范围内有效。在导航、焦点变化、滚动、窗口更改或应用重新渲染后,索引可能会失效。切勿在不刷新状态的情况下,将索引用于无关步骤。

App Selectors

应用选择器

Prefer bundle IDs returned by
list-apps
:
bash
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer get-app-state --app com.spotify.client --json
Names are acceptable when unambiguous:
bash
orca computer get-app-state --app Spotify --json
Use
pid:<number>
only when bundle ID or name matching is ambiguous:
bash
orca computer get-app-state --app pid:12345 --json
优先使用
list-apps
返回的bundle ID:
bash
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer get-app-state --app com.spotify.client --json
当名称无歧义时,也可使用应用名称:
bash
orca computer get-app-state --app Spotify --json
仅当bundle ID或名称匹配存在歧义时,才使用
pid:<number>
bash
orca computer get-app-state --app pid:12345 --json

Commands

命令

bash
orca computer permissions --json
orca computer capabilities --json
orca computer list-apps --json
orca computer list-windows --app <app> --json
orca computer get-app-state --app <app> --json
orca computer click --app <app> --element-index <index> --json
orca computer perform-secondary-action --app <app> --element-index <index> --action <name> --json
orca computer set-value --app <app> --element-index <index> --value "text" --json
orca computer type-text --app <app> --text "text" --json
orca computer press-key --app <app> --key Return --json
orca computer hotkey --app <app> --key CmdOrCtrl+A --json
orca computer paste-text --app <app> --text "text" --json
orca computer scroll --app <app> (--element-index <index> | --x <x> --y <y>) --direction down --json
orca computer drag --app <app> --from-x 100 --from-y 100 --to-x 300 --to-y 300 --json
Use
--no-screenshot
only when pixels are not needed. Screenshots are often the only useful signal for Electron, WebView, or canvas-heavy apps with shallow accessibility trees.
Coordinates are window-local. Use coordinates from the latest screenshot/state for the same target window.
Use
--text-stdin
or
--value-stdin
for sensitive text so payloads do not land in shell history. On Linux and Windows, action payloads still pass through a short-lived local operation file.
bash
printf '%s' "$TEXT" | orca computer set-value --app <app> --element-index <index> --value-stdin --json
bash
orca computer permissions --json
orca computer capabilities --json
orca computer list-apps --json
orca computer list-windows --app <app> --json
orca computer get-app-state --app <app> --json
orca computer click --app <app> --element-index <index> --json
orca computer perform-secondary-action --app <app> --element-index <index> --action <name> --json
orca computer set-value --app <app> --element-index <index> --value "text" --json
orca computer type-text --app <app> --text "text" --json
orca computer press-key --app <app> --key Return --json
orca computer hotkey --app <app> --key CmdOrCtrl+A --json
orca computer paste-text --app <app> --text "text" --json
orca computer scroll --app <app> (--element-index <index> | --x <x> --y <y>) --direction down --json
orca computer drag --app <app> --from-x 100 --from-y 100 --to-x 300 --to-y 300 --json
仅当不需要像素信息时使用
--no-screenshot
参数。对于Electron、WebView或重度依赖画布的应用,其无障碍树较浅,截图往往是唯一有用的信号。
坐标是窗口本地坐标。请使用同一目标窗口最新截图/状态中的坐标。
对于敏感文本,请使用
--text-stdin
--value-stdin
参数,避免有效负载进入shell历史记录。在Linux和Windows系统中,操作有效负载仍会通过一个短期存在的本地操作文件传递。
bash
printf '%s' "$TEXT" | orca computer set-value --app <app> --element-index <index> --value-stdin --json

Choosing Actions

操作选择

Prefer semantic actions over raw keyboard input:
  • Use
    set-value
    for known editable fields.
  • Use
    click
    for buttons, tabs, menu items, checkboxes, and other direct controls.
  • Use
    perform-secondary-action
    only when the state lists a concrete action name and the user intent matches it.
  • Use
    type-text
    after focusing a field and confirming the app has a focused text receiver.
  • Use
    press-key
    for navigation keys, Return, Escape, shortcuts, or submitting a field after the state confirms the right target is active.
Why: keyboard input is process-targeted on macOS, but it still depends on the target app having a valid focused receiver.
set-value
targets the accessibility element directly and is more reliable when supported.
优先使用语义化操作而非原始键盘输入:
  • 对于已知的可编辑字段,使用
    set-value
  • 对于按钮、标签页、菜单项、复选框和其他直接控件,使用
    click
  • 仅当状态列出具体操作名称且用户意图与之匹配时,才使用
    perform-secondary-action
  • 在聚焦字段并确认应用有聚焦的文本接收器后,使用
    type-text
  • 对于导航键、Return、Escape、快捷键,或在状态确认目标已激活后提交字段时,使用
    press-key
原因:在macOS上,键盘输入是针对进程的,但仍依赖目标应用有有效的聚焦接收器。
set-value
直接针对无障碍元素,在支持的情况下更可靠。

Foreground And Background

前台与后台

Some actions work while the app is in the background. Treat this as app-dependent:
  • set-value
    can work in the background when the app exposes a writable accessibility value.
  • click
    and accessibility actions may work in the background for some native controls.
  • type-text
    and
    press-key
    are targeted to the app process on macOS, but the app may ignore them unless it owns focus or already has an active text receiver.
If an action returns success but the UI did not change, do not repeat the same action blindly. Run
get-app-state
again, inspect the screenshot/tree, then switch to a more semantic action or bring/focus the target if needed.
部分操作在应用处于后台时仍可执行,这取决于具体应用:
  • 当应用暴露可写入的无障碍值时,
    set-value
    可在后台工作。
  • 对于某些原生控件,
    click
    和无障碍操作可能在后台工作。
  • 在macOS上,
    type-text
    press-key
    针对应用进程,但除非应用拥有焦点或已有活跃的文本接收器,否则可能会忽略这些操作。
如果操作返回成功但UI未发生变化,请勿盲目重复相同操作。请再次运行
get-app-state
,检查截图/无障碍树,然后切换为更语义化的操作,或根据需要激活/聚焦目标应用。

Screenshots

截图

get-app-state
returns an accessibility tree and, by default, a screenshot. Use both:
  • Trust the tree for element indexes, names, roles, values, and actions.
  • Trust the screenshot for visual confirmation, especially in Electron and WebView apps.
  • If the tree is shallow, use screenshot evidence before deciding whether any action is safe.
  • If screenshot capture fails or returns no image, the app may be hidden, minimized, off-screen, or have no visible window.
Use restore only when appropriate for the task:
bash
orca computer get-app-state --app <app> --restore-window --json
get-app-state
会返回无障碍树,默认还会返回截图。请结合两者使用:
  • 依赖无障碍树获取元素索引、名称、角色、值和操作。
  • 依赖截图进行视觉确认,尤其是在Electron和WebView应用中。
  • 如果无障碍树较浅,请在决定是否执行任何操作前,先查看截图证据。
  • 如果截图捕获失败或未返回图像,应用可能已隐藏、最小化、在屏幕外,或没有可见窗口。
仅在任务需要时使用恢复窗口功能:
bash
orca computer get-app-state --app <app> --restore-window --json

App-Specific Notes

特定应用说明

Browsers

浏览器

For Edge, Chrome, and similar browsers, prefer setting the address/search field directly:
bash
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer set-value --app com.microsoft.edgemac --element-index <addressBarIndex> --value "test123" --json
orca computer press-key --app com.microsoft.edgemac --key Return --json
orca computer get-app-state --app com.microsoft.edgemac --json
Do not assume raw typing went to the address bar. Confirm the field or page changed after pressing Return.
对于Edge、Chrome等浏览器,优先直接设置地址栏/搜索字段:
bash
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer set-value --app com.microsoft.edgemac --element-index <addressBarIndex> --value "test123" --json
orca computer press-key --app com.microsoft.edgemac --key Return --json
orca computer get-app-state --app com.microsoft.edgemac --json
请勿假设原始输入已进入地址栏。请在按下Return后确认字段或页面已更改。

Spotify

Spotify

Spotify state can update asynchronously after playback or network-backed search. After a playback click, run
get-app-state
before clicking again.
For search, prefer
set-value
on the search combobox, usually named like
What do you want to play?
.
type-text
may only work when Spotify owns focus and that field is already focused.
Spotify的状态可能在播放或基于网络的搜索后异步更新。点击播放按钮后,请先运行
get-app-state
再进行下一次点击。
对于搜索操作,优先在搜索组合框(通常命名为
What do you want to play?
)上使用
set-value
。仅当Spotify拥有焦点且该字段已聚焦时,
type-text
才可能生效。

Slack

Slack

Slack may expose a shallow accessibility tree while the screenshot contains the useful information. Reading visible Slack UI is acceptable when requested, but do not send messages or trigger workflows unless explicitly asked.
Slack可能会暴露较浅的无障碍树,而截图中包含有用信息。当用户请求时,读取Slack的可见UI是允许的,但除非明确要求,否则请勿发送消息或触发工作流。

Error Handling

错误处理

  • app_not_found
    : run
    list-apps
    and retry with the bundle ID.
  • element_not_found
    : the index is stale; run
    get-app-state
    again.
  • action_failed
    : inspect the element role/actions and try a more semantic action.
  • Empty tree or no screenshot: the app may have no visible window, be minimized, or be blocked by permissions.
  • Permission errors: the user needs to grant Accessibility or Screen Recording to
    Orca Computer Use
    . Run
    orca computer permissions --json
    , use the setup UI, then retry
    orca computer get-app-state --app <bundle> --json
    .
  • app_not_found
    :运行
    list-apps
    并使用bundle ID重试。
  • element_not_found
    :索引已失效;请再次运行
    get-app-state
  • action_failed
    :检查元素角色/操作,尝试更语义化的操作。
  • 空树或无截图:应用可能没有可见窗口、已最小化,或被权限阻止。
  • 权限错误:用户需要为
    Orca Computer Use
    授予无障碍或屏幕录制权限。运行
    orca computer permissions --json
    ,使用设置界面,然后重试
    orca computer get-app-state --app <bundle> --json

Safety Checks

安全检查

Before acting, classify the action:
  • Safe: read state, list apps, inspect screenshot, focus a search box, scroll, open a harmless tab.
  • Needs care: typing into a focused field, pressing Return, clicking a primary button.
  • Requires explicit user permission: sending messages, posting, purchasing, deleting, submitting forms, changing settings, signing in, or exposing secrets.
When uncertain, stop after
get-app-state
and report what is visible instead of acting.
执行操作前,对操作进行分类:
  • 安全操作:读取状态、列出应用、检查截图、聚焦搜索框、滚动、打开无害标签页。
  • 需要谨慎操作:在聚焦字段中输入文本、按Return键、点击主按钮。
  • 需要用户明确许可:发送消息、发布内容、购买物品、删除数据、提交表单、更改设置、登录或暴露机密信息。
若不确定,请在
get-app-state
后停止操作,报告可见内容而非继续执行。