computer-use
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseComputer Use
计算机使用功能
Use this skill when the task should operate through Orca's desktop computer-use surface rather than native Codex computer tools, raw AppleScript, ad hoc screenshots, or direct app internals.
当任务需要通过Orca的桌面计算机操作界面而非原生Codex计算机工具、原始AppleScript、临时截图或直接访问应用内部来执行时,可使用此技能。
Preconditions
前置条件
- Prefer the public command.
orca computer ... - In this Orca worktree, use when testing the local dev runtime.
./config/scripts/orca-dev computer ... - Prefer for agent-driven calls. Screenshot image bytes are omitted from JSON and written to
--jsonwhen present.screenshot.path - Do not push, submit forms, send messages, buy items, delete data, or change account settings unless the user explicitly asked for that specific action.
- If an app contains sensitive content, read only what the user requested and avoid unnecessary screenshots or logs.
Check runtime availability first:
bash
orca status --json
orca computer capabilities --jsonFor local development against this worktree:
bash
./config/scripts/orca-dev status --json- 优先使用公开的命令。
orca computer ... - 在Orca工作树中,测试本地开发运行时请使用。
./config/scripts/orca-dev computer ... - 对于Agent驱动的调用,优先使用参数。截图字节数据不会包含在JSON中,若存在则会写入
--json指定路径。screenshot.path - 除非用户明确要求,否则请勿推送、提交表单、发送消息、购买物品、删除数据或更改账户设置。
- 如果应用包含敏感内容,仅读取用户请求的部分,避免不必要的截图或日志。
先检查运行时可用性:
bash
orca status --json
orca computer capabilities --json针对本地工作树进行开发时:
bash
./config/scripts/orca-dev status --jsonCore Workflow
核心工作流
Use a snapshot-act-snapshot loop:
- Discover apps:
bash
orca computer list-apps --json- Get a fresh state for the target app:
bash
orca computer get-app-state --app com.spotify.client --json-
Choose an element from that state.
-
Perform one action:
bash
orca computer click --app com.spotify.client --element-index 42 --json- Inspect the action result before deciding whether to act again. Actions return a fresh state:
bash
orca computer click --app com.spotify.client --element-index 42 --jsonElement indexes are scoped to the current app state. They can go stale after navigation, focus changes, scrolling, window changes, or app re-rendering. Never carry indexes across unrelated steps without refreshing state.
使用“快照-操作-快照”循环:
- 发现应用:
bash
orca computer list-apps --json- 获取目标应用的最新状态:
bash
orca computer get-app-state --app com.spotify.client --json-
从该状态中选择一个元素。
-
执行一项操作:
bash
orca computer click --app com.spotify.client --element-index 42 --json- 在决定是否再次执行操作前,先检查操作结果。操作会返回最新状态:
bash
orca computer click --app com.spotify.client --element-index 42 --json元素索引仅在当前应用状态范围内有效。在导航、焦点变化、滚动、窗口更改或应用重新渲染后,索引可能会失效。切勿在不刷新状态的情况下,将索引用于无关步骤。
App Selectors
应用选择器
Prefer bundle IDs returned by :
list-appsbash
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer get-app-state --app com.spotify.client --jsonNames are acceptable when unambiguous:
bash
orca computer get-app-state --app Spotify --jsonUse only when bundle ID or name matching is ambiguous:
pid:<number>bash
orca computer get-app-state --app pid:12345 --json优先使用返回的bundle ID:
list-appsbash
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer get-app-state --app com.spotify.client --json当名称无歧义时,也可使用应用名称:
bash
orca computer get-app-state --app Spotify --json仅当bundle ID或名称匹配存在歧义时,才使用:
pid:<number>bash
orca computer get-app-state --app pid:12345 --jsonCommands
命令
bash
orca computer permissions --json
orca computer capabilities --json
orca computer list-apps --json
orca computer list-windows --app <app> --json
orca computer get-app-state --app <app> --json
orca computer click --app <app> --element-index <index> --json
orca computer perform-secondary-action --app <app> --element-index <index> --action <name> --json
orca computer set-value --app <app> --element-index <index> --value "text" --json
orca computer type-text --app <app> --text "text" --json
orca computer press-key --app <app> --key Return --json
orca computer hotkey --app <app> --key CmdOrCtrl+A --json
orca computer paste-text --app <app> --text "text" --json
orca computer scroll --app <app> (--element-index <index> | --x <x> --y <y>) --direction down --json
orca computer drag --app <app> --from-x 100 --from-y 100 --to-x 300 --to-y 300 --jsonUse only when pixels are not needed. Screenshots are often the only useful signal for Electron, WebView, or canvas-heavy apps with shallow accessibility trees.
--no-screenshotCoordinates are window-local. Use coordinates from the latest screenshot/state for the same target window.
Use or for sensitive text so payloads do not land in shell history.
On Linux and Windows, action payloads still pass through a short-lived local operation file.
--text-stdin--value-stdinbash
printf '%s' "$TEXT" | orca computer set-value --app <app> --element-index <index> --value-stdin --jsonbash
orca computer permissions --json
orca computer capabilities --json
orca computer list-apps --json
orca computer list-windows --app <app> --json
orca computer get-app-state --app <app> --json
orca computer click --app <app> --element-index <index> --json
orca computer perform-secondary-action --app <app> --element-index <index> --action <name> --json
orca computer set-value --app <app> --element-index <index> --value "text" --json
orca computer type-text --app <app> --text "text" --json
orca computer press-key --app <app> --key Return --json
orca computer hotkey --app <app> --key CmdOrCtrl+A --json
orca computer paste-text --app <app> --text "text" --json
orca computer scroll --app <app> (--element-index <index> | --x <x> --y <y>) --direction down --json
orca computer drag --app <app> --from-x 100 --from-y 100 --to-x 300 --to-y 300 --json仅当不需要像素信息时使用参数。对于Electron、WebView或重度依赖画布的应用,其无障碍树较浅,截图往往是唯一有用的信号。
--no-screenshot坐标是窗口本地坐标。请使用同一目标窗口最新截图/状态中的坐标。
对于敏感文本,请使用或参数,避免有效负载进入shell历史记录。在Linux和Windows系统中,操作有效负载仍会通过一个短期存在的本地操作文件传递。
--text-stdin--value-stdinbash
printf '%s' "$TEXT" | orca computer set-value --app <app> --element-index <index> --value-stdin --jsonChoosing Actions
操作选择
Prefer semantic actions over raw keyboard input:
- Use for known editable fields.
set-value - Use for buttons, tabs, menu items, checkboxes, and other direct controls.
click - Use only when the state lists a concrete action name and the user intent matches it.
perform-secondary-action - Use after focusing a field and confirming the app has a focused text receiver.
type-text - Use for navigation keys, Return, Escape, shortcuts, or submitting a field after the state confirms the right target is active.
press-key
Why: keyboard input is process-targeted on macOS, but it still depends on the target app having a valid focused receiver. targets the accessibility element directly and is more reliable when supported.
set-value优先使用语义化操作而非原始键盘输入:
- 对于已知的可编辑字段,使用。
set-value - 对于按钮、标签页、菜单项、复选框和其他直接控件,使用。
click - 仅当状态列出具体操作名称且用户意图与之匹配时,才使用。
perform-secondary-action - 在聚焦字段并确认应用有聚焦的文本接收器后,使用。
type-text - 对于导航键、Return、Escape、快捷键,或在状态确认目标已激活后提交字段时,使用。
press-key
原因:在macOS上,键盘输入是针对进程的,但仍依赖目标应用有有效的聚焦接收器。直接针对无障碍元素,在支持的情况下更可靠。
set-valueForeground And Background
前台与后台
Some actions work while the app is in the background. Treat this as app-dependent:
- can work in the background when the app exposes a writable accessibility value.
set-value - and accessibility actions may work in the background for some native controls.
click - and
type-textare targeted to the app process on macOS, but the app may ignore them unless it owns focus or already has an active text receiver.press-key
If an action returns success but the UI did not change, do not repeat the same action blindly. Run again, inspect the screenshot/tree, then switch to a more semantic action or bring/focus the target if needed.
get-app-state部分操作在应用处于后台时仍可执行,这取决于具体应用:
- 当应用暴露可写入的无障碍值时,可在后台工作。
set-value - 对于某些原生控件,和无障碍操作可能在后台工作。
click - 在macOS上,和
type-text针对应用进程,但除非应用拥有焦点或已有活跃的文本接收器,否则可能会忽略这些操作。press-key
如果操作返回成功但UI未发生变化,请勿盲目重复相同操作。请再次运行,检查截图/无障碍树,然后切换为更语义化的操作,或根据需要激活/聚焦目标应用。
get-app-stateScreenshots
截图
get-app-state- Trust the tree for element indexes, names, roles, values, and actions.
- Trust the screenshot for visual confirmation, especially in Electron and WebView apps.
- If the tree is shallow, use screenshot evidence before deciding whether any action is safe.
- If screenshot capture fails or returns no image, the app may be hidden, minimized, off-screen, or have no visible window.
Use restore only when appropriate for the task:
bash
orca computer get-app-state --app <app> --restore-window --jsonget-app-state- 依赖无障碍树获取元素索引、名称、角色、值和操作。
- 依赖截图进行视觉确认,尤其是在Electron和WebView应用中。
- 如果无障碍树较浅,请在决定是否执行任何操作前,先查看截图证据。
- 如果截图捕获失败或未返回图像,应用可能已隐藏、最小化、在屏幕外,或没有可见窗口。
仅在任务需要时使用恢复窗口功能:
bash
orca computer get-app-state --app <app> --restore-window --jsonApp-Specific Notes
特定应用说明
Browsers
浏览器
For Edge, Chrome, and similar browsers, prefer setting the address/search field directly:
bash
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer set-value --app com.microsoft.edgemac --element-index <addressBarIndex> --value "test123" --json
orca computer press-key --app com.microsoft.edgemac --key Return --json
orca computer get-app-state --app com.microsoft.edgemac --jsonDo not assume raw typing went to the address bar. Confirm the field or page changed after pressing Return.
对于Edge、Chrome等浏览器,优先直接设置地址栏/搜索字段:
bash
orca computer get-app-state --app com.microsoft.edgemac --json
orca computer set-value --app com.microsoft.edgemac --element-index <addressBarIndex> --value "test123" --json
orca computer press-key --app com.microsoft.edgemac --key Return --json
orca computer get-app-state --app com.microsoft.edgemac --json请勿假设原始输入已进入地址栏。请在按下Return后确认字段或页面已更改。
Spotify
Spotify
Spotify state can update asynchronously after playback or network-backed search. After a playback click, run before clicking again.
get-app-stateFor search, prefer on the search combobox, usually named like . may only work when Spotify owns focus and that field is already focused.
set-valueWhat do you want to play?type-textSpotify的状态可能在播放或基于网络的搜索后异步更新。点击播放按钮后,请先运行再进行下一次点击。
get-app-state对于搜索操作,优先在搜索组合框(通常命名为)上使用。仅当Spotify拥有焦点且该字段已聚焦时,才可能生效。
What do you want to play?set-valuetype-textSlack
Slack
Slack may expose a shallow accessibility tree while the screenshot contains the useful information. Reading visible Slack UI is acceptable when requested, but do not send messages or trigger workflows unless explicitly asked.
Slack可能会暴露较浅的无障碍树,而截图中包含有用信息。当用户请求时,读取Slack的可见UI是允许的,但除非明确要求,否则请勿发送消息或触发工作流。
Error Handling
错误处理
- : run
app_not_foundand retry with the bundle ID.list-apps - : the index is stale; run
element_not_foundagain.get-app-state - : inspect the element role/actions and try a more semantic action.
action_failed - Empty tree or no screenshot: the app may have no visible window, be minimized, or be blocked by permissions.
- Permission errors: the user needs to grant Accessibility or Screen Recording to . Run
Orca Computer Use, use the setup UI, then retryorca computer permissions --json.orca computer get-app-state --app <bundle> --json
- :运行
app_not_found并使用bundle ID重试。list-apps - :索引已失效;请再次运行
element_not_found。get-app-state - :检查元素角色/操作,尝试更语义化的操作。
action_failed - 空树或无截图:应用可能没有可见窗口、已最小化,或被权限阻止。
- 权限错误:用户需要为授予无障碍或屏幕录制权限。运行
Orca Computer Use,使用设置界面,然后重试orca computer permissions --json。orca computer get-app-state --app <bundle> --json
Safety Checks
安全检查
Before acting, classify the action:
- Safe: read state, list apps, inspect screenshot, focus a search box, scroll, open a harmless tab.
- Needs care: typing into a focused field, pressing Return, clicking a primary button.
- Requires explicit user permission: sending messages, posting, purchasing, deleting, submitting forms, changing settings, signing in, or exposing secrets.
When uncertain, stop after and report what is visible instead of acting.
get-app-state执行操作前,对操作进行分类:
- 安全操作:读取状态、列出应用、检查截图、聚焦搜索框、滚动、打开无害标签页。
- 需要谨慎操作:在聚焦字段中输入文本、按Return键、点击主按钮。
- 需要用户明确许可:发送消息、发布内容、购买物品、删除数据、提交表单、更改设置、登录或暴露机密信息。
若不确定,请在后停止操作,报告可见内容而非继续执行。
get-app-state