computer-use-playbook

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Computer Use Playbook

计算机操作自动化手册

Overview

概述

Use this skill for end-to-end computer automation across browser and desktop surfaces. Browser use is a major track, but not the only one. Prefer deterministic methods first, then escalate to visual/native automation only when required. For browser MCP workflows, treat

tab_id

as a required handle for all stateful actions.

本技能适用于跨浏览器和桌面端的端到端计算机自动化。浏览器操作是主要方向，但并非唯一方向。优先使用确定性方法，仅在必要时才升级到视觉/原生自动化。对于浏览器MCP工作流，需将

tab_id

作为所有有状态操作的必需句柄。

Playbook Structure

手册结构

Browser use (primary for web tasks): browser MCP tools, DOM snapshots, scripts, screenshots.
Filesystem use: shell-native operations for deterministic file/process work.
Native desktop use: coordinate and window automation only when DOM/shell are insufficient.
Human-in-the-loop checkpoints: login, CAPTCHA, security prompts, or policy-gated steps.

浏览器操作（Web任务首选）：浏览器MCP工具、DOM快照、脚本、截图。
文件系统操作：使用Shell原生操作完成确定性的文件/进程任务。
原生桌面操作：仅当DOM/Shell方法不足以完成任务时，才使用坐标和窗口自动化。
人机协作检查点：登录、CAPTCHA、安全提示或受策略限制的步骤。

Decision Order

决策顺序

Identify the active surface: browser page, filesystem/process, or native desktop UI.
For browser pages, use browser MCP tools first and keep a strict
```
tab_id
```
contract.
For filesystem/process work, use shell/system tools first (
```
rg
```
,
```
ls
```
,
```
find
```
, etc.).
Escalate to vision or native UI automation only when deterministic methods are insufficient.
If blocked by login, CAPTCHA, or security gates, switch to human-in-the-loop flow.
Verify each critical step with state checks plus screenshot evidence.

识别当前操作界面：浏览器页面、文件系统/进程或原生桌面UI。
对于浏览器页面，优先使用浏览器MCP工具，并严格遵循
```
tab_id
```
约定。
对于文件系统/进程任务，优先使用Shell/系统工具（如
```
rg
```
、
```
ls
```
、
```
find
```
等）。
仅当确定性方法无效时，才升级到视觉或原生UI自动化。
如果被登录、CAPTCHA或安全网关阻挡，切换到人机协作流程。
结合状态检查和截图证据验证每个关键步骤。

Browser Automation (Major Track)

浏览器自动化（主要方向）

Use browser tools + DOM-first for browser flows. Avoid jumping to native desktop clicks while the target is still reachable by browser tools.

Preferred sequence:

```
open_tab
```
and capture returned
```
tab_id
```
.
```
navigate_to(tab_id, url)
```
for explicit page transitions.

dom_snapshot(tab_id, ...)

run_script(tab_id, ...)

to identify target.

```
run_script(tab_id, ...)
```
action (click/type/submit).

read_page(tab_id, ...)

run_script(tab_id, ...)

to verify URL/title/content.

```
screenshot(tab_id, ...)
```
as evidence.

Session behavior guidance:

always pass

tab_id

for

navigate_to

read_page

screenshot

dom_snapshot

run_script

, and

close_tab

never rely on implicit active-tab behavior.
if a click opens a new tab/window, call
```
list_tabs
```
, detect the new
```
tab_id
```
, and continue explicitly on that
```
tab_id
```
.
keep a local map of
```
purpose -> tab_id
```
when handling multiple tabs.

Escalation triggers:

dynamic overlays not stable via selectors,
canvas/rendered controls,
consent dialogs where selector path is inconsistent,
native picker launched from browser (file upload dialog).

Do not overuse fallback:

if a browser tool can do it, stay in browser tools.
use native automation only for cross-app boundaries (OS dialogs, non-DOM UI).

浏览器流程优先使用浏览器工具+DOM优先策略。当目标仍可通过浏览器工具访问时，避免直接使用原生桌面点击操作。

推荐流程：

调用
```
open_tab
```
并记录返回的
```
tab_id
```
。
调用
```
navigate_to(tab_id, url)
```
完成明确的页面跳转。

使用

dom_snapshot(tab_id, ...)

或

run_script(tab_id, ...)

定位目标元素。

调用
```
run_script(tab_id, ...)
```
执行操作（点击/输入/提交）。

通过

read_page(tab_id, ...)

run_script(tab_id, ...)

验证URL/标题/内容。

调用
```
screenshot(tab_id, ...)
```
留存操作证据。

会话行为规范：

在调用

navigate_to

、

read_page

、

screenshot

、

dom_snapshot

、

run_script

和

close_tab

时，必须传入

tab_id

。

切勿依赖隐式的当前标签页行为。
如果点击操作打开了新标签页/窗口，调用
```
list_tabs
```
检测新的
```
tab_id
```
，并明确基于该
```
tab_id
```
继续操作。
处理多标签页时，维护本地的「用途 -> tab_id」映射关系。

升级触发条件：

动态浮层无法通过选择器稳定定位，
画布/渲染控件，
选择器路径不一致的授权弹窗，
浏览器唤起的原生选择器（如文件上传对话框）。

避免过度使用降级方案：

若浏览器工具可完成任务，优先使用浏览器工具。
仅在跨应用边界（如系统对话框、非DOM UI）时使用原生自动化。

File Explorer and Filesystem Automation

文件资源管理器与文件系统自动化

Prefer shell-native methods before GUI clicking.

Use shell when possible:

search files:
```
rg --files
```
,
```
find
```
move/copy/rename:
```
mv
```
,
```
cp
```
,
```
mkdir
```
inspect metadata:
```
ls -la
```
,
```
stat
```

Use native UI only when the workflow is GUI-only:

OS file picker from browser/app,
drag-drop interactions not scriptable via API,
app-specific explorer panes.

优先使用Shell原生方法，而非GUI点击操作。

优先使用Shell的场景：

文件搜索：
```
rg --files
```
、
```
find
```
移动/复制/重命名：
```
mv
```
、
```
cp
```
、
```
mkdir
```
元数据检查：
```
ls -la
```
、
```
stat
```

仅当工作流仅支持GUI时使用原生UI：

浏览器/应用唤起的系统文件选择器，
无法通过API脚本实现的拖放交互，
应用专属的资源管理器面板。

Native UI Automation

原生UI自动化

Use native UI automation for interactions outside application DOM/API.

Typical tools:

```
xdotool
```
for key/click/type,
```
xprop
```
/
```
xwininfo
```
for window targeting.

Guidelines:

ensure window focus before typing,
prefer keyboard-driven deterministic paths,
keep retries bounded and observable,
re-check application state after each action.

原生UI自动化用于与应用DOM/API之外的元素交互。

常用工具：

```
xdotool
```
：用于按键/点击/输入操作，
```
xprop
```
/
```
xwininfo
```
：用于窗口定位。

操作规范：

输入前确保窗口已获得焦点，
优先使用键盘驱动的确定性路径，
限制重试次数并可观测重试过程，
每次操作后重新检查应用状态。

Human-in-the-loop rules

人机协作规则

Pause and ask for user intervention when blocked by:

login/2FA challenges,
CAPTCHA or anti-bot checkpoints,
legal/security confirmation screens that require explicit human intent.

When waiting for user action:

explain exactly what the user must do and where.
issue an audible notification using
```
speak
```
so the user notices immediately.
wait, then re-check state (
```
url
```
,
```
title
```
, element visibility, screenshot) before continuing.

当遇到以下阻挡时，暂停并请求用户干预：

登录/双因素认证挑战，
CAPTCHA或反机器人检查点，
需要明确人工确认的法律/安全确认界面。

等待用户操作时：

明确说明用户需要执行的操作及操作位置。
调用
```
speak
```
发出声音通知，确保用户及时注意到。
等待后重新检查状态（URL、标题、元素可见性、截图），再继续执行。

Special Cases

特殊场景

Consent dialogs

授权弹窗

DOM-first click (
```
Accept all
```
/
```
Reject all
```
/localized variants).
if selector fails but button is visible, use coordinate/native fallback.
confirm modal is not visible and main interaction path works.

优先使用DOM点击（如「全部接受」/「全部拒绝」及本地化变体）。
若选择器失效但按钮可见，使用坐标/原生方案降级。
确认弹窗已关闭且主交互路径可正常使用。

CAPTCHA / anti-bot challenges

CAPTCHA/反机器人挑战

do not attempt bypass logic.
capture evidence and report blocked state clearly.
require human-in-the-loop completion.
notify user with
```
speak
```
when intervention is required.

请勿尝试绕过逻辑。
留存证据并清晰报告阻挡状态。
要求通过人机协作完成。
需要干预时调用
```
speak
```
通知用户。

Login and account security gates

登录与账号安全网关

try normal DOM steps first for username/password field fill and submit.
if SSO, passkey, device approval, or 2FA requires human action, pause and request user action.
after user confirms completion, re-snapshot and continue from verified page state.

优先尝试常规DOM步骤：填写用户名/密码字段并提交。
若SSO、密钥、设备验证或双因素认证需要人工操作，暂停并请求用户协助。
用户确认完成后，重新获取快照并从已验证的页面状态继续执行。

File uploads

文件上传

use DOM file input assignment if available.
if native picker opens, switch to native UI automation.
verify upload appears in page/app state.

若支持，优先使用DOM文件输入赋值。
若唤起原生选择器，切换到原生UI自动化。
验证上传内容已出现在页面/应用状态中。

Verification Standard

验证标准

Every important step should end with both:

state evidence (URL/title/content/element state), and
visual evidence (screenshot path).

If blocked, report:

attempted method,
blocker reason,
evidence collected,
next safe fallback.

每个重要步骤结束后，需同时留存：

状态证据（URL/标题/内容/元素状态），以及
视觉证据（截图路径）。

若被阻挡，需报告：

尝试过的方法，
阻挡原因，
收集到的证据，
下一个安全的降级方案。

Learning Library Structure

学习库结构

Use

references/learnings/

as the canonical knowledge base.

```
references/learnings/index.md
```
: topic registry and folder convention.
```
references/learnings/general/
```
: cross-task lessons.
```
references/learnings/<topic-slug>/
```
: topic-specific lessons and experience log.

Topic folder convention:

```
lessons.md
```
for stable workflow rules.
```
experience-log.md
```
for incremental run learnings.

将

references/learnings/

作为标准知识库。

```
references/learnings/index.md
```
：主题注册表和文件夹规范。
```
references/learnings/general/
```
：跨任务经验总结。
```
references/learnings/<topic-slug>/
```
：特定主题的经验总结和执行日志。

主题文件夹规范：

```
lessons.md
```
：稳定的工作流规则。
```
experience-log.md
```
：增量的执行经验记录。

Continuous Learning Loop (Required)

持续学习循环（必需）

Treat each real run as training data for future runs.

Before starting similar work:

Load
```
references/learnings/index.md
```
.
Map the task to a topic slug (for example
```
google-flow
```
).

Load

references/learnings/general/experience-log.md

Load topic files when present:

references/learnings/<topic-slug>/lessons.md

references/learnings/<topic-slug>/experience-log.md

If the topic folder does not exist, create it with
```
lessons.md
```
and
```
experience-log.md
```
.

During execution:

Capture failure signal and the exact step where it appears.
Record the minimal fix that resolved it.
Keep one-action-at-a-time execution where UI state is fragile.

After completion (or meaningful failure):

Append a short run note to

references/learnings/<topic-slug>/experience-log.md

Include: date, context, failure signal, root cause, fix pattern, reusable rule.
Keep entries concise and deduplicated by updating prior rules instead of adding noisy repeats.

将每次实际执行作为未来任务的训练数据。

开始类似任务前：

加载
```
references/learnings/index.md
```
。
将任务映射到对应的主题标识（例如
```
google-flow
```
）。

加载

references/learnings/general/experience-log.md

。

若存在对应主题文件，加载以下内容：

references/learnings/<topic-slug>/lessons.md

references/learnings/<topic-slug>/experience-log.md

若主题文件夹不存在，创建包含
```
lessons.md
```
和
```
experience-log.md
```
的文件夹。

执行过程中：

捕获失败信号及出现失败的具体步骤。
记录解决问题的最小修复方案。
在UI状态不稳定时，保持每次仅执行一个操作。

执行完成（或出现重大失败）后：

在

references/learnings/<topic-slug>/experience-log.md

中添加简短的执行记录。

记录内容包括：日期、上下文、失败信号、根本原因、修复模式、可复用规则。
保持记录简洁，通过更新已有规则避免重复记录。

References

参考资料

Load

references/computer-use-techniques.md

for command snippets and fallback templates. Load

references/learnings/index.md

to select the right topic folder. Load

references/learnings/general/experience-log.md

for cross-task patterns. Load

references/learnings/google-flow/lessons.md

when automating Google Flow video creation. Load

references/learnings/google-flow/experience-log.md

for incremental Google Flow learnings.

加载

references/computer-use-techniques.md

获取命令片段和降级模板。加载

references/learnings/index.md

选择正确的主题文件夹。加载

references/learnings/general/experience-log.md

获取跨任务模式。自动化Google Flow视频创建时，加载

references/learnings/google-flow/lessons.md

。获取Google Flow增量经验时，加载

references/learnings/google-flow/experience-log.md

。