autobrowse

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AutoBrowse — Self-Improving Browser Skill

AutoBrowse — 自我改进型浏览器技能

Build reliable browser automation skills through iterative experimentation. An inner agent browses the site (

evaluate.ts

). You — the outer agent — read what happened and improve the instructions (

strategy.md

). Repeat until it passes consistently.

通过迭代实验构建可靠的浏览器自动化技能。内部Agent负责浏览网站（

evaluate.ts

）。你作为外部Agent，读取执行过程并优化操作指令（

strategy.md

）。重复此流程直至任务持续通过。

Entry Points

入口方式

Invocation is flexible — both explicit flags and free-form natural language work:

/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --all

调用方式灵活多样，既支持明确的参数标识，也支持自由形式的自然语言：

/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --all

Also fine — parse freely:

以下形式同样可行——支持自由解析：

/autobrowse https://flights.google.com/ /autobrowse book a flight on delta.com /autobrowse fix the existing google-flights skill


When the user drops a URL or free-form instruction instead of `--task <name>`:
- If an existing task in `${WORKSPACE}/tasks/` clearly matches the site/intent, use it.
- Otherwise, pick a short kebab-case name, create `${WORKSPACE}/tasks/<name>/task.md` from `${CLAUDE_SKILL_DIR}/references/example-task.md`, fill in the URL/goal based on what the user said, and proceed. Tell the user the chosen name in one line.

---

/autobrowse https://flights.google.com/ /autobrowse book a flight on delta.com /autobrowse fix the existing google-flights skill


当用户传入URL或自由形式指令而非`--task <name>`时：
- 如果`${WORKSPACE}/tasks/`目录下已有与该网站/意图明确匹配的任务，则直接使用该任务。
- 否则，选择一个简短的短横线命名格式（kebab-case），从`${CLAUDE_SKILL_DIR}/references/example-task.md`复制模板创建`${WORKSPACE}/tasks/<name>/task.md`，根据用户输入填充URL和目标，然后继续执行。用一行文字告知用户所选的任务名称。

---

How to run

运行步骤

Step 1 — Parse arguments and orient

步骤1 — 解析参数并定位任务

Check what was passed:

```
--task <name>
```
→ single task mode
```
--tasks a,b,c
```
or
```
--all
```
→ multi-task mode (spawn sub-agents)
```
--iterations N
```
→ how many evaluate → improve cycles (default: 5)
```
--env local|remote
```
→ browser environment (default: local; use remote for bot-protected sites)

If the user passed free-form text instead, map it to one of the above before continuing.

检查传入的参数：

```
--task <name>
```
→ 单任务模式
```
--tasks a,b,c
```
或
```
--all
```
→ 多任务模式（生成子Agent）
```
--iterations N
```
→ 执行“评估→优化”循环的次数（默认值：5）
```
--env local|remote
```
→ 浏览器运行环境（默认值：local；针对有反机器人保护的网站使用remote）

如果用户传入的是自由形式文本，先将其映射为上述模式之一再继续。

Step 2 — Set up the workspace

步骤2 — 搭建工作区

All training artifacts (task definitions, strategy iterations, traces, reports) live in a workspace directory in the current working directory — NOT inside

~/.claude/skills/

. This keeps the inner agent's file writes out of Claude's home dir and away from permission friction.

Default workspace:

${CWD}/autobrowse/

bash

mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reports

If the task directory (

./autobrowse/tasks/<task>/task.md

) doesn't exist yet, scaffold it:

bash

mkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.md

所有训练产物（任务定义、策略迭代版本、执行轨迹、报告）都存储在当前工作目录下的工作区目录中——而非

~/.claude/skills/

内部。这样可以避免内部Agent的文件写入操作进入Claude的主目录，减少权限相关问题。

默认工作区：

${CWD}/autobrowse/

bash

mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reports

如果任务目录（

./autobrowse/tasks/<task>/task.md

）尚未存在，需先搭建基础结构：

bash

mkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.md

Then edit task.md to describe the URL, inputs, steps, and expected JSON output

然后编辑task.md，描述URL、输入信息、步骤和预期的JSON输出


The skill source at `${CLAUDE_SKILL_DIR}` stays read-only — only `./autobrowse/` in CWD gets written to during training. Graduation (final step) writes a single file to `~/.claude/skills/<task>/SKILL.md`.

List available tasks:
```bash
ls ./autobrowse/tasks/


`${CLAUDE_SKILL_DIR}`下的技能源码保持只读状态——训练过程中仅对当前工作目录下的`./autobrowse/`进行写入操作。最终毕业环节（最后一步）会将单个文件写入`~/.claude/skills/<task>/SKILL.md`。

查看可用任务：
```bash
ls ./autobrowse/tasks/

Step 3 — Multi-task: spawn parallel sub-agents

步骤3 — 多任务模式：并行生成子Agent

If running multiple tasks, use the Agent tool to spawn one sub-agent per task simultaneously. Each sub-agent receives a self-contained prompt to run the full autobrowse loop for its task:

"You are running the autobrowse skill for task
<name>
. Workspace:
<absolute-path-to-workspace>
(e.g.
/path/to/project/autobrowse
). Run
<N>
iterations of: evaluate → read trace → improve strategy.md → repeat. Use
--env <env>
. Pass
--workspace <workspace>
to every evaluate.mjs invocation. Follow the autobrowse loop instructions exactly.
When graduating, install the skill to
~/.claude/skills/<task-name>/SKILL.md
with proper agentskills frontmatter (name + description). Do not just copy strategy.md — write a self-contained skill.
At the end, output a structured summary with: task name, pass/fail on final run, total cumulative cost, iterations completed, per-iteration table (iter number, turns, cost, status, hypothesis tested), and 2-3 bullet key learnings."

Spawn all sub-agents in parallel, wait for all to complete, then collect their summaries and write the session report.

For single task, skip this step and run the loop directly below.

如果运行多个任务，使用Agent工具为每个任务同时生成一个子Agent。每个子Agent会收到一个独立的提示，用于为其负责的任务执行完整的AutoBrowse循环：

"你正在为
<name>
任务运行AutoBrowse技能。工作区路径：
<absolute-path-to-workspace>
（例如
/path/to/project/autobrowse
）。执行
<N>
次“评估→读取轨迹→优化strategy.md→重复”的循环。使用
--env <env>
参数。在每次调用evaluate.mjs时传入
--workspace <workspace>
参数。严格遵循AutoBrowse循环的指令执行。
当完成技能毕业时，将技能安装到
~/.claude/skills/<task-name>/SKILL.md
，并添加正确的agentskills前置信息（名称+描述）。不要直接复制strategy.md——需编写一个独立可用的技能文档。
任务结束后，输出结构化总结，包含：任务名称、最终运行结果（通过/失败）、累计总成本、完成的迭代次数、迭代详情表（迭代次数、交互轮次、成本、状态、测试的假设），以及2-3条关键要点总结。"

并行生成所有子Agent，等待全部完成后，收集它们的总结并编写会话报告。

单任务模式下跳过此步骤，直接运行下方的循环流程。

The Loop (run this for each task)

循环流程（针对每个任务执行）

Iteration start

迭代开始

Check that

./autobrowse/tasks/<task>/task.md

exists (scaffold it from the template if not — see Step 2).

strategy.md

is auto-created empty by the harness on first run.

确认

./autobrowse/tasks/<task>/task.md

已存在（如果不存在，从模板搭建——参见步骤2）。首次运行时，harness会自动创建空的

strategy.md

文件。

Requirements

前置要求

```
ANTHROPIC_API_KEY
```
must be in the environment (or in a
```
.env
```
file in CWD —
```
evaluate.mjs
```
auto-loads it). If missing, the harness prints a clear error and exits; don't hunt for keys in other paths.

环境中必须配置
```
ANTHROPIC_API_KEY
```
（或在当前工作目录的
```
.env
```
文件中配置——
```
evaluate.mjs
```
会自动加载）。如果缺失，harness会打印清晰的错误信息并退出；无需在其他路径中查找密钥。

Run the inner agent

运行内部Agent

bash

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse

bash

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse

or for bot-protected sites:

针对有反机器人保护的网站：

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote


This runs the browser session and writes a full trace to `./autobrowse/traces/<task>/latest/`.

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote


此命令会运行浏览器会话，并将完整的执行轨迹写入`./autobrowse/traces/<task>/latest/`目录。

Read the trace

读取执行轨迹

bash

cat ./autobrowse/traces/<task-name>/latest/summary.md

The summary has duration, cost, turns, the decision log, and the final JSON output.

If the agent failed or got stuck, look deeper:

Read

./autobrowse/traces/<task-name>/latest/trace.json

— search for the failure turn

Read screenshots around the failure point with the Read tool

bash

cat ./autobrowse/traces/<task-name>/latest/summary.md

总结内容包含执行时长、成本、交互轮次、决策日志和最终的JSON输出。

如果Agent执行失败或陷入停滞，需深入排查：

读取

./autobrowse/traces/<task-name>/latest/trace.json

——搜索失败对应的交互轮次

使用Read工具查看失败点附近的截图

Form one hypothesis

提出一个假设

Find the exact turn where things went wrong. What single heuristic would have prevented it?

Examples:

"After clicking the dropdown, wait 1s — options animate in before they're clickable"
"Navigate directly to
```
/pay-invoice/
```
— skip the landing page entirely"
"Use
```
browse fill #field_3 value
```
not
```
browse type
```
— this field clears on focus"
"The page shows a spinner at turn 8 — add
```
browse wait timeout 2000
```
before snapshot"

找出问题出现的确切交互轮次。什么样的单一启发式规则可以避免该问题？

示例：

"点击下拉菜单后，等待1秒——选项会在动画完成后才可点击"
"直接导航到
```
/pay-invoice/
```
——完全跳过着陆页"
"使用
```
browse fill #field_3 value
```
而非
```
browse type
```
——该字段在获取焦点时会清空"
"第8轮页面显示加载动画——在截图前添加
```
browse wait timeout 2000
```
"

Update strategy.md

更新strategy.md

Edit

./autobrowse/tasks/<task-name>/strategy.md

. Keep everything that worked. Fix the specific failure. Add a concrete heuristic.

Good strategies have:

Fast path: direct URL or shortcuts to skip exploration
Step-by-step workflow: exact sequence with timing notes
Site-specific knowledge: selector IDs, form field names, success indicators
Failure recovery: what to do when X goes wrong

编辑

./autobrowse/tasks/<task-name>/strategy.md

。保留所有有效的内容，修复特定的失败点，添加具体的启发式规则。

优质策略应包含：

快速路径：直接URL或快捷方式，跳过探索环节
分步工作流：包含时间节点的精确步骤序列
网站特定知识：选择器ID、表单字段名称、成功标识
故障恢复：当X问题出现时的处理方式

Judge the result

评估结果

Read the new summary. Did it pass? Make clear progress?

Pass or progress → keep, next iteration
No progress or regression → revert strategy.md to the previous version and try a different hypothesis

读取新的总结。任务是否通过？是否取得明显进展？

通过或有进展 → 保留当前策略，进入下一次迭代
无进展或出现倒退 → 将strategy.md回滚到上一版本，尝试其他假设

After all iterations — publish if ready

所有迭代完成后——若就绪则发布

If the task passed on 2+ of the last 3 iterations or has reached the max iteration limit, install it as a Claude Code skill. Do not just copy strategy.md — the skill must be self-contained and useful to someone who has never seen this codebase. If graduating at max iterations without a clean pass, note the known failure point but still document everything learned.

Install by writing to

~/.claude/skills/<task-name>/SKILL.md

bash

mkdir -p ~/.claude/skills/<task-name>

Use this structure for the SKILL.md:

markdown

---
name: <task-name>
description: <1-2 sentences describing what this skill does and when to use it. Include trigger keywords.>
---

如果任务在最后3次迭代中有2次及以上通过或已达到最大迭代次数，则将其安装为Claude Code技能。不要直接复制strategy.md——技能文档必须独立可用，便于从未接触过此代码库的用户使用。如果达到最大迭代次数但未完全通过，需记录已知的失败点，但仍需记录所有学到的内容。

通过写入

~/.claude/skills/<task-name>/SKILL.md

完成安装：

bash

mkdir -p ~/.claude/skills/<task-name>

SKILL.md需遵循以下结构：

markdown

---
name: <task-name>
description: <1-2句话描述该技能的功能及适用场景，包含触发关键词。>
---

<Task Title> — Browser Skill

<任务标题> — 浏览器技能

Purpose

用途

<1-2 sentences: what this automates and why it exists.>

<1-2句话：该技能自动化的内容及存在意义。>

When to Use

适用场景

<何时应使用该技能。>

Browse CLI Reference

Browse CLI参考

The inner agent uses the

browse

CLI. Key commands for this task:

```
browse stop
```
— kill existing session (always run before switching to remote)
```
browse env remote
```
— start a fresh Browserbase cloud session
```
browse newpage <url>
```
— open URL in a new tab (required in remote mode —
```
browse open
```
fails with "no page available")
```
browse open <url>
```
— navigate existing tab (local mode only)
```
browse wait load
```
— wait for page to finish loading
```
browse wait timeout <ms>
```
— wait a fixed amount of time for spinners or animations
```
browse wait selector "<selector>"
```
— wait for an element to become visible
```
browse get title
```
— verify you're on the right page
```
browse get text body
```
— extract all visible text (preferred for content extraction)
```
browse snapshot
```
— get accessibility tree; each node has a ref in
```
[X-Y]
```
format (e.g.
```
[0-5]
```
,
```
[2-147]
```
)
```
browse click [X-Y]
```
— click element by ref from the latest snapshot (include the brackets)

Never use
--session <name>
flags in SKILL.md. Named sessions are a parallel-run workaround — they contaminate skills with infrastructure concerns. Skills must work in isolation with the default session.

内部Agent使用

browse

CLI。针对此任务的关键命令：

```
browse stop
```
— 终止现有会话（切换到remote模式前务必执行）
```
browse env remote
```
— 启动新的Browserbase云端会话
```
browse newpage <url>
```
— 在新标签页打开URL（remote模式下必需——
```
browse open
```
会提示“no page available”）
```
browse open <url>
```
— 在现有标签页导航（仅local模式可用）
```
browse wait load
```
— 等待页面加载完成
```
browse wait timeout <ms>
```
— 等待固定时长，用于处理加载动画或过渡效果
```
browse wait selector "<selector>"
```
— 等待元素变为可见
```
browse get title
```
— 验证是否处于正确页面
```
browse get text body
```
— 提取所有可见文本（内容提取的首选方式）
```
browse snapshot
```
— 获取无障碍树；每个节点在
```
[X-Y]
```
格式中有一个引用（例如
```
[0-5]
```
、
```
[2-147]
```
）
```
browse click [X-Y]
```
— 通过最新快照中的引用点击元素（需包含方括号）

切勿在SKILL.md中使用
--session <name>
参数。命名会话是并行运行的临时解决方案——会将基础设施相关的内容混入技能文档。技能必须能够独立运行，使用默认会话。

Workflow

工作流

Step 1 — Start session

步骤1 — 启动会话

<按顺序排列的精确browse命令>

Step 2 — Navigate

步骤2 — 导航

<精确的URL及验证步骤>

Step 3 — Extract

步骤3 — 提取

<精确的提取命令>

Step 4 — Output

步骤4 — 输出

<需生成的JSON内容，参考下方的 schema>

Site-Specific Gotchas

网站特定注意事项

<从迭代过程中总结出的所有关键启发式规则列表。这是该技能的核心价值所在。>

Failure Recovery

故障恢复

<导航失败、会话异常或提取结果无效时的处理方式>

Expected Output

预期输出

json

<paste the exact expected output schema from task.md>


After writing the SKILL.md, confirm it's installed:
```bash
ls ~/.claude/skills/<task-name>/SKILL.md

The skill is now available as

/<task-name>

in Claude Code.

json

<粘贴task.md中精确的预期输出schema>


写入SKILL.md后，确认已安装成功：
```bash
ls ~/.claude/skills/<task-name>/SKILL.md

该技能现在可在Claude Code中通过

/<task-name>

调用。

Final report (multi-task mode)

最终报告（多任务模式）

After all sub-agents complete, print a markdown table:

Task	Iterations	Final Status	Graduated	Cost
google-flights	5	✅ pass	yes	$0.42
amazon-add-to-cart	5	❌ fail	no	$1.20

Then write a persistent session report to

./autobrowse/reports/

so there's a durable record of the run inside the workspace:

bash

mkdir -p ./autobrowse/reports

Write the file

./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md

with:

markdown

undefined

所有子Agent完成后，打印markdown表格：

任务	迭代次数	最终状态	是否已毕业	成本
google-flights	5	✅ 通过	是	$0.42
amazon-add-to-cart	5	❌ 失败	否	$1.20

然后将持久化的会话报告写入

./autobrowse/reports/

目录，以便在工作区内保留本次运行的永久记录：

bash

mkdir -p ./autobrowse/reports

创建文件

./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md

，内容如下：

markdown

undefined

AutoBrowse Session Report

AutoBrowse会话报告

Date: <ISO date> Tasks: <comma-separated list> Environment: remote|local Total cost: $X.XX

日期: <ISO格式日期> 任务: <逗号分隔的任务列表> 环境: remote|local 总成本: $X.XX

Results

结果

Task	Iterations	Pass Rate	Final Status	Graduated	Cost
...	...	X/5	✅/❌	yes/no	$X.XX

任务	迭代次数	通过率	最终状态	是否已毕业	成本
...	...	X/5	✅/❌	是/否	$X.XX

Per-Task Learnings

各任务总结

<task-name>

Key insight 1: <what the agent learned>
Key insight 2: <another heuristic>
Failure mode fixed: <what was failing and how it was resolved>

关键洞察1: <Agent学到的内容>
关键洞察2: <另一条启发式规则>
修复的故障模式: <之前的故障点及解决方式>

Iteration Log

迭代日志

<task-name>

Iter	Turns	Cost	Status	Hypothesis tested
1	79	$18.75	❌ fail	baseline
2	9	$0.26	✅ pass	session contamination fix
...	...	...	...	...

---

迭代次数	交互轮次	成本	状态	测试的假设
1	79	$18.75	❌ 失败	基准测试
2	9	$0.26	✅ 通过	会话异常修复
...	...	...	...	...

---

Rules

规则

Only edit
strategy.md
— never touch
```
task.md
```
(unless creating it from the template) or
```
evaluate.mjs
```
Stay in the workspace — all training writes go to
```
./autobrowse/
```
, never to
```
~/.claude/skills/autobrowse/
```
. The skill source is read-only.
One hypothesis per iteration — test one change at a time
Build on wins — keep what worked, add to it
Trust the trace — the inner agent shows exactly what it saw and did
Graduate to
~/.claude/skills/
— the only file you write there is the final graduated
```
SKILL.md
```

仅编辑
strategy.md
— 切勿修改
```
task.md
```
（除非从模板创建）或
```
evaluate.mjs
```
保持在工作区内操作 — 所有训练相关的写入操作都指向
```
./autobrowse/
```
，切勿写入
```
~/.claude/skills/autobrowse/
```
。技能源码为只读状态。
每次迭代仅测试一个假设 — 一次只做一处修改
基于成功经验构建 — 保留有效的内容，在此基础上扩展
信任执行轨迹 — 内部Agent会准确展示其所见及所执行的操作
毕业到
~/.claude/skills/
— 仅在该目录下写入最终的毕业文件
```
SKILL.md
```