autobrowse

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AutoBrowse — Self-Improving Browser Skill

AutoBrowse — 自我改进的浏览器技能

Build reliable browser automation skills through iterative experimentation. An inner agent browses the site (

evaluate.ts

). You — the outer agent — read what happened and improve the instructions (

strategy.md

). Repeat until it passes consistently.

通过迭代实验构建可靠的浏览器自动化技能。内部代理负责浏览网站（

evaluate.ts

），而你作为外部代理，读取执行过程并优化指令（

strategy.md

），重复此过程直到任务能持续完成。

Entry Points

入口方式

Invocation is flexible — both explicit flags and free-form natural language work:

/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --all

调用方式灵活——既支持明确的命令行参数，也支持自由格式的自然语言指令：

/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --all

Also fine — parse freely:

以下方式同样可行——支持自由解析：

/autobrowse https://flights.google.com/ /autobrowse book a flight on delta.com /autobrowse fix the existing google-flights skill


When the user drops a URL or free-form instruction instead of `--task <name>`:
- If an existing task in `${WORKSPACE}/tasks/` clearly matches the site/intent, use it.
- Otherwise, pick a short kebab-case name, create `${WORKSPACE}/tasks/<name>/task.md` from `${CLAUDE_SKILL_DIR}/references/example-task.md`, fill in the URL/goal based on what the user said, and proceed. Tell the user the chosen name in one line.

---

/autobrowse https://flights.google.com/ /autobrowse book a flight on delta.com /autobrowse fix the existing google-flights skill


当用户输入URL或自由格式指令而非`--task <name>`时：
- 如果`${WORKSPACE}/tasks/`中已有与该网站/意图匹配的任务，则直接使用。
- 否则，选择一个短横线命名的名称，从`${CLAUDE_SKILL_DIR}/references/example-task.md`创建`${WORKSPACE}/tasks/<name>/task.md`，根据用户输入填充URL和目标后继续执行，并向用户告知所选名称。

---

How to run

运行步骤

Step 1 — Parse arguments and orient

步骤1 — 解析参数并确定方向

Check what was passed:

```
--task <name>
```
→ single task mode
```
--tasks a,b,c
```
or
```
--all
```
→ multi-task mode (spawn sub-agents)
```
--iterations N
```
→ how many evaluate → improve cycles (default: 5)
```
--env local|remote
```
→ browser environment (default: local; use remote for bot-protected sites)

If the user passed free-form text instead, map it to one of the above before continuing.

检查输入内容：

```
--task <name>
```
→ 单任务模式
```
--tasks a,b,c
```
或
```
--all
```
→ 多任务模式（生成子代理）
```
--iterations N
```
→ 评估→优化循环的次数（默认值：5）
```
--env local|remote
```
→ 浏览器运行环境（默认值：local；针对有反机器人保护的网站使用remote）

如果用户输入的是自由格式文本，先将其映射为上述模式之一再继续。

Step 2 — Set up the workspace

步骤2 — 设置工作区

All training artifacts (task definitions, strategy iterations, traces, reports) live in a workspace directory in the current working directory — NOT inside

~/.claude/skills/

. This keeps the inner agent's file writes out of Claude's home dir and away from permission friction.

Default workspace:

${CWD}/autobrowse/

bash

mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reports

If the task directory (

./autobrowse/tasks/<task>/task.md

) doesn't exist yet, scaffold it:

bash

mkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.md

所有训练产物（任务定义、策略迭代版本、执行轨迹、报告）都存储在当前工作目录下的工作区文件夹中——而非

~/.claude/skills/

内。这样可以避免内部代理的文件写入操作进入Claude的主目录，减少权限问题。

默认工作区：

${CWD}/autobrowse/

bash

mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reports

如果任务目录（

./autobrowse/tasks/<task>/task.md

）尚未存在，则创建模板：

bash

mkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.md

Then edit task.md to describe the URL, inputs, steps, and expected JSON output

随后编辑task.md，描述URL、输入信息、步骤和预期的JSON输出


The skill source at `${CLAUDE_SKILL_DIR}` stays read-only — only `./autobrowse/` in CWD gets written to during training. Graduation (final step) writes a single file to `~/.claude/skills/<task>/SKILL.md`.

List available tasks:
```bash
ls ./autobrowse/tasks/


`${CLAUDE_SKILL_DIR}`下的技能源文件保持只读状态——训练期间只有当前目录下的`./autobrowse/`会被写入。最终的毕业步骤会将单个文件写入`~/.claude/skills/<task>/SKILL.md`。

查看可用任务：
```bash
ls ./autobrowse/tasks/

Step 3 — Multi-task: spawn parallel sub-agents

步骤3 — 多任务模式：并行生成子代理

If running multiple tasks, use the Agent tool to spawn one sub-agent per task simultaneously. Each sub-agent receives a self-contained prompt to run the full autobrowse loop for its task:

"You are running the autobrowse skill for task
<name>
. Workspace:
<absolute-path-to-workspace>
(e.g.
/path/to/project/autobrowse
). Run
<N>
iterations of: evaluate → read trace → improve strategy.md → repeat. Use
--env <env>
. Pass
--workspace <workspace>
to every evaluate.mjs invocation. Follow the autobrowse loop instructions exactly.
When graduating, install the skill to
~/.claude/skills/<task-name>/SKILL.md
with proper agentskills frontmatter (name + description). Do not just copy strategy.md — write a self-contained skill.
At the end, output a structured summary with: task name, pass/fail on final run, total cumulative cost, iterations completed, per-iteration table (iter number, turns, cost, status, hypothesis tested), and 2-3 bullet key learnings."

Spawn all sub-agents in parallel, wait for all to complete, then collect their summaries and write the session report.

For single task, skip this step and run the loop directly below.

如果运行多个任务，使用Agent工具为每个任务同时生成一个子代理。每个子代理会收到独立的提示，用于运行对应任务的完整AutoBrowse循环：

"你正在为任务
<name>
运行autobrowse技能。工作区：
<absolute-path-to-workspace>
（例如
/path/to/project/autobrowse
）。运行
<N>
次评估→读取轨迹→优化strategy.md→重复的循环。使用
--env <env>
参数。在每次调用evaluate.mjs时传入
--workspace <workspace>
。严格遵循autobrowse循环的说明。
完成毕业步骤时，将技能安装到
~/.claude/skills/<task-name>/SKILL.md
，并添加正确的agentskills前置信息（名称+描述）。不要直接复制strategy.md——要编写一个独立可用的技能。
最后，输出结构化摘要，包含：任务名称、最终运行结果（通过/失败）、累计总成本、完成的迭代次数、迭代详情表（迭代编号、交互轮次、成本、状态、测试的假设），以及2-3条关键要点。"

并行生成所有子代理，等待全部完成后，收集它们的摘要并写入会话报告。

单任务模式跳过此步骤，直接运行下方的循环。

The Loop (run this for each task)

循环流程（每个任务执行此流程）

Iteration start

迭代开始

Check that

./autobrowse/tasks/<task>/task.md

exists (scaffold it from the template if not — see Step 2).

strategy.md

is auto-created empty by the harness on first run.

确认

./autobrowse/tasks/<task>/task.md

已存在（如果不存在，从模板创建——见步骤2）。首次运行时，harness会自动创建空的

strategy.md

。

Requirements

前置要求

```
ANTHROPIC_API_KEY
```
must be in the environment (or in a
```
.env
```
file in CWD —
```
evaluate.mjs
```
auto-loads it). If missing, the harness prints a clear error and exits; don't hunt for keys in other paths.

环境中必须配置
```
ANTHROPIC_API_KEY
```
（或当前目录下的
```
.env
```
文件——
```
evaluate.mjs
```
会自动加载）。如果缺失，harness会打印清晰的错误信息并退出；无需在其他路径查找密钥。

Run the inner agent

运行内部代理

bash

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse

bash

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse

or for bot-protected sites:

针对有反机器人保护的网站：

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote


This runs the browser session and writes a full trace to `./autobrowse/traces/<task>/latest/`.

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote


此命令会运行浏览器会话，并将完整的执行轨迹写入`./autobrowse/traces/<task>/latest/`。

Read the trace

读取执行轨迹

bash

cat ./autobrowse/traces/<task-name>/latest/summary.md

The summary has duration, cost, turns, the decision log, and the final JSON output.

If the agent failed or got stuck, look deeper:

Read

./autobrowse/traces/<task-name>/latest/trace.json

— search for the failure turn

Read screenshots around the failure point with the Read tool

bash

cat ./autobrowse/traces/<task-name>/latest/summary.md

摘要包含时长、成本、交互轮次、决策日志和最终JSON输出。

如果代理执行失败或卡住，需深入查看：

读取

./autobrowse/traces/<task-name>/latest/trace.json

——搜索失败的交互轮次

使用Read工具查看失败点附近的截图

Form one hypothesis

提出一个假设

Find the exact turn where things went wrong. What single heuristic would have prevented it?

Examples:

"After clicking the dropdown, wait 1s — options animate in before they're clickable"
"Navigate directly to
```
/pay-invoice/
```
— skip the landing page entirely"
"Use
```
browse fill #field_3 value
```
not
```
browse type
```
— this field clears on focus"
"The page shows a spinner at turn 8 — add
```
browse wait timeout 2000
```
before snapshot"

找出问题出现的具体交互轮次。哪种单一策略可以避免该问题？

示例：

"点击下拉菜单后等待1秒——选项动画完成后才可点击"
"直接导航到
```
/pay-invoice/
```
——完全跳过着陆页"
"使用
```
browse fill #field_3 value
```
而非
```
browse type
```
——此输入框获得焦点时会清空内容"
"第8轮页面显示加载动画——在快照前添加
```
browse wait timeout 2000
```
"

Update strategy.md

更新strategy.md

Edit

./autobrowse/tasks/<task-name>/strategy.md

. Keep everything that worked. Fix the specific failure. Add a concrete heuristic.

Good strategies have:

Fast path: direct URL or shortcuts to skip exploration
Step-by-step workflow: exact sequence with timing notes
Site-specific knowledge: selector IDs, form field names, success indicators
Failure recovery: what to do when X goes wrong

编辑

./autobrowse/tasks/<task-name>/strategy.md

。保留所有有效的内容，修复具体的失败点，添加明确的策略。

优质策略应包含：

快速路径：直接URL或跳过探索的快捷方式
分步工作流：包含时间说明的精确步骤序列
网站特定知识：选择器ID、表单字段名称、成功标识
故障恢复：当X出现问题时的处理方法

Judge the result

判断结果

Read the new summary. Did it pass? Make clear progress?

Pass or progress → keep, next iteration
No progress or regression → revert strategy.md to the previous version and try a different hypothesis

读取新的摘要。任务是否通过？是否有明显进展？

通过或有进展 → 保留当前版本，进入下一次迭代
无进展或出现倒退 → 将strategy.md恢复到上一版本，尝试其他假设

After all iterations — publish if ready

所有迭代完成后——若就绪则发布

If the task passed on 2+ of the last 3 iterations or has reached the max iteration limit, install it as a Claude Code skill. Do not just copy strategy.md — the skill must be self-contained and useful to someone who has never seen this codebase. If graduating at max iterations without a clean pass, note the known failure point but still document everything learned.

Install by writing to

~/.claude/skills/<task-name>/SKILL.md

bash

mkdir -p ~/.claude/skills/<task-name>

Use this structure for the SKILL.md:

markdown

---
name: <task-name>
description: <1-2 sentences describing what this skill does and when to use it. Include trigger keywords.>
---

如果任务在最后3次迭代中有2次以上通过或已达到最大迭代次数，则将其安装为Claude Code技能。不要直接复制strategy.md——技能必须独立可用，从未接触过此代码库的用户也能使用。如果达到最大迭代次数但未完全通过，需注明已知的失败点，但仍需记录所有学到的内容。

通过写入

~/.claude/skills/<task-name>/SKILL.md

完成安装：

bash

mkdir -p ~/.claude/skills/<task-name>

SKILL.md需使用以下结构：

markdown

---
name: <task-name>
description: <1-2句话描述该技能的功能和适用场景，包含触发关键词。>
---

<Task Title> — Browser Skill

<任务标题> — 浏览器技能

Purpose

用途

<1-2 sentences: what this automates and why it exists.>

<1-2句话：该技能自动化的内容及其存在的意义。>

When to Use

适用场景

<何时应该使用此技能。>

Browse CLI Reference

Browse CLI参考

The inner agent uses the

browse

CLI. Key commands for this task:

```
browse stop
```
— kill existing session (always run before switching to remote)
```
browse open <url> --remote
```
— start a fresh Browserbase cloud session and navigate
```
browse open <url> --local
```
— start a clean local browser and navigate
```
browse tab new <url>
```
— open URL in a new tab
```
browse wait load
```
— wait for page to finish loading
```
browse wait timeout <ms>
```
— wait a fixed amount of time for spinners or animations
```
browse wait selector "<selector>"
```
— wait for an element to become visible
```
browse get title
```
— verify you're on the right page
```
browse get text body
```
— extract all visible text (preferred for content extraction)
```
browse snapshot
```
— get accessibility tree; each node has a ref in
```
[X-Y]
```
format (e.g.
```
[0-5]
```
,
```
[2-147]
```
)
```
browse click [X-Y]
```
— click element by ref from the latest snapshot (include the brackets)

Never use
--session <name>
flags in SKILL.md. Named sessions are a parallel-run workaround — they contaminate skills with infrastructure concerns. Skills must work in isolation with the default session.

内部代理使用

browse

命令行工具。此任务的关键命令：

```
browse stop
```
— 终止现有会话（切换到remote环境前务必运行此命令）
```
browse open <url> --remote
```
— 启动新的Browserbase云端会话并导航
```
browse open <url> --local
```
— 启动干净的本地浏览器并导航
```
browse tab new <url>
```
— 在新标签页打开URL
```
browse wait load
```
— 等待页面加载完成
```
browse wait timeout <ms>
```
— 等待固定时长（适用于加载动画或过渡效果）
```
browse wait selector "<selector>"
```
— 等待元素变为可见
```
browse get title
```
— 验证是否在正确页面
```
browse get text body
```
— 提取所有可见文本（内容提取的首选方式）
```
browse snapshot
```
— 获取可访问性树；每个节点都有
```
[X-Y]
```
格式的引用（例如
```
[0-5]
```
、
```
[2-147]
```
）
```
browse click [X-Y]
```
— 通过最新快照中的引用点击元素（需包含括号）

**切勿在SKILL.md中使用

--session <name>

参数。**命名会话是并行运行的临时解决方案——会将基础设施相关的内容混入技能中。技能必须能在默认会话下独立运行。

Workflow

工作流

Step 1 — Start session

步骤1 — 启动会话

<按顺序排列的精确browse命令>

Step 2 — Navigate

步骤2 — 导航

<精确的URL和验证步骤>

Step 3 — Extract

步骤3 — 提取

<精确的提取命令>

Step 4 — Output

步骤4 — 输出

<要生成的JSON内容，参考下方的Schema>

Site-Specific Gotchas

网站特定注意事项

<从迭代中总结出的所有关键策略要点列表。这是该技能的核心价值。>

Failure Recovery

故障恢复

<导航失败、会话被污染或提取结果无效时的处理方法>

Expected Output

预期输出

json

<paste the exact expected output schema from task.md>


After writing the SKILL.md, confirm it's installed:
```bash
ls ~/.claude/skills/<task-name>/SKILL.md

The skill is now available as

/<task-name>

in Claude Code.

json

<粘贴task.md中的精确预期输出Schema>


写入SKILL.md后，确认安装成功：
```bash
ls ~/.claude/skills/<task-name>/SKILL.md

该技能现在可在Claude Code中通过

/<task-name>

调用。

Final report (multi-task mode)

最终报告（多任务模式）

After all sub-agents complete, print a markdown table:

Task	Iterations	Final Status	Graduated	Cost
google-flights	5	✅ pass	yes	$0.42
amazon-add-to-cart	5	❌ fail	no	$1.20

Then write a persistent session report to

./autobrowse/reports/

so there's a durable record of the run inside the workspace:

bash

mkdir -p ./autobrowse/reports

Write the file

./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md

with:

markdown

undefined

所有子代理完成后，打印Markdown表格：

任务	迭代次数	最终状态	是否已毕业	成本
google-flights	5	✅ 通过	是	$0.42
amazon-add-to-cart	5	❌ 失败	否	$1.20

然后将持久化的会话报告写入

./autobrowse/reports/

，确保工作区内有本次运行的永久记录：

bash

mkdir -p ./autobrowse/reports

创建文件

./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md

，内容如下：

markdown

undefined

AutoBrowse Session Report

AutoBrowse会话报告

Date: <ISO date> Tasks: <comma-separated list> Environment: remote|local Total cost: $X.XX

日期: <ISO格式日期> 任务: <逗号分隔的任务列表> 环境: remote|local 总成本: $X.XX

Results

结果

Task	Iterations	Pass Rate	Final Status	Graduated	Cost
...	...	X/5	✅/❌	yes/no	$X.XX

任务	迭代次数	通过率	最终状态	是否已毕业	成本
...	...	X/5	✅/❌	是/否	$X.XX

Per-Task Learnings

各任务要点

<task-name>

Key insight 1: <what the agent learned>
Key insight 2: <another heuristic>
Failure mode fixed: <what was failing and how it was resolved>

关键洞察1: <代理学到的内容>
关键洞察2: <另一个策略>
修复的故障模式: <之前的问题及解决方法>

Iteration Log

迭代日志

<task-name>

Iter	Turns	Cost	Status	Hypothesis tested
1	79	$18.75	❌ fail	baseline
2	9	$0.26	✅ pass	session contamination fix
...	...	...	...	...

---

迭代编号	交互轮次	成本	状态	测试的假设
1	79	$18.75	❌ 失败	基准测试
2	9	$0.26	✅ 通过	会话污染修复
...	...	...	...	...

---

Rules

规则

Only edit
strategy.md
— never touch
```
task.md
```
(unless creating it from the template) or
```
evaluate.mjs
```
Stay in the workspace — all training writes go to
```
./autobrowse/
```
, never to
```
~/.claude/skills/autobrowse/
```
. The skill source is read-only.
One hypothesis per iteration — test one change at a time
Build on wins — keep what worked, add to it
Trust the trace — the inner agent shows exactly what it saw and did
Graduate to
~/.claude/skills/
— the only file you write there is the final graduated
```
SKILL.md
```

仅编辑
strategy.md
— 切勿修改
```
task.md
```
（除非从模板创建）或
```
evaluate.mjs
```
保持在工作区内 — 所有训练相关的写入操作都指向
```
./autobrowse/
```
，绝不写入
```
~/.claude/skills/autobrowse/
```
。技能源文件为只读状态。
每次迭代仅测试一个假设 — 一次只做一处修改
基于成功经验构建 — 保留有效的内容并在此基础上优化
信任执行轨迹 — 内部代理会准确展示其所见和所做的操作
毕业到
~/.claude/skills/
— 仅在该目录下写入最终的毕业文件
```
SKILL.md
```