autobrowse

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AutoBrowse — Self-Improving Browser Skill

AutoBrowse — 自我改进型浏览器技能

Build reliable browser automation skills through iterative experimentation. An inner agent browses the site (
evaluate.ts
). You — the outer agent — read what happened and improve the instructions (
strategy.md
). Repeat until it passes consistently.
通过迭代实验构建可靠的浏览器自动化技能。内部Agent负责浏览网站(
evaluate.ts
)。你作为外部Agent,读取执行过程并优化操作指令(
strategy.md
)。重复此流程直至任务持续通过。

Entry Points

入口方式

Invocation is flexible — both explicit flags and free-form natural language work:
/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --all
调用方式灵活多样,既支持明确的参数标识,也支持自由形式的自然语言:
/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --all

Also fine — parse freely:

以下形式同样可行——支持自由解析:

/autobrowse https://flights.google.com/ /autobrowse book a flight on delta.com /autobrowse fix the existing google-flights skill

When the user drops a URL or free-form instruction instead of `--task <name>`:
- If an existing task in `${WORKSPACE}/tasks/` clearly matches the site/intent, use it.
- Otherwise, pick a short kebab-case name, create `${WORKSPACE}/tasks/<name>/task.md` from `${CLAUDE_SKILL_DIR}/references/example-task.md`, fill in the URL/goal based on what the user said, and proceed. Tell the user the chosen name in one line.

---
/autobrowse https://flights.google.com/ /autobrowse book a flight on delta.com /autobrowse fix the existing google-flights skill

当用户传入URL或自由形式指令而非`--task <name>`时:
- 如果`${WORKSPACE}/tasks/`目录下已有与该网站/意图明确匹配的任务,则直接使用该任务。
- 否则,选择一个简短的短横线命名格式(kebab-case),从`${CLAUDE_SKILL_DIR}/references/example-task.md`复制模板创建`${WORKSPACE}/tasks/<name>/task.md`,根据用户输入填充URL和目标,然后继续执行。用一行文字告知用户所选的任务名称。

---

How to run

运行步骤

Step 1 — Parse arguments and orient

步骤1 — 解析参数并定位任务

Check what was passed:
  • --task <name>
    → single task mode
  • --tasks a,b,c
    or
    --all
    → multi-task mode (spawn sub-agents)
  • --iterations N
    → how many evaluate → improve cycles (default: 5)
  • --env local|remote
    → browser environment (default: local; use remote for bot-protected sites)
If the user passed free-form text instead, map it to one of the above before continuing.
检查传入的参数:
  • --task <name>
    → 单任务模式
  • --tasks a,b,c
    --all
    → 多任务模式(生成子Agent)
  • --iterations N
    → 执行“评估→优化”循环的次数(默认值:5)
  • --env local|remote
    → 浏览器运行环境(默认值:local;针对有反机器人保护的网站使用remote)
如果用户传入的是自由形式文本,先将其映射为上述模式之一再继续。

Step 2 — Set up the workspace

步骤2 — 搭建工作区

All training artifacts (task definitions, strategy iterations, traces, reports) live in a workspace directory in the current working directory — NOT inside
~/.claude/skills/
. This keeps the inner agent's file writes out of Claude's home dir and away from permission friction.
Default workspace:
${CWD}/autobrowse/
bash
mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reports
If the task directory (
./autobrowse/tasks/<task>/task.md
) doesn't exist yet, scaffold it:
bash
mkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.md
所有训练产物(任务定义、策略迭代版本、执行轨迹、报告)都存储在当前工作目录下的工作区目录中——而非
~/.claude/skills/
内部。这样可以避免内部Agent的文件写入操作进入Claude的主目录,减少权限相关问题。
默认工作区:
${CWD}/autobrowse/
bash
mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reports
如果任务目录(
./autobrowse/tasks/<task>/task.md
)尚未存在,需先搭建基础结构:
bash
mkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.md

Then edit task.md to describe the URL, inputs, steps, and expected JSON output

然后编辑task.md,描述URL、输入信息、步骤和预期的JSON输出


The skill source at `${CLAUDE_SKILL_DIR}` stays read-only — only `./autobrowse/` in CWD gets written to during training. Graduation (final step) writes a single file to `~/.claude/skills/<task>/SKILL.md`.

List available tasks:
```bash
ls ./autobrowse/tasks/

`${CLAUDE_SKILL_DIR}`下的技能源码保持只读状态——训练过程中仅对当前工作目录下的`./autobrowse/`进行写入操作。最终毕业环节(最后一步)会将单个文件写入`~/.claude/skills/<task>/SKILL.md`。

查看可用任务:
```bash
ls ./autobrowse/tasks/

Step 3 — Multi-task: spawn parallel sub-agents

步骤3 — 多任务模式:并行生成子Agent

If running multiple tasks, use the Agent tool to spawn one sub-agent per task simultaneously. Each sub-agent receives a self-contained prompt to run the full autobrowse loop for its task:
"You are running the autobrowse skill for task
<name>
. Workspace:
<absolute-path-to-workspace>
(e.g.
/path/to/project/autobrowse
). Run
<N>
iterations of: evaluate → read trace → improve strategy.md → repeat. Use
--env <env>
. Pass
--workspace <workspace>
to every evaluate.mjs invocation. Follow the autobrowse loop instructions exactly.
When graduating, install the skill to
~/.claude/skills/<task-name>/SKILL.md
with proper agentskills frontmatter (name + description). Do not just copy strategy.md — write a self-contained skill.
At the end, output a structured summary with: task name, pass/fail on final run, total cumulative cost, iterations completed, per-iteration table (iter number, turns, cost, status, hypothesis tested), and 2-3 bullet key learnings."
Spawn all sub-agents in parallel, wait for all to complete, then collect their summaries and write the session report.
For single task, skip this step and run the loop directly below.

如果运行多个任务,使用Agent工具为每个任务同时生成一个子Agent。每个子Agent会收到一个独立的提示,用于为其负责的任务执行完整的AutoBrowse循环:
"你正在为
<name>
任务运行AutoBrowse技能。工作区路径:
<absolute-path-to-workspace>
(例如
/path/to/project/autobrowse
)。执行
<N>
次“评估→读取轨迹→优化strategy.md→重复”的循环。使用
--env <env>
参数。在每次调用evaluate.mjs时传入
--workspace <workspace>
参数。严格遵循AutoBrowse循环的指令执行。
当完成技能毕业时,将技能安装到
~/.claude/skills/<task-name>/SKILL.md
,并添加正确的agentskills前置信息(名称+描述)。不要直接复制strategy.md——需编写一个独立可用的技能文档。
任务结束后,输出结构化总结,包含:任务名称、最终运行结果(通过/失败)、累计总成本、完成的迭代次数、迭代详情表(迭代次数、交互轮次、成本、状态、测试的假设),以及2-3条关键要点总结。"
并行生成所有子Agent,等待全部完成后,收集它们的总结并编写会话报告。
单任务模式下跳过此步骤,直接运行下方的循环流程。

The Loop (run this for each task)

循环流程(针对每个任务执行)

Iteration start

迭代开始

Check that
./autobrowse/tasks/<task>/task.md
exists (scaffold it from the template if not — see Step 2).
strategy.md
is auto-created empty by the harness on first run.
确认
./autobrowse/tasks/<task>/task.md
已存在(如果不存在,从模板搭建——参见步骤2)。首次运行时,harness会自动创建空的
strategy.md
文件。

Requirements

前置要求

  • ANTHROPIC_API_KEY
    must be in the environment (or in a
    .env
    file in CWD —
    evaluate.mjs
    auto-loads it). If missing, the harness prints a clear error and exits; don't hunt for keys in other paths.
  • 环境中必须配置
    ANTHROPIC_API_KEY
    (或在当前工作目录的
    .env
    文件中配置——
    evaluate.mjs
    会自动加载)。如果缺失,harness会打印清晰的错误信息并退出;无需在其他路径中查找密钥。

Run the inner agent

运行内部Agent

bash
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse
bash
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse

or for bot-protected sites:

针对有反机器人保护的网站:

node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote

This runs the browser session and writes a full trace to `./autobrowse/traces/<task>/latest/`.
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote

此命令会运行浏览器会话,并将完整的执行轨迹写入`./autobrowse/traces/<task>/latest/`目录。

Read the trace

读取执行轨迹

bash
cat ./autobrowse/traces/<task-name>/latest/summary.md
The summary has duration, cost, turns, the decision log, and the final JSON output.
If the agent failed or got stuck, look deeper:
  • Read
    ./autobrowse/traces/<task-name>/latest/trace.json
    — search for the failure turn
  • Read screenshots around the failure point with the Read tool
bash
cat ./autobrowse/traces/<task-name>/latest/summary.md
总结内容包含执行时长、成本、交互轮次、决策日志和最终的JSON输出。
如果Agent执行失败或陷入停滞,需深入排查:
  • 读取
    ./autobrowse/traces/<task-name>/latest/trace.json
    ——搜索失败对应的交互轮次
  • 使用Read工具查看失败点附近的截图

Form one hypothesis

提出一个假设

Find the exact turn where things went wrong. What single heuristic would have prevented it?
Examples:
  • "After clicking the dropdown, wait 1s — options animate in before they're clickable"
  • "Navigate directly to
    /pay-invoice/
    — skip the landing page entirely"
  • "Use
    browse fill #field_3 value
    not
    browse type
    — this field clears on focus"
  • "The page shows a spinner at turn 8 — add
    browse wait timeout 2000
    before snapshot"
找出问题出现的确切交互轮次。什么样的单一启发式规则可以避免该问题?
示例:
  • "点击下拉菜单后,等待1秒——选项会在动画完成后才可点击"
  • "直接导航到
    /pay-invoice/
    ——完全跳过着陆页"
  • "使用
    browse fill #field_3 value
    而非
    browse type
    ——该字段在获取焦点时会清空"
  • "第8轮页面显示加载动画——在截图前添加
    browse wait timeout 2000
    "

Update strategy.md

更新strategy.md

Edit
./autobrowse/tasks/<task-name>/strategy.md
. Keep everything that worked. Fix the specific failure. Add a concrete heuristic.
Good strategies have:
  • Fast path: direct URL or shortcuts to skip exploration
  • Step-by-step workflow: exact sequence with timing notes
  • Site-specific knowledge: selector IDs, form field names, success indicators
  • Failure recovery: what to do when X goes wrong
编辑
./autobrowse/tasks/<task-name>/strategy.md
。保留所有有效的内容,修复特定的失败点,添加具体的启发式规则。
优质策略应包含:
  • 快速路径:直接URL或快捷方式,跳过探索环节
  • 分步工作流:包含时间节点的精确步骤序列
  • 网站特定知识:选择器ID、表单字段名称、成功标识
  • 故障恢复:当X问题出现时的处理方式

Judge the result

评估结果

Read the new summary. Did it pass? Make clear progress?
  • Pass or progress → keep, next iteration
  • No progress or regression → revert strategy.md to the previous version and try a different hypothesis
读取新的总结。任务是否通过?是否取得明显进展?
  • 通过或有进展 → 保留当前策略,进入下一次迭代
  • 无进展或出现倒退 → 将strategy.md回滚到上一版本,尝试其他假设

After all iterations — publish if ready

所有迭代完成后——若就绪则发布

If the task passed on 2+ of the last 3 iterations or has reached the max iteration limit, install it as a Claude Code skill. Do not just copy strategy.md — the skill must be self-contained and useful to someone who has never seen this codebase. If graduating at max iterations without a clean pass, note the known failure point but still document everything learned.
Install by writing to
~/.claude/skills/<task-name>/SKILL.md
:
bash
mkdir -p ~/.claude/skills/<task-name>
Use this structure for the SKILL.md:
markdown
---
name: <task-name>
description: <1-2 sentences describing what this skill does and when to use it. Include trigger keywords.>
---
如果任务在最后3次迭代中有2次及以上通过或已达到最大迭代次数,则将其安装为Claude Code技能。不要直接复制strategy.md——技能文档必须独立可用,便于从未接触过此代码库的用户使用。如果达到最大迭代次数但未完全通过,需记录已知的失败点,但仍需记录所有学到的内容。
通过写入
~/.claude/skills/<task-name>/SKILL.md
完成安装:
bash
mkdir -p ~/.claude/skills/<task-name>
SKILL.md需遵循以下结构:
markdown
---
name: <task-name>
description: <1-2句话描述该技能的功能及适用场景,包含触发关键词。>
---

<Task Title> — Browser Skill

<任务标题> — 浏览器技能

Purpose

用途

<1-2 sentences: what this automates and why it exists.>
<1-2句话:该技能自动化的内容及存在意义。>

When to Use

适用场景

<When should someone reach for this skill.>
<何时应使用该技能。>

Browse CLI Reference

Browse CLI参考

The inner agent uses the
browse
CLI. Key commands for this task:
  • browse stop
    — kill existing session (always run before switching to remote)
  • browse env remote
    — start a fresh Browserbase cloud session
  • browse newpage <url>
    — open URL in a new tab (required in remote mode —
    browse open
    fails with "no page available")
  • browse open <url>
    — navigate existing tab (local mode only)
  • browse wait load
    — wait for page to finish loading
  • browse wait timeout <ms>
    — wait a fixed amount of time for spinners or animations
  • browse wait selector "<selector>"
    — wait for an element to become visible
  • browse get title
    — verify you're on the right page
  • browse get text body
    — extract all visible text (preferred for content extraction)
  • browse snapshot
    — get accessibility tree; each node has a ref in
    [X-Y]
    format (e.g.
    [0-5]
    ,
    [2-147]
    )
  • browse click [X-Y]
    — click element by ref from the latest snapshot (include the brackets)
Never use
--session <name>
flags in SKILL.md.
Named sessions are a parallel-run workaround — they contaminate skills with infrastructure concerns. Skills must work in isolation with the default session.
内部Agent使用
browse
CLI。针对此任务的关键命令:
  • browse stop
    — 终止现有会话(切换到remote模式前务必执行)
  • browse env remote
    — 启动新的Browserbase云端会话
  • browse newpage <url>
    — 在新标签页打开URL(remote模式下必需——
    browse open
    会提示“no page available”)
  • browse open <url>
    — 在现有标签页导航(仅local模式可用)
  • browse wait load
    — 等待页面加载完成
  • browse wait timeout <ms>
    — 等待固定时长,用于处理加载动画或过渡效果
  • browse wait selector "<selector>"
    — 等待元素变为可见
  • browse get title
    — 验证是否处于正确页面
  • browse get text body
    — 提取所有可见文本(内容提取的首选方式)
  • browse snapshot
    — 获取无障碍树;每个节点在
    [X-Y]
    格式中有一个引用(例如
    [0-5]
    [2-147]
  • browse click [X-Y]
    — 通过最新快照中的引用点击元素(需包含方括号)
切勿在SKILL.md中使用
--session <name>
参数。
命名会话是并行运行的临时解决方案——会将基础设施相关的内容混入技能文档。技能必须能够独立运行,使用默认会话。

Workflow

工作流

Step 1 — Start session

步骤1 — 启动会话

<exact browse commands in order>
<按顺序排列的精确browse命令>

Step 2 — Navigate

步骤2 — 导航

<exact URL and verification steps>
<精确的URL及验证步骤>

Step 3 — Extract

步骤3 — 提取

<exact extraction commands>
<精确的提取命令>

Step 4 — Output

步骤4 — 输出

<what JSON to emit, referencing the schema below>
<需生成的JSON内容,参考下方的 schema>

Site-Specific Gotchas

网站特定注意事项

<Bullet list of every hard-won heuristic from the iterations. This is the core value of the skill.>
<从迭代过程中总结出的所有关键启发式规则列表。这是该技能的核心价值所在。>

Failure Recovery

故障恢复

<What to do when navigation fails, session is contaminated, or extraction returns garbage>
<导航失败、会话异常或提取结果无效时的处理方式>

Expected Output

预期输出

json
<paste the exact expected output schema from task.md>

After writing the SKILL.md, confirm it's installed:
```bash
ls ~/.claude/skills/<task-name>/SKILL.md
The skill is now available as
/<task-name>
in Claude Code.

json
<粘贴task.md中精确的预期输出schema>

写入SKILL.md后,确认已安装成功:
```bash
ls ~/.claude/skills/<task-name>/SKILL.md
该技能现在可在Claude Code中通过
/<task-name>
调用。

Final report (multi-task mode)

最终报告(多任务模式)

After all sub-agents complete, print a markdown table:
TaskIterationsFinal StatusGraduatedCost
google-flights5✅ passyes$0.42
amazon-add-to-cart5❌ failno$1.20
Then write a persistent session report to
./autobrowse/reports/
so there's a durable record of the run inside the workspace:
bash
mkdir -p ./autobrowse/reports
Write the file
./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md
with:
markdown
undefined
所有子Agent完成后,打印markdown表格:
任务迭代次数最终状态是否已毕业成本
google-flights5✅ 通过$0.42
amazon-add-to-cart5❌ 失败$1.20
然后将持久化的会话报告写入
./autobrowse/reports/
目录,以便在工作区内保留本次运行的永久记录:
bash
mkdir -p ./autobrowse/reports
创建文件
./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md
,内容如下:
markdown
undefined

AutoBrowse Session Report

AutoBrowse会话报告

Date: <ISO date> Tasks: <comma-separated list> Environment: remote|local Total cost: $X.XX
日期: <ISO格式日期> 任务: <逗号分隔的任务列表> 环境: remote|local 总成本: $X.XX

Results

结果

TaskIterationsPass RateFinal StatusGraduatedCost
......X/5✅/❌yes/no$X.XX
任务迭代次数通过率最终状态是否已毕业成本
......X/5✅/❌是/否$X.XX

Per-Task Learnings

各任务总结

<task-name>

<task-name>

  • Key insight 1: <what the agent learned>
  • Key insight 2: <another heuristic>
  • Failure mode fixed: <what was failing and how it was resolved>
  • 关键洞察1: <Agent学到的内容>
  • 关键洞察2: <另一条启发式规则>
  • 修复的故障模式: <之前的故障点及解决方式>

Iteration Log

迭代日志

<task-name>

<task-name>

IterTurnsCostStatusHypothesis tested
179$18.75❌ failbaseline
29$0.26✅ passsession contamination fix
...............

---
迭代次数交互轮次成本状态测试的假设
179$18.75❌ 失败基准测试
29$0.26✅ 通过会话异常修复
...............

---

Rules

规则

  • Only edit
    strategy.md
    — never touch
    task.md
    (unless creating it from the template) or
    evaluate.mjs
  • Stay in the workspace — all training writes go to
    ./autobrowse/
    , never to
    ~/.claude/skills/autobrowse/
    . The skill source is read-only.
  • One hypothesis per iteration — test one change at a time
  • Build on wins — keep what worked, add to it
  • Trust the trace — the inner agent shows exactly what it saw and did
  • Graduate to
    ~/.claude/skills/
    — the only file you write there is the final graduated
    SKILL.md
  • 仅编辑
    strategy.md
    — 切勿修改
    task.md
    (除非从模板创建)或
    evaluate.mjs
  • 保持在工作区内操作 — 所有训练相关的写入操作都指向
    ./autobrowse/
    ,切勿写入
    ~/.claude/skills/autobrowse/
    。技能源码为只读状态。
  • 每次迭代仅测试一个假设 — 一次只做一处修改
  • 基于成功经验构建 — 保留有效的内容,在此基础上扩展
  • 信任执行轨迹 — 内部Agent会准确展示其所见及所执行的操作
  • 毕业到
    ~/.claude/skills/
    — 仅在该目录下写入最终的毕业文件
    SKILL.md