autobrowse
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAutoBrowse — Self-Improving Browser Skill
AutoBrowse — 自我改进型浏览器技能
Build reliable browser automation skills through iterative experimentation. An inner agent browses the site (). You — the outer agent — read what happened and improve the instructions (). Repeat until it passes consistently.
evaluate.tsstrategy.md通过迭代实验构建可靠的浏览器自动化技能。内部Agent负责浏览网站()。你作为外部Agent,读取执行过程并优化操作指令()。重复此流程直至任务持续通过。
evaluate.tsstrategy.mdEntry Points
入口方式
Invocation is flexible — both explicit flags and free-form natural language work:
/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --all调用方式灵活多样,既支持明确的参数标识,也支持自由形式的自然语言:
/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --allAlso fine — parse freely:
以下形式同样可行——支持自由解析:
/autobrowse https://flights.google.com/
/autobrowse book a flight on delta.com
/autobrowse fix the existing google-flights skill
When the user drops a URL or free-form instruction instead of `--task <name>`:
- If an existing task in `${WORKSPACE}/tasks/` clearly matches the site/intent, use it.
- Otherwise, pick a short kebab-case name, create `${WORKSPACE}/tasks/<name>/task.md` from `${CLAUDE_SKILL_DIR}/references/example-task.md`, fill in the URL/goal based on what the user said, and proceed. Tell the user the chosen name in one line.
---/autobrowse https://flights.google.com/
/autobrowse book a flight on delta.com
/autobrowse fix the existing google-flights skill
当用户传入URL或自由形式指令而非`--task <name>`时:
- 如果`${WORKSPACE}/tasks/`目录下已有与该网站/意图明确匹配的任务,则直接使用该任务。
- 否则,选择一个简短的短横线命名格式(kebab-case),从`${CLAUDE_SKILL_DIR}/references/example-task.md`复制模板创建`${WORKSPACE}/tasks/<name>/task.md`,根据用户输入填充URL和目标,然后继续执行。用一行文字告知用户所选的任务名称。
---How to run
运行步骤
Step 1 — Parse arguments and orient
步骤1 — 解析参数并定位任务
Check what was passed:
- → single task mode
--task <name> - or
--tasks a,b,c→ multi-task mode (spawn sub-agents)--all - → how many evaluate → improve cycles (default: 5)
--iterations N - → browser environment (default: local; use remote for bot-protected sites)
--env local|remote
If the user passed free-form text instead, map it to one of the above before continuing.
检查传入的参数:
- → 单任务模式
--task <name> - 或
--tasks a,b,c→ 多任务模式(生成子Agent)--all - → 执行“评估→优化”循环的次数(默认值:5)
--iterations N - → 浏览器运行环境(默认值:local;针对有反机器人保护的网站使用remote)
--env local|remote
如果用户传入的是自由形式文本,先将其映射为上述模式之一再继续。
Step 2 — Set up the workspace
步骤2 — 搭建工作区
All training artifacts (task definitions, strategy iterations, traces, reports) live in a workspace directory in the current working directory — NOT inside . This keeps the inner agent's file writes out of Claude's home dir and away from permission friction.
~/.claude/skills/Default workspace:
${CWD}/autobrowse/bash
mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reportsIf the task directory () doesn't exist yet, scaffold it:
./autobrowse/tasks/<task>/task.mdbash
mkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.md所有训练产物(任务定义、策略迭代版本、执行轨迹、报告)都存储在当前工作目录下的工作区目录中——而非内部。这样可以避免内部Agent的文件写入操作进入Claude的主目录,减少权限相关问题。
~/.claude/skills/默认工作区:
${CWD}/autobrowse/bash
mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reports如果任务目录()尚未存在,需先搭建基础结构:
./autobrowse/tasks/<task>/task.mdbash
mkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.mdThen edit task.md to describe the URL, inputs, steps, and expected JSON output
然后编辑task.md,描述URL、输入信息、步骤和预期的JSON输出
The skill source at `${CLAUDE_SKILL_DIR}` stays read-only — only `./autobrowse/` in CWD gets written to during training. Graduation (final step) writes a single file to `~/.claude/skills/<task>/SKILL.md`.
List available tasks:
```bash
ls ./autobrowse/tasks/
`${CLAUDE_SKILL_DIR}`下的技能源码保持只读状态——训练过程中仅对当前工作目录下的`./autobrowse/`进行写入操作。最终毕业环节(最后一步)会将单个文件写入`~/.claude/skills/<task>/SKILL.md`。
查看可用任务:
```bash
ls ./autobrowse/tasks/Step 3 — Multi-task: spawn parallel sub-agents
步骤3 — 多任务模式:并行生成子Agent
If running multiple tasks, use the Agent tool to spawn one sub-agent per task simultaneously. Each sub-agent receives a self-contained prompt to run the full autobrowse loop for its task:
"You are running the autobrowse skill for task. Workspace:<name>(e.g.<absolute-path-to-workspace>). Run/path/to/project/autobrowseiterations of: evaluate → read trace → improve strategy.md → repeat. Use<N>. Pass--env <env>to every evaluate.mjs invocation. Follow the autobrowse loop instructions exactly.--workspace <workspace>When graduating, install the skill towith proper agentskills frontmatter (name + description). Do not just copy strategy.md — write a self-contained skill.~/.claude/skills/<task-name>/SKILL.mdAt the end, output a structured summary with: task name, pass/fail on final run, total cumulative cost, iterations completed, per-iteration table (iter number, turns, cost, status, hypothesis tested), and 2-3 bullet key learnings."
Spawn all sub-agents in parallel, wait for all to complete, then collect their summaries and write the session report.
For single task, skip this step and run the loop directly below.
如果运行多个任务,使用Agent工具为每个任务同时生成一个子Agent。每个子Agent会收到一个独立的提示,用于为其负责的任务执行完整的AutoBrowse循环:
"你正在为任务运行AutoBrowse技能。工作区路径:<name>(例如<absolute-path-to-workspace>)。执行/path/to/project/autobrowse次“评估→读取轨迹→优化strategy.md→重复”的循环。使用<N>参数。在每次调用evaluate.mjs时传入--env <env>参数。严格遵循AutoBrowse循环的指令执行。--workspace <workspace>当完成技能毕业时,将技能安装到,并添加正确的agentskills前置信息(名称+描述)。不要直接复制strategy.md——需编写一个独立可用的技能文档。~/.claude/skills/<task-name>/SKILL.md任务结束后,输出结构化总结,包含:任务名称、最终运行结果(通过/失败)、累计总成本、完成的迭代次数、迭代详情表(迭代次数、交互轮次、成本、状态、测试的假设),以及2-3条关键要点总结。"
并行生成所有子Agent,等待全部完成后,收集它们的总结并编写会话报告。
单任务模式下跳过此步骤,直接运行下方的循环流程。
The Loop (run this for each task)
循环流程(针对每个任务执行)
Iteration start
迭代开始
Check that exists (scaffold it from the template if not — see Step 2). is auto-created empty by the harness on first run.
./autobrowse/tasks/<task>/task.mdstrategy.md确认已存在(如果不存在,从模板搭建——参见步骤2)。首次运行时,harness会自动创建空的文件。
./autobrowse/tasks/<task>/task.mdstrategy.mdRequirements
前置要求
- must be in the environment (or in a
ANTHROPIC_API_KEYfile in CWD —.envauto-loads it). If missing, the harness prints a clear error and exits; don't hunt for keys in other paths.evaluate.mjs
- 环境中必须配置(或在当前工作目录的
ANTHROPIC_API_KEY文件中配置——.env会自动加载)。如果缺失,harness会打印清晰的错误信息并退出;无需在其他路径中查找密钥。evaluate.mjs
Run the inner agent
运行内部Agent
bash
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowsebash
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowseor for bot-protected sites:
针对有反机器人保护的网站:
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote
This runs the browser session and writes a full trace to `./autobrowse/traces/<task>/latest/`.node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote
此命令会运行浏览器会话,并将完整的执行轨迹写入`./autobrowse/traces/<task>/latest/`目录。Read the trace
读取执行轨迹
bash
cat ./autobrowse/traces/<task-name>/latest/summary.mdThe summary has duration, cost, turns, the decision log, and the final JSON output.
If the agent failed or got stuck, look deeper:
- Read — search for the failure turn
./autobrowse/traces/<task-name>/latest/trace.json - Read screenshots around the failure point with the Read tool
bash
cat ./autobrowse/traces/<task-name>/latest/summary.md总结内容包含执行时长、成本、交互轮次、决策日志和最终的JSON输出。
如果Agent执行失败或陷入停滞,需深入排查:
- 读取——搜索失败对应的交互轮次
./autobrowse/traces/<task-name>/latest/trace.json - 使用Read工具查看失败点附近的截图
Form one hypothesis
提出一个假设
Find the exact turn where things went wrong. What single heuristic would have prevented it?
Examples:
- "After clicking the dropdown, wait 1s — options animate in before they're clickable"
- "Navigate directly to — skip the landing page entirely"
/pay-invoice/ - "Use not
browse fill #field_3 value— this field clears on focus"browse type - "The page shows a spinner at turn 8 — add before snapshot"
browse wait timeout 2000
找出问题出现的确切交互轮次。什么样的单一启发式规则可以避免该问题?
示例:
- "点击下拉菜单后,等待1秒——选项会在动画完成后才可点击"
- "直接导航到——完全跳过着陆页"
/pay-invoice/ - "使用而非
browse fill #field_3 value——该字段在获取焦点时会清空"browse type - "第8轮页面显示加载动画——在截图前添加"
browse wait timeout 2000
Update strategy.md
更新strategy.md
Edit . Keep everything that worked. Fix the specific failure. Add a concrete heuristic.
./autobrowse/tasks/<task-name>/strategy.mdGood strategies have:
- Fast path: direct URL or shortcuts to skip exploration
- Step-by-step workflow: exact sequence with timing notes
- Site-specific knowledge: selector IDs, form field names, success indicators
- Failure recovery: what to do when X goes wrong
编辑。保留所有有效的内容,修复特定的失败点,添加具体的启发式规则。
./autobrowse/tasks/<task-name>/strategy.md优质策略应包含:
- 快速路径:直接URL或快捷方式,跳过探索环节
- 分步工作流:包含时间节点的精确步骤序列
- 网站特定知识:选择器ID、表单字段名称、成功标识
- 故障恢复:当X问题出现时的处理方式
Judge the result
评估结果
Read the new summary. Did it pass? Make clear progress?
- Pass or progress → keep, next iteration
- No progress or regression → revert strategy.md to the previous version and try a different hypothesis
读取新的总结。任务是否通过?是否取得明显进展?
- 通过或有进展 → 保留当前策略,进入下一次迭代
- 无进展或出现倒退 → 将strategy.md回滚到上一版本,尝试其他假设
After all iterations — publish if ready
所有迭代完成后——若就绪则发布
If the task passed on 2+ of the last 3 iterations or has reached the max iteration limit, install it as a Claude Code skill. Do not just copy strategy.md — the skill must be self-contained and useful to someone who has never seen this codebase. If graduating at max iterations without a clean pass, note the known failure point but still document everything learned.
Install by writing to :
~/.claude/skills/<task-name>/SKILL.mdbash
mkdir -p ~/.claude/skills/<task-name>Use this structure for the SKILL.md:
markdown
---
name: <task-name>
description: <1-2 sentences describing what this skill does and when to use it. Include trigger keywords.>
---如果任务在最后3次迭代中有2次及以上通过或已达到最大迭代次数,则将其安装为Claude Code技能。不要直接复制strategy.md——技能文档必须独立可用,便于从未接触过此代码库的用户使用。如果达到最大迭代次数但未完全通过,需记录已知的失败点,但仍需记录所有学到的内容。
通过写入完成安装:
~/.claude/skills/<task-name>/SKILL.mdbash
mkdir -p ~/.claude/skills/<task-name>SKILL.md需遵循以下结构:
markdown
---
name: <task-name>
description: <1-2句话描述该技能的功能及适用场景,包含触发关键词。>
---<Task Title> — Browser Skill
<任务标题> — 浏览器技能
Purpose
用途
<1-2 sentences: what this automates and why it exists.>
<1-2句话:该技能自动化的内容及存在意义。>
When to Use
适用场景
<When should someone reach for this skill.>
<何时应使用该技能。>
Browse CLI Reference
Browse CLI参考
The inner agent uses the CLI. Key commands for this task:
browse- — kill existing session (always run before switching to remote)
browse stop - — start a fresh Browserbase cloud session
browse env remote - — open URL in a new tab (required in remote mode —
browse newpage <url>fails with "no page available")browse open - — navigate existing tab (local mode only)
browse open <url> - — wait for page to finish loading
browse wait load - — wait a fixed amount of time for spinners or animations
browse wait timeout <ms> - — wait for an element to become visible
browse wait selector "<selector>" - — verify you're on the right page
browse get title - — extract all visible text (preferred for content extraction)
browse get text body - — get accessibility tree; each node has a ref in
browse snapshotformat (e.g.[X-Y],[0-5])[2-147] - — click element by ref from the latest snapshot (include the brackets)
browse click [X-Y]
Never use flags in SKILL.md. Named sessions are a parallel-run workaround — they contaminate skills with infrastructure concerns. Skills must work in isolation with the default session.
--session <name>内部Agent使用 CLI。针对此任务的关键命令:
browse- — 终止现有会话(切换到remote模式前务必执行)
browse stop - — 启动新的Browserbase云端会话
browse env remote - — 在新标签页打开URL(remote模式下必需——
browse newpage <url>会提示“no page available”)browse open - — 在现有标签页导航(仅local模式可用)
browse open <url> - — 等待页面加载完成
browse wait load - — 等待固定时长,用于处理加载动画或过渡效果
browse wait timeout <ms> - — 等待元素变为可见
browse wait selector "<selector>" - — 验证是否处于正确页面
browse get title - — 提取所有可见文本(内容提取的首选方式)
browse get text body - — 获取无障碍树;每个节点在
browse snapshot格式中有一个引用(例如[X-Y]、[0-5])[2-147] - — 通过最新快照中的引用点击元素(需包含方括号)
browse click [X-Y]
切勿在SKILL.md中使用参数。 命名会话是并行运行的临时解决方案——会将基础设施相关的内容混入技能文档。技能必须能够独立运行,使用默认会话。
--session <name>Workflow
工作流
Step 1 — Start session
步骤1 — 启动会话
<exact browse commands in order>
<按顺序排列的精确browse命令>
Step 2 — Navigate
步骤2 — 导航
<exact URL and verification steps>
<精确的URL及验证步骤>
Step 3 — Extract
步骤3 — 提取
<exact extraction commands>
<精确的提取命令>
Step 4 — Output
步骤4 — 输出
<what JSON to emit, referencing the schema below>
<需生成的JSON内容,参考下方的 schema>
Site-Specific Gotchas
网站特定注意事项
<Bullet list of every hard-won heuristic from the iterations. This is the core value of the skill.>
<从迭代过程中总结出的所有关键启发式规则列表。这是该技能的核心价值所在。>
Failure Recovery
故障恢复
<What to do when navigation fails, session is contaminated, or extraction returns garbage>
<导航失败、会话异常或提取结果无效时的处理方式>
Expected Output
预期输出
json
<paste the exact expected output schema from task.md>
After writing the SKILL.md, confirm it's installed:
```bash
ls ~/.claude/skills/<task-name>/SKILL.mdThe skill is now available as in Claude Code.
/<task-name>json
<粘贴task.md中精确的预期输出schema>
写入SKILL.md后,确认已安装成功:
```bash
ls ~/.claude/skills/<task-name>/SKILL.md该技能现在可在Claude Code中通过调用。
/<task-name>Final report (multi-task mode)
最终报告(多任务模式)
After all sub-agents complete, print a markdown table:
| Task | Iterations | Final Status | Graduated | Cost |
|---|---|---|---|---|
| google-flights | 5 | ✅ pass | yes | $0.42 |
| amazon-add-to-cart | 5 | ❌ fail | no | $1.20 |
Then write a persistent session report to so there's a durable record of the run inside the workspace:
./autobrowse/reports/bash
mkdir -p ./autobrowse/reportsWrite the file with:
./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.mdmarkdown
undefined所有子Agent完成后,打印markdown表格:
| 任务 | 迭代次数 | 最终状态 | 是否已毕业 | 成本 |
|---|---|---|---|---|
| google-flights | 5 | ✅ 通过 | 是 | $0.42 |
| amazon-add-to-cart | 5 | ❌ 失败 | 否 | $1.20 |
然后将持久化的会话报告写入目录,以便在工作区内保留本次运行的永久记录:
./autobrowse/reports/bash
mkdir -p ./autobrowse/reports创建文件,内容如下:
./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.mdmarkdown
undefinedAutoBrowse Session Report
AutoBrowse会话报告
Date: <ISO date>
Tasks: <comma-separated list>
Environment: remote|local
Total cost: $X.XX
日期: <ISO格式日期>
任务: <逗号分隔的任务列表>
环境: remote|local
总成本: $X.XX
Results
结果
| Task | Iterations | Pass Rate | Final Status | Graduated | Cost |
|---|---|---|---|---|---|
| ... | ... | X/5 | ✅/❌ | yes/no | $X.XX |
| 任务 | 迭代次数 | 通过率 | 最终状态 | 是否已毕业 | 成本 |
|---|---|---|---|---|---|
| ... | ... | X/5 | ✅/❌ | 是/否 | $X.XX |
Per-Task Learnings
各任务总结
<task-name>
<task-name>
- Key insight 1: <what the agent learned>
- Key insight 2: <another heuristic>
- Failure mode fixed: <what was failing and how it was resolved>
- 关键洞察1: <Agent学到的内容>
- 关键洞察2: <另一条启发式规则>
- 修复的故障模式: <之前的故障点及解决方式>
Iteration Log
迭代日志
<task-name>
<task-name>
| Iter | Turns | Cost | Status | Hypothesis tested |
|---|---|---|---|---|
| 1 | 79 | $18.75 | ❌ fail | baseline |
| 2 | 9 | $0.26 | ✅ pass | session contamination fix |
| ... | ... | ... | ... | ... |
---| 迭代次数 | 交互轮次 | 成本 | 状态 | 测试的假设 |
|---|---|---|---|---|
| 1 | 79 | $18.75 | ❌ 失败 | 基准测试 |
| 2 | 9 | $0.26 | ✅ 通过 | 会话异常修复 |
| ... | ... | ... | ... | ... |
---Rules
规则
- Only edit — never touch
strategy.md(unless creating it from the template) ortask.mdevaluate.mjs - Stay in the workspace — all training writes go to , never to
./autobrowse/. The skill source is read-only.~/.claude/skills/autobrowse/ - One hypothesis per iteration — test one change at a time
- Build on wins — keep what worked, add to it
- Trust the trace — the inner agent shows exactly what it saw and did
- Graduate to — the only file you write there is the final graduated
~/.claude/skills/SKILL.md
- 仅编辑— 切勿修改
strategy.md(除非从模板创建)或task.mdevaluate.mjs - 保持在工作区内操作 — 所有训练相关的写入操作都指向,切勿写入
./autobrowse/。技能源码为只读状态。~/.claude/skills/autobrowse/ - 每次迭代仅测试一个假设 — 一次只做一处修改
- 基于成功经验构建 — 保留有效的内容,在此基础上扩展
- 信任执行轨迹 — 内部Agent会准确展示其所见及所执行的操作
- 毕业到— 仅在该目录下写入最终的毕业文件
~/.claude/skills/SKILL.md