Loading...
Loading...
Self-improving browser automation via the auto-research loop. Iteratively runs a browsing task, reads the trace, and improves the navigation skill (strategy.md) until it reliably passes. Supports parallel runs across multiple tasks using sub-agents. Use when you want to build or improve browser automation skills for specific website tasks.
npx skill4agent add browserbase/skills autobrowseevaluate.tsstrategy.md/autobrowse --task google-flights
/autobrowse --task google-flights --iterations 10 --env remote
/autobrowse --tasks google-flights,amazon-add-to-cart
/autobrowse --all
# Also fine — parse freely:
/autobrowse https://flights.google.com/
/autobrowse book a flight on delta.com
/autobrowse fix the existing google-flights skill--task <name>${WORKSPACE}/tasks/${WORKSPACE}/tasks/<name>/task.md${CLAUDE_SKILL_DIR}/references/example-task.md--task <name>--tasks a,b,c--all--iterations N--env local|remote~/.claude/skills/${CWD}/autobrowse/mkdir -p ./autobrowse/tasks ./autobrowse/traces ./autobrowse/reports./autobrowse/tasks/<task>/task.mdmkdir -p ./autobrowse/tasks/<task>
cp ${CLAUDE_SKILL_DIR}/references/example-task.md ./autobrowse/tasks/<task>/task.md
# Then edit task.md to describe the URL, inputs, steps, and expected JSON output${CLAUDE_SKILL_DIR}./autobrowse/~/.claude/skills/<task>/SKILL.mdls ./autobrowse/tasks/"You are running the autobrowse skill for task. Workspace:<name>(e.g.<absolute-path-to-workspace>). Run/path/to/project/autobrowseiterations of: evaluate → read trace → improve strategy.md → repeat. Use<N>. Pass--env <env>to every evaluate.mjs invocation. Follow the autobrowse loop instructions exactly.--workspace <workspace>When graduating, install the skill towith proper agentskills frontmatter (name + description). Do not just copy strategy.md — write a self-contained skill.~/.claude/skills/<task-name>/SKILL.mdAt the end, output a structured summary with: task name, pass/fail on final run, total cumulative cost, iterations completed, per-iteration table (iter number, turns, cost, status, hypothesis tested), and 2-3 bullet key learnings."
./autobrowse/tasks/<task>/task.mdstrategy.mdANTHROPIC_API_KEY.envevaluate.mjsnode ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse
# or for bot-protected sites:
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote./autobrowse/traces/<task>/latest/cat ./autobrowse/traces/<task-name>/latest/summary.md./autobrowse/traces/<task-name>/latest/trace.json/pay-invoice/browse fill #field_3 valuebrowse typebrowse wait timeout 2000./autobrowse/tasks/<task-name>/strategy.md~/.claude/skills/<task-name>/SKILL.mdmkdir -p ~/.claude/skills/<task-name>---
name: <task-name>
description: <1-2 sentences describing what this skill does and when to use it. Include trigger keywords.>
---
# <Task Title> — Browser Skill
## Purpose
<1-2 sentences: what this automates and why it exists.>
## When to Use
<When should someone reach for this skill.>
## Browse CLI Reference
The inner agent uses the `browse` CLI. Key commands for this task:
- `browse stop` — kill existing session (always run before switching to remote)
- `browse env remote` — start a fresh Browserbase cloud session
- `browse newpage <url>` — open URL in a new tab (required in remote mode — `browse open` fails with "no page available")
- `browse open <url>` — navigate existing tab (local mode only)
- `browse wait load` — wait for page to finish loading
- `browse wait timeout <ms>` — wait a fixed amount of time for spinners or animations
- `browse wait selector "<selector>"` — wait for an element to become visible
- `browse get title` — verify you're on the right page
- `browse get text body` — extract all visible text (preferred for content extraction)
- `browse snapshot` — get accessibility tree; each node has a ref in `[X-Y]` format (e.g. `[0-5]`, `[2-147]`)
- `browse click [X-Y]` — click element by ref from the latest snapshot (include the brackets)
**Never use `--session <name>` flags in SKILL.md.** Named sessions are a parallel-run workaround — they contaminate skills with infrastructure concerns. Skills must work in isolation with the default session.
## Workflow
### Step 1 — Start session
<exact browse commands in order>
### Step 2 — Navigate
<exact URL and verification steps>
### Step 3 — Extract
<exact extraction commands>
### Step 4 — Output
<what JSON to emit, referencing the schema below>
## Site-Specific Gotchas
<Bullet list of every hard-won heuristic from the iterations. This is the core value of the skill.>
## Failure Recovery
<What to do when navigation fails, session is contaminated, or extraction returns garbage>
## Expected Output
```json
<paste the exact expected output schema from task.md>
After writing the SKILL.md, confirm it's installed:
```bash
ls ~/.claude/skills/<task-name>/SKILL.md/<task-name>| Task | Iterations | Final Status | Graduated | Cost |
|---|---|---|---|---|
| google-flights | 5 | ✅ pass | yes | $0.42 |
| amazon-add-to-cart | 5 | ❌ fail | no | $1.20 |
./autobrowse/reports/mkdir -p ./autobrowse/reports./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md# AutoBrowse Session Report
**Date:** <ISO date>
**Tasks:** <comma-separated list>
**Environment:** remote|local
**Total cost:** $X.XX
## Results
| Task | Iterations | Pass Rate | Final Status | Graduated | Cost |
|------|-----------|-----------|--------------|-----------|------|
| ... | ... | X/5 | ✅/❌ | yes/no | $X.XX |
## Per-Task Learnings
### <task-name>
- **Key insight 1:** <what the agent learned>
- **Key insight 2:** <another heuristic>
- **Failure mode fixed:** <what was failing and how it was resolved>
## Iteration Log
### <task-name>
| Iter | Turns | Cost | Status | Hypothesis tested |
|------|-------|------|--------|-------------------|
| 1 | 79 | $18.75 | ❌ fail | baseline |
| 2 | 9 | $0.26 | ✅ pass | session contamination fix |
| ... | ... | ... | ... | ... |strategy.mdtask.mdevaluate.mjs./autobrowse/~/.claude/skills/autobrowse/~/.claude/skills/SKILL.md