agent-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

agent-browser — AI Browser Automation CLI

agent-browser — AI 浏览器自动化 CLI

Rust-native CLI by Vercel Labs. Persistent daemon per session, CDP-based, deterministic ref selectors (
@e1
,
@e2
), JSON output. Built for AI agents.
Repo: github.com/vercel-labs/agent-browser | License: Apache 2.0
Vercel Labs 推出的原生Rust CLI。每个会话对应持久化守护进程,基于CDP实现,支持确定性引用选择器(
@e1
@e2
)、JSON输出,专为AI Agent打造。
仓库地址: github.com/vercel-labs/agent-browser | 许可证: Apache 2.0

Install

安装

bash
undefined
bash
undefined

npm (recommended)

npm(推荐方式)

npm install -g agent-browser agent-browser install # downloads Chrome for Testing
npm install -g agent-browser agent-browser install # 下载Chrome for Testing

homebrew

homebrew

brew install agent-browser && agent-browser install
brew install agent-browser && agent-browser install

cargo

cargo

cargo install agent-browser && agent-browser install
cargo install agent-browser && agent-browser install

Linux — also install system deps

Linux 还需要安装系统依赖

agent-browser install --with-deps

Upgrade: `agent-browser upgrade`
agent-browser install --with-deps

升级命令:`agent-browser upgrade`

Core Concept: Refs

核心概念:引用(Refs)

Every
snapshot
assigns deterministic refs (
@e1
,
@e2
, ...) to DOM elements. Use refs instead of CSS selectors — they're stable within a snapshot, semantic, and LLM-friendly.
bash
agent-browser open https://example.com
agent-browser snapshot -i          # interactive elements only
每次
snapshot
操作都会为DOM元素分配确定性的引用(
@e1
@e2
...)。你可以使用引用替代CSS选择器——它们在同一次快照内是稳定的、语义化的,且对LLM友好。
bash
agent-browser open https://example.com
agent-browser snapshot -i          # 仅提取可交互元素

Output:

输出:

- heading "Example" [ref=e1]

- 标题 "Example" [ref=e1]

- link "More info" [ref=e2]

- 链接 "More info" [ref=e2]

- button "Submit" [ref=e3]

- 按钮 "Submit" [ref=e3]

agent-browser click @e3 # click Submit

**After any DOM change (navigation, AJAX, form submit) — re-snapshot to get fresh refs.**
agent-browser click @e3 # 点击提交按钮

**DOM发生任何变更后(导航、AJAX、表单提交)——请重新生成快照以获取最新的引用。**

Essential Commands

核心命令

Navigation

导航

bash
agent-browser open <url>
agent-browser back | forward | reload
bash
agent-browser open <url>
agent-browser back | forward | reload

Snapshot (primary way to "see" the page)

快照(页面「查看」的核心方式)

bash
agent-browser snapshot              # full accessibility tree
agent-browser snapshot -i           # interactive elements only (buttons, inputs, links)
agent-browser snapshot -c           # compact (remove empty nodes)
agent-browser snapshot -d 3         # limit depth
agent-browser snapshot -s "#main"   # scope to selector
agent-browser snapshot --json       # machine-readable
bash
agent-browser snapshot              # 完整可访问性树
agent-browser snapshot -i           # 仅提取可交互元素(按钮、输入框、链接)
agent-browser snapshot -c           # 紧凑模式(移除空节点)
agent-browser snapshot -d 3         # 限制遍历深度
agent-browser snapshot -s "#main"   # 限定选择器范围
agent-browser snapshot --json       # 机器可读格式输出

Interaction

交互

bash
agent-browser click <ref|selector>
agent-browser fill <ref> "text"         # clear + type (for inputs)
agent-browser type <ref> "text"         # type without clearing
agent-browser select <ref> "option"     # dropdown
agent-browser check|uncheck <ref>       # checkbox
agent-browser hover <ref>
agent-browser press Enter|Tab|Escape    # keyboard
agent-browser press Control+a           # key combo
agent-browser scroll down 500           # scroll page
agent-browser upload <ref> file.pdf     # file upload
bash
agent-browser click <ref|selector>
agent-browser fill <ref> "text"         # 清空后输入(适用于输入框)
agent-browser type <ref> "text"         # 不清空直接输入
agent-browser select <ref> "option"     # 下拉框选择
agent-browser check|uncheck <ref>       # 勾选/取消勾选复选框
agent-browser hover <ref>
agent-browser press Enter|Tab|Escape    # 键盘按键
agent-browser press Control+a           # 组合按键
agent-browser scroll down 500           # 页面滚动
agent-browser upload <ref> file.pdf     # 文件上传

Data Extraction

数据提取

bash
agent-browser get text <ref>        # element text
agent-browser get html <ref>        # innerHTML
agent-browser get value <ref>       # input value
agent-browser get attr <ref> href   # attribute
agent-browser get title             # page title
agent-browser get url               # current URL
bash
agent-browser get text <ref>        # 元素文本内容
agent-browser get html <ref>        # 元素innerHTML
agent-browser get value <ref>       # 输入框值
agent-browser get attr <ref> href   # 元素属性
agent-browser get title             # 页面标题
agent-browser get url               # 当前URL

Screenshot & PDF

截图与PDF

bash
agent-browser screenshot [path]         # to file or temp
agent-browser screenshot --full         # full page scroll capture
agent-browser screenshot --annotate     # numbered labels matching refs
agent-browser pdf output.pdf
bash
agent-browser screenshot [path]         # 保存到指定路径或临时文件
agent-browser screenshot --full         # 全页面滚动截图
agent-browser screenshot --annotate     # 添加与引用匹配的编号标注
agent-browser pdf output.pdf

Wait

等待

bash
agent-browser wait <ref>                    # wait for element visible
agent-browser wait 2000                     # wait ms
agent-browser wait --load networkidle       # wait for network idle
agent-browser wait --url "**/dashboard"     # wait for URL pattern
agent-browser wait --text "Success"         # wait for text
agent-browser wait <ref> --state hidden     # wait for element to disappear
bash
agent-browser wait <ref>                    # 等待元素可见
agent-browser wait 2000                     # 等待指定毫秒数
agent-browser wait --load networkidle       # 等待网络空闲
agent-browser wait --url "**/dashboard"     # 等待URL匹配指定规则
agent-browser wait --text "Success"         # 等待指定文本出现
agent-browser wait <ref> --state hidden     # 等待元素消失

Semantic Locators (alternative to refs)

语义定位器(引用的替代方案)

bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find placeholder "Search" fill "query"
agent-browser find testid "login-btn" click
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find placeholder "Search" fill "query"
agent-browser find testid "login-btn" click

Session Management

会话管理

bash
agent-browser session list              # active sessions
agent-browser --session myapp <cmd>     # named session
agent-browser close                     # close current
agent-browser close --all               # close all
bash
agent-browser session list              # 查看活跃会话
agent-browser --session myapp <cmd>     # 使用指定命名会话执行命令
agent-browser close                     # 关闭当前会话
agent-browser close --all               # 关闭所有会话

Tabs

标签页

bash
agent-browser tab                   # list tabs
agent-browser tab new [url]         # new tab
agent-browser tab 2                 # switch to tab 2
agent-browser tab close
bash
agent-browser tab                   # 列出所有标签页
agent-browser tab new [url]         # 新建标签页
agent-browser tab 2                 # 切换到第2个标签页
agent-browser tab close

AI Agent Workflow Pattern

AI Agent工作流模式

Standard loop for any AI agent:
bash
undefined
所有AI Agent的标准循环流程:
bash
undefined

1. Navigate

1. 导航

agent-browser open https://target.com
agent-browser open https://target.com

2. Observe (snapshot → LLM reads → decides action)

2. 观察(生成快照 → LLM读取内容 → 决定执行动作)

agent-browser snapshot -i --json
agent-browser snapshot -i --json

3. Act (LLM picks ref from snapshot)

3. 执行动作(LLM从快照中选择对应引用)

agent-browser fill @e2 "search query" agent-browser click @e3
agent-browser fill @e2 "search query" agent-browser click @e3

4. Wait for result

4. 等待结果加载

agent-browser wait --load networkidle
agent-browser wait --load networkidle

5. Re-observe (new snapshot after DOM changed)

5. 重新观察(DOM变更后生成新快照)

agent-browser snapshot -i --json
agent-browser snapshot -i --json

6. Extract or continue

6. 提取数据或继续执行

agent-browser get text @e5

Repeat 2-5 until task complete. Always `close` when done.
agent-browser get text @e5

重复步骤2-5直到任务完成,任务结束后务必执行`close`关闭会话。

Batch Execution

批量执行

When exact sequence is known upfront:
bash
cat << 'EOF' | agent-browser batch --json
[
  ["open", "https://example.com/login"],
  ["fill", "@e1", "user@example.com"],
  ["fill", "@e2", "password123"],
  ["click", "@e3"],
  ["wait", "--load", "networkidle"],
  ["screenshot", "result.png"]
]
EOF
--bail
stops on first error.
当你提前明确完整执行序列时可使用批量执行:
bash
cat << 'EOF' | agent-browser batch --json
[
  ["open", "https://example.com/login"],
  ["fill", "@e1", "user@example.com"],
  ["fill", "@e2", "password123"],
  ["click", "@e3"],
  ["wait", "--load", "networkidle"],
  ["screenshot", "result.png"]
]
EOF
添加
--bail
参数可在遇到第一个错误时停止执行。

Authentication & State Persistence

认证与状态持久化

bash
undefined
bash
undefined

Save credentials (encrypted vault — LLM never sees password)

保存凭证(加密存储 — LLM永远不会获取到密码)

echo "pass" | agent-browser auth save myapp
--url https://app.example.com/login
--username user@example.com --password-stdin
echo "pass" | agent-browser auth save myapp
--url https://app.example.com/login
--username user@example.com --password-stdin

Login with saved creds

使用保存的凭证登录

agent-browser auth login myapp
agent-browser auth login myapp

Auto-persist session (cookies, localStorage, IndexedDB)

自动持久化会话(Cookie、localStorage、IndexedDB)

agent-browser --session-name myapp open https://app.example.com
agent-browser --session-name myapp open https://app.example.com

... interact ...

... 执行交互操作 ...

agent-browser close # state auto-saved to ~/.agent-browser/sessions/
agent-browser close # 状态自动保存到 ~/.agent-browser/sessions/

Next run — auto-restored, already logged in

下次运行时自动恢复会话,保持登录状态

agent-browser --session-name myapp open https://app.example.com/dashboard

Manual state save/load:
```bash
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser --session-name myapp open https://app.example.com/dashboard

手动状态保存/加载:
```bash
agent-browser state save auth.json
agent-browser state load auth.json

Network Control

网络控制

bash
agent-browser network requests                  # list tracked requests
agent-browser network requests --type xhr,fetch # filter
agent-browser network route "**/analytics" --abort    # block tracking
agent-browser network route "**/api/*" --body '{"mock":true}'  # mock response
agent-browser network har start && agent-browser network har stop output.har
bash
agent-browser network requests                  # 列出已追踪的请求
agent-browser network requests --type xhr,fetch # 按请求类型过滤
agent-browser network route "**/analytics" --abort    # 拦截屏蔽追踪请求
agent-browser network route "**/api/*" --body '{"mock":true}'  #  mock响应
agent-browser network har start && agent-browser network har stop output.har

Device Emulation

设备模拟

bash
agent-browser set device "iPhone 14"
agent-browser set viewport 1920 1080        # desktop
agent-browser set viewport 1920 1080 2      # retina
agent-browser set media dark                # dark mode
agent-browser set geo 52.2297 21.0122       # Warsaw
bash
agent-browser set device "iPhone 14"
agent-browser set viewport 1920 1080        # 桌面端视口
agent-browser set viewport 1920 1080 2      # 视网膜屏视口
agent-browser set media dark                # 深色模式
agent-browser set geo 52.2297 21.0122       # 定位到华沙

Configuration

配置

Priority: CLI flags > env vars >
./agent-browser.json
>
~/.agent-browser/config.json
Key env vars:
bash
AGENT_BROWSER_SESSION=myapp           # default session
AGENT_BROWSER_HEADED=1                # show browser window
AGENT_BROWSER_EXECUTABLE_PATH=/path   # custom Chrome
AGENT_BROWSER_PROXY=http://host:port  # proxy
AGENT_BROWSER_DEFAULT_TIMEOUT=25000   # operation timeout (max 30000)
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000   # daemon auto-shutdown
AGENT_BROWSER_ENCRYPTION_KEY=<64hex>  # encrypt state files
AGENT_BROWSER_ALLOWED_DOMAINS=a.com,b.com  # restrict navigation
AGENT_BROWSER_MAX_OUTPUT=50000        # truncate output (prevent context flooding)
Config file (
agent-browser.json
):
json
{
  "headed": false,
  "proxy": "http://localhost:8080",
  "profile": "./browser-data",
  "userAgent": "my-agent/1.0",
  "screenshotDir": "./shots",
  "colorScheme": "dark"
}
Key CLI flags:
--json
,
--session <name>
,
--profile <name|path>
,
--headed
,
--proxy <url>
,
--ignore-https-errors
,
--annotate
,
--engine lightpanda
,
--provider <cloud>
,
--content-boundaries
,
--no-auto-dialog
,
--debug
优先级:CLI参数 > 环境变量 >
./agent-browser.json
>
~/.agent-browser/config.json
核心环境变量:
bash
AGENT_BROWSER_SESSION=myapp           # 默认会话
AGENT_BROWSER_HEADED=1                # 显示浏览器窗口
AGENT_BROWSER_EXECUTABLE_PATH=/path   # 自定义Chrome路径
AGENT_BROWSER_PROXY=http://host:port  # 代理配置
AGENT_BROWSER_DEFAULT_TIMEOUT=25000   # 操作超时时间(最大30000)
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000   # 守护进程自动关闭超时时间
AGENT_BROWSER_ENCRYPTION_KEY=<64hex>  # 状态文件加密密钥
AGENT_BROWSER_ALLOWED_DOMAINS=a.com,b.com  # 限制可访问域名
AGENT_BROWSER_MAX_OUTPUT=50000        # 截断输出(防止上下文溢出)
配置文件(
agent-browser.json
):
json
{
  "headed": false,
  "proxy": "http://localhost:8080",
  "profile": "./browser-data",
  "userAgent": "my-agent/1.0",
  "screenshotDir": "./shots",
  "colorScheme": "dark"
}
核心CLI参数:
--json
--session <name>
--profile <name|path>
--headed
--proxy <url>
--ignore-https-errors
--annotate
--engine lightpanda
--provider <cloud>
--content-boundaries
--no-auto-dialog
--debug

Safety Features for AI

面向AI的安全特性

bash
--content-boundaries          # wrap untrusted page content in markers
--max-output 50000            # prevent context window flooding
--allowed-domains a.com,b.com # restrict navigation to trusted domains
--action-policy policy.json   # gate destructive actions
--confirm-actions eval,download  # require approval for sensitive ops
bash
--content-boundaries          # 为不可信页面内容添加边界标记
--max-output 50000            # 防止上下文窗口溢出
--allowed-domains a.com,b.com # 限制仅可访问可信域名
--action-policy policy.json   # 管控高风险操作
--confirm-actions eval,download  # 敏感操作需要人工确认

Cloud Providers

云服务商支持

For scalable/CI deployments:
--provider <name>
Supported: AgentCore, Browserbase, Browserless, BrowserUse, Kernel
用于可扩展/CI部署场景:
--provider <name>
支持的服务商:AgentCore、Browserbase、Browserless、BrowserUse、Kernel

Gotchas

注意事项

  1. Refs invalidate on DOM change — always re-snapshot after navigation/AJAX/form submit
  2. Daemon persists — run
    agent-browser close
    explicitly to avoid orphaned processes
  3. networkidle unreliable on SPAs — prefer
    wait --text "X"
    or
    wait --url "pattern"
  4. Timeout max 30s — keep
    DEFAULT_TIMEOUT
    under 30000ms
  5. State files contain tokens — use
    ENCRYPTION_KEY
    or add to
    .gitignore
  6. Shadow DOM not in snapshots — only light DOM visible
  7. Large pages — use
    snapshot -i -c -d 3
    to limit output size
  8. Chrome profile lock — close Chrome before using
    --profile
    with same profile
  1. DOM变更后引用会失效 — 导航/AJAX/表单提交后务必重新生成快照
  2. 守护进程会持续运行 — 显式执行
    agent-browser close
    避免残留进程
  3. SPA场景下networkidle不可靠 — 优先使用
    wait --text "X"
    wait --url "pattern"
  4. 最大超时时间为30秒 — 保持
    DEFAULT_TIMEOUT
    低于30000ms
  5. 状态文件包含令牌 — 使用
    ENCRYPTION_KEY
    加密或将状态文件加入
    .gitignore
  6. 快照不包含Shadow DOM — 仅可识别light DOM内容
  7. 大页面场景 — 使用
    snapshot -i -c -d 3
    限制输出大小
  8. Chrome profile锁定问题 — 使用
    --profile
    前请关闭对应profile的Chrome进程