agent-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent-Browser Driver

Agent-Browser 驱动

The orchestrator routed you here. Use these mechanics to execute your plan.
Control web pages and Electron desktop apps via the
agent-browser
CLI. Uses Playwright under the hood with a headless Chromium instance managed by a background daemon.
编排器将您引导至此。使用这些机制来执行您的计划。
通过
agent-browser
CLI控制网页和Electron桌面应用。底层基于Playwright,由后台守护进程管理无头Chromium实例。

When to use

适用场景

  • Automating web app flows (login, form fill, data extraction, visual QA)
  • Driving Electron apps (VS Code, Slack, Discord, Figma, Notion, Spotify)
  • Visual verification -- screenshots and annotated element overlays
  • DOM-level assertions where terminal snapshots are irrelevant
If the target is a terminal TUI, use tuistory or true-input instead.
  • 自动化Web应用流程(登录、表单填写、数据提取、视觉QA)
  • 驱动Electron应用(VS Code、Slack、Discord、Figma、Notion、Spotify)
  • 视觉验证——截图和带标注的元素覆盖层
  • 终端快照无关的DOM级断言
如果目标是终端TUI,请改用tuistorytrue-input

Prerequisites

前置条件

bash
agent-browser install   # one-time: downloads bundled Chromium
For Electron apps, the target app must be launched with
--remote-debugging-port=<port>
.
bash
agent-browser install   # 一次性操作:下载捆绑的Chromium
对于Electron应用,目标应用必须以
--remote-debugging-port=<port>
参数启动。

Core workflow

核心工作流

Every interaction follows the same loop:
bash
agent-browser open <url>
agent-browser snapshot -i          # interactive elements only -> refs like @e1, @e2
agent-browser click @e3            # interact using refs
agent-browser snapshot -i          # re-snapshot (refs invalidate after navigation/DOM changes)
agent-browser close                # always close when done
每次交互都遵循相同的循环:
bash
agent-browser open <url>
agent-browser snapshot -i          # 仅显示交互元素 -> 生成@e1、@e2这类引用
agent-browser click @e3            # 使用引用进行交互
agent-browser snapshot -i          # 重新生成快照(导航/DOM变更后引用会失效)
agent-browser close                # 操作完成后务必关闭

Command chaining

命令链式调用

Commands share a persistent daemon, so
&&
chaining is safe:
bash
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
Chain when you don't need intermediate output. Run separately when you need to parse refs before acting.
命令共享持久化守护进程,因此使用
&&
链式调用是安全的:
bash
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
无需中间输出时使用链式调用;需要先解析引用再执行操作时,请分开运行命令。

Command reference

命令参考

Navigation

导航

CommandPurpose
open <url>
Navigate (auto-prepends
https://
if no protocol)
back
/
forward
/
reload
History navigation
close
Shut down browser session
connect <port>
Attach to a running browser/Electron app via CDP
命令用途
open <url>
导航(如果无协议则自动添加
https://
back
/
forward
/
reload
历史导航
close
关闭浏览器会话
connect <port>
通过CDP连接到运行中的浏览器/Electron应用

Snapshot (page analysis)

快照(页面分析)

CommandPurpose
snapshot
Full accessibility tree
snapshot -i
Interactive elements only (recommended default)
snapshot -i -C
Include cursor-interactive elements (onclick divs)
snapshot -c
Compact output
snapshot -d <n>
Limit tree depth
snapshot -s "<selector>"
Scope to CSS selector
命令用途
snapshot
完整可访问性树
snapshot -i
仅显示交互元素(推荐默认选项)
snapshot -i -C
包含光标可交互元素(带onclick的div)
snapshot -c
精简输出
snapshot -d <n>
限制树深度
snapshot -s "<selector>"
限定CSS选择器范围

Interactions (use @refs from snapshot)

交互操作(使用快照生成的@引用)

CommandPurpose
click @e1
Click (
dblclick
for double-click)
fill @e2 "text"
Clear field and type
type @e2 "text"
Type without clearing
press Enter
Press key (combos:
Control+a
)
keyboard type "text"
Type at current focus (no ref needed)
keyboard inserttext "text"
Insert without key events (Electron custom inputs)
hover @e1
Hover
check @e1
/
uncheck @e1
Toggle checkbox
select @e1 "value"
Select dropdown option
scroll down 500
Scroll page (
--selector
for containers)
scrollintoview @e1
Scroll element into view
drag @e1 @e2
Drag and drop
upload @e1 file.pdf
Upload file
命令用途
click @e1
点击(
dblclick
表示双击)
fill @e2 "text"
清空字段并输入文本
type @e2 "text"
直接输入文本(不清空原有内容)
press Enter
按键(组合键示例:
Control+a
keyboard type "text"
在当前焦点位置输入文本(无需引用)
keyboard inserttext "text"
插入文本(无按键事件,适用于Electron自定义输入框)
hover @e1
悬停
check @e1
/
uncheck @e1
切换复选框状态
select @e1 "value"
选择下拉选项
scroll down 500
滚动页面(
--selector
参数可指定容器)
scrollintoview @e1
滚动至元素可见
drag @e1 @e2
拖拽操作
upload @e1 file.pdf
上传文件

Semantic locators (when refs are unreliable)

语义定位(引用不可靠时使用)

bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find testid "submit-btn" click
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find testid "submit-btn" click

Get information

获取信息

CommandPurpose
get text @e1
Element text (
get text body > page.txt
for full page)
get html @e1
innerHTML
get value @e1
Input value
get attr @e1 href
Element attribute
get title
/
get url
Page title / URL
get count ".item"
Count matching elements
命令用途
get text @e1
获取元素文本(
get text body > page.txt
可获取整页文本)
get html @e1
获取innerHTML
get value @e1
获取输入框值
get attr @e1 href
获取元素属性
get title
/
get url
获取页面标题/URL
get count ".item"
统计匹配元素数量

Check state

状态检查

bash
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1
bash
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1

Wait

等待

CommandPurpose
wait @e1
Wait for element
wait 2000
Wait milliseconds
wait --text "Success"
Wait for text
wait --url "**/dashboard"
Wait for URL pattern
wait --load networkidle
Wait for network idle (best for slow pages)
wait --fn "window.ready"
Wait for JS condition
命令用途
wait @e1
等待元素出现
wait 2000
等待指定毫秒数
wait --text "Success"
等待文本出现
wait --url "**/dashboard"
等待URL匹配指定模式
wait --load networkidle
等待网络空闲(适用于加载缓慢的页面)
wait --fn "window.ready"
等待JS条件满足

JavaScript (eval)

JavaScript(执行)

bash
agent-browser eval 'document.title'
bash
agent-browser eval 'document.title'

Complex JS -- use --stdin to avoid shell quoting issues

复杂JS代码 -- 使用--stdin避免shell引号问题

agent-browser eval --stdin <<'EVALEOF' JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => a.href)) EVALEOF
undefined
agent-browser eval --stdin <<'EVALEOF' JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => a.href)) EVALEOF
undefined

Diff (compare page states)

差异对比(页面状态对比)

bash
agent-browser diff snapshot                          # current vs last snapshot
agent-browser diff snapshot --baseline before.txt    # current vs saved file
agent-browser diff screenshot --baseline before.png  # visual pixel diff
agent-browser diff url <url1> <url2>                 # compare two pages
bash
agent-browser diff snapshot                          # 当前快照与上一次快照对比
agent-browser diff snapshot --baseline before.txt    # 当前快照与保存的文件对比
agent-browser diff screenshot --baseline before.png  # 视觉像素差异对比
agent-browser diff url <url1> <url2>                 # 对比两个页面

Dialogs

对话框

bash
agent-browser dialog accept [text]  # accept alert/confirm/prompt
agent-browser dialog dismiss        # dismiss dialog
bash
agent-browser dialog accept [text]  # 确认警告/确认/提示对话框
agent-browser dialog dismiss        # 取消对话框

Tabs & frames

标签页与框架

bash
agent-browser tab                 # list tabs
agent-browser tab new [url]       # new tab
agent-browser tab 2               # switch to tab by index
agent-browser tab close           # close current tab
agent-browser frame "#iframe"     # switch to iframe
agent-browser frame main          # back to main frame
bash
agent-browser tab                 # 列出所有标签页
agent-browser tab new [url]       # 新建标签页
agent-browser tab 2               # 通过索引切换标签页
agent-browser tab close           # 关闭当前标签页
agent-browser frame "#iframe"     # 切换到iframe
agent-browser frame main          # 返回主框架

Screenshots & recording

截图与录制

bash
agent-browser screenshot                      # save to temp directory
agent-browser screenshot path.png             # save to specific path
agent-browser screenshot --full               # full-page screenshot
agent-browser screenshot --annotate           # annotated with numbered element labels
agent-browser pdf output.pdf                  # save as PDF
--annotate
overlays numbered labels on interactive elements. Each label
[N]
maps to ref
@eN
, enabling both visual verification and immediate interaction.
Video recording:
bash
agent-browser record start ./demo.webm
bash
agent-browser screenshot                      # 保存到临时目录
agent-browser screenshot path.png             # 保存到指定路径
agent-browser screenshot --full               # 整页截图
agent-browser screenshot --annotate           # 带编号元素标注的截图
agent-browser pdf output.pdf                  # 保存为PDF
--annotate
参数会在交互元素上叠加编号标签。每个标签
[N]
对应引用
@eN
,既支持视觉验证也支持即时交互。
视频录制:
bash
agent-browser record start ./demo.webm

... perform actions ...

... 执行操作 ...

agent-browser record stop agent-browser record restart ./take2.webm # stop current + start new

Recording creates a fresh context but preserves cookies/storage. Explore first, then start recording for smooth demos.
agent-browser record stop agent-browser record restart ./take2.webm # 停止当前录制并开始新录制

录制会创建新上下文,但保留Cookie和存储信息。建议先探索页面,再开始录制以获得流畅的演示效果。

Ref lifecycle

引用生命周期

Refs (
@e1
,
@e2
, ...) are invalidated whenever the page changes. Always re-snapshot after:
  • Clicking links/buttons that navigate
  • Form submissions
  • Dynamic content loading (dropdowns, modals)
bash
agent-browser click @e5           # navigates
agent-browser snapshot -i         # MUST re-snapshot
agent-browser click @e1           # use new refs
引用(
@e1
@e2
...)会在页面变更时失效。以下操作后务必重新生成快照:
  • 点击链接/按钮触发导航
  • 表单提交
  • 动态内容加载(下拉菜单、模态框)
bash
agent-browser click @e5           # 触发导航
agent-browser snapshot -i         # 必须重新生成快照
agent-browser click @e1           # 使用新的引用

Electron app automation

Electron应用自动化

Any Electron app supports
--remote-debugging-port
since it's built on Chromium.
所有Electron应用都支持
--remote-debugging-port
参数,因为它们基于Chromium构建。

Launch and connect

启动与连接

bash
undefined
bash
undefined

macOS

macOS

open -a "Slack" --args --remote-debugging-port=9222
open -a "Slack" --args --remote-debugging-port=9222

Linux

Linux

slack --remote-debugging-port=9222
slack --remote-debugging-port=9222

Then connect

然后连接

sleep 3 agent-browser connect 9222 agent-browser snapshot -i

**The app must be quit first** if already running -- the flag only takes effect at launch.
sleep 3 agent-browser connect 9222 agent-browser snapshot -i

**如果应用已运行,必须先退出再重新启动**——该参数仅在启动时生效。

Tab management in Electron

Electron中的标签页管理

Electron apps often have multiple windows/webviews:
bash
agent-browser tab                        # list targets
agent-browser tab 2                      # switch by index
agent-browser tab --url "*settings*"     # switch by URL pattern
Electron应用通常包含多个窗口/webview:
bash
agent-browser tab                        # 列出所有目标
agent-browser tab 2                      # 通过索引切换
agent-browser tab --url "*settings*"     # 通过URL模式切换

Electron troubleshooting

Electron故障排查

ProblemFix
"Connection refused"Ensure app was launched with
--remote-debugging-port
; quit and relaunch if already running
Connect fails after launch
sleep 3
before connecting; app needs time to initialize
Elements missing from snapshotTry
snapshot -i -C
; use
tab
to switch to the correct webview
Cannot type in fieldsUse
keyboard type "text"
or
keyboard inserttext "text"
for custom input components
Dark mode lostSet
AGENT_BROWSER_COLOR_SCHEME=dark
or use
--color-scheme dark
问题解决方法
"Connection refused"确保应用以
--remote-debugging-port
参数启动;如果已运行则退出后重新启动
启动后连接失败连接前执行
sleep 3
;应用需要时间初始化
快照中缺少元素尝试
snapshot -i -C
;使用
tab
命令切换到正确的webview
无法在字段中输入文本对自定义输入组件使用
keyboard type "text"
keyboard inserttext "text"
深色模式丢失设置
AGENT_BROWSER_COLOR_SCHEME=dark
或使用
--color-scheme dark
参数

State persistence

状态持久化

Save and restore cookies/localStorage across sessions:
bash
agent-browser open https://app.example.com/login
跨会话保存和恢复Cookie/localStorage:
bash
agent-browser open https://app.example.com/login

... login flow ...

... 登录流程 ...

agent-browser state save auth.json
agent-browser state save auth.json

Later: load saved state

后续操作:加载保存的状态

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard

Auto-save/restore with named sessions:

```bash
agent-browser --session-name myapp open https://app.example.com
agent-browser state load auth.json agent-browser open https://app.example.com/dashboard

使用命名会话自动保存/恢复状态:

```bash
agent-browser --session-name myapp open https://app.example.com

state auto-saved on close, auto-loaded on next launch with same --session-name

关闭时自动保存状态,下次使用相同--session-name启动时自动加载

undefined
undefined

Sessions

会话管理

The browser persists via a background daemon. One session is the default.
bash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
agent-browser --session test1 close
agent-browser --session test2 close
Each
--session
spawns a separate Chromium process (~300 MB). Prefer navigating within a single session. Exception: controlling multiple Electron apps on different CDP ports.
浏览器通过后台守护进程持久化运行。默认使用一个会话。
bash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
agent-browser --session test1 close
agent-browser --session test2 close
每个
--session
会启动一个独立的Chromium进程(约300 MB内存占用)。优先在单个会话内导航。例外情况:控制不同CDP端口上的多个Electron应用。

Global options

全局选项

FlagPurpose
--session <name>
Isolated browser session
--headed
Show browser window
--cdp <port>
Connect via CDP
--auto-connect
Auto-discover running Chrome
--proxy <url>
Use proxy server
--color-scheme dark
Force dark/light mode
--ignore-https-errors
Accept self-signed certs
--allow-file-access
Enable
file://
URLs
--json
JSON output for parsing
参数用途
--session <name>
独立浏览器会话
--headed
显示浏览器窗口
--cdp <port>
通过CDP连接
--auto-connect
自动发现运行中的Chrome
--proxy <url>
使用代理服务器
--color-scheme dark
强制深色/浅色模式
--ignore-https-errors
接受自签名证书
--allow-file-access
启用
file://
URL
--json
输出JSON格式以便解析

Debugging

调试

bash
agent-browser --headed open example.com   # visible browser
agent-browser console                     # view console messages
agent-browser errors                      # view page errors
agent-browser highlight @e1               # highlight element
bash
agent-browser --headed open example.com   # 显示可见浏览器
agent-browser console                     # 查看控制台消息
agent-browser errors                      # 查看页面错误
agent-browser highlight @e1               # 高亮元素

Gotchas

注意事项

  • Invisible-to-snapshot elements.
    contenteditable
    divs and custom components may not appear in accessibility snapshots. Use
    eval
    to interact:
    bash
    agent-browser eval --stdin <<'EVALEOF'
    const el = document.querySelector("[contenteditable]");
    el.focus();
    el.textContent = "hello";
    el.dispatchEvent(new Event('input', { bubbles: true }));
    EVALEOF
  • Unstable class names. Never hardcode CSS-in-JS class names (
    sc-*
    ,
    css-*
    ). Find elements by text content,
    cursor: pointer
    style, or
    testid
    instead.
  • SPA loading delays. Single-page apps may take 5-10s to render after navigation. Double-wait:
    wait --load networkidle
    then
    wait 5000
    .
  • Flag ordering. Global flags (
    --headers
    ,
    --session
    ,
    --cdp
    ) must come before the subcommand:
    agent-browser --headers '{}' open <url>
    .
  • 快照不可见元素
    contenteditable
    div和自定义组件可能不会出现在可访问性快照中。使用
    eval
    进行交互:
    bash
    agent-browser eval --stdin <<'EVALEOF'
    const el = document.querySelector("[contenteditable]");
    el.focus();
    el.textContent = "hello";
    el.dispatchEvent(new Event('input', { bubbles: true }));
    EVALEOF
  • 不稳定的类名:永远不要硬编码CSS-in-JS类名(如
    sc-*
    css-*
    )。改用文本内容、
    cursor: pointer
    样式或
    testid
    查找元素。
  • SPA加载延迟:单页应用导航后可能需要5-10秒渲染。双重等待:
    wait --load networkidle
    后再执行
    wait 5000
  • 参数顺序:全局参数(如
    --headers
    --session
    --cdp
    )必须放在子命令之前:
    agent-browser --headers '{}' open <url>

Critical rules

关键规则

  1. Always take screenshots for visual QA. Text snapshots miss layout, styling, alignment, and z-index issues. Use
    screenshot --annotate
    when you need both visual proof and element refs.
  2. One session by default. Navigate between pages with
    open <url>
    instead of creating new sessions.
  3. Always close when done.
    agent-browser close
    frees the Chromium process.
  4. Re-snapshot after every navigation. Refs are invalidated.
  1. 视觉QA务必截图:文本快照无法发现布局、样式、对齐和z-index问题。需要同时获取视觉证明和元素引用时,使用
    screenshot --annotate
  2. 默认使用单个会话:使用
    open <url>
    在页面间导航,而非创建新会话。
  3. 操作完成后务必关闭
    agent-browser close
    会释放Chromium进程。
  4. 每次导航后重新生成快照:引用会失效。