agent-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent-Browser Driver
Agent-Browser 驱动
The orchestrator routed you here. Use these mechanics to execute your plan.
Control web pages and Electron desktop apps via the CLI. Uses Playwright under the hood with a headless Chromium instance managed by a background daemon.
agent-browser编排器将您引导至此。使用这些机制来执行您的计划。
通过 CLI控制网页和Electron桌面应用。底层基于Playwright,由后台守护进程管理无头Chromium实例。
agent-browserWhen to use
适用场景
- Automating web app flows (login, form fill, data extraction, visual QA)
- Driving Electron apps (VS Code, Slack, Discord, Figma, Notion, Spotify)
- Visual verification -- screenshots and annotated element overlays
- DOM-level assertions where terminal snapshots are irrelevant
If the target is a terminal TUI, use tuistory or true-input instead.
- 自动化Web应用流程(登录、表单填写、数据提取、视觉QA)
- 驱动Electron应用(VS Code、Slack、Discord、Figma、Notion、Spotify)
- 视觉验证——截图和带标注的元素覆盖层
- 终端快照无关的DOM级断言
如果目标是终端TUI,请改用tuistory或true-input。
Prerequisites
前置条件
bash
agent-browser install # one-time: downloads bundled ChromiumFor Electron apps, the target app must be launched with .
--remote-debugging-port=<port>bash
agent-browser install # 一次性操作:下载捆绑的Chromium对于Electron应用,目标应用必须以参数启动。
--remote-debugging-port=<port>Core workflow
核心工作流
Every interaction follows the same loop:
bash
agent-browser open <url>
agent-browser snapshot -i # interactive elements only -> refs like @e1, @e2
agent-browser click @e3 # interact using refs
agent-browser snapshot -i # re-snapshot (refs invalidate after navigation/DOM changes)
agent-browser close # always close when done每次交互都遵循相同的循环:
bash
agent-browser open <url>
agent-browser snapshot -i # 仅显示交互元素 -> 生成@e1、@e2这类引用
agent-browser click @e3 # 使用引用进行交互
agent-browser snapshot -i # 重新生成快照(导航/DOM变更后引用会失效)
agent-browser close # 操作完成后务必关闭Command chaining
命令链式调用
Commands share a persistent daemon, so chaining is safe:
&&bash
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -iChain when you don't need intermediate output. Run separately when you need to parse refs before acting.
命令共享持久化守护进程,因此使用链式调用是安全的:
&&bash
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i无需中间输出时使用链式调用;需要先解析引用再执行操作时,请分开运行命令。
Command reference
命令参考
Navigation
导航
| Command | Purpose |
|---|---|
| Navigate (auto-prepends |
| History navigation |
| Shut down browser session |
| Attach to a running browser/Electron app via CDP |
| 命令 | 用途 |
|---|---|
| 导航(如果无协议则自动添加 |
| 历史导航 |
| 关闭浏览器会话 |
| 通过CDP连接到运行中的浏览器/Electron应用 |
Snapshot (page analysis)
快照(页面分析)
| Command | Purpose |
|---|---|
| Full accessibility tree |
| Interactive elements only (recommended default) |
| Include cursor-interactive elements (onclick divs) |
| Compact output |
| Limit tree depth |
| Scope to CSS selector |
| 命令 | 用途 |
|---|---|
| 完整可访问性树 |
| 仅显示交互元素(推荐默认选项) |
| 包含光标可交互元素(带onclick的div) |
| 精简输出 |
| 限制树深度 |
| 限定CSS选择器范围 |
Interactions (use @refs from snapshot)
交互操作(使用快照生成的@引用)
| Command | Purpose |
|---|---|
| Click ( |
| Clear field and type |
| Type without clearing |
| Press key (combos: |
| Type at current focus (no ref needed) |
| Insert without key events (Electron custom inputs) |
| Hover |
| Toggle checkbox |
| Select dropdown option |
| Scroll page ( |
| Scroll element into view |
| Drag and drop |
| Upload file |
| 命令 | 用途 |
|---|---|
| 点击( |
| 清空字段并输入文本 |
| 直接输入文本(不清空原有内容) |
| 按键(组合键示例: |
| 在当前焦点位置输入文本(无需引用) |
| 插入文本(无按键事件,适用于Electron自定义输入框) |
| 悬停 |
| 切换复选框状态 |
| 选择下拉选项 |
| 滚动页面( |
| 滚动至元素可见 |
| 拖拽操作 |
| 上传文件 |
Semantic locators (when refs are unreliable)
语义定位(引用不可靠时使用)
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find testid "submit-btn" clickbash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find testid "submit-btn" clickGet information
获取信息
| Command | Purpose |
|---|---|
| Element text ( |
| innerHTML |
| Input value |
| Element attribute |
| Page title / URL |
| Count matching elements |
| 命令 | 用途 |
|---|---|
| 获取元素文本( |
| 获取innerHTML |
| 获取输入框值 |
| 获取元素属性 |
| 获取页面标题/URL |
| 统计匹配元素数量 |
Check state
状态检查
bash
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1bash
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1Wait
等待
| Command | Purpose |
|---|---|
| Wait for element |
| Wait milliseconds |
| Wait for text |
| Wait for URL pattern |
| Wait for network idle (best for slow pages) |
| Wait for JS condition |
| 命令 | 用途 |
|---|---|
| 等待元素出现 |
| 等待指定毫秒数 |
| 等待文本出现 |
| 等待URL匹配指定模式 |
| 等待网络空闲(适用于加载缓慢的页面) |
| 等待JS条件满足 |
JavaScript (eval)
JavaScript(执行)
bash
agent-browser eval 'document.title'bash
agent-browser eval 'document.title'Complex JS -- use --stdin to avoid shell quoting issues
复杂JS代码 -- 使用--stdin避免shell引号问题
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => a.href))
EVALEOF
undefinedagent-browser eval --stdin <<'EVALEOF'
JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => a.href))
EVALEOF
undefinedDiff (compare page states)
差异对比(页面状态对比)
bash
agent-browser diff snapshot # current vs last snapshot
agent-browser diff snapshot --baseline before.txt # current vs saved file
agent-browser diff screenshot --baseline before.png # visual pixel diff
agent-browser diff url <url1> <url2> # compare two pagesbash
agent-browser diff snapshot # 当前快照与上一次快照对比
agent-browser diff snapshot --baseline before.txt # 当前快照与保存的文件对比
agent-browser diff screenshot --baseline before.png # 视觉像素差异对比
agent-browser diff url <url1> <url2> # 对比两个页面Dialogs
对话框
bash
agent-browser dialog accept [text] # accept alert/confirm/prompt
agent-browser dialog dismiss # dismiss dialogbash
agent-browser dialog accept [text] # 确认警告/确认/提示对话框
agent-browser dialog dismiss # 取消对话框Tabs & frames
标签页与框架
bash
agent-browser tab # list tabs
agent-browser tab new [url] # new tab
agent-browser tab 2 # switch to tab by index
agent-browser tab close # close current tab
agent-browser frame "#iframe" # switch to iframe
agent-browser frame main # back to main framebash
agent-browser tab # 列出所有标签页
agent-browser tab new [url] # 新建标签页
agent-browser tab 2 # 通过索引切换标签页
agent-browser tab close # 关闭当前标签页
agent-browser frame "#iframe" # 切换到iframe
agent-browser frame main # 返回主框架Screenshots & recording
截图与录制
bash
agent-browser screenshot # save to temp directory
agent-browser screenshot path.png # save to specific path
agent-browser screenshot --full # full-page screenshot
agent-browser screenshot --annotate # annotated with numbered element labels
agent-browser pdf output.pdf # save as PDF--annotate[N]@eNVideo recording:
bash
agent-browser record start ./demo.webmbash
agent-browser screenshot # 保存到临时目录
agent-browser screenshot path.png # 保存到指定路径
agent-browser screenshot --full # 整页截图
agent-browser screenshot --annotate # 带编号元素标注的截图
agent-browser pdf output.pdf # 保存为PDF--annotate[N]@eN视频录制:
bash
agent-browser record start ./demo.webm... perform actions ...
... 执行操作 ...
agent-browser record stop
agent-browser record restart ./take2.webm # stop current + start new
Recording creates a fresh context but preserves cookies/storage. Explore first, then start recording for smooth demos.agent-browser record stop
agent-browser record restart ./take2.webm # 停止当前录制并开始新录制
录制会创建新上下文,但保留Cookie和存储信息。建议先探索页面,再开始录制以获得流畅的演示效果。Ref lifecycle
引用生命周期
Refs (, , ...) are invalidated whenever the page changes. Always re-snapshot after:
@e1@e2- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
bash
agent-browser click @e5 # navigates
agent-browser snapshot -i # MUST re-snapshot
agent-browser click @e1 # use new refs引用(、...)会在页面变更时失效。以下操作后务必重新生成快照:
@e1@e2- 点击链接/按钮触发导航
- 表单提交
- 动态内容加载(下拉菜单、模态框)
bash
agent-browser click @e5 # 触发导航
agent-browser snapshot -i # 必须重新生成快照
agent-browser click @e1 # 使用新的引用Electron app automation
Electron应用自动化
Any Electron app supports since it's built on Chromium.
--remote-debugging-port所有Electron应用都支持参数,因为它们基于Chromium构建。
--remote-debugging-portLaunch and connect
启动与连接
bash
undefinedbash
undefinedmacOS
macOS
open -a "Slack" --args --remote-debugging-port=9222
open -a "Slack" --args --remote-debugging-port=9222
Linux
Linux
slack --remote-debugging-port=9222
slack --remote-debugging-port=9222
Then connect
然后连接
sleep 3
agent-browser connect 9222
agent-browser snapshot -i
**The app must be quit first** if already running -- the flag only takes effect at launch.sleep 3
agent-browser connect 9222
agent-browser snapshot -i
**如果应用已运行,必须先退出再重新启动**——该参数仅在启动时生效。Tab management in Electron
Electron中的标签页管理
Electron apps often have multiple windows/webviews:
bash
agent-browser tab # list targets
agent-browser tab 2 # switch by index
agent-browser tab --url "*settings*" # switch by URL patternElectron应用通常包含多个窗口/webview:
bash
agent-browser tab # 列出所有目标
agent-browser tab 2 # 通过索引切换
agent-browser tab --url "*settings*" # 通过URL模式切换Electron troubleshooting
Electron故障排查
| Problem | Fix |
|---|---|
| "Connection refused" | Ensure app was launched with |
| Connect fails after launch | |
| Elements missing from snapshot | Try |
| Cannot type in fields | Use |
| Dark mode lost | Set |
| 问题 | 解决方法 |
|---|---|
| "Connection refused" | 确保应用以 |
| 启动后连接失败 | 连接前执行 |
| 快照中缺少元素 | 尝试 |
| 无法在字段中输入文本 | 对自定义输入组件使用 |
| 深色模式丢失 | 设置 |
State persistence
状态持久化
Save and restore cookies/localStorage across sessions:
bash
agent-browser open https://app.example.com/login跨会话保存和恢复Cookie/localStorage:
bash
agent-browser open https://app.example.com/login... login flow ...
... 登录流程 ...
agent-browser state save auth.json
agent-browser state save auth.json
Later: load saved state
后续操作:加载保存的状态
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Auto-save/restore with named sessions:
```bash
agent-browser --session-name myapp open https://app.example.comagent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
使用命名会话自动保存/恢复状态:
```bash
agent-browser --session-name myapp open https://app.example.comstate auto-saved on close, auto-loaded on next launch with same --session-name
关闭时自动保存状态,下次使用相同--session-name启动时自动加载
undefinedundefinedSessions
会话管理
The browser persists via a background daemon. One session is the default.
bash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
agent-browser --session test1 close
agent-browser --session test2 closeEach spawns a separate Chromium process (~300 MB). Prefer navigating within a single session. Exception: controlling multiple Electron apps on different CDP ports.
--session浏览器通过后台守护进程持久化运行。默认使用一个会话。
bash
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
agent-browser --session test1 close
agent-browser --session test2 close每个会启动一个独立的Chromium进程(约300 MB内存占用)。优先在单个会话内导航。例外情况:控制不同CDP端口上的多个Electron应用。
--sessionGlobal options
全局选项
| Flag | Purpose |
|---|---|
| Isolated browser session |
| Show browser window |
| Connect via CDP |
| Auto-discover running Chrome |
| Use proxy server |
| Force dark/light mode |
| Accept self-signed certs |
| Enable |
| JSON output for parsing |
| 参数 | 用途 |
|---|---|
| 独立浏览器会话 |
| 显示浏览器窗口 |
| 通过CDP连接 |
| 自动发现运行中的Chrome |
| 使用代理服务器 |
| 强制深色/浅色模式 |
| 接受自签名证书 |
| 启用 |
| 输出JSON格式以便解析 |
Debugging
调试
bash
agent-browser --headed open example.com # visible browser
agent-browser console # view console messages
agent-browser errors # view page errors
agent-browser highlight @e1 # highlight elementbash
agent-browser --headed open example.com # 显示可见浏览器
agent-browser console # 查看控制台消息
agent-browser errors # 查看页面错误
agent-browser highlight @e1 # 高亮元素Gotchas
注意事项
- Invisible-to-snapshot elements. divs and custom components may not appear in accessibility snapshots. Use
contenteditableto interact:evalbashagent-browser eval --stdin <<'EVALEOF' const el = document.querySelector("[contenteditable]"); el.focus(); el.textContent = "hello"; el.dispatchEvent(new Event('input', { bubbles: true })); EVALEOF - Unstable class names. Never hardcode CSS-in-JS class names (,
sc-*). Find elements by text content,css-*style, orcursor: pointerinstead.testid - SPA loading delays. Single-page apps may take 5-10s to render after navigation. Double-wait: then
wait --load networkidle.wait 5000 - Flag ordering. Global flags (,
--headers,--session) must come before the subcommand:--cdp.agent-browser --headers '{}' open <url>
- 快照不可见元素:div和自定义组件可能不会出现在可访问性快照中。使用
contenteditable进行交互:evalbashagent-browser eval --stdin <<'EVALEOF' const el = document.querySelector("[contenteditable]"); el.focus(); el.textContent = "hello"; el.dispatchEvent(new Event('input', { bubbles: true })); EVALEOF - 不稳定的类名:永远不要硬编码CSS-in-JS类名(如、
sc-*)。改用文本内容、css-*样式或cursor: pointer查找元素。testid - SPA加载延迟:单页应用导航后可能需要5-10秒渲染。双重等待:后再执行
wait --load networkidle。wait 5000 - 参数顺序:全局参数(如、
--headers、--session)必须放在子命令之前:--cdp。agent-browser --headers '{}' open <url>
Critical rules
关键规则
- Always take screenshots for visual QA. Text snapshots miss layout, styling, alignment, and z-index issues. Use when you need both visual proof and element refs.
screenshot --annotate - One session by default. Navigate between pages with instead of creating new sessions.
open <url> - Always close when done. frees the Chromium process.
agent-browser close - Re-snapshot after every navigation. Refs are invalidated.
- 视觉QA务必截图:文本快照无法发现布局、样式、对齐和z-index问题。需要同时获取视觉证明和元素引用时,使用。
screenshot --annotate - 默认使用单个会话:使用在页面间导航,而非创建新会话。
open <url> - 操作完成后务必关闭:会释放Chromium进程。
agent-browser close - 每次导航后重新生成快照:引用会失效。