BE SURE TO CLEAN UP SCREEN SHOTS AFTER YOU ARE DONE WITH EVERYTHING

IF THIS NEEDS TO BE INSTALLED

npm install -g agent-browser
agent-browser install # to get chromium downloaded

agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 "test@example.com" # Fill by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png agent-browser close

Traditional Selectors (also supported)

agent-browser click "#submit" agent-browser fill "#email" "test@example.com" agent-browser find role button click --name "Submit"

Commands Core Commands

agent-browser open <url> # Navigate to URL (aliases: goto, navigate) agent-browser click <sel> # Click element agent-browser dblclick <sel> # Double-click element agent-browser focus <sel> # Focus element agent-browser type <sel> <text> # Type into element agent-browser fill <sel> <text> # Clear and fill agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key) agent-browser keydown <key> # Hold key down agent-browser keyup <key> # Release key agent-browser hover <sel> # Hover element agent-browser select <sel> <val> # Select dropdown option agent-browser check <sel> # Check checkbox agent-browser uncheck <sel> # Uncheck checkbox agent-browser scroll <dir> [px] # Scroll (up/down/left/right) agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto) agent-browser drag <src> <tgt> # Drag and drop agent-browser upload <sel> <files> # Upload files agent-browser screenshot [path] # Take screenshot (--full for full page) agent-browser pdf <path> # Save as PDF agent-browser snapshot # Accessibility tree with refs (best for AI) agent-browser eval <js> # Run JavaScript agent-browser close # Close browser (aliases: quit, exit)

Get Info

agent-browser get text <sel> # Get text content agent-browser get html <sel> # Get innerHTML agent-browser get value <sel> # Get input value agent-browser get attr <sel> <attr> # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count <sel> # Count matching elements agent-browser get box <sel> # Get bounding box

Check State

agent-browser is visible <sel> # Check if visible agent-browser is enabled <sel> # Check if enabled agent-browser is checked <sel> # Check if checked

Find Elements (Semantic Locators)

agent-browser find role <role> <action> [value] # By ARIA role agent-browser find text <text> <action> # By text content agent-browser find label <label> <action> [value] # By label agent-browser find placeholder <ph> <action> [value] # By placeholder agent-browser find alt <text> <action> # By alt text agent-browser find title <text> <action> # By title attr agent-browser find testid <id> <action> [value] # By data-testid agent-browser find first <sel> <action> [value] # First match agent-browser find last <sel> <action> [value] # Last match agent-browser find nth <n> <sel> <action> [value] # Nth match

Actions: click, fill, check, hover, text

Examples:

agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find label "Email" fill "test@test.com" agent-browser find first ".item" click agent-browser find nth 2 "a" text

Wait

agent-browser wait <selector> # Wait for element to be visible agent-browser wait <ms> # Wait for time (milliseconds) agent-browser wait --text "Welcome" # Wait for text to appear agent-browser wait --url "**/dash" # Wait for URL pattern agent-browser wait --load networkidle # Wait for load state agent-browser wait --fn "window.ready === true" # Wait for JS condition

Load states: load, domcontentloaded, networkidle Mouse Control

agent-browser mouse move <x> <y> # Move mouse agent-browser mouse down [button] # Press button (left/right/middle) agent-browser mouse up [button] # Release button agent-browser mouse wheel <dy> [dx] # Scroll wheel

Browser Settings

agent-browser set viewport <w> <h> # Set viewport size agent-browser set device <name> # Emulate device ("iPhone 14") agent-browser set geo <lat> <lng> # Set geolocation agent-browser set offline [on|off] # Toggle offline mode agent-browser set headers <json> # Extra HTTP headers agent-browser set credentials <u> <p> # HTTP basic auth agent-browser set media [dark|light] # Emulate color scheme

Cookies & Storage

agent-browser cookies # Get all cookies agent-browser cookies set <name> <val> # Set cookie agent-browser cookies clear # Clear cookies

agent-browser storage local # Get all localStorage agent-browser storage local <key> # Get specific key agent-browser storage local set <k> <v> # Set value agent-browser storage local clear # Clear all

agent-browser storage session # Same for sessionStorage

Network

agent-browser network route <url> # Intercept requests agent-browser network route <url> --abort # Block requests agent-browser network route <url> --body <json> # Mock response agent-browser network unroute [url] # Remove routes agent-browser network requests # View tracked requests agent-browser network requests --filter api # Filter requests

Tabs & Windows

agent-browser tab # List tabs agent-browser tab new [url] # New tab (optionally with URL) agent-browser tab <n> # Switch to tab n agent-browser tab close [n] # Close tab agent-browser window new # New window

Frames

agent-browser frame <sel> # Switch to iframe agent-browser frame main # Back to main frame

Dialogs

agent-browser dialog accept [text] # Accept (with optional prompt text) agent-browser dialog dismiss # Dismiss

Debug

agent-browser trace start [path] # Start recording trace agent-browser trace stop [path] # Stop and save trace agent-browser console # View console messages agent-browser console --clear # Clear console agent-browser errors # View page errors agent-browser errors --clear # Clear errors agent-browser highlight <sel> # Highlight element agent-browser state save <path> # Save auth state agent-browser state load <path> # Load auth state

Navigation

agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page

Setup

agent-browser install # Download Chromium browser agent-browser install --with-deps # Also install system deps (Linux)

Sessions

Run multiple isolated browser instances:

完成所有操作后，请务必清理截图

如需安装，请执行以下命令

npm install -g agent-browser
agent-browser install # 下载Chromium浏览器

agent-browser open example.com agent-browser snapshot # 获取带引用的无障碍树 agent-browser click @e2 # 通过快照中的引用点击元素 agent-browser fill @e3 "test@example.com" # 通过快照中的引用填充内容 agent-browser get text @e1 # 通过快照中的引用获取文本 agent-browser screenshot page.png agent-browser close

传统选择器（同样支持）

agent-browser click "#submit" agent-browser fill "#email" "test@example.com" agent-browser find role button click --name "Submit"

命令列表核心命令

agent-browser open <url> # 跳转到指定URL（别名：goto、navigate） agent-browser click <sel> # 点击元素 agent-browser dblclick <sel> # 双击元素 agent-browser focus <sel> # 聚焦元素 agent-browser type <sel> <text> # 在元素中输入文本 agent-browser fill <sel> <text> # 清空并填充元素 agent-browser press <key> # 按下按键（Enter、Tab、Control+a等）（别名：key） agent-browser keydown <key> # 按住按键 agent-browser keyup <key> # 释放按键 agent-browser hover <sel> # 悬停在元素上 agent-browser select <sel> <val> # 选择下拉选项 agent-browser check <sel> # 勾选复选框 agent-browser uncheck <sel> # 取消勾选复选框 agent-browser scroll <dir> [px] # 滚动页面（方向：up/down/left/right，可指定像素数） agent-browser scrollintoview <sel> # 将元素滚动到可视区域（别名：scrollinto） agent-browser drag <src> <tgt> # 拖放元素 agent-browser upload <sel> <files> # 上传文件 agent-browser screenshot [path] # 截取屏幕截图（--full参数可截取整页） agent-browser pdf <path> # 将页面保存为PDF agent-browser snapshot # 获取带引用的无障碍树（AI场景下最佳选择） agent-browser eval <js> # 运行JavaScript代码 agent-browser close # 关闭浏览器（别名：quit、exit）

信息获取

agent-browser get text <sel> # 获取元素文本内容 agent-browser get html <sel> # 获取元素innerHTML agent-browser get value <sel> # 获取输入框的值 agent-browser get attr <sel> <attr> # 获取元素属性 agent-browser get title # 获取页面标题 agent-browser get url # 获取当前URL agent-browser get count <sel> # 统计匹配元素的数量 agent-browser get box <sel> # 获取元素的边界框

状态检查

agent-browser is visible <sel> # 检查元素是否可见 agent-browser is enabled <sel> # 检查元素是否可用 agent-browser is checked <sel> # 检查复选框是否已勾选

元素查找（语义定位器）

agent-browser find role <role> <action> [value] # 通过ARIA角色查找 agent-browser find text <text> <action> # 通过文本内容查找 agent-browser find label <label> <action> [value] # 通过标签查找 agent-browser find placeholder <ph> <action> [value] # 通过占位符查找 agent-browser find alt <text> <action> # 通过alt文本查找 agent-browser find title <text> <action> # 通过title属性查找 agent-browser find testid <id> <action> [value] # 通过data-testid查找 agent-browser find first <sel> <action> [value] # 查找第一个匹配元素 agent-browser find last <sel> <action> [value] # 查找最后一个匹配元素 agent-browser find nth <n> <sel> <action> [value] # 查找第n个匹配元素

支持的操作：click、fill、check、hover、text

示例：

agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find label "Email" fill "test@test.com" agent-browser find first ".item" click agent-browser find nth 2 "a" text

等待操作

agent-browser wait <selector> # 等待元素变为可见 agent-browser wait <ms> # 等待指定时长（毫秒） agent-browser wait --text "Welcome" # 等待指定文本出现 agent-browser wait --url "**/dash" # 等待URL匹配指定模式 agent-browser wait --load networkidle # 等待页面加载完成 agent-browser wait --fn "window.ready === true" # 等待JavaScript条件满足

加载状态选项：load、domcontentloaded、networkidle 鼠标控制

agent-browser mouse move <x> <y> # 移动鼠标 agent-browser mouse down [button] # 按下鼠标按键（left/right/middle） agent-browser mouse up [button] # 释放鼠标按键 agent-browser mouse wheel <dy> [dx] # 滚动鼠标滚轮

浏览器设置

agent-browser set viewport <w> <h> # 设置视口大小 agent-browser set device <name> # 模拟指定设备（如"iPhone 14"） agent-browser set geo <lat> <lng> # 设置地理位置 agent-browser set offline [on|off] # 切换离线模式 agent-browser set headers <json> # 设置额外的HTTP请求头 agent-browser set credentials <u> <p> # 设置HTTP基础认证信息 agent-browser set media [dark|light] # 模拟配色方案

Cookie与存储

agent-browser cookies # 获取所有Cookie agent-browser cookies set <name> <val> # 设置Cookie agent-browser cookies clear # 清除所有Cookie

agent-browser storage local # 获取所有localStorage内容 agent-browser storage local <key> # 获取localStorage中指定键的值 agent-browser storage local set <k> <v> # 设置localStorage中的键值对 agent-browser storage local clear # 清空所有localStorage

agent-browser storage session # sessionStorage操作与localStorage一致

网络控制

agent-browser network route <url> # 拦截指定URL的请求 agent-browser network route <url> --abort # 阻止指定URL的请求 agent-browser network route <url> --body <json> # 模拟指定URL的响应内容 agent-browser network unroute [url] # 移除请求拦截规则 agent-browser network requests # 查看已跟踪的请求 agent-browser network requests --filter api # 筛选指定类型的请求（如api）

标签页与窗口

agent-browser tab # 列出所有标签页 agent-browser tab new [url] # 新建标签页（可指定打开的URL） agent-browser tab <n> # 切换到第n个标签页 agent-browser tab close [n] # 关闭标签页（默认关闭当前标签页） agent-browser window new # 新建浏览器窗口

框架操作

agent-browser frame <sel> # 切换到指定iframe agent-browser frame main # 返回主框架

对话框处理

agent-browser dialog accept [text] # 确认对话框（可输入提示文本） agent-browser dialog dismiss # 取消对话框

调试功能

agent-browser trace start [path] # 开始记录性能追踪 agent-browser trace stop [path] # 停止并保存性能追踪记录 agent-browser console # 查看控制台消息 agent-browser console --clear # 清空控制台 agent-browser errors # 查看页面错误 agent-browser errors --clear # 清空页面错误记录 agent-browser highlight <sel> # 高亮显示指定元素 agent-browser state save <path> # 保存认证状态 agent-browser state load <path> # 加载认证状态

导航操作

agent-browser back # 后退到上一页 agent-browser forward # 前进到下一页 agent-browser reload # 重新加载页面

安装设置

agent-browser install # 下载Chromium浏览器 agent-browser install --with-deps # 同时安装系统依赖（Linux环境）

会话管理

运行多个独立的浏览器实例：

Different sessions

使用不同会话

agent-browser --session agent1 open site-a.com agent-browser --session agent2 open site-b.com

Or via environment variable

或通过环境变量指定会话

AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"

List active sessions

列出所有活跃会话

agent-browser session list

Output:

输出示例：

Active sessions:

-> default

agent1

Show current session

查看当前会话

agent-browser session

Each session has its own:

Browser instance
Cookies and storage
Navigation history
Authentication state

Snapshot Options

The snapshot command supports filtering to reduce output size:

agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (buttons, inputs, links) agent-browser snapshot -c # Compact (remove empty structural elements) agent-browser snapshot -d 3 # Limit depth to 3 levels agent-browser snapshot -s "#main" # Scope to CSS selector agent-browser snapshot -i -c -d 5 # Combine options

Option Description -i, --interactive Only show interactive elements (buttons, links, inputs) -c, --compact Remove empty structural elements -d, --depth <n> Limit tree depth -s, --selector <sel> Scope to CSS selector Options Option Description --session <name> Use isolated session (or AGENT_BROWSER_SESSION env) --headers <json> Set HTTP headers scoped to the URL's origin --executable-path <path> Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) --json JSON output (for agents) --full, -f Full page screenshot --name, -n Locator name filter --exact Exact text match --headed Show browser window (not headless) --cdp <port> Connect via Chrome DevTools Protocol --debug Debug output Selectors Refs (Recommended for AI)

Refs provide deterministic element selection from snapshots:

agent-browser session

每个会话拥有独立的：

浏览器实例
Cookie与存储
浏览历史
认证状态

快照选项

快照命令支持过滤功能以减少输出内容：

agent-browser snapshot # 获取完整无障碍树 agent-browser snapshot -i # 仅获取交互式元素（按钮、输入框、链接等） agent-browser snapshot -c # 精简模式（移除空的结构性元素） agent-browser snapshot -d 3 # 限制树的深度为3层 agent-browser snapshot -s "#main" # 仅获取指定CSS选择器范围内的内容 agent-browser snapshot -i -c -d 5 # 组合使用多个选项

选项说明 -i, --interactive 仅显示交互式元素（按钮、链接、输入框等） -c, --compact 移除空的结构性元素 -d, --depth <n> 限制树的深度 -s, --selector <sel> 限定到指定CSS选择器范围全局选项选项说明 --session <name> 使用独立会话（或通过AGENT_BROWSER_SESSION环境变量指定） --headers <json> 为当前URL的源设置HTTP请求头 --executable-path <path> 指定自定义浏览器可执行文件路径（或通过AGENT_BROWSER_EXECUTABLE_PATH环境变量指定） --json 输出JSON格式内容（适用于Agent） --full, -f 截取整页截图 --name, -n 按定位器名称过滤 --exact 精确匹配文本 --headed 显示浏览器窗口（非无头模式） --cdp <port> 通过Chrome DevTools Protocol连接 --debug 输出调试信息选择器类型引用选择器（推荐AI使用）

引用选择器通过快照提供确定性的元素选择：

1. Get snapshot with refs

1. 获取带引用的快照

agent-browser snapshot

Output:

输出示例：

- heading "Example Domain" [ref=e1] [level=1]

- button "Submit" [ref=e2]

- textbox "Email" [ref=e3]

- link "Learn more" [ref=e4]

2. Use refs to interact

2. 使用引用选择器进行交互

agent-browser click @e2 # Click the button agent-browser fill @e3 "test@example.com" # Fill the textbox agent-browser get text @e1 # Get heading text agent-browser hover @e4 # Hover the link

Why use refs?

Deterministic: Ref points to exact element from snapshot
Fast: No DOM re-query needed
AI-friendly: Snapshot + ref workflow is optimal for LLMs

CSS Selectors

agent-browser click "#id" agent-browser click ".class" agent-browser click "div > button"

Text & XPath

agent-browser click "text=Submit" agent-browser click "xpath=//button"

Semantic Locators

agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com"

Agent Mode

Use --json for machine-readable output:

agent-browser snapshot --json

agent-browser click @e2 # 点击按钮 agent-browser fill @e3 "test@example.com" # 填充文本框 agent-browser get text @e1 # 获取标题文本 agent-browser hover @e4 # 悬停在链接上

为什么使用引用选择器？

确定性：引用指向快照中的精确元素
高效：无需重新查询DOM
AI友好：快照+引用的工作流最适合LLM

CSS选择器

agent-browser click "#id" agent-browser click ".class" agent-browser click "div > button"

文本与XPath选择器

agent-browser click "text=Submit" agent-browser click "xpath=//button"

语义定位器

agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com"

Agent模式

使用--json参数获取机器可读的输出：

agent-browser snapshot --json

Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}

返回示例：{"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}

agent-browser get text @e1 --json agent-browser is visible @e2 --json

Optimal AI Workflow

agent-browser get text @e1 --json agent-browser is visible @e2 --json

优化的AI工作流

1. Navigate and get snapshot

1. 跳转页面并获取快照

agent-browser open example.com agent-browser snapshot -i --json # AI parses tree and refs

agent-browser open example.com agent-browser snapshot -i --json # AI解析无障碍树与引用

2. AI identifies target refs from snapshot

2. AI从快照中识别目标元素的引用

3. Execute actions using refs

3. 使用引用执行操作

agent-browser click @e2 agent-browser fill @e3 "input text"

agent-browser click @e2 agent-browser fill @e3 "输入文本"

4. Get new snapshot if page changed

4. 若页面发生变化，重新获取快照

agent-browser snapshot -i --json

Headed Mode

Show the browser window for debugging:

agent-browser open example.com --headed

This opens a visible browser window instead of running headless. Authenticated Sessions

Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:

agent-browser snapshot -i --json

有头模式

显示浏览器窗口以便调试：

agent-browser open example.com --headed

此命令会打开一个可见的浏览器窗口，而非以无头模式运行。认证会话

使用--headers参数为特定源设置HTTP请求头，无需登录流程即可实现认证：

Headers are scoped to api.example.com only

请求头仅作用于api.example.com

agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'

Requests to api.example.com include the auth header

发送到api.example.com的请求会包含认证头

agent-browser snapshot -i --json agent-browser click @e2

Navigate to another domain - headers are NOT sent (safe!)

跳转到其他域名时，不会发送该请求头（安全可靠！）

agent-browser open other-site.com

This is useful for:

Skipping login flows - Authenticate via headers instead of UI
Switching users - Start new sessions with different auth tokens
API testing - Access protected endpoints directly
Security - Headers are scoped to the origin, not leaked to other domains

To set headers for multiple origins, use --headers with each open command:

agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}' agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'

For global headers (all domains), use set headers:

agent-browser set headers '{"X-Custom-Header": "value"}'

Custom Browser Executable

Use a custom browser executable instead of the bundled Chromium. This is useful for:

Serverless deployment: Use lightweight Chromium builds like @sparticuz/chromium (~50MB vs ~684MB)
System browsers: Use an existing Chrome/Chromium installation
Custom builds: Use modified browser builds

CLI Usage

agent-browser open other-site.com

此功能适用于：

跳过登录流程 - 通过请求头直接认证
切换用户 - 使用不同认证令牌启动新会话
API测试 - 直接访问受保护的端点
安全性 - 请求头仅作用于指定源，不会泄露到其他域名

如需为多个源设置请求头，请在每次执行open命令时使用--headers参数：

agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}' agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'

如需设置全局请求头（作用于所有域名），请使用set headers命令：

agent-browser set headers '{"X-Custom-Header": "value"}'

自定义浏览器可执行文件

使用自定义浏览器可执行文件替代内置的Chromium，适用于以下场景：

无服务器部署：使用轻量级Chromium构建版本（如@sparticuz/chromium，约50MB，对比内置的684MB）
系统浏览器：使用已安装的Chrome/Chromium
自定义构建：使用修改后的浏览器版本

CLI使用方式

Via flag

通过命令行参数指定

agent-browser --executable-path /path/to/chromium open example.com

Via environment variable

或通过环境变量指定

AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com

Serverless Example (Vercel/AWS Lambda)

import chromium from '@sparticuz/chromium'; import { BrowserManager } from 'agent-browser';

export async function handler() { const browser = new BrowserManager(); await browser.launch({ executablePath: await chromium.executablePath(), headless: true, }); // ... use browser }

AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com

无服务器部署示例（Vercel/AWS Lambda）

import chromium from '@sparticuz/chromium'; import { BrowserManager } from 'agent-browser';

export async function handler() { const browser = new BrowserManager(); await browser.launch({ executablePath: await chromium.executablePath(), headless: true, }); // ... 使用浏览器实例 }

interact-with-browser

Original

Translation

Different sessions

使用不同会话

Or via environment variable

或通过环境变量指定会话

List active sessions

列出所有活跃会话

Output:

输出示例：

Active sessions:

Active sessions:

-> default

-> default

agent1

agent1

Show current session

查看当前会话

1. Get snapshot with refs

1. 获取带引用的快照

Output:

输出示例：

- heading "Example Domain" [ref=e1] [level=1]

- heading "Example Domain" [ref=e1] [level=1]

- button "Submit" [ref=e2]

- button "Submit" [ref=e2]

- textbox "Email" [ref=e3]

- textbox "Email" [ref=e3]

- link "Learn more" [ref=e4]

- link "Learn more" [ref=e4]

2. Use refs to interact

2. 使用引用选择器进行交互

Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}

返回示例：{"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}

1. Navigate and get snapshot

1. 跳转页面并获取快照

2. AI identifies target refs from snapshot

2. AI从快照中识别目标元素的引用

3. Execute actions using refs

3. 使用引用执行操作

4. Get new snapshot if page changed

4. 若页面发生变化，重新获取快照

Headers are scoped to api.example.com only

请求头仅作用于api.example.com

Requests to api.example.com include the auth header

发送到api.example.com的请求会包含认证头

Navigate to another domain - headers are NOT sent (safe!)

跳转到其他域名时，不会发送该请求头（安全可靠！）

Via flag

通过命令行参数指定

Via environment variable

或通过环境变量指定