interact-with-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBE SURE TO CLEAN UP SCREEN SHOTS AFTER YOU ARE DONE WITH EVERYTHING
IF THIS NEEDS TO BE INSTALLED
npm install -g agent-browser
agent-browser install # to get chromium downloadedagent-browser open example.com
agent-browser snapshot # Get accessibility tree with refs
agent-browser click @e2 # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill by ref
agent-browser get text @e1 # Get text by ref
agent-browser screenshot page.png
agent-browser close
Traditional Selectors (also supported)
agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"
Commands
Core Commands
agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
agent-browser click <sel> # Click element
agent-browser dblclick <sel> # Double-click element
agent-browser focus <sel> # Focus element
agent-browser type <sel> <text> # Type into element
agent-browser fill <sel> <text> # Clear and fill
agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
agent-browser keydown <key> # Hold key down
agent-browser keyup <key> # Release key
agent-browser hover <sel> # Hover element
agent-browser select <sel> <val> # Select dropdown option
agent-browser check <sel> # Check checkbox
agent-browser uncheck <sel> # Uncheck checkbox
agent-browser scroll <dir> [px] # Scroll (up/down/left/right)
agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
agent-browser drag <src> <tgt> # Drag and drop
agent-browser upload <sel> <files> # Upload files
agent-browser screenshot [path] # Take screenshot (--full for full page)
agent-browser pdf <path> # Save as PDF
agent-browser snapshot # Accessibility tree with refs (best for AI)
agent-browser eval <js> # Run JavaScript
agent-browser close # Close browser (aliases: quit, exit)
Get Info
agent-browser get text <sel> # Get text content
agent-browser get html <sel> # Get innerHTML
agent-browser get value <sel> # Get input value
agent-browser get attr <sel> <attr> # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count <sel> # Count matching elements
agent-browser get box <sel> # Get bounding box
Check State
agent-browser is visible <sel> # Check if visible
agent-browser is enabled <sel> # Check if enabled
agent-browser is checked <sel> # Check if checked
Find Elements (Semantic Locators)
agent-browser find role <role> <action> [value] # By ARIA role
agent-browser find text <text> <action> # By text content
agent-browser find label <label> <action> [value] # By label
agent-browser find placeholder <ph> <action> [value] # By placeholder
agent-browser find alt <text> <action> # By alt text
agent-browser find title <text> <action> # By title attr
agent-browser find testid <id> <action> [value] # By data-testid
agent-browser find first <sel> <action> [value] # First match
agent-browser find last <sel> <action> [value] # Last match
agent-browser find nth <n> <sel> <action> [value] # Nth match
Actions: click, fill, check, hover, text
Examples:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
Wait
agent-browser wait <selector> # Wait for element to be visible
agent-browser wait <ms> # Wait for time (milliseconds)
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --url "**/dash" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for load state
agent-browser wait --fn "window.ready === true" # Wait for JS condition
Load states: load, domcontentloaded, networkidle
Mouse Control
agent-browser mouse move <x> <y> # Move mouse
agent-browser mouse down [button] # Press button (left/right/middle)
agent-browser mouse up [button] # Release button
agent-browser mouse wheel <dy> [dx] # Scroll wheel
Browser Settings
agent-browser set viewport <w> <h> # Set viewport size
agent-browser set device <name> # Emulate device ("iPhone 14")
agent-browser set geo <lat> <lng> # Set geolocation
agent-browser set offline [on|off] # Toggle offline mode
agent-browser set headers <json> # Extra HTTP headers
agent-browser set credentials <u> <p> # HTTP basic auth
agent-browser set media [dark|light] # Emulate color scheme
Cookies & Storage
agent-browser cookies # Get all cookies
agent-browser cookies set <name> <val> # Set cookie
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage
agent-browser storage local <key> # Get specific key
agent-browser storage local set <k> <v> # Set value
agent-browser storage local clear # Clear all
agent-browser storage session # Same for sessionStorage
Network
agent-browser network route <url> # Intercept requests
agent-browser network route <url> --abort # Block requests
agent-browser network route <url> --body <json> # Mock response
agent-browser network unroute [url] # Remove routes
agent-browser network requests # View tracked requests
agent-browser network requests --filter api # Filter requests
Tabs & Windows
agent-browser tab # List tabs
agent-browser tab new [url] # New tab (optionally with URL)
agent-browser tab <n> # Switch to tab n
agent-browser tab close [n] # Close tab
agent-browser window new # New window
Frames
agent-browser frame <sel> # Switch to iframe
agent-browser frame main # Back to main frame
Dialogs
agent-browser dialog accept [text] # Accept (with optional prompt text)
agent-browser dialog dismiss # Dismiss
Debug
agent-browser trace start [path] # Start recording trace
agent-browser trace stop [path] # Stop and save trace
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear # Clear errors
agent-browser highlight <sel> # Highlight element
agent-browser state save <path> # Save auth state
agent-browser state load <path> # Load auth state
Navigation
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
Setup
agent-browser install # Download Chromium browser
agent-browser install --with-deps # Also install system deps (Linux)
Sessions
Run multiple isolated browser instances:
完成所有操作后,请务必清理截图
如需安装,请执行以下命令
npm install -g agent-browser
agent-browser install # 下载Chromium浏览器agent-browser open example.com
agent-browser snapshot # 获取带引用的无障碍树
agent-browser click @e2 # 通过快照中的引用点击元素
agent-browser fill @e3 "test@example.com" # 通过快照中的引用填充内容
agent-browser get text @e1 # 通过快照中的引用获取文本
agent-browser screenshot page.png
agent-browser close
传统选择器(同样支持)
agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"
命令列表
核心命令
agent-browser open <url> # 跳转到指定URL(别名:goto、navigate)
agent-browser click <sel> # 点击元素
agent-browser dblclick <sel> # 双击元素
agent-browser focus <sel> # 聚焦元素
agent-browser type <sel> <text> # 在元素中输入文本
agent-browser fill <sel> <text> # 清空并填充元素
agent-browser press <key> # 按下按键(Enter、Tab、Control+a等)(别名:key)
agent-browser keydown <key> # 按住按键
agent-browser keyup <key> # 释放按键
agent-browser hover <sel> # 悬停在元素上
agent-browser select <sel> <val> # 选择下拉选项
agent-browser check <sel> # 勾选复选框
agent-browser uncheck <sel> # 取消勾选复选框
agent-browser scroll <dir> [px] # 滚动页面(方向:up/down/left/right,可指定像素数)
agent-browser scrollintoview <sel> # 将元素滚动到可视区域(别名:scrollinto)
agent-browser drag <src> <tgt> # 拖放元素
agent-browser upload <sel> <files> # 上传文件
agent-browser screenshot [path] # 截取屏幕截图(--full参数可截取整页)
agent-browser pdf <path> # 将页面保存为PDF
agent-browser snapshot # 获取带引用的无障碍树(AI场景下最佳选择)
agent-browser eval <js> # 运行JavaScript代码
agent-browser close # 关闭浏览器(别名:quit、exit)
信息获取
agent-browser get text <sel> # 获取元素文本内容
agent-browser get html <sel> # 获取元素innerHTML
agent-browser get value <sel> # 获取输入框的值
agent-browser get attr <sel> <attr> # 获取元素属性
agent-browser get title # 获取页面标题
agent-browser get url # 获取当前URL
agent-browser get count <sel> # 统计匹配元素的数量
agent-browser get box <sel> # 获取元素的边界框
状态检查
agent-browser is visible <sel> # 检查元素是否可见
agent-browser is enabled <sel> # 检查元素是否可用
agent-browser is checked <sel> # 检查复选框是否已勾选
元素查找(语义定位器)
agent-browser find role <role> <action> [value] # 通过ARIA角色查找
agent-browser find text <text> <action> # 通过文本内容查找
agent-browser find label <label> <action> [value] # 通过标签查找
agent-browser find placeholder <ph> <action> [value] # 通过占位符查找
agent-browser find alt <text> <action> # 通过alt文本查找
agent-browser find title <text> <action> # 通过title属性查找
agent-browser find testid <id> <action> [value] # 通过data-testid查找
agent-browser find first <sel> <action> [value] # 查找第一个匹配元素
agent-browser find last <sel> <action> [value] # 查找最后一个匹配元素
agent-browser find nth <n> <sel> <action> [value] # 查找第n个匹配元素
支持的操作:click、fill、check、hover、text
示例:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
等待操作
agent-browser wait <selector> # 等待元素变为可见
agent-browser wait <ms> # 等待指定时长(毫秒)
agent-browser wait --text "Welcome" # 等待指定文本出现
agent-browser wait --url "**/dash" # 等待URL匹配指定模式
agent-browser wait --load networkidle # 等待页面加载完成
agent-browser wait --fn "window.ready === true" # 等待JavaScript条件满足
加载状态选项:load、domcontentloaded、networkidle
鼠标控制
agent-browser mouse move <x> <y> # 移动鼠标
agent-browser mouse down [button] # 按下鼠标按键(left/right/middle)
agent-browser mouse up [button] # 释放鼠标按键
agent-browser mouse wheel <dy> [dx] # 滚动鼠标滚轮
浏览器设置
agent-browser set viewport <w> <h> # 设置视口大小
agent-browser set device <name> # 模拟指定设备(如"iPhone 14")
agent-browser set geo <lat> <lng> # 设置地理位置
agent-browser set offline [on|off] # 切换离线模式
agent-browser set headers <json> # 设置额外的HTTP请求头
agent-browser set credentials <u> <p> # 设置HTTP基础认证信息
agent-browser set media [dark|light] # 模拟配色方案
Cookie与存储
agent-browser cookies # 获取所有Cookie
agent-browser cookies set <name> <val> # 设置Cookie
agent-browser cookies clear # 清除所有Cookie
agent-browser storage local # 获取所有localStorage内容
agent-browser storage local <key> # 获取localStorage中指定键的值
agent-browser storage local set <k> <v> # 设置localStorage中的键值对
agent-browser storage local clear # 清空所有localStorage
agent-browser storage session # sessionStorage操作与localStorage一致
网络控制
agent-browser network route <url> # 拦截指定URL的请求
agent-browser network route <url> --abort # 阻止指定URL的请求
agent-browser network route <url> --body <json> # 模拟指定URL的响应内容
agent-browser network unroute [url] # 移除请求拦截规则
agent-browser network requests # 查看已跟踪的请求
agent-browser network requests --filter api # 筛选指定类型的请求(如api)
标签页与窗口
agent-browser tab # 列出所有标签页
agent-browser tab new [url] # 新建标签页(可指定打开的URL)
agent-browser tab <n> # 切换到第n个标签页
agent-browser tab close [n] # 关闭标签页(默认关闭当前标签页)
agent-browser window new # 新建浏览器窗口
框架操作
agent-browser frame <sel> # 切换到指定iframe
agent-browser frame main # 返回主框架
对话框处理
agent-browser dialog accept [text] # 确认对话框(可输入提示文本)
agent-browser dialog dismiss # 取消对话框
调试功能
agent-browser trace start [path] # 开始记录性能追踪
agent-browser trace stop [path] # 停止并保存性能追踪记录
agent-browser console # 查看控制台消息
agent-browser console --clear # 清空控制台
agent-browser errors # 查看页面错误
agent-browser errors --clear # 清空页面错误记录
agent-browser highlight <sel> # 高亮显示指定元素
agent-browser state save <path> # 保存认证状态
agent-browser state load <path> # 加载认证状态
导航操作
agent-browser back # 后退到上一页
agent-browser forward # 前进到下一页
agent-browser reload # 重新加载页面
安装设置
agent-browser install # 下载Chromium浏览器
agent-browser install --with-deps # 同时安装系统依赖(Linux环境)
会话管理
运行多个独立的浏览器实例:
Different sessions
使用不同会话
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
Or via environment variable
或通过环境变量指定会话
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
List active sessions
列出所有活跃会话
agent-browser session list
agent-browser session list
Output:
输出示例:
Active sessions:
Active sessions:
-> default
-> default
agent1
agent1
Show current session
查看当前会话
agent-browser session
Each session has its own:
Browser instance
Cookies and storage
Navigation history
Authentication stateSnapshot Options
The snapshot command supports filtering to reduce output size:
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options
Option Description
-i, --interactive Only show interactive elements (buttons, links, inputs)
-c, --compact Remove empty structural elements
-d, --depth <n> Limit tree depth
-s, --selector <sel> Scope to CSS selector
Options
Option Description
--session <name> Use isolated session (or AGENT_BROWSER_SESSION env)
--headers <json> Set HTTP headers scoped to the URL's origin
--executable-path <path> Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env)
--json JSON output (for agents)
--full, -f Full page screenshot
--name, -n Locator name filter
--exact Exact text match
--headed Show browser window (not headless)
--cdp <port> Connect via Chrome DevTools Protocol
--debug Debug output
Selectors
Refs (Recommended for AI)
Refs provide deterministic element selection from snapshots:
agent-browser session
每个会话拥有独立的:
浏览器实例
Cookie与存储
浏览历史
认证状态快照选项
快照命令支持过滤功能以减少输出内容:
agent-browser snapshot # 获取完整无障碍树
agent-browser snapshot -i # 仅获取交互式元素(按钮、输入框、链接等)
agent-browser snapshot -c # 精简模式(移除空的结构性元素)
agent-browser snapshot -d 3 # 限制树的深度为3层
agent-browser snapshot -s "#main" # 仅获取指定CSS选择器范围内的内容
agent-browser snapshot -i -c -d 5 # 组合使用多个选项
选项 说明
-i, --interactive 仅显示交互式元素(按钮、链接、输入框等)
-c, --compact 移除空的结构性元素
-d, --depth <n> 限制树的深度
-s, --selector <sel> 限定到指定CSS选择器范围
全局选项
选项 说明
--session <name> 使用独立会话(或通过AGENT_BROWSER_SESSION环境变量指定)
--headers <json> 为当前URL的源设置HTTP请求头
--executable-path <path> 指定自定义浏览器可执行文件路径(或通过AGENT_BROWSER_EXECUTABLE_PATH环境变量指定)
--json 输出JSON格式内容(适用于Agent)
--full, -f 截取整页截图
--name, -n 按定位器名称过滤
--exact 精确匹配文本
--headed 显示浏览器窗口(非无头模式)
--cdp <port> 通过Chrome DevTools Protocol连接
--debug 输出调试信息
选择器类型
引用选择器(推荐AI使用)
引用选择器通过快照提供确定性的元素选择:
1. Get snapshot with refs
1. 获取带引用的快照
agent-browser snapshot
agent-browser snapshot
Output:
输出示例:
- heading "Example Domain" [ref=e1] [level=1]
- heading "Example Domain" [ref=e1] [level=1]
- button "Submit" [ref=e2]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
- textbox "Email" [ref=e3]
- link "Learn more" [ref=e4]
- link "Learn more" [ref=e4]
2. Use refs to interact
2. 使用引用选择器进行交互
agent-browser click @e2 # Click the button
agent-browser fill @e3 "test@example.com" # Fill the textbox
agent-browser get text @e1 # Get heading text
agent-browser hover @e4 # Hover the link
Why use refs?
Deterministic: Ref points to exact element from snapshot
Fast: No DOM re-query needed
AI-friendly: Snapshot + ref workflow is optimal for LLMsCSS Selectors
agent-browser click "#id"
agent-browser click ".class"
agent-browser click "div > button"
Text & XPath
agent-browser click "text=Submit"
agent-browser click "xpath=//button"
Semantic Locators
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
Agent Mode
Use --json for machine-readable output:
agent-browser snapshot --json
agent-browser click @e2 # 点击按钮
agent-browser fill @e3 "test@example.com" # 填充文本框
agent-browser get text @e1 # 获取标题文本
agent-browser hover @e4 # 悬停在链接上
为什么使用引用选择器?
确定性:引用指向快照中的精确元素
高效:无需重新查询DOM
AI友好:快照+引用的工作流最适合LLMCSS选择器
agent-browser click "#id"
agent-browser click ".class"
agent-browser click "div > button"
文本与XPath选择器
agent-browser click "text=Submit"
agent-browser click "xpath=//button"
语义定位器
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
Agent模式
使用--json参数获取机器可读的输出:
agent-browser snapshot --json
Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
返回示例:{"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
agent-browser get text @e1 --json
agent-browser is visible @e2 --json
Optimal AI Workflow
agent-browser get text @e1 --json
agent-browser is visible @e2 --json
优化的AI工作流
1. Navigate and get snapshot
1. 跳转页面并获取快照
agent-browser open example.com
agent-browser snapshot -i --json # AI parses tree and refs
agent-browser open example.com
agent-browser snapshot -i --json # AI解析无障碍树与引用
2. AI identifies target refs from snapshot
2. AI从快照中识别目标元素的引用
3. Execute actions using refs
3. 使用引用执行操作
agent-browser click @e2
agent-browser fill @e3 "input text"
agent-browser click @e2
agent-browser fill @e3 "输入文本"
4. Get new snapshot if page changed
4. 若页面发生变化,重新获取快照
agent-browser snapshot -i --json
Headed Mode
Show the browser window for debugging:
agent-browser open example.com --headed
This opens a visible browser window instead of running headless.
Authenticated Sessions
Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:
agent-browser snapshot -i --json
有头模式
显示浏览器窗口以便调试:
agent-browser open example.com --headed
此命令会打开一个可见的浏览器窗口,而非以无头模式运行。
认证会话
使用--headers参数为特定源设置HTTP请求头,无需登录流程即可实现认证:
Headers are scoped to api.example.com only
请求头仅作用于api.example.com
agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
Requests to api.example.com include the auth header
发送到api.example.com的请求会包含认证头
agent-browser snapshot -i --json
agent-browser click @e2
agent-browser snapshot -i --json
agent-browser click @e2
Navigate to another domain - headers are NOT sent (safe!)
跳转到其他域名时,不会发送该请求头(安全可靠!)
agent-browser open other-site.com
This is useful for:
Skipping login flows - Authenticate via headers instead of UI
Switching users - Start new sessions with different auth tokens
API testing - Access protected endpoints directly
Security - Headers are scoped to the origin, not leaked to other domainsTo set headers for multiple origins, use --headers with each open command:
agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
For global headers (all domains), use set headers:
agent-browser set headers '{"X-Custom-Header": "value"}'
Custom Browser Executable
Use a custom browser executable instead of the bundled Chromium. This is useful for:
Serverless deployment: Use lightweight Chromium builds like @sparticuz/chromium (~50MB vs ~684MB)
System browsers: Use an existing Chrome/Chromium installation
Custom builds: Use modified browser buildsCLI Usage
agent-browser open other-site.com
此功能适用于:
跳过登录流程 - 通过请求头直接认证
切换用户 - 使用不同认证令牌启动新会话
API测试 - 直接访问受保护的端点
安全性 - 请求头仅作用于指定源,不会泄露到其他域名如需为多个源设置请求头,请在每次执行open命令时使用--headers参数:
agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
如需设置全局请求头(作用于所有域名),请使用set headers命令:
agent-browser set headers '{"X-Custom-Header": "value"}'
自定义浏览器可执行文件
使用自定义浏览器可执行文件替代内置的Chromium,适用于以下场景:
无服务器部署:使用轻量级Chromium构建版本(如@sparticuz/chromium,约50MB,对比内置的684MB)
系统浏览器:使用已安装的Chrome/Chromium
自定义构建:使用修改后的浏览器版本CLI使用方式
Via flag
通过命令行参数指定
agent-browser --executable-path /path/to/chromium open example.com
agent-browser --executable-path /path/to/chromium open example.com
Via environment variable
或通过环境变量指定
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
Serverless Example (Vercel/AWS Lambda)
import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-browser';
export async function handler() {
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
// ... use browser
}
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
无服务器部署示例(Vercel/AWS Lambda)
import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-browser';
export async function handler() {
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
// ... 使用浏览器实例
}