agent-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseagent-browser: CLI Browser Automation
agent-browser:CLI浏览器自动化
Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
Vercel专为AI Agent设计的无头浏览器自动化CLI。通过无障碍快照实现基于引用的元素选择(@e1、@e2)。
Setup Check
安装检查
bash
undefinedbash
undefinedCheck installation
Check installation
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED - run: npm install -g agent-browser && agent-browser install"
undefinedcommand -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED - run: npm install -g agent-browser && agent-browser install"
undefinedInstall if needed
按需安装
bash
npm install -g agent-browser
agent-browser install # Downloads Chromiumbash
npm install -g agent-browser
agent-browser install # Downloads ChromiumCore Workflow
核心工作流
The snapshot + ref pattern is optimal for LLMs:
- Navigate to URL
- Snapshot to get interactive elements with refs
- Interact using refs (@e1, @e2, etc.)
- Re-snapshot after navigation or DOM changes
bash
undefined快照+引用模式是LLM的最优选择:
- 导航至URL
- 生成快照以获取带引用的可交互元素
- 交互使用引用(@e1、@e2等)
- 重新生成快照在导航或DOM变更后
bash
undefinedStep 1: Open URL
Step 1: Open URL
agent-browser open https://example.com
agent-browser open https://example.com
Step 2: Get interactive elements with refs
Step 2: Get interactive elements with refs
agent-browser snapshot -i --json
agent-browser snapshot -i --json
Step 3: Interact using refs
Step 3: Interact using refs
agent-browser click @e1
agent-browser fill @e2 "search query"
agent-browser click @e1
agent-browser fill @e2 "search query"
Step 4: Re-snapshot after changes
Step 4: Re-snapshot after changes
agent-browser snapshot -i
undefinedagent-browser snapshot -i
undefinedKey Commands
核心命令
Navigation
导航
bash
agent-browser open <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browserbash
agent-browser open <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browserSnapshots (Essential for AI)
快照(AI必备)
bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -i --json # JSON output for parsing
agent-browser snapshot -c # Compact (remove empty elements)
agent-browser snapshot -d 3 # Limit depthbash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -i --json # JSON output for parsing
agent-browser snapshot -c # Compact (remove empty elements)
agent-browser snapshot -d 3 # Limit depthInteractions
交互操作
bash
agent-browser click @e1 # Click element
agent-browser dblclick @e1 # Double-click
agent-browser fill @e1 "text" # Clear and fill input
agent-browser type @e1 "text" # Type without clearing
agent-browser press Enter # Press key
agent-browser hover @e1 # Hover element
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck checkbox
agent-browser select @e1 "option" # Select dropdown option
agent-browser scroll down 500 # Scroll (up/down/left/right)
agent-browser scrollintoview @e1 # Scroll element into viewbash
agent-browser click @e1 # Click element
agent-browser dblclick @e1 # Double-click
agent-browser fill @e1 "text" # Clear and fill input
agent-browser type @e1 "text" # Type without clearing
agent-browser press Enter # Press key
agent-browser hover @e1 # Hover element
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck checkbox
agent-browser select @e1 "option" # Select dropdown option
agent-browser scroll down 500 # Scroll (up/down/left/right)
agent-browser scrollintoview @e1 # Scroll element into viewGet Information
获取信息
bash
agent-browser get text @e1 # Get element text
agent-browser get html @e1 # Get element HTML
agent-browser get value @e1 # Get input value
agent-browser get attr href @e1 # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count "button" # Count matching elementsbash
agent-browser get text @e1 # Get element text
agent-browser get html @e1 # Get element HTML
agent-browser get value @e1 # Get input value
agent-browser get attr href @e1 # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count "button" # Count matching elementsScreenshots & PDFs
截图与PDF
bash
agent-browser screenshot # Viewport screenshot
agent-browser screenshot --full # Full page
agent-browser screenshot output.png # Save to file
agent-browser screenshot --full output.png # Full page to file
agent-browser pdf output.pdf # Save as PDFbash
agent-browser screenshot # Viewport screenshot
agent-browser screenshot --full # Full page
agent-browser screenshot output.png # Save to file
agent-browser screenshot --full output.png # Full page to file
agent-browser pdf output.pdf # Save as PDFWait
等待操作
bash
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait "text" # Wait for text to appearbash
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait "text" # Wait for text to appearSemantic Locators (Alternative to Refs)
语义定位器(引用的替代方案)
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign up" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." fill "query"bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign up" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." fill "query"Sessions (Parallel Browsers)
会话(并行浏览器)
bash
undefinedbash
undefinedRun multiple independent browser sessions
Run multiple independent browser sessions
agent-browser --session browser1 open https://site1.com
agent-browser --session browser2 open https://site2.com
agent-browser --session browser1 open https://site1.com
agent-browser --session browser2 open https://site2.com
List active sessions
List active sessions
agent-browser session list
undefinedagent-browser session list
undefinedExamples
示例
Login Flow
登录流程
bash
agent-browser open https://app.example.com/login
agent-browser snapshot -ibash
agent-browser open https://app.example.com/login
agent-browser snapshot -iOutput shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i # Verify logged in
undefinedagent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i # Verify logged in
undefinedSearch and Extract
搜索与提取
bash
agent-browser open https://news.ycombinator.com
agent-browser snapshot -i --jsonbash
agent-browser open https://news.ycombinator.com
agent-browser snapshot -i --jsonParse JSON to find story links
Parse JSON to find story links
agent-browser get text @e12 # Get headline text
agent-browser click @e12 # Click to open story
undefinedagent-browser get text @e12 # Get headline text
agent-browser click @e12 # Click to open story
undefinedForm Filling
表单填写
bash
agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4 # Agree to terms
agent-browser click @e5 # Submit button
agent-browser screenshot confirmation.pngbash
agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4 # Agree to terms
agent-browser click @e5 # Submit button
agent-browser screenshot confirmation.pngDebug Mode
调试模式
bash
undefinedbash
undefinedRun with visible browser window
Run with visible browser window
agent-browser --headed open https://example.com
agent-browser --headed snapshot -i
agent-browser --headed click @e1
undefinedagent-browser --headed open https://example.com
agent-browser --headed snapshot -i
agent-browser --headed click @e1
undefinedJSON Output
JSON输出
Add for structured output:
--jsonbash
agent-browser snapshot -i --jsonReturns:
json
{
"success": true,
"data": {
"refs": {
"e1": {"name": "Submit", "role": "button"},
"e2": {"name": "Email", "role": "textbox"}
},
"snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
}
}添加参数以获取结构化输出:
--jsonbash
agent-browser snapshot -i --json返回结果:
json
{
"success": true,
"data": {
"refs": {
"e1": {"name": "Submit", "role": "button"},
"e2": {"name": "Email", "role": "textbox"}
},
"snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
}
}vs Playwright MCP
与Playwright MCP对比
| Feature | agent-browser (CLI) | Playwright MCP |
|---|---|---|
| Interface | Bash commands | MCP tools |
| Selection | Refs (@e1) | Refs (e1) |
| Output | Text/JSON | Tool responses |
| Parallel | Sessions | Tabs |
| Best for | Quick automation | Tool integration |
Use agent-browser when:
- You prefer Bash-based workflows
- You want simpler CLI commands
- You need quick one-off automation
Use Playwright MCP when:
- You need deep MCP tool integration
- You want tool-based responses
- You're building complex automation
| 功能 | agent-browser(CLI) | Playwright MCP |
|---|---|---|
| 交互界面 | Bash命令 | MCP工具 |
| 元素选择 | 引用(@e1) | 引用(e1) |
| 输出格式 | 文本/JSON | 工具响应 |
| 并行处理 | 会话 | 标签页 |
| 适用场景 | 快速自动化 | 工具集成 |
使用agent-browser的场景:
- 你偏好基于Bash的工作流
- 你需要更简洁的CLI命令
- 你需要快速的一次性自动化任务
使用Playwright MCP的场景:
- 你需要深度集成MCP工具
- 你需要基于工具的响应
- 你正在构建复杂的自动化系统