agent-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

agent-browser: CLI Browser Automation

agent-browser:CLI浏览器自动化

Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
Vercel专为AI Agent设计的无头浏览器自动化CLI。通过无障碍快照实现基于引用的元素选择(@e1、@e2)。

Setup Check

安装检查

bash
undefined
bash
undefined

Check installation

Check installation

command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED - run: npm install -g agent-browser && agent-browser install"
undefined
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED - run: npm install -g agent-browser && agent-browser install"
undefined

Install if needed

按需安装

bash
npm install -g agent-browser
agent-browser install  # Downloads Chromium
bash
npm install -g agent-browser
agent-browser install  # Downloads Chromium

Core Workflow

核心工作流

The snapshot + ref pattern is optimal for LLMs:
  1. Navigate to URL
  2. Snapshot to get interactive elements with refs
  3. Interact using refs (@e1, @e2, etc.)
  4. Re-snapshot after navigation or DOM changes
bash
undefined
快照+引用模式是LLM的最优选择:
  1. 导航至URL
  2. 生成快照以获取带引用的可交互元素
  3. 交互使用引用(@e1、@e2等)
  4. 重新生成快照在导航或DOM变更后
bash
undefined

Step 1: Open URL

Step 1: Open URL

agent-browser open https://example.com
agent-browser open https://example.com

Step 2: Get interactive elements with refs

Step 2: Get interactive elements with refs

agent-browser snapshot -i --json
agent-browser snapshot -i --json

Step 3: Interact using refs

Step 3: Interact using refs

agent-browser click @e1 agent-browser fill @e2 "search query"
agent-browser click @e1 agent-browser fill @e2 "search query"

Step 4: Re-snapshot after changes

Step 4: Re-snapshot after changes

agent-browser snapshot -i
undefined
agent-browser snapshot -i
undefined

Key Commands

核心命令

Navigation

导航

bash
agent-browser open <url>       # Navigate to URL
agent-browser back             # Go back
agent-browser forward          # Go forward
agent-browser reload           # Reload page
agent-browser close            # Close browser
bash
agent-browser open <url>       # Navigate to URL
agent-browser back             # Go back
agent-browser forward          # Go forward
agent-browser reload           # Reload page
agent-browser close            # Close browser

Snapshots (Essential for AI)

快照(AI必备)

bash
agent-browser snapshot              # Full accessibility tree
agent-browser snapshot -i           # Interactive elements only (recommended)
agent-browser snapshot -i --json    # JSON output for parsing
agent-browser snapshot -c           # Compact (remove empty elements)
agent-browser snapshot -d 3         # Limit depth
bash
agent-browser snapshot              # Full accessibility tree
agent-browser snapshot -i           # Interactive elements only (recommended)
agent-browser snapshot -i --json    # JSON output for parsing
agent-browser snapshot -c           # Compact (remove empty elements)
agent-browser snapshot -d 3         # Limit depth

Interactions

交互操作

bash
agent-browser click @e1                    # Click element
agent-browser dblclick @e1                 # Double-click
agent-browser fill @e1 "text"              # Clear and fill input
agent-browser type @e1 "text"              # Type without clearing
agent-browser press Enter                  # Press key
agent-browser hover @e1                    # Hover element
agent-browser check @e1                    # Check checkbox
agent-browser uncheck @e1                  # Uncheck checkbox
agent-browser select @e1 "option"          # Select dropdown option
agent-browser scroll down 500              # Scroll (up/down/left/right)
agent-browser scrollintoview @e1           # Scroll element into view
bash
agent-browser click @e1                    # Click element
agent-browser dblclick @e1                 # Double-click
agent-browser fill @e1 "text"              # Clear and fill input
agent-browser type @e1 "text"              # Type without clearing
agent-browser press Enter                  # Press key
agent-browser hover @e1                    # Hover element
agent-browser check @e1                    # Check checkbox
agent-browser uncheck @e1                  # Uncheck checkbox
agent-browser select @e1 "option"          # Select dropdown option
agent-browser scroll down 500              # Scroll (up/down/left/right)
agent-browser scrollintoview @e1           # Scroll element into view

Get Information

获取信息

bash
agent-browser get text @e1          # Get element text
agent-browser get html @e1          # Get element HTML
agent-browser get value @e1         # Get input value
agent-browser get attr href @e1     # Get attribute
agent-browser get title             # Get page title
agent-browser get url               # Get current URL
agent-browser get count "button"    # Count matching elements
bash
agent-browser get text @e1          # Get element text
agent-browser get html @e1          # Get element HTML
agent-browser get value @e1         # Get input value
agent-browser get attr href @e1     # Get attribute
agent-browser get title             # Get page title
agent-browser get url               # Get current URL
agent-browser get count "button"    # Count matching elements

Screenshots & PDFs

截图与PDF

bash
agent-browser screenshot                      # Viewport screenshot
agent-browser screenshot --full               # Full page
agent-browser screenshot output.png           # Save to file
agent-browser screenshot --full output.png    # Full page to file
agent-browser pdf output.pdf                  # Save as PDF
bash
agent-browser screenshot                      # Viewport screenshot
agent-browser screenshot --full               # Full page
agent-browser screenshot output.png           # Save to file
agent-browser screenshot --full output.png    # Full page to file
agent-browser pdf output.pdf                  # Save as PDF

Wait

等待操作

bash
agent-browser wait @e1              # Wait for element
agent-browser wait 2000             # Wait milliseconds
agent-browser wait "text"           # Wait for text to appear
bash
agent-browser wait @e1              # Wait for element
agent-browser wait 2000             # Wait milliseconds
agent-browser wait "text"           # Wait for text to appear

Semantic Locators (Alternative to Refs)

语义定位器(引用的替代方案)

bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign up" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." fill "query"
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign up" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." fill "query"

Sessions (Parallel Browsers)

会话(并行浏览器)

bash
undefined
bash
undefined

Run multiple independent browser sessions

Run multiple independent browser sessions

agent-browser --session browser1 open https://site1.com agent-browser --session browser2 open https://site2.com
agent-browser --session browser1 open https://site1.com agent-browser --session browser2 open https://site2.com

List active sessions

List active sessions

agent-browser session list
undefined
agent-browser session list
undefined

Examples

示例

Login Flow

登录流程

bash
agent-browser open https://app.example.com/login
agent-browser snapshot -i
bash
agent-browser open https://app.example.com/login
agent-browser snapshot -i

Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]

Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait 2000 agent-browser snapshot -i # Verify logged in
undefined
agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait 2000 agent-browser snapshot -i # Verify logged in
undefined

Search and Extract

搜索与提取

bash
agent-browser open https://news.ycombinator.com
agent-browser snapshot -i --json
bash
agent-browser open https://news.ycombinator.com
agent-browser snapshot -i --json

Parse JSON to find story links

Parse JSON to find story links

agent-browser get text @e12 # Get headline text agent-browser click @e12 # Click to open story
undefined
agent-browser get text @e12 # Get headline text agent-browser click @e12 # Click to open story
undefined

Form Filling

表单填写

bash
agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4  # Agree to terms
agent-browser click @e5  # Submit button
agent-browser screenshot confirmation.png
bash
agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4  # Agree to terms
agent-browser click @e5  # Submit button
agent-browser screenshot confirmation.png

Debug Mode

调试模式

bash
undefined
bash
undefined

Run with visible browser window

Run with visible browser window

agent-browser --headed open https://example.com agent-browser --headed snapshot -i agent-browser --headed click @e1
undefined
agent-browser --headed open https://example.com agent-browser --headed snapshot -i agent-browser --headed click @e1
undefined

JSON Output

JSON输出

Add
--json
for structured output:
bash
agent-browser snapshot -i --json
Returns:
json
{
  "success": true,
  "data": {
    "refs": {
      "e1": {"name": "Submit", "role": "button"},
      "e2": {"name": "Email", "role": "textbox"}
    },
    "snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
  }
}
添加
--json
参数以获取结构化输出:
bash
agent-browser snapshot -i --json
返回结果:
json
{
  "success": true,
  "data": {
    "refs": {
      "e1": {"name": "Submit", "role": "button"},
      "e2": {"name": "Email", "role": "textbox"}
    },
    "snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
  }
}

vs Playwright MCP

与Playwright MCP对比

Featureagent-browser (CLI)Playwright MCP
InterfaceBash commandsMCP tools
SelectionRefs (@e1)Refs (e1)
OutputText/JSONTool responses
ParallelSessionsTabs
Best forQuick automationTool integration
Use agent-browser when:
  • You prefer Bash-based workflows
  • You want simpler CLI commands
  • You need quick one-off automation
Use Playwright MCP when:
  • You need deep MCP tool integration
  • You want tool-based responses
  • You're building complex automation
功能agent-browser(CLI)Playwright MCP
交互界面Bash命令MCP工具
元素选择引用(@e1)引用(e1)
输出格式文本/JSON工具响应
并行处理会话标签页
适用场景快速自动化工具集成
使用agent-browser的场景:
  • 你偏好基于Bash的工作流
  • 你需要更简洁的CLI命令
  • 你需要快速的一次性自动化任务
使用Playwright MCP的场景:
  • 你需要深度集成MCP工具
  • 你需要基于工具的响应
  • 你正在构建复杂的自动化系统