agent-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent Browser

Agent Browser

Browser automation using the
agent-browser
CLI - a fast, headless browser automation tool for AI agents.
使用
agent-browser
CLI实现浏览器自动化——一款面向AI Agent的快速无头浏览器自动化工具。

Installation

安装

bash
npm install -g agent-browser
agent-browser install  # Install browser binaries
bash
npm install -g agent-browser
agent-browser install  # Install browser binaries

Quick Start

快速开始

bash
undefined
bash
undefined

Navigate to a URL

Navigate to a URL

agent-browser open https://example.com
agent-browser open https://example.com

Get accessibility snapshot (shows refs like @e1, @e2)

Get accessibility snapshot (shows refs like @e1, @e2)

agent-browser snapshot -i
agent-browser snapshot -i

Click using ref from snapshot

Click using ref from snapshot

agent-browser click @e2
agent-browser click @e2

Type into an element

Type into an element

agent-browser fill @e3 "hello world"
agent-browser fill @e3 "hello world"

Take screenshot

Take screenshot

agent-browser screenshot output.png
undefined
agent-browser screenshot output.png
undefined

Workflow Pattern

工作流模式

  1. Open - Navigate to the target URL
  2. Snapshot - Get the accessibility tree to see available elements
  3. Interact - Use refs (@e1, @e2, etc.) to interact with elements
  4. Verify - Take a snapshot or screenshot to verify state
  1. 打开 - 导航至目标URL
  2. 快照 - 获取可访问性树以查看可用元素
  3. 交互 - 使用引用(@e1、@e2等)与元素交互
  4. 验证 - 拍摄快照或截图以验证状态

Core Commands

核心命令

See references/commands.md for the complete command reference.
完整命令参考请查看references/commands.md

Navigation

导航

bash
agent-browser open <url>           # Navigate to URL
agent-browser back                 # Go back
agent-browser forward              # Go forward
agent-browser reload               # Reload page
bash
agent-browser open <url>           # Navigate to URL
agent-browser back                 # Go back
agent-browser forward              # Go forward
agent-browser reload               # Reload page

Interaction

交互

bash
agent-browser click <sel>          # Click element (or @ref)
agent-browser fill <sel> <text>    # Clear and fill
agent-browser press <key>          # Press key (Enter, Tab, etc.)
agent-browser select <sel> <val>   # Select dropdown option
bash
agent-browser click <sel>          # Click element (or @ref)
agent-browser fill <sel> <text>    # Clear and fill
agent-browser press <key>          # Press key (Enter, Tab, etc.)
agent-browser select <sel> <val>   # Select dropdown option

Getting Information

获取信息

bash
agent-browser snapshot             # Accessibility tree with refs
agent-browser snapshot -i          # Interactive elements only
agent-browser get text <sel>       # Get element text
agent-browser get url              # Get current URL
bash
agent-browser snapshot             # Accessibility tree with refs
agent-browser snapshot -i          # Interactive elements only
agent-browser get text <sel>       # Get element text
agent-browser get url              # Get current URL

Capture

捕获

bash
agent-browser screenshot [path]    # Take screenshot
agent-browser screenshot --full    # Full page screenshot
agent-browser pdf <path>           # Save as PDF
bash
agent-browser screenshot [path]    # Take screenshot
agent-browser screenshot --full    # Full page screenshot
agent-browser pdf <path>           # Save as PDF

Sessions

会话

Use sessions to maintain browser state across commands:
bash
agent-browser --session myproject open https://example.com
agent-browser --session myproject snapshot
agent-browser --session myproject click @e1
使用会话在多个命令间维持浏览器状态:
bash
agent-browser --session myproject open https://example.com
agent-browser --session myproject snapshot
agent-browser --session myproject click @e1

Selectors

选择器

  • Refs:
    @e1
    ,
    @e2
    (from snapshot output) - preferred
  • CSS:
    #id
    ,
    .class
    ,
    div > span
  • Text:
    text=Submit
  • Role:
    role=button[name="Submit"]
  • 引用(Refs)
    @e1
    @e2
    (来自快照输出) - 推荐使用
  • CSS选择器
    #id
    .class
    div > span
  • 文本选择器
    text=Submit
  • 角色选择器
    role=button[name="Submit"]

Best Practices

最佳实践

  1. Always snapshot first - Get the accessibility tree before interacting
  2. Use refs - Prefer
    @e1
    refs from snapshot over CSS selectors
  3. Use sessions - Maintain state across multiple commands
  4. Wait appropriately - Use
    wait
    for dynamic content
  5. Verify actions - Snapshot or screenshot after interactions
  1. 始终先获取快照 - 交互前先获取可访问性树
  2. 使用引用 - 优先使用快照中的
    @e1
    引用而非CSS选择器
  3. 使用会话 - 在多个命令间维持状态
  4. 适当等待 - 针对动态内容使用
    wait
    命令
  5. 验证操作 - 交互后拍摄快照或截图