core-agent-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser Automation with agent-browser

使用agent-browser实现浏览器自动化

Priority Note

优先级说明

For fetching Rust/crate information, use this priority order:
  1. rust-learner skill - Orchestrates actionbook + browser-fetcher
  2. actionbook MCP - Pre-computed selectors for known sites
  3. agent-browser CLI - Direct browser automation (last resort)
Use agent-browser directly only when:
  • actionbook has no pre-computed selectors for the target site
  • You need interactive browser testing/automation
  • You need screenshots or form filling
获取Rust/crate信息时,请遵循以下优先级顺序:
  1. rust-learner技能 - 协调actionbook + browser-fetcher
  2. actionbook MCP - 针对已知站点的预计算选择器
  3. agent-browser CLI - 直接浏览器自动化(最后手段)
仅在以下情况直接使用agent-browser:
  • actionbook没有针对目标站点的预计算选择器
  • 需要交互式浏览器测试/自动化
  • 需要截图或表单填写

Quick start

快速开始

bash
agent-browser open <url>        # Navigate to page
agent-browser snapshot -i       # Get interactive elements with refs
agent-browser click @e1         # Click element by ref
agent-browser fill @e2 "text"   # Fill input by ref
agent-browser close             # Close browser
bash
agent-browser open <url>        # 导航至页面
agent-browser snapshot -i       # 获取带引用的交互元素
agent-browser click @e1         # 通过引用点击元素
agent-browser fill @e2 "text"   # 通过引用填充输入框
agent-browser close             # 关闭浏览器

Core workflow

核心工作流

  1. Navigate:
    agent-browser open <url>
  2. Snapshot:
    agent-browser snapshot -i
    (returns elements with refs like
    @e1
    ,
    @e2
    )
  3. Interact using refs from the snapshot
  4. Re-snapshot after navigation or significant DOM changes
  1. 导航:
    agent-browser open <url>
  2. 快照:
    agent-browser snapshot -i
    (返回带
    @e1
    @e2
    等引用的元素)
  3. 使用快照中的引用进行交互
  4. 导航或DOM发生重大变化后重新快照

Commands

命令

Navigation

导航

bash
agent-browser open <url>      # Navigate to URL
agent-browser back            # Go back
agent-browser forward         # Go forward
agent-browser reload          # Reload page
agent-browser close           # Close browser
bash
agent-browser open <url>      # 导航至URL
agent-browser back            # 返回上一页
agent-browser forward         # 前进至下一页
agent-browser reload          # 重新加载页面
agent-browser close           # 关闭浏览器

Snapshot (page analysis)

快照(页面分析)

bash
agent-browser snapshot        # Full accessibility tree
agent-browser snapshot -i     # Interactive elements only (recommended)
agent-browser snapshot -c     # Compact output
agent-browser snapshot -d 3   # Limit depth to 3
bash
agent-browser snapshot        # 完整可访问性树
agent-browser snapshot -i     # 仅显示交互元素(推荐)
agent-browser snapshot -c     # 精简输出
agent-browser snapshot -d 3   # 限制深度为3

Interactions (use @refs from snapshot)

交互(使用快照中的@引用)

bash
agent-browser click @e1           # Click
agent-browser dblclick @e1        # Double-click
agent-browser fill @e2 "text"     # Clear and type
agent-browser type @e2 "text"     # Type without clearing
agent-browser press Enter         # Press key
agent-browser press Control+a     # Key combination
agent-browser hover @e1           # Hover
agent-browser check @e1           # Check checkbox
agent-browser uncheck @e1         # Uncheck checkbox
agent-browser select @e1 "value"  # Select dropdown
agent-browser scroll down 500     # Scroll page
agent-browser scrollintoview @e1  # Scroll element into view
bash
agent-browser click @e1           # 点击
agent-browser dblclick @e1        # 双击
agent-browser fill @e2 "text"     # 清空并输入
agent-browser type @e2 "text"     # 直接输入(不清空)
agent-browser press Enter         # 按下按键
agent-browser press Control+a     # 组合按键
agent-browser hover @e1           # 悬停
agent-browser check @e1           # 勾选复选框
agent-browser uncheck @e1         # 取消勾选复选框
agent-browser select @e1 "value"  # 选择下拉选项
agent-browser scroll down 500     # 向下滚动页面
agent-browser scrollintoview @e1  # 滚动至元素可见

Get information

获取信息

bash
agent-browser get text @e1        # Get element text
agent-browser get value @e1       # Get input value
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
bash
agent-browser get text @e1        # 获取元素文本
agent-browser get value @e1       # 获取输入框值
agent-browser get title           # 获取页面标题
agent-browser get url             # 获取当前URL

Screenshots

截图

bash
agent-browser screenshot          # Screenshot to stdout
agent-browser screenshot path.png # Save to file
agent-browser screenshot --full   # Full page
bash
agent-browser screenshot          # 将截图输出至标准输出
agent-browser screenshot path.png # 保存至文件
agent-browser screenshot --full   # 整页截图

Wait

等待

bash
agent-browser wait @e1                     # Wait for element
agent-browser wait 2000                    # Wait milliseconds
agent-browser wait --text "Success"        # Wait for text
agent-browser wait --load networkidle      # Wait for network idle
bash
agent-browser wait @e1                     # 等待元素出现
agent-browser wait 2000                    # 等待指定毫秒数
agent-browser wait --text "Success"        # 等待文本出现
agent-browser wait --load networkidle      # 等待网络空闲

Semantic locators (alternative to refs)

语义定位器(引用的替代方案)

bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"

Example: Form submission

示例:表单提交

bash
agent-browser open https://example.com/form
agent-browser snapshot -i
bash
agent-browser open https://example.com/form
agent-browser snapshot -i

Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]

输出显示:文本框 "Email" [ref=e1], 文本框 "Password" [ref=e2], 按钮 "Submit" [ref=e3]

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result
undefined
agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # 检查结果
undefined