browser-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser Automation

浏览器自动化

Available Tools

可用工具

  • browser_act(instruction, starting_url?): Execute browser actions using natural language (click, type, scroll, select). Use
    starting_url
    to navigate to a page and act in a single call.
  • browser_get_page_info(url?, text?, tables?, links?): Get page structure and DOM data (fast, no AI). Use
    url
    to navigate first;
    text=True
    for full text,
    tables=True
    for table data,
    links=True
    for all links.
  • browser_manage_tabs(action, tab_index?, url?): Switch, close, or create browser tabs
  • browser_save_screenshot(filename): Save current page screenshot to workspace
  • browser_act(instruction, starting_url?):使用自然语言执行浏览器操作(点击、输入、滚动、选择)。使用
    starting_url
    可在单次调用中导航到页面并执行操作。
  • browser_get_page_info(url?, text?, tables?, links?):获取页面结构和DOM数据(速度快,无需AI)。先使用
    url
    导航;设置
    text=True
    可获取完整文本,
    tables=True
    可获取表格数据,
    links=True
    可获取所有链接。
  • browser_manage_tabs(action, tab_index?, url?):切换、关闭或创建浏览器标签页
  • browser_save_screenshot(filename):将当前页面截图保存到工作区

When to Use

使用场景

Use browser automation when the task genuinely requires it:
  • UI interactions: Filling forms, clicking buttons, navigating multi-step workflows
  • Login-required pages: Accessing content behind authentication that APIs cannot reach
  • Dynamic/JS-heavy pages: Content rendered client-side that plain HTTP requests can't capture
  • Human-like browsing needed: Sites that block bots or require realistic interaction patterns
  • Scraping structured data: When no API exists and the data must be extracted from rendered pages
Prefer web search or url_fetcher for general information lookup, news, or publicly accessible pages — browser automation is slower and heavier. Reserve it for tasks where simpler tools are insufficient.
仅在任务确实需要时使用浏览器自动化:
  • UI交互:填写表单、点击按钮、导航多步骤工作流
  • 需登录的页面:访问API无法触及的认证后内容
  • 动态/重度依赖JS的页面:客户端渲染的内容,普通HTTP请求无法捕获
  • 需要类人浏览的场景:会拦截机器人或要求真实交互模式的网站
  • 结构化数据爬取:无API可用,必须从渲染页面提取数据的情况
对于一般信息查询、新闻或公开可访问的页面,优先使用web search或url_fetcher——浏览器自动化速度更慢、资源占用更高。仅在简单工具无法满足需求时才使用它。

Tool Selection

工具选择

  • browser_act
    : UI interactions (click, type, scroll, form fill). Use
    starting_url
    to open a page and act in one call.
  • browser_get_page_info
    : Fast page structure check and optional content extraction (<300ms). Use
    url
    to navigate first.
  • browser_manage_tabs
    : Switch/close/create tabs (view tabs via
    get_page_info
    )
  • browser_save_screenshot
    : Save milestone screenshots (search results, confirmations, key data)
  • browser_act
    :UI交互操作(点击、输入、滚动、表单填写)。使用
    starting_url
    可在一次调用中打开页面并执行操作。
  • browser_get_page_info
    :快速检查页面结构并可选提取内容(耗时<300ms)。先使用
    url
    导航。
  • browser_manage_tabs
    :切换/关闭/创建标签页(可通过
    get_page_info
    查看标签页)
  • browser_save_screenshot
    :保存关键节点截图(搜索结果、确认信息、关键数据)

browser_act Best Practice

browser_act 最佳实践

  • Combine up to 3 predictable steps: "1. Type 'laptop' in search 2. Click search button 3. Click first result"
  • Use
    starting_url
    when opening a fresh page:
    browser_act(instruction='Search for laptops', starting_url='https://amazon.com')
  • On failure: check the screenshot to see current state, then retry from that point
  • For visual creation (diagrams, drawings), prefer code/text input methods over mouse interactions
  • 最多组合3个可预测步骤:“1. 在搜索框输入‘笔记本电脑’ 2. 点击搜索按钮 3. 点击第一个结果”
  • 打开新页面时使用
    starting_url
    browser_act(instruction='Search for laptops', starting_url='https://amazon.com')
  • 操作失败时:查看截图了解当前状态,然后从该状态重试
  • 对于视觉内容创建(图表、绘图),优先使用代码/文本输入方式而非鼠标交互

browser_get_page_info Best Practice

  • Use
    url
    to navigate and inspect in one call:
    browser_get_page_info(url='https://example.com', tables=True)
  • Use
    text=True
    to get full page text content (useful for reading article text)
  • Use
    tables=True
    to extract structured table data from the page
  • Use
    links=True
    to get all links on the page (up to 200)