browser-tools

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser Tools

浏览器工具

Chrome DevTools Protocol tools for agent-assisted web automation. These tools connect to Chrome running on
:9222
with remote debugging enabled.
基于Chrome DevTools Protocol的Agent辅助网页自动化工具。这些工具连接到启用了远程调试、运行在
:9222
端口的Chrome浏览器。

Setup

设置

Run once before first use:
bash
cd {baseDir}/browser-tools
npm install
首次使用前运行一次:
bash
cd {baseDir}/browser-tools
npm install

Start Chrome

启动Chrome

bash
{baseDir}/browser-start.js              # Fresh profile
{baseDir}/browser-start.js --profile    # Copy user's profile (cookies, logins)
Launch Chrome with remote debugging on
:9222
. Use
--profile
to preserve user's authentication state.
bash
{baseDir}/browser-start.js              # 全新配置文件
{baseDir}/browser-start.js --profile    # 复制用户配置文件(包含Cookie、登录信息)
启动Chrome并在
:9222
端口启用远程调试。使用
--profile
参数可保留用户的认证状态。

Navigate

导航

bash
{baseDir}/browser-nav.js https://example.com
{baseDir}/browser-nav.js https://example.com --new
Navigate to URLs. Use
--new
flag to open in a new tab instead of reusing current tab.
bash
{baseDir}/browser-nav.js https://example.com
{baseDir}/browser-nav.js https://example.com --new
导航至指定URL。使用
--new
参数可在新标签页中打开,而非复用当前标签页。

Evaluate JavaScript

执行JavaScript

bash
{baseDir}/browser-eval.js 'document.title'
{baseDir}/browser-eval.js 'document.querySelectorAll("a").length'
Execute JavaScript in the active tab. Code runs in async context. Use this to extract data, inspect page state, or perform DOM operations programmatically.
bash
{baseDir}/browser-eval.js 'document.title'
{baseDir}/browser-eval.js 'document.querySelectorAll("a").length'
在当前活动标签页中执行JavaScript代码。代码运行在异步上下文环境中,可用于提取数据、检查页面状态或通过编程方式执行DOM操作。

Screenshot

截图

bash
{baseDir}/browser-screenshot.js
Capture current viewport and return temporary file path. Use this to visually inspect page state or verify UI changes.
bash
{baseDir}/browser-screenshot.js
捕获当前视口并返回临时文件路径。可用于可视化检查页面状态或验证UI变更。

Pick Elements

选择元素

bash
{baseDir}/browser-pick.js "Click the submit button"
IMPORTANT: Use this tool when the user wants to select specific DOM elements on the page. This launches an interactive picker that lets the user click elements to select them. The user can select multiple elements (Cmd/Ctrl+Click) and press Enter when done. The tool returns CSS selectors for the selected elements.
Common use cases:
  • User says "I want to click that button" → Use this tool to let them select it
  • User says "extract data from these items" → Use this tool to let them select the elements
  • When you need specific selectors but the page structure is complex or ambiguous
bash
{baseDir}/browser-pick.js "Click the submit button"
重要提示:当用户需要选择页面上特定DOM元素时使用此工具。它会启动一个交互式选择器,允许用户点击元素进行选择。用户可通过Cmd/Ctrl+Click选择多个元素,完成后按Enter确认。工具会返回所选元素的CSS选择器。
常见使用场景:
  • 用户说“我想点击那个按钮”→使用此工具让用户选择目标元素
  • 用户说“从这些条目提取数据”→使用此工具让用户选择目标元素
  • 当页面结构复杂或模糊,需要特定选择器时

Cookies

Cookie管理

bash
{baseDir}/browser-cookies.js
Display all cookies for the current tab including domain, path, httpOnly, and secure flags. Use this to debug authentication issues or inspect session state.
bash
{baseDir}/browser-cookies.js
显示当前标签页的所有Cookie,包括域名、路径、httpOnly和secure标记。可用于调试认证问题或检查会话状态。

Extract Page Content

提取页面内容

bash
{baseDir}/browser-content.js https://example.com
Navigate to a URL and extract readable content as markdown. Uses Mozilla Readability for article extraction and Turndown for HTML-to-markdown conversion. Works on pages with JavaScript content (waits for page to load).
bash
{baseDir}/browser-content.js https://example.com
导航至指定URL并以Markdown格式提取可读内容。使用Mozilla Readability进行文章提取,Turndown进行HTML到Markdown的转换。支持包含JavaScript内容的页面(会等待页面加载完成)。

When to Use

使用场景

  • Testing frontend code in a real browser
  • Interacting with pages that require JavaScript
  • When user needs to visually see or interact with a page
  • Debugging authentication or session issues
  • Scraping dynamic content that requires JS execution

  • 在真实浏览器中测试前端代码
  • 与需要JavaScript支持的页面进行交互
  • 用户需要可视化查看或与页面交互时
  • 调试认证或会话相关问题
  • 抓取需要执行JavaScript的动态内容

Efficiency Guide

效率指南

DOM Inspection Over Screenshots

优先DOM检查而非截图

Don't take screenshots to see page state. Do parse the DOM directly:
javascript
// Get page structure
document.body.innerHTML.slice(0, 5000)

// Find interactive elements
Array.from(document.querySelectorAll('button, input, [role="button"]')).map(e => ({
  id: e.id,
  text: e.textContent.trim(),
  class: e.className
}))
不要通过截图查看页面状态。应该直接解析DOM:
javascript
// 获取页面结构
document.body.innerHTML.slice(0, 5000)

// 查找交互式元素
Array.from(document.querySelectorAll('button, input, [role="button"]')).map(e => ({
  id: e.id,
  text: e.textContent.trim(),
  class: e.className
}))

Complex Scripts in Single Calls

单次调用执行复杂脚本

Wrap everything in an IIFE to run multi-statement code:
javascript
(function() {
  // Multiple operations
  const data = document.querySelector('#target').textContent;
  const buttons = document.querySelectorAll('button');
  
  // Interactions
  buttons[0].click();
  
  // Return results
  return JSON.stringify({ data, buttonCount: buttons.length });
})()
将所有代码包裹在IIFE中以执行多语句代码:
javascript
(function() {
  // 多项操作
  const data = document.querySelector('#target').textContent;
  const buttons = document.querySelectorAll('button');
  
  // 交互操作
  buttons[0].click();
  
  // 返回结果
  return JSON.stringify({ data, buttonCount: buttons.length });
})()

Batch Interactions

批量执行交互操作

Don't make separate calls for each click. Do batch them:
javascript
(function() {
  const actions = ["btn1", "btn2", "btn3"];
  actions.forEach(id => document.getElementById(id).click());
  return "Done";
})()
不要为每次点击单独调用工具。应该批量处理:
javascript
(function() {
  const actions = ["btn1", "btn2", "btn3"];
  actions.forEach(id => document.getElementById(id).click());
  return "Done";
})()

Typing/Input Sequences

输入/打字序列

javascript
(function() {
  const text = "HELLO";
  for (const char of text) {
    document.getElementById("key-" + char).click();
  }
  document.getElementById("submit").click();
  return "Submitted: " + text;
})()
javascript
(function() {
  const text = "HELLO";
  for (const char of text) {
    document.getElementById("key-" + char).click();
  }
  document.getElementById("submit").click();
  return "Submitted: " + text;
})()

Reading App/Game State

读取应用/游戏状态

Extract structured state in one call:
javascript
(function() {
  const state = {
    score: document.querySelector('.score')?.textContent,
    status: document.querySelector('.status')?.className,
    items: Array.from(document.querySelectorAll('.item')).map(el => ({
      text: el.textContent,
      active: el.classList.contains('active')
    }))
  };
  return JSON.stringify(state, null, 2);
})()
单次调用提取结构化状态:
javascript
(function() {
  const state = {
    score: document.querySelector('.score')?.textContent,
    status: document.querySelector('.status')?.className,
    items: Array.from(document.querySelectorAll('.item')).map(el => ({
      text: el.textContent,
      active: el.classList.contains('active')
    }))
  };
  return JSON.stringify(state, null, 2);
})()

Waiting for Updates

等待更新

If DOM updates after actions, add a small delay with bash:
bash
sleep 0.5 && {baseDir}/browser-eval.js '...'
如果执行操作后DOM会更新,可通过bash添加短暂延迟:
bash
sleep 0.5 && {baseDir}/browser-eval.js '...'

Investigate Before Interacting

先调研再交互

Always start by understanding the page structure:
javascript
(function() {
  return {
    title: document.title,
    forms: document.forms.length,
    buttons: document.querySelectorAll('button').length,
    inputs: document.querySelectorAll('input').length,
    mainContent: document.body.innerHTML.slice(0, 3000)
  };
})()
Then target specific elements based on what you find.
始终先了解页面结构:
javascript
(function() {
  return {
    title: document.title,
    forms: document.forms.length,
    buttons: document.querySelectorAll('button').length,
    inputs: document.querySelectorAll('input').length,
    mainContent: document.body.innerHTML.slice(0, 3000)
  };
})()
然后根据调研结果定位特定元素。