browser-tools
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBrowser Tools
浏览器工具
Chrome DevTools Protocol tools for agent-assisted web automation. These tools connect to Chrome running on with remote debugging enabled.
:9222基于Chrome DevTools Protocol的Agent辅助网页自动化工具。这些工具连接到启用了远程调试、运行在端口的Chrome浏览器。
:9222Setup
设置
Run once before first use:
bash
cd {baseDir}/browser-tools
npm install首次使用前运行一次:
bash
cd {baseDir}/browser-tools
npm installStart Chrome
启动Chrome
bash
{baseDir}/browser-start.js # Fresh profile
{baseDir}/browser-start.js --profile # Copy user's profile (cookies, logins)Launch Chrome with remote debugging on . Use to preserve user's authentication state.
:9222--profilebash
{baseDir}/browser-start.js # 全新配置文件
{baseDir}/browser-start.js --profile # 复制用户配置文件(包含Cookie、登录信息)启动Chrome并在端口启用远程调试。使用参数可保留用户的认证状态。
:9222--profileNavigate
导航
bash
{baseDir}/browser-nav.js https://example.com
{baseDir}/browser-nav.js https://example.com --newNavigate to URLs. Use flag to open in a new tab instead of reusing current tab.
--newbash
{baseDir}/browser-nav.js https://example.com
{baseDir}/browser-nav.js https://example.com --new导航至指定URL。使用参数可在新标签页中打开,而非复用当前标签页。
--newEvaluate JavaScript
执行JavaScript
bash
{baseDir}/browser-eval.js 'document.title'
{baseDir}/browser-eval.js 'document.querySelectorAll("a").length'Execute JavaScript in the active tab. Code runs in async context. Use this to extract data, inspect page state, or perform DOM operations programmatically.
bash
{baseDir}/browser-eval.js 'document.title'
{baseDir}/browser-eval.js 'document.querySelectorAll("a").length'在当前活动标签页中执行JavaScript代码。代码运行在异步上下文环境中,可用于提取数据、检查页面状态或通过编程方式执行DOM操作。
Screenshot
截图
bash
{baseDir}/browser-screenshot.jsCapture current viewport and return temporary file path. Use this to visually inspect page state or verify UI changes.
bash
{baseDir}/browser-screenshot.js捕获当前视口并返回临时文件路径。可用于可视化检查页面状态或验证UI变更。
Pick Elements
选择元素
bash
{baseDir}/browser-pick.js "Click the submit button"IMPORTANT: Use this tool when the user wants to select specific DOM elements on the page. This launches an interactive picker that lets the user click elements to select them. The user can select multiple elements (Cmd/Ctrl+Click) and press Enter when done. The tool returns CSS selectors for the selected elements.
Common use cases:
- User says "I want to click that button" → Use this tool to let them select it
- User says "extract data from these items" → Use this tool to let them select the elements
- When you need specific selectors but the page structure is complex or ambiguous
bash
{baseDir}/browser-pick.js "Click the submit button"重要提示:当用户需要选择页面上特定DOM元素时使用此工具。它会启动一个交互式选择器,允许用户点击元素进行选择。用户可通过Cmd/Ctrl+Click选择多个元素,完成后按Enter确认。工具会返回所选元素的CSS选择器。
常见使用场景:
- 用户说“我想点击那个按钮”→使用此工具让用户选择目标元素
- 用户说“从这些条目提取数据”→使用此工具让用户选择目标元素
- 当页面结构复杂或模糊,需要特定选择器时
Cookies
Cookie管理
bash
{baseDir}/browser-cookies.jsDisplay all cookies for the current tab including domain, path, httpOnly, and secure flags. Use this to debug authentication issues or inspect session state.
bash
{baseDir}/browser-cookies.js显示当前标签页的所有Cookie,包括域名、路径、httpOnly和secure标记。可用于调试认证问题或检查会话状态。
Extract Page Content
提取页面内容
bash
{baseDir}/browser-content.js https://example.comNavigate to a URL and extract readable content as markdown. Uses Mozilla Readability for article extraction and Turndown for HTML-to-markdown conversion. Works on pages with JavaScript content (waits for page to load).
bash
{baseDir}/browser-content.js https://example.com导航至指定URL并以Markdown格式提取可读内容。使用Mozilla Readability进行文章提取,Turndown进行HTML到Markdown的转换。支持包含JavaScript内容的页面(会等待页面加载完成)。
When to Use
使用场景
- Testing frontend code in a real browser
- Interacting with pages that require JavaScript
- When user needs to visually see or interact with a page
- Debugging authentication or session issues
- Scraping dynamic content that requires JS execution
- 在真实浏览器中测试前端代码
- 与需要JavaScript支持的页面进行交互
- 用户需要可视化查看或与页面交互时
- 调试认证或会话相关问题
- 抓取需要执行JavaScript的动态内容
Efficiency Guide
效率指南
DOM Inspection Over Screenshots
优先DOM检查而非截图
Don't take screenshots to see page state. Do parse the DOM directly:
javascript
// Get page structure
document.body.innerHTML.slice(0, 5000)
// Find interactive elements
Array.from(document.querySelectorAll('button, input, [role="button"]')).map(e => ({
id: e.id,
text: e.textContent.trim(),
class: e.className
}))不要通过截图查看页面状态。应该直接解析DOM:
javascript
// 获取页面结构
document.body.innerHTML.slice(0, 5000)
// 查找交互式元素
Array.from(document.querySelectorAll('button, input, [role="button"]')).map(e => ({
id: e.id,
text: e.textContent.trim(),
class: e.className
}))Complex Scripts in Single Calls
单次调用执行复杂脚本
Wrap everything in an IIFE to run multi-statement code:
javascript
(function() {
// Multiple operations
const data = document.querySelector('#target').textContent;
const buttons = document.querySelectorAll('button');
// Interactions
buttons[0].click();
// Return results
return JSON.stringify({ data, buttonCount: buttons.length });
})()将所有代码包裹在IIFE中以执行多语句代码:
javascript
(function() {
// 多项操作
const data = document.querySelector('#target').textContent;
const buttons = document.querySelectorAll('button');
// 交互操作
buttons[0].click();
// 返回结果
return JSON.stringify({ data, buttonCount: buttons.length });
})()Batch Interactions
批量执行交互操作
Don't make separate calls for each click. Do batch them:
javascript
(function() {
const actions = ["btn1", "btn2", "btn3"];
actions.forEach(id => document.getElementById(id).click());
return "Done";
})()不要为每次点击单独调用工具。应该批量处理:
javascript
(function() {
const actions = ["btn1", "btn2", "btn3"];
actions.forEach(id => document.getElementById(id).click());
return "Done";
})()Typing/Input Sequences
输入/打字序列
javascript
(function() {
const text = "HELLO";
for (const char of text) {
document.getElementById("key-" + char).click();
}
document.getElementById("submit").click();
return "Submitted: " + text;
})()javascript
(function() {
const text = "HELLO";
for (const char of text) {
document.getElementById("key-" + char).click();
}
document.getElementById("submit").click();
return "Submitted: " + text;
})()Reading App/Game State
读取应用/游戏状态
Extract structured state in one call:
javascript
(function() {
const state = {
score: document.querySelector('.score')?.textContent,
status: document.querySelector('.status')?.className,
items: Array.from(document.querySelectorAll('.item')).map(el => ({
text: el.textContent,
active: el.classList.contains('active')
}))
};
return JSON.stringify(state, null, 2);
})()单次调用提取结构化状态:
javascript
(function() {
const state = {
score: document.querySelector('.score')?.textContent,
status: document.querySelector('.status')?.className,
items: Array.from(document.querySelectorAll('.item')).map(el => ({
text: el.textContent,
active: el.classList.contains('active')
}))
};
return JSON.stringify(state, null, 2);
})()Waiting for Updates
等待更新
If DOM updates after actions, add a small delay with bash:
bash
sleep 0.5 && {baseDir}/browser-eval.js '...'如果执行操作后DOM会更新,可通过bash添加短暂延迟:
bash
sleep 0.5 && {baseDir}/browser-eval.js '...'Investigate Before Interacting
先调研再交互
Always start by understanding the page structure:
javascript
(function() {
return {
title: document.title,
forms: document.forms.length,
buttons: document.querySelectorAll('button').length,
inputs: document.querySelectorAll('input').length,
mainContent: document.body.innerHTML.slice(0, 3000)
};
})()Then target specific elements based on what you find.
始终先了解页面结构:
javascript
(function() {
return {
title: document.title,
forms: document.forms.length,
buttons: document.querySelectorAll('button').length,
inputs: document.querySelectorAll('input').length,
mainContent: document.body.innerHTML.slice(0, 3000)
};
})()然后根据调研结果定位特定元素。