browser-scrape
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBrowser Scraping
Browser Scraping
Extract structured data from web pages using browser automation.
使用浏览器自动化从网页中提取结构化数据。
When to use
使用场景
When you need to gather information from web pages that require JavaScript rendering, authentication, or dynamic content loading.
当你需要从需要JavaScript渲染、身份验证或动态加载内容的网页中收集信息时。
Steps
步骤
- Open page — call with the target URL
mcp__claude-flow__browser_open - Wait for content — call for dynamic content to load
mcp__claude-flow__browser_wait - Get accessibility tree — call for structured page content
mcp__claude-flow__browser_snapshot - Extract text — call with CSS selectors
mcp__claude-flow__browser_get-text - Run queries — call with JavaScript to extract structured data
mcp__claude-flow__browser_eval - Paginate — use on next/load-more buttons, then repeat extraction
browser_click - Close — call when done
mcp__claude-flow__browser_close
- 打开页面 — 调用并传入目标URL
mcp__claude-flow__browser_open - 等待内容加载 — 调用等待动态内容加载完成
mcp__claude-flow__browser_wait - 获取可访问性树 — 调用获取结构化页面内容
mcp__claude-flow__browser_snapshot - 提取文本 — 使用CSS选择器调用
mcp__claude-flow__browser_get-text - 运行查询 — 使用JavaScript调用提取结构化数据
mcp__claude-flow__browser_eval - 分页处理 — 在“下一页”或“加载更多”按钮上使用,然后重复提取步骤
browser_click - 关闭浏览器 — 完成操作后调用
mcp__claude-flow__browser_close
Best practices
最佳实践
- Prefer (accessibility tree) over raw HTML for structured extraction
browser_snapshot - Use with
browser_evalfor bulk extractiondocument.querySelectorAll - Add between page loads to avoid timing issues
browser_wait - Respect robots.txt and rate limits
- 对于结构化提取,优先使用(可访问性树)而非原始HTML
browser_snapshot - 使用结合
browser_eval进行批量提取document.querySelectorAll - 在页面加载之间添加以避免时序问题
browser_wait - 遵守robots.txt规则和请求频率限制