browser-screenshot

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill: Browser Screenshot

技能:Browser Screenshot

Take focused screenshots of specific regions on web pages — a Reddit post, a tweet, an article section, a chart, etc. — not just a full-page dump.
Prerequisite: agent-browser must be installed and Chrome must have remote debugging enabled. See
references/agent-browser-setup.md
if unsure.

截取网页上特定区域的聚焦截图——比如Reddit帖子、推文、文章章节、图表等——而不只是整页截图。
前提条件:必须安装agent-browser,且Chrome需启用远程调试。如有疑问,请查看
references/agent-browser-setup.md

Overview

概述

This skill handles the full pipeline:
  1. Research the best page to screenshot (web search, fetch)
  2. Navigate to the right page in the browser
  3. Locate the target element/region on the page
  4. Capture a focused, cropped screenshot of just that region
该技能处理完整流程:
  1. 调研最佳截图页面(网页搜索、获取)
  2. 在浏览器中导航到正确页面
  3. 在页面上定位目标元素/区域
  4. 捕获仅该区域的聚焦裁剪截图

Hard Rule: No Full-Screen Screenshots

硬性规则:禁止全屏截图

NEVER output an uncropped full-viewport or full-page screenshot as a final result. Full screenshots contain too much noise (nav bars, sidebars, ads, unrelated content) and are unsuitable as article illustrations. Every screenshot MUST be cropped to a focused region.

绝对不要输出未裁剪的全屏或整页截图作为最终结果。 完整截图包含太多无关内容(导航栏、侧边栏、广告、不相关内容),不适合作为文章插图。所有截图都必须裁剪到聚焦区域。

Step 0: Research — Find the Right Page First

步骤0:调研——先找到正确页面

Before opening anything in the browser, figure out which page to screenshot. Use WebSearch and WebFetch tools (not the browser) for this research phase — they're faster and don't require tab management.
在打开浏览器之前,先确定要截图的页面。此调研阶段使用WebSearch和WebFetch工具(而非浏览器)——它们速度更快,无需管理标签页。

Page Selection Strategy

页面选择策略

The right page depends on the context of the article and how recent/notable the subject is:
Subject TypeBest Page to FindHow to Find It
New model/feature launch (< 6 months)Official blog post announcing itWebSearch
"<model name>" site:<vendor-domain> blog
Established product (> 6 months)Product landing page or docs overviewWebSearch
"<model name>" official page
Open-source modelHuggingFace model card or GitHub repoDirect URL:
huggingface.co/<org>/<model>
API serviceAPI documentation pageWebSearch
"<service name>" API docs
合适的页面取决于文章的上下文以及主题的时效性/知名度:
主题类型最佳查找页面查找方式
新模型/功能发布(<6个月)官方发布博客文章WebSearch
"<模型名称>" site:<厂商域名> blog
成熟产品(>6个月)产品落地页或文档概览WebSearch
"<模型名称>" official page
开源模型HuggingFace模型卡片或GitHub仓库直接URL:
huggingface.co/<组织>/<模型>
API服务API文档页面WebSearch
"<服务名称>" API docs

What Makes a Good Screenshot Source

优质截图来源的特征

  • Official blog posts are ideal: they have hero images, prominent titles, and concise descriptions designed for sharing
  • Product landing pages work well: hero sections with taglines and key features
  • HuggingFace model cards are reliable for open-source models: consistent layout, model name + description always at top
  • API docs are acceptable fallback: show the product name and key specs
  • 官方博客文章是理想选择:它们有首屏图、醒目标题和专为分享设计的简洁描述
  • 产品落地页效果很好:包含标语和核心功能的首屏区域
  • HuggingFace模型卡片对开源模型很可靠:布局一致,模型名称+描述始终在顶部
  • API文档是可接受的备选:展示产品名称和关键规格

Pre-Flight URL Validation

预验证URL

Before opening in the browser, validate URLs with WebFetch (lightweight HEAD/GET) to avoid wasting time on 404s or redirects:
WebFetch: <candidate-url>
→ Check status code, title, and content snippet
→ If 404 or redirect to unrelated page, try next candidate
在浏览器中打开之前,使用WebFetch(轻量级HEAD/GET)验证URL,避免在404或重定向页面上浪费时间:
WebFetch: <候选URL>
→ 检查状态码、标题和内容片段
→ 如果是404或重定向到不相关页面,尝试下一个候选

Region Selection Strategy

区域选择策略

Think about what the article reader needs to see in this screenshot:
Article ContextWhat to CaptureTarget Region
Introducing a model in a lineupModel name + key tagline/descriptionBlog hero section or HF model card header
Comparing capabilitiesFeature highlights or spec tableBlog section showing specs/features
Discussing a specific featureThe feature descriptionRelevant section heading + 1-2 paragraphs
Showing a product/serviceBrand identity + value propLanding page hero (title + subtitle + visual)
The screenshot should make the reader think "ah, that's what this model/product is" — not "what am I looking at?"

思考文章读者需要在截图中看到什么
文章上下文要捕获的内容目标区域
在系列中介绍模型模型名称+关键标语/描述博客首屏区域或HF模型卡片头部
对比功能功能亮点或规格表格展示规格/功能的博客章节
讨论特定功能功能描述相关章节标题+1-2段落
展示产品/服务品牌标识+价值主张落地页首屏(标题+副标题+视觉元素)
截图应让读者觉得“哦,这就是这个模型/产品”——而不是“我看的是什么?”

Step 1: Navigate to the Target Page

步骤1:导航到目标页面

Always Start by Listing Tabs

始终先列出标签页

bash
agent-browser --auto-connect tab list
Check if the page is already open. Reuse existing tabs — they have login sessions and correct state.
bash
agent-browser --auto-connect tab list
检查页面是否已打开。重复使用现有标签页——它们有登录会话和正确状态。

Navigation by Input Type

按输入类型的导航策略

User ProvidesStrategy
Direct URL
agent-browser --auto-connect open <url>
Search query
open https://www.google.com/search?q=<encoded-query>
→ find and click the best result
Platform + topicConstruct platform search URL (see below) → locate target content
Vague descriptionGoogle search → evaluate results → navigate to best match
用户提供的内容策略
直接URL
agent-browser --auto-connect open <url>
搜索查询
open https://www.google.com/search?q=<编码后的查询词>
→ 找到并点击最佳结果
平台+主题构造平台搜索URL(见下文)→ 定位目标内容
模糊描述Google搜索 → 评估结果 → 导航到最佳匹配页面

Platform-Specific Search URLs

平台特定搜索URL

PlatformSearch URL Pattern
Reddit
https://www.reddit.com/search/?q=<query>
X / Twitter
https://x.com/search?q=<query>
LinkedIn
https://www.linkedin.com/search/results/content/?keywords=<query>
Hacker News
https://hn.algolia.com/?q=<query>
GitHub
https://github.com/search?q=<query>
YouTube
https://www.youtube.com/results?search_query=<query>
平台搜索URL模板
Reddit
https://www.reddit.com/search/?q=<查询词>
X / Twitter
https://x.com/search?q=<查询词>
LinkedIn
https://www.linkedin.com/search/results/content/?keywords=<查询词>
Hacker News
https://hn.algolia.com/?q=<查询词>
GitHub
https://github.com/search?q=<查询词>
YouTube
https://www.youtube.com/results?search_query=<查询词>

Wait for Page Load

等待页面加载

After navigation, wait for content to settle:
bash
agent-browser --auto-connect wait --load networkidle
Note: Some sites (Reddit, X, LinkedIn) never reach
networkidle
. If
open
already shows the page title in its output, skip the wait. Use
wait 2000
as a safe alternative.

导航后,等待内容加载完成:
bash
agent-browser --auto-connect wait --load networkidle
注意:部分网站(Reddit、X、LinkedIn)永远不会达到
networkidle
状态。如果
open
的输出已显示页面标题,则跳过等待。使用
wait 2000
作为安全替代方案。

Step 2: Locate the Target Region

步骤2:定位目标区域

This is the critical step. The goal is to find a CSS selector that precisely wraps the content to capture.
这是关键步骤。目标是找到能精准包裹要捕获内容的CSS选择器

Primary Method: DOM Selector Discovery

主要方法:DOM选择器发现

  1. Take an annotated screenshot to understand the page layout:
    bash
    agent-browser --auto-connect screenshot --annotate
  2. Take a snapshot to see the page's accessibility tree:
    bash
    agent-browser --auto-connect snapshot -i
  3. Identify the target container element. Look for:
    • Semantic HTML containers:
      <article>
      ,
      <main>
      ,
      <section>
    • Platform-specific components (see Platform Selectors)
    • Data attributes:
      [data-testid="..."]
      ,
      [data-id="..."]
  4. Verify with
    get box
    to confirm the element has a reasonable bounding box:
    bash
    agent-browser --auto-connect get box "<selector>"
    This returns
    { x, y, width, height }
    . Sanity-check:
    • Width should be > 100px and < viewport width
    • Height should be > 50px
    • If the box is the entire page, the selector is too broad — refine it
  5. If the selector is hard to find, use
    eval
    to explore the DOM:
    bash
    agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"
  1. 拍摄带注释的截图以了解页面布局:
    bash
    agent-browser --auto-connect screenshot --annotate
  2. 生成快照以查看页面的可访问性树:
    bash
    agent-browser --auto-connect snapshot -i
  3. 识别目标容器元素。寻找:
    • 语义化HTML容器:
      <article>
      <main>
      <section>
    • 平台特定组件(见平台选择器
    • 数据属性:
      [data-testid="..."]
      [data-id="..."]
  4. 使用
    get box
    验证
    以确认元素有合理的边界框:
    bash
    agent-browser --auto-connect get box "<选择器>"
    它会返回
    { x, y, width, height }
    。检查合理性:
    • 宽度应>100px且<视口宽度
    • 高度应>50px
    • 如果边界框是整个页面,说明选择器太宽泛——需细化
  5. 如果选择器难以找到,使用
    eval
    探索DOM:
    bash
    agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"

Platform Selectors

平台选择器

Common container selectors for popular platforms:
PlatformTargetTypical Selector
RedditA post
shreddit-post
,
[data-testid="post-container"]
X / TwitterA tweet
article[data-testid="tweet"]
LinkedInA feed post
.feed-shared-update-v2
Hacker NewsA story + comments
#hnmain .fatitem
GitHubA repo card
[data-hpc]
,
.repository-content
YouTubeVideo player area
#player-container-outer
Generic articleMain content
article
,
main
,
[role="main"]
,
.post-content
,
.article-body
These selectors may change over time. Always verify with
get box
before using.
热门平台的常见容器选择器:
平台目标典型选择器
Reddit帖子
shreddit-post
,
[data-testid="post-container"]
X / Twitter推文
article[data-testid="tweet"]
LinkedIn动态帖子
.feed-shared-update-v2
Hacker News故事+评论
#hnmain .fatitem
GitHub仓库卡片
[data-hpc]
,
.repository-content
YouTube视频播放器区域
#player-container-outer
通用文章主要内容
article
,
main
,
[role="main"]
,
.post-content
,
.article-body
这些选择器可能会随时间变化。使用前务必用
get box
验证。

Multiple Matching Elements

多个匹配元素

If the selector matches multiple elements (e.g., multiple tweets on a timeline), narrow it down:
bash
undefined
如果选择器匹配多个元素(例如时间线上的多条推文),请缩小范围:
bash
undefined

Count matches

统计匹配数量

agent-browser --auto-connect get count "article[data-testid='tweet']"
agent-browser --auto-connect get count "article[data-testid='tweet']"

Use nth-child or :first-of-type, or a more specific selector

使用nth-child或:first-of-type,或更具体的选择器

Or use eval to find the right one by text content:

或使用eval通过文本内容找到正确的元素:

agent-browser --auto-connect eval --stdin <<'EOF' const posts = document.querySelectorAll('article[data-testid="tweet"]'); for (let i = 0; i < posts.length; i++) { const text = posts[i].textContent.substring(0, 80); console.log(i, text); } EOF

Then target a specific one using `:nth-of-type(N)` or a unique parent selector.

---
agent-browser --auto-connect eval --stdin <<'EOF' const posts = document.querySelectorAll('article[data-testid="tweet"]'); for (let i = 0; i < posts.length; i++) { const text = posts[i].textContent.substring(0, 80); console.log(i, text); } EOF

然后使用`:nth-of-type(N)`或更具体的父选择器定位特定元素。

---

Step 3: Capture the Focused Screenshot

步骤3:捕获聚焦截图

Method A: Scroll + Viewport Screenshot (Preferred for Viewport-Sized Targets)

方法A:滚动+视口截图(适用于视口大小的目标)

Best when the target element fits within the viewport.
bash
undefined
当目标元素能容纳在视口中时最佳。
bash
undefined

Scroll the target into view

将目标滚动到视图中

agent-browser --auto-connect scrollintoview "<selector>" agent-browser --auto-connect wait 500
agent-browser --auto-connect scrollintoview "<选择器>" agent-browser --auto-connect wait 500

Take viewport screenshot

拍摄视口截图

agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png

Then crop using the bounding box (see [Cropping](#cropping)).
agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png

然后使用边界框裁剪(见[裁剪](#裁剪))。

Method B: Full-Page Screenshot + Crop (For Any Size Target)

方法B:整页截图+裁剪(适用于任意大小的目标)

Best when the target might be larger than the viewport or when precise cropping is needed.
bash
undefined
当目标可能大于视口或需要精准裁剪时最佳。
bash
undefined

Take full-page screenshot

拍摄整页截图

agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png
agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png

Get the target element's bounding box

获取目标元素的边界框

agent-browser --auto-connect get box "<selector>"
agent-browser --auto-connect get box "<选择器>"

Output: { x: 200, y: 450, width: 680, height: 520 }

输出:{ x: 200, y: 450, width: 680, height: 520 }


Then crop (see [Cropping](#cropping)).

然后裁剪(见[裁剪](#裁剪))。

Cropping

裁剪

Use ImageMagick (
magick
on IMv7,
convert
is deprecated) to crop the screenshot to the target region. Add padding for visual breathing room.
使用ImageMagick(IMv7为
magick
convert
已弃用)将截图裁剪到目标区域。添加内边距以获得视觉呼吸空间。

Retina Display Handling

Retina显示屏处理

Critical: On macOS Retina displays, screenshots are captured at 2x resolution. A 1728x940 viewport produces a 3456x1880 image. You MUST account for this:
  1. Detect the scale factor: Compare viewport size vs actual image dimensions:
    bash
    # Check actual image dimensions
    magick identify /tmp/screenshot.png
    # → 3456x1880 means 2x scale on a 1728x940 viewport
  2. Multiply
    get box
    coordinates by the scale factor
    before cropping:
    bash
    # get box returns viewport coordinates: { x: 200, y: 450, width: 680, height: 520 }
    # For 2x Retina, actual image coordinates are:
    SCALE=2
    X=$((200 * SCALE))
    Y=$((450 * SCALE))
    W=$((680 * SCALE))
    H=$((520 * SCALE))
    PADDING=$((16 * SCALE))
关键:在macOS Retina显示屏上,截图以2倍分辨率捕获。1728x940的视口会生成3456x1880的图像。你必须考虑这一点:
  1. 检测缩放比例:对比视口大小与实际图像尺寸:
    bash
    # 检查实际图像尺寸
    magick identify /tmp/screenshot.png
    # → 3456x1880表示在1728x940视口上是2倍缩放
  2. 裁剪前将
    get box
    坐标乘以缩放比例
    bash
    # get box返回视口坐标:{ x: 200, y: 450, width: 680, height: 520 }
    # 对于2倍Retina,实际图像坐标为:
    SCALE=2
    X=$((200 * SCALE))
    Y=$((450 * SCALE))
    W=$((680 * SCALE))
    H=$((520 * SCALE))
    PADDING=$((16 * SCALE))

Crop Command

裁剪命令

bash
magick /tmp/browser-screenshot-full.png \
  -crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
  +repage \
  <output-path>.png
Important:
get box
returns floating-point values. Round them to integers before passing to ImageMagick.
Padding: Use 12–20px (viewport px). Increase to ~30px if the target has a distinct visual boundary (card, bordered box). Use 0 if the user wants a tight crop.
bash
magick /tmp/browser-screenshot-full.png \
  -crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
  +repage \
  <输出路径>.png
重要
get box
返回浮点值。传递给ImageMagick前需四舍五入为整数。
内边距:使用12–20px(视口像素)。如果目标有明显的视觉边界(卡片、带边框的盒子),增加到约30px。如果用户需要紧凑裁剪,使用0。

Output Path

输出路径

  • If the user specifies an output path, use that
  • Otherwise, save to a descriptive name in the current directory, e.g.,
    reddit-post-screenshot.png
    ,
    tweet-screenshot.png

  • 如果用户指定输出路径,使用该路径
  • 否则,保存到当前目录的描述性名称,例如
    reddit-post-screenshot.png
    tweet-screenshot.png

Step 4: Verify the Result

步骤4:验证结果

After cropping, read the output image to verify it captured the right content:
bash
undefined
裁剪后,读取输出图像以验证是否捕获了正确内容:
bash
undefined

Use the Read tool to visually inspect the cropped screenshot

使用Read工具视觉检查裁剪后的截图


If the crop is wrong (missed content, too much whitespace, wrong element), adjust the selector or bounding box and retry.

---

如果裁剪错误(遗漏内容、空白过多、错误元素),调整选择器或边界框并重试。

---

Fallback: Visual Highlight Confirmation

备选方案:视觉高亮确认

When DOM-based location is uncertain — the selector might be wrong, multiple candidates exist, or the target is ambiguous — use JS-injected highlighting to visually confirm before cropping.
当基于DOM的定位不确定时——选择器可能错误、存在多个候选元素或目标不明确——使用注入JS高亮在裁剪前视觉确认。

How It Works

工作方式

  1. Inject a highlight border on the candidate element:
    bash
    agent-browser --auto-connect eval --stdin <<'EOF'
    (function() {
      const el = document.querySelector('<selector>');
      if (!el) { console.log('NOT_FOUND'); return; }
      el.style.outline = '4px solid red';
      el.style.outlineOffset = '2px';
      el.scrollIntoView({ block: 'center' });
    })();
    EOF
  2. Take a screenshot and visually inspect:
    bash
    agent-browser --auto-connect screenshot /tmp/highlight-check.png
    Read the screenshot to check if the red border surrounds the correct content.
  3. If correct, remove the highlight and proceed with cropping:
    bash
    agent-browser --auto-connect eval "document.querySelector('<selector>').style.outline = ''; document.querySelector('<selector>').style.outlineOffset = '';"
  4. If wrong, try the next candidate or refine the selector, re-highlight, and re-check.
  1. 在候选元素上注入高亮边框
    bash
    agent-browser --auto-connect eval --stdin <<'EOF'
    (function() {
      const el = document.querySelector('<选择器>');
      if (!el) { console.log('NOT_FOUND'); return; }
      el.style.outline = '4px solid red';
      el.style.outlineOffset = '2px';
      el.scrollIntoView({ block: 'center' });
    })();
    EOF
  2. 拍摄截图并视觉检查
    bash
    agent-browser --auto-connect screenshot /tmp/highlight-check.png
    读取截图以检查红色边框是否包围正确内容。
  3. 如果正确,移除高亮并继续裁剪:
    bash
    agent-browser --auto-connect eval "document.querySelector('<选择器>').style.outline = ''; document.querySelector('<选择器>').style.outlineOffset = '';"
  4. 如果错误,尝试下一个候选元素或细化选择器,重新高亮并检查。

When to Use This Fallback

何时使用此备选方案

  • The page has complex/nested components and you're not sure which container is right
  • Multiple similar elements exist and you need to pick the correct one
  • The user's description is vague ("that chart in the middle of the page")
  • The
    get box
    result looks suspicious (too large, too small, zero-sized)

  • 页面有复杂/嵌套组件,不确定哪个容器正确
  • 存在多个相似元素,需要选择正确的那个
  • 用户描述模糊(“页面中间的那个图表”)
  • get box
    结果可疑(太大、太小、零尺寸)

Page Preparation: Clean Up Before Capture

页面准备:捕获前清理

Before taking the final screenshot, clean up the page for a better result:
bash
undefined
拍摄最终截图前,清理页面以获得更好结果:
bash
undefined

Dismiss cookie banners, popups, overlays

关闭cookie横幅、弹窗、覆盖层

agent-browser --auto-connect eval --stdin <<'EOF' (function() { // Common cookie/popup selectors const selectors = [ '[class*="cookie"] button', '[class*="consent"] button', '[class*="banner"] [class*="close"]', '[class*="modal"] [class*="close"]', '[class*="popup"] [class*="close"]', '[aria-label="Close"]', '[data-testid="close"]' ]; selectors.forEach(sel => { document.querySelectorAll(sel).forEach(el => { if (el.offsetParent !== null) el.click(); }); });
// Hide fixed/sticky elements that overlay content (nav bars, banners) document.querySelectorAll('*').forEach(el => { const style = getComputedStyle(el); if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') { el.style.display = 'none'; } }); })(); EOF

> **Use with caution**: Hiding fixed elements might remove important context. Only run this when overlays visibly obstruct the target region.
agent-browser --auto-connect eval --stdin <<'EOF' (function() { // 常见cookie/弹窗选择器 const selectors = [ '[class*="cookie"] button', '[class*="consent"] button', '[class*="banner"] [class*="close"]', '[class*="modal"] [class*="close"]', '[class*="popup"] [class*="close"]', '[aria-label="Close"]', '[data-testid="close"]' ]; selectors.forEach(sel => { document.querySelectorAll(sel).forEach(el => { if (el.offsetParent !== null) el.click(); }); });
// 隐藏覆盖内容的固定/粘性元素(导航栏、横幅) document.querySelectorAll('*').forEach(el => { const style = getComputedStyle(el); if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') { el.style.display = 'none'; } }); })(); EOF

> **谨慎使用**:隐藏固定元素可能会移除重要上下文。仅当覆盖层明显遮挡目标区域时运行此操作。

Cookie Banners That Won't Dismiss

无法关闭的Cookie横幅

Some cookie consent banners (e.g., Jina AI's Usercentrics) live in shadow DOM or iframes and cannot be dismissed via JS
click()
or
remove()
. Don't waste time with multiple JS attempts. Instead:
  1. Crop it out — if the banner is at the top or bottom, simply adjust the crop region to exclude it. This is the fastest and most reliable approach.
  2. Scroll past it — scroll the target content away from the banner area before capturing.

部分cookie同意横幅(例如Jina AI的Usercentrics)位于shadow DOM或iframe中,无法通过JS
click()
remove()
关闭。不要在多次JS尝试上浪费时间。而是:
  1. 裁剪掉——如果横幅在顶部或底部,只需调整裁剪区域以排除它。这是最快最可靠的方法。
  2. 滚动过去——在捕获前将目标内容滚动到远离横幅的区域。

Viewport Sizing

视口大小设置

For consistent, high-quality screenshots, set the viewport before capturing:
bash
undefined
为获得一致、高质量的截图,在捕获前设置视口:
bash
undefined

Standard desktop viewport

标准桌面视口

agent-browser --auto-connect set viewport 1280 800
agent-browser --auto-connect set viewport 1280 800

Wider for dashboard/data-heavy pages

更宽的视口(适用于仪表盘/数据密集型页面)

agent-browser --auto-connect set viewport 1440 900
agent-browser --auto-connect set viewport 1440 900

Narrower for mobile-like content (social media posts)

更窄的视口(适用于类移动端内容,如社交媒体帖子)

agent-browser --auto-connect set viewport 800 600

Choose a viewport width that makes the target content render cleanly — not too cramped, not too stretched.

---
agent-browser --auto-connect set viewport 800 600

选择能让目标内容清晰渲染的视口宽度——不要太拥挤,也不要太拉伸。

---

Complete Example: Screenshot a Reddit Post

完整示例:截取Reddit帖子

User: "Screenshot the top post on r/programming"
bash
undefined
用户:“截取r/programming上的置顶帖子”
bash
undefined

1. List existing tabs

1. 列出现有标签页

agent-browser --auto-connect tab list
agent-browser --auto-connect tab list

2. Navigate to subreddit

2. 导航到子版块

agent-browser --auto-connect open https://www.reddit.com/r/programming/ agent-browser --auto-connect wait 2000
agent-browser --auto-connect open https://www.reddit.com/r/programming/ agent-browser --auto-connect wait 2000

3. Find the first post container

3. 找到第一个帖子容器

agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"
agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"

4. Scroll it into view

4. 将其滚动到视图中

agent-browser --auto-connect scrollintoview "shreddit-post" agent-browser --auto-connect wait 500
agent-browser --auto-connect scrollintoview "shreddit-post" agent-browser --auto-connect wait 500

5. Get bounding box

5. 获取边界框

agent-browser --auto-connect get box "shreddit-post"
agent-browser --auto-connect get box "shreddit-post"

→ { x: 312, y: 80, width: 656, height: 420 }

→ { x: 312, y: 80, width: 656, height: 420 }

6. Take full-page screenshot

6. 拍摄整页截图

agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png
agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png

7. Crop with padding

7. 添加内边距裁剪

convert /tmp/reddit-raw.png
-crop 688x452+296+64 +repage
reddit-post-screenshot.png
convert /tmp/reddit-raw.png
-crop 688x452+296+64 +repage
reddit-post-screenshot.png

8. Verify by reading the output image

8. 通过读取输出图像验证


---

---

Key Commands Quick Reference

关键命令快速参考

CommandPurpose
tab list
List open tabs
open <url>
Navigate to URL
wait 2000
Wait for content to settle
snapshot -i
See interactive elements
screenshot --annotate
Visual overview with labels
screenshot --full <path>
Full-page screenshot
get box "<selector>"
Get element bounding box
scrollintoview "<sel>"
Scroll element into view
eval <js>
Run JavaScript in page
set viewport <w> <h>
Set viewport dimensions

命令用途
tab list
列出打开的标签页
open <url>
导航到URL
wait 2000
等待内容加载完成
snapshot -i
查看交互元素
screenshot --annotate
带标签的视觉概览
screenshot --full <path>
整页截图
get box "<选择器>"
获取元素边界框
scrollintoview "<选择器>"
将元素滚动到视图中
eval <js>
在页面中运行JavaScript
set viewport <w> <h>
设置视口尺寸

Troubleshooting

故障排除

get box
returns null or zero-sized

get box
返回null或零尺寸

  • The selector doesn't match any element. Use
    get count "<selector>"
    to verify.
  • The element may be hidden or not yet rendered. Try
    wait 2000
    and retry.
  • 选择器不匹配任何元素。使用
    get count "<选择器>"
    验证。
  • 元素可能隐藏或尚未渲染。尝试
    wait 2000
    并重试。

Cropped image is blank or wrong area

裁剪后的图像空白或区域错误

  • The full-page screenshot coordinates may differ from viewport coordinates. Use
    screenshot --full
    with
    get box
    (they use the same coordinate system).
  • Check if the page has horizontal scroll —
    get box
    x values may be offset.
  • 整页截图坐标可能与视口坐标不同。将
    screenshot --full
    get box
    配合使用(它们使用相同的坐标系)。
  • 检查页面是否有水平滚动——
    get box
    的x值可能偏移。

Target element is inside an iframe

目标元素在iframe内

  • get box
    and
    snapshot -i
    cannot see inside iframes.
  • Use
    eval
    to access iframe content:
    bash
    agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<sel>').getBoundingClientRect()"
    Note: Only works for same-origin iframes.
  • get box
    snapshot -i
    无法看到iframe内部。
  • 使用
    eval
    访问iframe内容:
    bash
    agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<选择器>').getBoundingClientRect()"
    注意:仅适用于同源iframe。

open
succeeded but page content is wrong

open
成功但页面内容错误

  • The browser may have switched to a different tab (e.g., a popup or redirect opened a new tab). Always verify after navigation:
    bash
    agent-browser --auto-connect eval "document.location.href"
  • If the URL is wrong, use
    tab list
    to find the correct tab and
    tab goto <N>
    to switch.
  • 浏览器可能切换到了其他标签页(例如弹窗或重定向打开了新标签页)。导航后始终验证:
    bash
    agent-browser --auto-connect eval "document.location.href"
  • 如果URL错误,使用
    tab list
    找到正确标签页,然后使用
    tab goto <N>
    切换。

Screenshot command times out on fonts

截图命令因字体超时

  • Some pages (e.g., Google developer docs) hang on
    document.fonts.ready
    . Force-resolve it first:
    bash
    agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
    Then retry the screenshot.
  • 部分页面(例如Google开发者文档)会在
    document.fonts.ready
    处挂起。先强制解析:
    bash
    agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
    然后重试截图。

Page has lazy-loaded content

页面有懒加载内容

  • Scroll down to trigger loading before taking the screenshot:
    bash
    agent-browser --auto-connect scroll down 1000
    agent-browser --auto-connect wait 1500
    agent-browser --auto-connect scroll up 1000
  • 在截图前向下滚动触发加载:
    bash
    agent-browser --auto-connect scroll down 1000
    agent-browser --auto-connect wait 1500
    agent-browser --auto-connect scroll up 1000