browser-screenshot

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Skill: Browser Screenshot

技能：Browser Screenshot

Take focused screenshots of specific regions on web pages — a Reddit post, a tweet, an article section, a chart, etc. — not just a full-page dump.

Prerequisite: agent-browser must be installed and Chrome must have remote debugging enabled. See
references/agent-browser-setup.md
if unsure.

截取网页上特定区域的聚焦截图——比如Reddit帖子、推文、文章章节、图表等——而不只是整页截图。

前提条件：必须安装agent-browser，且Chrome需启用远程调试。如有疑问，请查看
references/agent-browser-setup.md
。

Overview

概述

This skill handles the full pipeline:

Research the best page to screenshot (web search, fetch)
Navigate to the right page in the browser
Locate the target element/region on the page
Capture a focused, cropped screenshot of just that region

该技能处理完整流程：

调研最佳截图页面（网页搜索、获取）
在浏览器中导航到正确页面
在页面上定位目标元素/区域
捕获仅该区域的聚焦裁剪截图

Hard Rule: No Full-Screen Screenshots

硬性规则：禁止全屏截图

NEVER output an uncropped full-viewport or full-page screenshot as a final result. Full screenshots contain too much noise (nav bars, sidebars, ads, unrelated content) and are unsuitable as article illustrations. Every screenshot MUST be cropped to a focused region.

绝对不要输出未裁剪的全屏或整页截图作为最终结果。 完整截图包含太多无关内容（导航栏、侧边栏、广告、不相关内容），不适合作为文章插图。所有截图都必须裁剪到聚焦区域。

Step 0: Research — Find the Right Page First

步骤0：调研——先找到正确页面

Before opening anything in the browser, figure out which page to screenshot. Use WebSearch and WebFetch tools (not the browser) for this research phase — they're faster and don't require tab management.

在打开浏览器之前，先确定要截图的页面。此调研阶段使用WebSearch和WebFetch工具（而非浏览器）——它们速度更快，无需管理标签页。

Page Selection Strategy

页面选择策略

The right page depends on the context of the article and how recent/notable the subject is:

Subject Type	Best Page to Find	How to Find It
New model/feature launch (< 6 months)	Official blog post announcing it	WebSearch `"<model name>" site:<vendor-domain> blog`
Established product (> 6 months)	Product landing page or docs overview	WebSearch `"<model name>" official page`
Open-source model	HuggingFace model card or GitHub repo	Direct URL: `huggingface.co/<org>/<model>`
API service	API documentation page	WebSearch `"<service name>" API docs`

合适的页面取决于文章的上下文以及主题的时效性/知名度：

主题类型	最佳查找页面	查找方式
新模型/功能发布（<6个月）	官方发布博客文章	WebSearch `"<模型名称>" site:<厂商域名> blog`
成熟产品（>6个月）	产品落地页或文档概览	WebSearch `"<模型名称>" official page`
开源模型	HuggingFace模型卡片或GitHub仓库	直接URL： `huggingface.co/<组织>/<模型>`
API服务	API文档页面	WebSearch `"<服务名称>" API docs`

What Makes a Good Screenshot Source

优质截图来源的特征

Official blog posts are ideal: they have hero images, prominent titles, and concise descriptions designed for sharing
Product landing pages work well: hero sections with taglines and key features
HuggingFace model cards are reliable for open-source models: consistent layout, model name + description always at top
API docs are acceptable fallback: show the product name and key specs

官方博客文章是理想选择：它们有首屏图、醒目标题和专为分享设计的简洁描述
产品落地页效果很好：包含标语和核心功能的首屏区域
HuggingFace模型卡片对开源模型很可靠：布局一致，模型名称+描述始终在顶部
API文档是可接受的备选：展示产品名称和关键规格

Pre-Flight URL Validation

预验证URL

Before opening in the browser, validate URLs with WebFetch (lightweight HEAD/GET) to avoid wasting time on 404s or redirects:

WebFetch: <candidate-url>
→ Check status code, title, and content snippet
→ If 404 or redirect to unrelated page, try next candidate

在浏览器中打开之前，使用WebFetch（轻量级HEAD/GET）验证URL，避免在404或重定向页面上浪费时间：

WebFetch: <候选URL>
→ 检查状态码、标题和内容片段
→ 如果是404或重定向到不相关页面，尝试下一个候选

Region Selection Strategy

区域选择策略

Think about what the article reader needs to see in this screenshot:

Article Context	What to Capture	Target Region
Introducing a model in a lineup	Model name + key tagline/description	Blog hero section or HF model card header
Comparing capabilities	Feature highlights or spec table	Blog section showing specs/features
Discussing a specific feature	The feature description	Relevant section heading + 1-2 paragraphs
Showing a product/service	Brand identity + value prop	Landing page hero (title + subtitle + visual)

The screenshot should make the reader think "ah, that's what this model/product is" — not "what am I looking at?"

思考文章读者需要在截图中看到什么：

文章上下文	要捕获的内容	目标区域
在系列中介绍模型	模型名称+关键标语/描述	博客首屏区域或HF模型卡片头部
对比功能	功能亮点或规格表格	展示规格/功能的博客章节
讨论特定功能	功能描述	相关章节标题+1-2段落
展示产品/服务	品牌标识+价值主张	落地页首屏（标题+副标题+视觉元素）

截图应让读者觉得“哦，这就是这个模型/产品”——而不是“我看的是什么？”

Step 1: Navigate to the Target Page

步骤1：导航到目标页面

Always Start by Listing Tabs

始终先列出标签页

bash

agent-browser --auto-connect tab list

Check if the page is already open. Reuse existing tabs — they have login sessions and correct state.

bash

agent-browser --auto-connect tab list

检查页面是否已打开。重复使用现有标签页——它们有登录会话和正确状态。

Navigation by Input Type

按输入类型的导航策略

User Provides	Strategy
Direct URL	`agent-browser --auto-connect open <url>`
Search query	`open https://www.google.com/search?q=<encoded-query>` → find and click the best result
Platform + topic	Construct platform search URL (see below) → locate target content
Vague description	Google search → evaluate results → navigate to best match

用户提供的内容	策略
直接URL	`agent-browser --auto-connect open <url>`
搜索查询	`open https://www.google.com/search?q=<编码后的查询词>` → 找到并点击最佳结果
平台+主题	构造平台搜索URL（见下文）→ 定位目标内容
模糊描述	Google搜索 → 评估结果 → 导航到最佳匹配页面

Platform-Specific Search URLs

平台特定搜索URL

Platform	Search URL Pattern
Reddit	`https://www.reddit.com/search/?q=<query>`
X / Twitter	`https://x.com/search?q=<query>`
LinkedIn	`https://www.linkedin.com/search/results/content/?keywords=<query>`
Hacker News	`https://hn.algolia.com/?q=<query>`
GitHub	`https://github.com/search?q=<query>`
YouTube	`https://www.youtube.com/results?search_query=<query>`

平台	搜索URL模板
Reddit	`https://www.reddit.com/search/?q=<查询词>`
X / Twitter	`https://x.com/search?q=<查询词>`
LinkedIn	`https://www.linkedin.com/search/results/content/?keywords=<查询词>`
Hacker News	`https://hn.algolia.com/?q=<查询词>`
GitHub	`https://github.com/search?q=<查询词>`
YouTube	`https://www.youtube.com/results?search_query=<查询词>`

Wait for Page Load

等待页面加载

After navigation, wait for content to settle:

bash

agent-browser --auto-connect wait --load networkidle

Note: Some sites (Reddit, X, LinkedIn) never reach
networkidle
. If
open
already shows the page title in its output, skip the wait. Use
wait 2000
as a safe alternative.

导航后，等待内容加载完成：

bash

agent-browser --auto-connect wait --load networkidle

注意：部分网站（Reddit、X、LinkedIn）永远不会达到
networkidle
状态。如果
open
的输出已显示页面标题，则跳过等待。使用
wait 2000
作为安全替代方案。

Step 2: Locate the Target Region

步骤2：定位目标区域

This is the critical step. The goal is to find a CSS selector that precisely wraps the content to capture.

这是关键步骤。目标是找到能精准包裹要捕获内容的CSS选择器。

Primary Method: DOM Selector Discovery

主要方法：DOM选择器发现

Take an annotated screenshot to understand the page layout:
bash
```
agent-browser --auto-connect screenshot --annotate
```
Take a snapshot to see the page's accessibility tree:
bash
```
agent-browser --auto-connect snapshot -i
```
Identify the target container element. Look for:
- Semantic HTML containers:
```
<article>
```
  ,
```
<main>
```
  ,
```
<section>
```
- Platform-specific components (see Platform Selectors)
- Data attributes:
```
[data-testid="..."]
```
  ,
```
[data-id="..."]
```
Verify with
get box
to confirm the element has a reasonable bounding box:
bash
```
agent-browser --auto-connect get box "<selector>"
```
This returns
```
{ x, y, width, height }
```
. Sanity-check:
- Width should be > 100px and < viewport width
- Height should be > 50px
- If the box is the entire page, the selector is too broad — refine it

If the selector is hard to find, use

eval

to explore the DOM:

bash

agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"

拍摄带注释的截图以了解页面布局：
bash
```
agent-browser --auto-connect screenshot --annotate
```
生成快照以查看页面的可访问性树：
bash
```
agent-browser --auto-connect snapshot -i
```
识别目标容器元素。寻找：
- 语义化HTML容器：
```
<article>
```
  、
```
<main>
```
  、
```
<section>
```
- 平台特定组件（见平台选择器）
- 数据属性：
```
[data-testid="..."]
```
  、
```
[data-id="..."]
```
使用
get box
验证以确认元素有合理的边界框：
bash
```
agent-browser --auto-connect get box "<选择器>"
```
它会返回
```
{ x, y, width, height }
```
。检查合理性：
- 宽度应>100px且<视口宽度
- 高度应>50px
- 如果边界框是整个页面，说明选择器太宽泛——需细化

如果选择器难以找到，使用

eval

探索DOM：

bash

agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"

Platform Selectors

平台选择器

Common container selectors for popular platforms:

Platform	Target	Typical Selector
Reddit	A post	`shreddit-post` , `[data-testid="post-container"]`
X / Twitter	A tweet	`article[data-testid="tweet"]`
LinkedIn	A feed post	`.feed-shared-update-v2`
Hacker News	A story + comments	`#hnmain .fatitem`
GitHub	A repo card	`[data-hpc]` , `.repository-content`
YouTube	Video player area	`#player-container-outer`
Generic article	Main content	`article` , `main` , `[role="main"]` , `.post-content` , `.article-body`

These selectors may change over time. Always verify with
get box
before using.

热门平台的常见容器选择器：

平台	目标	典型选择器
Reddit	帖子	`shreddit-post` , `[data-testid="post-container"]`
X / Twitter	推文	`article[data-testid="tweet"]`
LinkedIn	动态帖子	`.feed-shared-update-v2`
Hacker News	故事+评论	`#hnmain .fatitem`
GitHub	仓库卡片	`[data-hpc]` , `.repository-content`
YouTube	视频播放器区域	`#player-container-outer`
通用文章	主要内容	`article` , `main` , `[role="main"]` , `.post-content` , `.article-body`

这些选择器可能会随时间变化。使用前务必用
get box
验证。

Multiple Matching Elements

多个匹配元素

If the selector matches multiple elements (e.g., multiple tweets on a timeline), narrow it down:

bash

undefined

如果选择器匹配多个元素（例如时间线上的多条推文），请缩小范围：

bash

undefined

Count matches

统计匹配数量

agent-browser --auto-connect get count "article[data-testid='tweet']"

Use nth-child or :first-of-type, or a more specific selector

使用nth-child或:first-of-type，或更具体的选择器

Or use eval to find the right one by text content:

或使用eval通过文本内容找到正确的元素：

agent-browser --auto-connect eval --stdin <<'EOF' const posts = document.querySelectorAll('article[data-testid="tweet"]'); for (let i = 0; i < posts.length; i++) { const text = posts[i].textContent.substring(0, 80); console.log(i, text); } EOF


Then target a specific one using `:nth-of-type(N)` or a unique parent selector.

---


然后使用`:nth-of-type(N)`或更具体的父选择器定位特定元素。

---

Step 3: Capture the Focused Screenshot

步骤3：捕获聚焦截图

Method A: Scroll + Viewport Screenshot (Preferred for Viewport-Sized Targets)

方法A：滚动+视口截图（适用于视口大小的目标）

Best when the target element fits within the viewport.

bash

undefined

当目标元素能容纳在视口中时最佳。

bash

undefined

Scroll the target into view

将目标滚动到视图中

agent-browser --auto-connect scrollintoview "<selector>" agent-browser --auto-connect wait 500

agent-browser --auto-connect scrollintoview "<选择器>" agent-browser --auto-connect wait 500

Take viewport screenshot

拍摄视口截图

agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png


Then crop using the bounding box (see [Cropping](#cropping)).

agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png


然后使用边界框裁剪（见[裁剪](#裁剪)）。

Method B: Full-Page Screenshot + Crop (For Any Size Target)

方法B：整页截图+裁剪（适用于任意大小的目标）

Best when the target might be larger than the viewport or when precise cropping is needed.

bash

undefined

当目标可能大于视口或需要精准裁剪时最佳。

bash

undefined

Take full-page screenshot

拍摄整页截图

agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png

Get the target element's bounding box

获取目标元素的边界框

agent-browser --auto-connect get box "<selector>"

agent-browser --auto-connect get box "<选择器>"

Output: { x: 200, y: 450, width: 680, height: 520 }

输出：{ x: 200, y: 450, width: 680, height: 520 }


Then crop (see [Cropping](#cropping)).


然后裁剪（见[裁剪](#裁剪)）。

Cropping

裁剪

Use ImageMagick (

magick

on IMv7,

convert

is deprecated) to crop the screenshot to the target region. Add padding for visual breathing room.

使用ImageMagick（IMv7为

magick

，

convert

已弃用）将截图裁剪到目标区域。添加内边距以获得视觉呼吸空间。

Retina Display Handling

Retina显示屏处理

Critical: On macOS Retina displays, screenshots are captured at 2x resolution. A 1728x940 viewport produces a 3456x1880 image. You MUST account for this:

Detect the scale factor: Compare viewport size vs actual image dimensions:

bash

# Check actual image dimensions
magick identify /tmp/screenshot.png
# → 3456x1880 means 2x scale on a 1728x940 viewport

Multiply
get box
coordinates by the scale factor before cropping:

bash

# get box returns viewport coordinates: { x: 200, y: 450, width: 680, height: 520 }
# For 2x Retina, actual image coordinates are:
SCALE=2
X=$((200 * SCALE))
Y=$((450 * SCALE))
W=$((680 * SCALE))
H=$((520 * SCALE))
PADDING=$((16 * SCALE))

关键：在macOS Retina显示屏上，截图以2倍分辨率捕获。1728x940的视口会生成3456x1880的图像。你必须考虑这一点：

检测缩放比例：对比视口大小与实际图像尺寸：

bash

# 检查实际图像尺寸
magick identify /tmp/screenshot.png
# → 3456x1880表示在1728x940视口上是2倍缩放

裁剪前将
get box
坐标乘以缩放比例：

bash

# get box返回视口坐标：{ x: 200, y: 450, width: 680, height: 520 }
# 对于2倍Retina，实际图像坐标为：
SCALE=2
X=$((200 * SCALE))
Y=$((450 * SCALE))
W=$((680 * SCALE))
H=$((520 * SCALE))
PADDING=$((16 * SCALE))

Crop Command

裁剪命令

bash

magick /tmp/browser-screenshot-full.png \
  -crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
  +repage \
  <output-path>.png

Important:
get box
returns floating-point values. Round them to integers before passing to ImageMagick.

Padding: Use 12–20px (viewport px). Increase to ~30px if the target has a distinct visual boundary (card, bordered box). Use 0 if the user wants a tight crop.

bash

magick /tmp/browser-screenshot-full.png \
  -crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
  +repage \
  <输出路径>.png

重要：
get box
返回浮点值。传递给ImageMagick前需四舍五入为整数。

内边距：使用12–20px（视口像素）。如果目标有明显的视觉边界（卡片、带边框的盒子），增加到约30px。如果用户需要紧凑裁剪，使用0。

Output Path

输出路径

If the user specifies an output path, use that
Otherwise, save to a descriptive name in the current directory, e.g.,
```
reddit-post-screenshot.png
```
,
```
tweet-screenshot.png
```

如果用户指定输出路径，使用该路径
否则，保存到当前目录的描述性名称，例如
```
reddit-post-screenshot.png
```
、
```
tweet-screenshot.png
```

Step 4: Verify the Result

步骤4：验证结果

After cropping, read the output image to verify it captured the right content:

bash

undefined

裁剪后，读取输出图像以验证是否捕获了正确内容：

bash

undefined

Use the Read tool to visually inspect the cropped screenshot

使用Read工具视觉检查裁剪后的截图


If the crop is wrong (missed content, too much whitespace, wrong element), adjust the selector or bounding box and retry.

---


如果裁剪错误（遗漏内容、空白过多、错误元素），调整选择器或边界框并重试。

---

Fallback: Visual Highlight Confirmation

备选方案：视觉高亮确认

When DOM-based location is uncertain — the selector might be wrong, multiple candidates exist, or the target is ambiguous — use JS-injected highlighting to visually confirm before cropping.

当基于DOM的定位不确定时——选择器可能错误、存在多个候选元素或目标不明确——使用注入JS高亮在裁剪前视觉确认。

How It Works

工作方式

Inject a highlight border on the candidate element:

bash

agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
  const el = document.querySelector('<selector>');
  if (!el) { console.log('NOT_FOUND'); return; }
  el.style.outline = '4px solid red';
  el.style.outlineOffset = '2px';
  el.scrollIntoView({ block: 'center' });
})();
EOF

Take a screenshot and visually inspect:
bash
```
agent-browser --auto-connect screenshot /tmp/highlight-check.png
```
Read the screenshot to check if the red border surrounds the correct content.

If correct, remove the highlight and proceed with cropping:

bash

agent-browser --auto-connect eval "document.querySelector('<selector>').style.outline = ''; document.querySelector('<selector>').style.outlineOffset = '';"

If wrong, try the next candidate or refine the selector, re-highlight, and re-check.

在候选元素上注入高亮边框：

bash

agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
  const el = document.querySelector('<选择器>');
  if (!el) { console.log('NOT_FOUND'); return; }
  el.style.outline = '4px solid red';
  el.style.outlineOffset = '2px';
  el.scrollIntoView({ block: 'center' });
})();
EOF

拍摄截图并视觉检查：
bash
```
agent-browser --auto-connect screenshot /tmp/highlight-check.png
```
读取截图以检查红色边框是否包围正确内容。

如果正确，移除高亮并继续裁剪：

bash

agent-browser --auto-connect eval "document.querySelector('<选择器>').style.outline = ''; document.querySelector('<选择器>').style.outlineOffset = '';"

如果错误，尝试下一个候选元素或细化选择器，重新高亮并检查。

When to Use This Fallback

何时使用此备选方案

The page has complex/nested components and you're not sure which container is right
Multiple similar elements exist and you need to pick the correct one
The user's description is vague ("that chart in the middle of the page")
The
```
get box
```
result looks suspicious (too large, too small, zero-sized)

页面有复杂/嵌套组件，不确定哪个容器正确
存在多个相似元素，需要选择正确的那个
用户描述模糊（“页面中间的那个图表”）
```
get box
```
结果可疑（太大、太小、零尺寸）

Page Preparation: Clean Up Before Capture

页面准备：捕获前清理

Before taking the final screenshot, clean up the page for a better result:

bash

undefined

拍摄最终截图前，清理页面以获得更好结果：

bash

undefined

Dismiss cookie banners, popups, overlays

关闭cookie横幅、弹窗、覆盖层

agent-browser --auto-connect eval --stdin <<'EOF' (function() { // Common cookie/popup selectors const selectors = [ '[class*="cookie"] button', '[class*="consent"] button', '[class*="banner"] [class*="close"]', '[class*="modal"] [class*="close"]', '[class*="popup"] [class*="close"]', '[aria-label="Close"]', '[data-testid="close"]' ]; selectors.forEach(sel => { document.querySelectorAll(sel).forEach(el => { if (el.offsetParent !== null) el.click(); }); });

// Hide fixed/sticky elements that overlay content (nav bars, banners) document.querySelectorAll('*').forEach(el => { const style = getComputedStyle(el); if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') { el.style.display = 'none'; } }); })(); EOF


> **Use with caution**: Hiding fixed elements might remove important context. Only run this when overlays visibly obstruct the target region.

agent-browser --auto-connect eval --stdin <<'EOF' (function() { // 常见cookie/弹窗选择器 const selectors = [ '[class*="cookie"] button', '[class*="consent"] button', '[class*="banner"] [class*="close"]', '[class*="modal"] [class*="close"]', '[class*="popup"] [class*="close"]', '[aria-label="Close"]', '[data-testid="close"]' ]; selectors.forEach(sel => { document.querySelectorAll(sel).forEach(el => { if (el.offsetParent !== null) el.click(); }); });

// 隐藏覆盖内容的固定/粘性元素（导航栏、横幅） document.querySelectorAll('*').forEach(el => { const style = getComputedStyle(el); if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') { el.style.display = 'none'; } }); })(); EOF


> **谨慎使用**：隐藏固定元素可能会移除重要上下文。仅当覆盖层明显遮挡目标区域时运行此操作。

Cookie Banners That Won't Dismiss

无法关闭的Cookie横幅

Some cookie consent banners (e.g., Jina AI's Usercentrics) live in shadow DOM or iframes and cannot be dismissed via JS

click()

remove()

. Don't waste time with multiple JS attempts. Instead:

Crop it out — if the banner is at the top or bottom, simply adjust the crop region to exclude it. This is the fastest and most reliable approach.
Scroll past it — scroll the target content away from the banner area before capturing.

部分cookie同意横幅（例如Jina AI的Usercentrics）位于shadow DOM或iframe中，无法通过JS

click()

或

remove()

关闭。不要在多次JS尝试上浪费时间。而是：

裁剪掉——如果横幅在顶部或底部，只需调整裁剪区域以排除它。这是最快最可靠的方法。
滚动过去——在捕获前将目标内容滚动到远离横幅的区域。

Viewport Sizing

视口大小设置

For consistent, high-quality screenshots, set the viewport before capturing:

bash

undefined

为获得一致、高质量的截图，在捕获前设置视口：

bash

undefined

Standard desktop viewport

标准桌面视口

agent-browser --auto-connect set viewport 1280 800

Wider for dashboard/data-heavy pages

更宽的视口（适用于仪表盘/数据密集型页面）

agent-browser --auto-connect set viewport 1440 900

Narrower for mobile-like content (social media posts)

更窄的视口（适用于类移动端内容，如社交媒体帖子）

agent-browser --auto-connect set viewport 800 600


Choose a viewport width that makes the target content render cleanly — not too cramped, not too stretched.

---

agent-browser --auto-connect set viewport 800 600


选择能让目标内容清晰渲染的视口宽度——不要太拥挤，也不要太拉伸。

---

Complete Example: Screenshot a Reddit Post

完整示例：截取Reddit帖子

User: "Screenshot the top post on r/programming"

bash

undefined

用户：“截取r/programming上的置顶帖子”

bash

undefined

1. List existing tabs

1. 列出现有标签页

agent-browser --auto-connect tab list

2. Navigate to subreddit

2. 导航到子版块

agent-browser --auto-connect open https://www.reddit.com/r/programming/ agent-browser --auto-connect wait 2000

3. Find the first post container

3. 找到第一个帖子容器

agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"

4. Scroll it into view

4. 将其滚动到视图中

agent-browser --auto-connect scrollintoview "shreddit-post" agent-browser --auto-connect wait 500

5. Get bounding box

5. 获取边界框

agent-browser --auto-connect get box "shreddit-post"

→ { x: 312, y: 80, width: 656, height: 420 }

6. Take full-page screenshot

6. 拍摄整页截图

agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png

7. Crop with padding

7. 添加内边距裁剪

convert /tmp/reddit-raw.png
-crop 688x452+296+64 +repage
reddit-post-screenshot.png

8. Verify by reading the output image

8. 通过读取输出图像验证

---

---

Key Commands Quick Reference

关键命令快速参考

Command	Purpose
`tab list`	List open tabs
`open <url>`	Navigate to URL
`wait 2000`	Wait for content to settle
`snapshot -i`	See interactive elements
`screenshot --annotate`	Visual overview with labels
`screenshot --full <path>`	Full-page screenshot
`get box "<selector>"`	Get element bounding box
`scrollintoview "<sel>"`	Scroll element into view
`eval <js>`	Run JavaScript in page
`set viewport <w> <h>`	Set viewport dimensions

命令	用途
`tab list`	列出打开的标签页
`open <url>`	导航到URL
`wait 2000`	等待内容加载完成
`snapshot -i`	查看交互元素
`screenshot --annotate`	带标签的视觉概览
`screenshot --full <path>`	整页截图
`get box "<选择器>"`	获取元素边界框
`scrollintoview "<选择器>"`	将元素滚动到视图中
`eval <js>`	在页面中运行JavaScript
`set viewport <w> <h>`	设置视口尺寸

Troubleshooting

故障排除

get box

returns null or zero-sized

get box

返回null或零尺寸

The selector doesn't match any element. Use
```
get count "<selector>"
```
to verify.
The element may be hidden or not yet rendered. Try
```
wait 2000
```
and retry.

选择器不匹配任何元素。使用
```
get count "<选择器>"
```
验证。
元素可能隐藏或尚未渲染。尝试
```
wait 2000
```
并重试。

Cropped image is blank or wrong area

裁剪后的图像空白或区域错误

The full-page screenshot coordinates may differ from viewport coordinates. Use
```
screenshot --full
```
with
```
get box
```
(they use the same coordinate system).
Check if the page has horizontal scroll —
```
get box
```
x values may be offset.

整页截图坐标可能与视口坐标不同。将
```
screenshot --full
```
与
```
get box
```
配合使用（它们使用相同的坐标系）。
检查页面是否有水平滚动——
```
get box
```
的x值可能偏移。

Target element is inside an iframe

目标元素在iframe内

```
get box
```
and
```
snapshot -i
```
cannot see inside iframes.

Use

eval

to access iframe content:

bash

agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<sel>').getBoundingClientRect()"

Note: Only works for same-origin iframes.

```
get box
```
和
```
snapshot -i
```
无法看到iframe内部。

使用

eval

访问iframe内容：

bash

agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<选择器>').getBoundingClientRect()"

注意：仅适用于同源iframe。

open

succeeded but page content is wrong

open

成功但页面内容错误

The browser may have switched to a different tab (e.g., a popup or redirect opened a new tab). Always verify after navigation:
bash
```
agent-browser --auto-connect eval "document.location.href"
```
If the URL is wrong, use
```
tab list
```
to find the correct tab and
```
tab goto <N>
```
to switch.

浏览器可能切换到了其他标签页（例如弹窗或重定向打开了新标签页）。导航后始终验证：
bash
```
agent-browser --auto-connect eval "document.location.href"
```
如果URL错误，使用
```
tab list
```
找到正确标签页，然后使用
```
tab goto <N>
```
切换。

Screenshot command times out on fonts

截图命令因字体超时

Some pages (e.g., Google developer docs) hang on
```
document.fonts.ready
```
. Force-resolve it first:
bash
```
agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
```
Then retry the screenshot.

部分页面（例如Google开发者文档）会在
```
document.fonts.ready
```
处挂起。先强制解析：
bash
```
agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
```
然后重试截图。

Page has lazy-loaded content

页面有懒加载内容

Scroll down to trigger loading before taking the screenshot:

bash

agent-browser --auto-connect scroll down 1000
agent-browser --auto-connect wait 1500
agent-browser --auto-connect scroll up 1000

在截图前向下滚动触发加载：

bash

agent-browser --auto-connect scroll down 1000
agent-browser --auto-connect wait 1500
agent-browser --auto-connect scroll up 1000