cf-browser

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Cloudflare Browser Rendering

Browse and scrape the web via Cloudflare's Browser Rendering REST API. Every call is a single POST request — no browser setup, no Puppeteer scripts.

通过Cloudflare的Browser Rendering REST API浏览和抓取网页内容。每次调用仅需一个POST请求——无需浏览器设置，也无需编写Puppeteer脚本。

Prerequisites

前提条件

Requires two env vars (confirm they're set before making calls):

```
CF_ACCOUNT_ID
```
— Cloudflare account ID
```
CF_API_TOKEN
```
— API token with Browser Rendering - Edit permission

需要设置两个环境变量（调用前请确认已配置）：

```
CF_ACCOUNT_ID
```
— Cloudflare账户ID
```
CF_API_TOKEN
```
— 拥有Browser Rendering - 编辑权限的API令牌

Helper script

辅助脚本

Use cfbr.sh for all API calls. It handles auth headers and the base URL:

bash

undefined

所有API调用均可使用cfbr.sh脚本，它会自动处理认证头和基础URL：

bash

undefined

JSON endpoints

JSON端点

cfbr.sh <endpoint> '<json_body>'

Screenshot (binary) — optional third arg for output filename

屏幕截图（二进制）——可选第三个参数作为输出文件名

cfbr.sh screenshot '<json_body>' output.png

undefined

cfbr.sh screenshot '<json_body>' output.png

undefined

Choosing an endpoint

选择端点

Goal	Endpoint	When to use
Read page content for analysis	`markdown`	Default choice — clean, token-efficient
Extract specific elements	`scrape`	Know the CSS selectors for what you need
Extract structured data with AI	`json`	Need typed objects, don't know exact selectors
Get full rendered DOM	`content`	Need raw HTML for parsing or debugging
Discover pages / crawl	`links`	Building a sitemap or finding subpages
Visual inspection	`screenshot`	Need to see the page layout or debug visually
DOM + visual in one shot	`snapshot`	Need both HTML and a screenshot

For full endpoint details and parameters, see api.md.

目标	端点	使用场景
读取页面内容用于分析	`markdown`	默认选择——内容简洁、令牌使用效率高
提取特定元素	`scrape`	清楚所需内容对应的CSS选择器时
用AI提取结构化数据	`json`	需要类型化对象，但不清楚具体选择器时
获取完整渲染后的DOM	`content`	需要原始HTML用于解析或调试时
发现页面/爬取	`links`	构建站点地图或查找子页面时
视觉检查	`screenshot`	需要查看页面布局或进行视觉调试时
同时获取DOM和视觉内容	`snapshot`	同时需要HTML和屏幕截图时

有关端点的完整详情和参数，请参阅api.md。

Scraping workflow

抓取工作流

Follow this sequence when scraping a site for structured data (e.g. rental listings, product catalogs, job boards):

当需要从网站抓取结构化数据时（如租赁列表、产品目录、招聘信息），请遵循以下步骤：

1. Reconnaissance — understand the page

1. 侦察阶段——了解页面结构

Start with

markdown

to see what content is on the page and how it's structured:

bash

cfbr.sh markdown '{"url":"https://target-site.com/listings", "gotoOptions":{"waitUntil":"networkidle0"}}'

If the page is an SPA or loads content dynamically,

networkidle0

ensures JS finishes executing. If you know a specific element that signals content is ready, use

waitForSelector

instead — it's faster:

json

{"url":"...", "waitForSelector": ".listing-card"}

首先使用

markdown

端点查看页面包含的内容及其结构：

bash

cfbr.sh markdown '{"url":"https://target-site.com/listings", "gotoOptions":{"waitUntil":"networkidle0"}}'

如果页面是SPA或动态加载内容，

networkidle0

参数可确保JS执行完成。如果您知道某个特定元素标志着内容已加载完成，可改用

waitForSelector

——速度更快：

json

{"url":"...", "waitForSelector": ".listing-card"}

2. Discover structure — find the selectors

2. 发现结构——查找选择器

From the markdown/HTML, identify repeating patterns (listing cards, table rows, etc.) and their CSS selectors. If unclear from markdown alone, use

screenshot

to visually inspect:

bash

cfbr.sh screenshot '{"url":"https://target-site.com/listings", "screenshotOptions":{"fullPage":true}, "gotoOptions":{"waitUntil":"networkidle0"}}' listings.png

从markdown/HTML内容中识别重复的模式（如列表卡片、表格行等）及其对应的CSS选择器。如果仅通过markdown无法明确，可使用

screenshot

进行视觉检查：

bash

cfbr.sh screenshot '{"url":"https://target-site.com/listings", "screenshotOptions":{"fullPage":true}, "gotoOptions":{"waitUntil":"networkidle0"}}' listings.png

3. Extract — pull structured data

3. 提取阶段——获取结构化数据

Option A: CSS selectors (when you know the DOM structure)

bash

cfbr.sh scrape '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "elements": [
    {"selector": ".listing-card .title"},
    {"selector": ".listing-card .price"},
    {"selector": ".listing-card .address"},
    {"selector": ".listing-card a"}
  ]
}'

The

scrape

endpoint returns

text

html

attributes

(including

href

), and position/dimensions for each match. Correlate results across selectors by index (first title matches first price, etc.).

Option B: AI extraction (when structure is complex or unknown)

bash

cfbr.sh json '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "prompt": "Extract all rental listings with title, price, address, bedrooms, and link",
  "response_format": {
    "type": "json_schema",
    "schema": {
      "type": "object",
      "properties": {
        "listings": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "price": {"type": "string"},
              "address": {"type": "string"},
              "bedrooms": {"type": "string"},
              "url": {"type": "string"}
            },
            "required": ["title", "price"]
          }
        }
      }
    }
  }
}'

Prefer

scrape

when selectors are clear — it's deterministic and free. Use

json

when the page structure is messy or you need semantic interpretation (incurs Workers AI charges).

选项A：CSS选择器（清楚DOM结构时）

bash

cfbr.sh scrape '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "elements": [
    {"selector": ".listing-card .title"},
    {"selector": ".listing-card .price"},
    {"selector": ".listing-card .address"},
    {"selector": ".listing-card a"}
  ]
}'

scrape

端点会返回每个匹配元素的

text

、

html

、

attributes

（包括

href

）以及位置/尺寸信息。可通过数组索引关联不同选择器的结果（如第一个标题对应第一个价格等）。

选项B：AI提取（结构复杂或未知时）

bash

cfbr.sh json '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "prompt": "Extract all rental listings with title, price, address, bedrooms, and link",
  "response_format": {
    "type": "json_schema",
    "schema": {
      "type": "object",
      "properties": {
        "listings": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "price": {"type": "string"},
              "address": {"type": "string"},
              "bedrooms": {"type": "string"},
              "url": {"type": "string"}
            },
            "required": ["title", "price"]
          }
        }
      }
    }
  }
}'

当选择器明确时，优先使用

scrape

——结果确定且免费。当页面结构混乱或需要语义解析时，使用

json

（会产生Workers AI费用）。

4. Paginate — get all results

4. 分页处理——获取所有结果

Use

links

to find pagination URLs:

bash

cfbr.sh links '{"url":"https://target-site.com/listings"}'

Look for

?page=2

next

, or load-more patterns. Repeat extraction for each page.

Infinite-scroll pages are a limitation — the API is stateless (one request = one browser session), so there's no way to scroll, wait for new content to load, and then extract in a single call. For these pages, look for an underlying API or URL parameters (e.g.

?page=2

?offset=20

) that serve paginated data directly.

使用

links

端点查找分页URL：

bash

cfbr.sh links '{"url":"https://target-site.com/listings"}'

查找

?page=2

、

next

或加载更多等模式。对每个页面重复提取操作。

无限滚动页面是一个限制——该API是无状态的（一次请求对应一个浏览器会话），因此无法在单次调用中完成滚动、等待新内容加载然后提取的操作。对于这类页面，请查找可直接提供分页数据的底层API或URL参数（如

?page=2

、

?offset=20

）。

5. Handle obstacles

5. 处理常见问题

SPA / empty results — Add

"gotoOptions": {"waitUntil": "networkidle0"}

"waitForSelector": "<selector>"

Slow pages — Increase timeout:

"gotoOptions": {"timeout": 60000}

Heavy pages — Strip unnecessary resources:

json

{"rejectResourceTypes": ["image", "stylesheet", "font", "media"]}

Auth-gated pages — Pass session cookies:

json

{"cookies": [{"name": "session", "value": "abc123", "domain": "target-site.com", "path": "/"}]}

Bot detection — Cloudflare Browser Rendering is always identified as a bot. The

userAgent

field changes what the site sees but will not bypass bot protection. If a site blocks the request, there is no workaround via this API.

SPA / 结果为空 — 添加

"gotoOptions": {"waitUntil": "networkidle0"}

或

"waitForSelector": "<selector>"

参数。

页面加载缓慢 — 增加超时时间：

"gotoOptions": {"timeout": 60000}

。

页面内容庞大 — 剔除不必要的资源：

json

{"rejectResourceTypes": ["image", "stylesheet", "font", "media"]}

需要认证的页面 — 传递会话Cookie：

json

{"cookies": [{"name": "session", "value": "abc123", "domain": "target-site.com", "path": "/"}]}

机器人检测 — Cloudflare Browser Rendering始终会被识别为机器人。

userAgent

字段可修改网站看到的用户代理，但无法绕过机器人防护。如果网站阻止了请求，此API没有解决办法。

Tips

小贴士

```
markdown
```
is the best default for content extraction — it's clean, compact, and LLM-ready.
Always use
```
networkidle0
```
or
```
waitForSelector
```
on any modern site. Without it you'll get incomplete content.
```
rejectResourceTypes
```
dramatically speeds up text-only operations. Always strip images/fonts/stylesheets when you only need text.
```
scrape
```
results are ordered by DOM position — correlate across selectors by array index.
For large scraping jobs, process pages sequentially to stay within rate limits.

```
markdown
```
是内容提取的最佳默认选择——内容简洁紧凑，适合LLM处理。
在任何现代网站上，务必使用
```
networkidle0
```
或
```
waitForSelector
```
参数。否则可能获取到不完整的内容。
```
rejectResourceTypes
```
可显著加快纯文本操作的速度。当仅需要文本内容时，务必剔除图片、字体和样式表。
```
scrape
```
结果按DOM位置排序——可通过数组索引关联不同选择器的结果。
对于大规模抓取任务，请按顺序处理页面，以避免超出速率限制。