cf-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cloudflare Browser Rendering

Cloudflare Browser Rendering

Browse and scrape the web via Cloudflare's Browser Rendering REST API. Every call is a single POST request — no browser setup, no Puppeteer scripts.
通过Cloudflare的Browser Rendering REST API浏览和抓取网页内容。每次调用仅需一个POST请求——无需浏览器设置,也无需编写Puppeteer脚本。

Prerequisites

前提条件

Requires two env vars (confirm they're set before making calls):
  • CF_ACCOUNT_ID
    — Cloudflare account ID
  • CF_API_TOKEN
    — API token with Browser Rendering - Edit permission
需要设置两个环境变量(调用前请确认已配置):
  • CF_ACCOUNT_ID
    — Cloudflare账户ID
  • CF_API_TOKEN
    — 拥有Browser Rendering - 编辑权限的API令牌

Helper script

辅助脚本

Use cfbr.sh for all API calls. It handles auth headers and the base URL:
bash
undefined
所有API调用均可使用cfbr.sh脚本,它会自动处理认证头和基础URL:
bash
undefined

JSON endpoints

JSON端点

cfbr.sh <endpoint> '<json_body>'
cfbr.sh <endpoint> '<json_body>'

Screenshot (binary) — optional third arg for output filename

屏幕截图(二进制)——可选第三个参数作为输出文件名

cfbr.sh screenshot '<json_body>' output.png
undefined
cfbr.sh screenshot '<json_body>' output.png
undefined

Choosing an endpoint

选择端点

GoalEndpointWhen to use
Read page content for analysis
markdown
Default choice — clean, token-efficient
Extract specific elements
scrape
Know the CSS selectors for what you need
Extract structured data with AI
json
Need typed objects, don't know exact selectors
Get full rendered DOM
content
Need raw HTML for parsing or debugging
Discover pages / crawl
links
Building a sitemap or finding subpages
Visual inspection
screenshot
Need to see the page layout or debug visually
DOM + visual in one shot
snapshot
Need both HTML and a screenshot
For full endpoint details and parameters, see api.md.
目标端点使用场景
读取页面内容用于分析
markdown
默认选择——内容简洁、令牌使用效率高
提取特定元素
scrape
清楚所需内容对应的CSS选择器时
用AI提取结构化数据
json
需要类型化对象,但不清楚具体选择器时
获取完整渲染后的DOM
content
需要原始HTML用于解析或调试时
发现页面/爬取
links
构建站点地图或查找子页面时
视觉检查
screenshot
需要查看页面布局或进行视觉调试时
同时获取DOM和视觉内容
snapshot
同时需要HTML和屏幕截图时
有关端点的完整详情和参数,请参阅api.md

Scraping workflow

抓取工作流

Follow this sequence when scraping a site for structured data (e.g. rental listings, product catalogs, job boards):
当需要从网站抓取结构化数据时(如租赁列表、产品目录、招聘信息),请遵循以下步骤:

1. Reconnaissance — understand the page

1. 侦察阶段——了解页面结构

Start with
markdown
to see what content is on the page and how it's structured:
bash
cfbr.sh markdown '{"url":"https://target-site.com/listings", "gotoOptions":{"waitUntil":"networkidle0"}}'
If the page is an SPA or loads content dynamically,
networkidle0
ensures JS finishes executing. If you know a specific element that signals content is ready, use
waitForSelector
instead — it's faster:
json
{"url":"...", "waitForSelector": ".listing-card"}
首先使用
markdown
端点查看页面包含的内容及其结构:
bash
cfbr.sh markdown '{"url":"https://target-site.com/listings", "gotoOptions":{"waitUntil":"networkidle0"}}'
如果页面是SPA或动态加载内容,
networkidle0
参数可确保JS执行完成。如果您知道某个特定元素标志着内容已加载完成,可改用
waitForSelector
——速度更快:
json
{"url":"...", "waitForSelector": ".listing-card"}

2. Discover structure — find the selectors

2. 发现结构——查找选择器

From the markdown/HTML, identify repeating patterns (listing cards, table rows, etc.) and their CSS selectors. If unclear from markdown alone, use
screenshot
to visually inspect:
bash
cfbr.sh screenshot '{"url":"https://target-site.com/listings", "screenshotOptions":{"fullPage":true}, "gotoOptions":{"waitUntil":"networkidle0"}}' listings.png
从markdown/HTML内容中识别重复的模式(如列表卡片、表格行等)及其对应的CSS选择器。如果仅通过markdown无法明确,可使用
screenshot
进行视觉检查:
bash
cfbr.sh screenshot '{"url":"https://target-site.com/listings", "screenshotOptions":{"fullPage":true}, "gotoOptions":{"waitUntil":"networkidle0"}}' listings.png

3. Extract — pull structured data

3. 提取阶段——获取结构化数据

Option A: CSS selectors (when you know the DOM structure)
bash
cfbr.sh scrape '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "elements": [
    {"selector": ".listing-card .title"},
    {"selector": ".listing-card .price"},
    {"selector": ".listing-card .address"},
    {"selector": ".listing-card a"}
  ]
}'
The
scrape
endpoint returns
text
,
html
,
attributes
(including
href
), and position/dimensions for each match. Correlate results across selectors by index (first title matches first price, etc.).
Option B: AI extraction (when structure is complex or unknown)
bash
cfbr.sh json '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "prompt": "Extract all rental listings with title, price, address, bedrooms, and link",
  "response_format": {
    "type": "json_schema",
    "schema": {
      "type": "object",
      "properties": {
        "listings": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "price": {"type": "string"},
              "address": {"type": "string"},
              "bedrooms": {"type": "string"},
              "url": {"type": "string"}
            },
            "required": ["title", "price"]
          }
        }
      }
    }
  }
}'
Prefer
scrape
when selectors are clear — it's deterministic and free. Use
json
when the page structure is messy or you need semantic interpretation (incurs Workers AI charges).
选项A:CSS选择器(清楚DOM结构时)
bash
cfbr.sh scrape '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "elements": [
    {"selector": ".listing-card .title"},
    {"selector": ".listing-card .price"},
    {"selector": ".listing-card .address"},
    {"selector": ".listing-card a"}
  ]
}'
scrape
端点会返回每个匹配元素的
text
html
attributes
(包括
href
)以及位置/尺寸信息。可通过数组索引关联不同选择器的结果(如第一个标题对应第一个价格等)。
选项B:AI提取(结构复杂或未知时)
bash
cfbr.sh json '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "prompt": "Extract all rental listings with title, price, address, bedrooms, and link",
  "response_format": {
    "type": "json_schema",
    "schema": {
      "type": "object",
      "properties": {
        "listings": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "price": {"type": "string"},
              "address": {"type": "string"},
              "bedrooms": {"type": "string"},
              "url": {"type": "string"}
            },
            "required": ["title", "price"]
          }
        }
      }
    }
  }
}'
当选择器明确时,优先使用
scrape
——结果确定且免费。当页面结构混乱或需要语义解析时,使用
json
(会产生Workers AI费用)。

4. Paginate — get all results

4. 分页处理——获取所有结果

Use
links
to find pagination URLs:
bash
cfbr.sh links '{"url":"https://target-site.com/listings"}'
Look for
?page=2
,
next
, or load-more patterns. Repeat extraction for each page.
Infinite-scroll pages are a limitation — the API is stateless (one request = one browser session), so there's no way to scroll, wait for new content to load, and then extract in a single call. For these pages, look for an underlying API or URL parameters (e.g.
?page=2
,
?offset=20
) that serve paginated data directly.
使用
links
端点查找分页URL:
bash
cfbr.sh links '{"url":"https://target-site.com/listings"}'
查找
?page=2
next
或加载更多等模式。对每个页面重复提取操作。
无限滚动页面是一个限制——该API是无状态的(一次请求对应一个浏览器会话),因此无法在单次调用中完成滚动、等待新内容加载然后提取的操作。对于这类页面,请查找可直接提供分页数据的底层API或URL参数(如
?page=2
?offset=20
)。

5. Handle obstacles

5. 处理常见问题

SPA / empty results — Add
"gotoOptions": {"waitUntil": "networkidle0"}
or
"waitForSelector": "<selector>"
.
Slow pages — Increase timeout:
"gotoOptions": {"timeout": 60000}
.
Heavy pages — Strip unnecessary resources:
json
{"rejectResourceTypes": ["image", "stylesheet", "font", "media"]}
Auth-gated pages — Pass session cookies:
json
{"cookies": [{"name": "session", "value": "abc123", "domain": "target-site.com", "path": "/"}]}
Bot detection — Cloudflare Browser Rendering is always identified as a bot. The
userAgent
field changes what the site sees but will not bypass bot protection. If a site blocks the request, there is no workaround via this API.
SPA / 结果为空 — 添加
"gotoOptions": {"waitUntil": "networkidle0"}
"waitForSelector": "<selector>"
参数。
页面加载缓慢 — 增加超时时间:
"gotoOptions": {"timeout": 60000}
页面内容庞大 — 剔除不必要的资源:
json
{"rejectResourceTypes": ["image", "stylesheet", "font", "media"]}
需要认证的页面 — 传递会话Cookie:
json
{"cookies": [{"name": "session", "value": "abc123", "domain": "target-site.com", "path": "/"}]}
机器人检测 — Cloudflare Browser Rendering始终会被识别为机器人。
userAgent
字段可修改网站看到的用户代理,但无法绕过机器人防护。如果网站阻止了请求,此API没有解决办法。

Tips

小贴士

  • markdown
    is the best default for content extraction — it's clean, compact, and LLM-ready.
  • Always use
    networkidle0
    or
    waitForSelector
    on any modern site. Without it you'll get incomplete content.
  • rejectResourceTypes
    dramatically speeds up text-only operations. Always strip images/fonts/stylesheets when you only need text.
  • scrape
    results are ordered by DOM position — correlate across selectors by array index.
  • For large scraping jobs, process pages sequentially to stay within rate limits.
  • markdown
    是内容提取的最佳默认选择——内容简洁紧凑,适合LLM处理。
  • 在任何现代网站上,务必使用
    networkidle0
    waitForSelector
    参数。否则可能获取到不完整的内容。
  • rejectResourceTypes
    可显著加快纯文本操作的速度。当仅需要文本内容时,务必剔除图片、字体和样式表。
  • scrape
    结果按DOM位置排序——可通过数组索引关联不同选择器的结果。
  • 对于大规模抓取任务,请按顺序处理页面,以避免超出速率限制。