cf-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCloudflare Browser Rendering
Cloudflare Browser Rendering
Browse and scrape the web via Cloudflare's Browser Rendering REST API. Every call is a single POST request — no browser setup, no Puppeteer scripts.
通过Cloudflare的Browser Rendering REST API浏览和抓取网页内容。每次调用仅需一个POST请求——无需浏览器设置,也无需编写Puppeteer脚本。
Prerequisites
前提条件
Requires two env vars (confirm they're set before making calls):
- — Cloudflare account ID
CF_ACCOUNT_ID - — API token with Browser Rendering - Edit permission
CF_API_TOKEN
需要设置两个环境变量(调用前请确认已配置):
- — Cloudflare账户ID
CF_ACCOUNT_ID - — 拥有Browser Rendering - 编辑权限的API令牌
CF_API_TOKEN
Helper script
辅助脚本
Use cfbr.sh for all API calls. It handles auth headers and the base URL:
bash
undefined所有API调用均可使用cfbr.sh脚本,它会自动处理认证头和基础URL:
bash
undefinedJSON endpoints
JSON端点
cfbr.sh <endpoint> '<json_body>'
cfbr.sh <endpoint> '<json_body>'
Screenshot (binary) — optional third arg for output filename
屏幕截图(二进制)——可选第三个参数作为输出文件名
cfbr.sh screenshot '<json_body>' output.png
undefinedcfbr.sh screenshot '<json_body>' output.png
undefinedChoosing an endpoint
选择端点
| Goal | Endpoint | When to use |
|---|---|---|
| Read page content for analysis | | Default choice — clean, token-efficient |
| Extract specific elements | | Know the CSS selectors for what you need |
| Extract structured data with AI | | Need typed objects, don't know exact selectors |
| Get full rendered DOM | | Need raw HTML for parsing or debugging |
| Discover pages / crawl | | Building a sitemap or finding subpages |
| Visual inspection | | Need to see the page layout or debug visually |
| DOM + visual in one shot | | Need both HTML and a screenshot |
For full endpoint details and parameters, see api.md.
| 目标 | 端点 | 使用场景 |
|---|---|---|
| 读取页面内容用于分析 | | 默认选择——内容简洁、令牌使用效率高 |
| 提取特定元素 | | 清楚所需内容对应的CSS选择器时 |
| 用AI提取结构化数据 | | 需要类型化对象,但不清楚具体选择器时 |
| 获取完整渲染后的DOM | | 需要原始HTML用于解析或调试时 |
| 发现页面/爬取 | | 构建站点地图或查找子页面时 |
| 视觉检查 | | 需要查看页面布局或进行视觉调试时 |
| 同时获取DOM和视觉内容 | | 同时需要HTML和屏幕截图时 |
有关端点的完整详情和参数,请参阅api.md。
Scraping workflow
抓取工作流
Follow this sequence when scraping a site for structured data (e.g. rental listings, product catalogs, job boards):
当需要从网站抓取结构化数据时(如租赁列表、产品目录、招聘信息),请遵循以下步骤:
1. Reconnaissance — understand the page
1. 侦察阶段——了解页面结构
Start with to see what content is on the page and how it's structured:
markdownbash
cfbr.sh markdown '{"url":"https://target-site.com/listings", "gotoOptions":{"waitUntil":"networkidle0"}}'If the page is an SPA or loads content dynamically, ensures JS finishes executing. If you know a specific element that signals content is ready, use instead — it's faster:
networkidle0waitForSelectorjson
{"url":"...", "waitForSelector": ".listing-card"}首先使用端点查看页面包含的内容及其结构:
markdownbash
cfbr.sh markdown '{"url":"https://target-site.com/listings", "gotoOptions":{"waitUntil":"networkidle0"}}'如果页面是SPA或动态加载内容,参数可确保JS执行完成。如果您知道某个特定元素标志着内容已加载完成,可改用——速度更快:
networkidle0waitForSelectorjson
{"url":"...", "waitForSelector": ".listing-card"}2. Discover structure — find the selectors
2. 发现结构——查找选择器
From the markdown/HTML, identify repeating patterns (listing cards, table rows, etc.) and their CSS selectors. If unclear from markdown alone, use to visually inspect:
screenshotbash
cfbr.sh screenshot '{"url":"https://target-site.com/listings", "screenshotOptions":{"fullPage":true}, "gotoOptions":{"waitUntil":"networkidle0"}}' listings.png从markdown/HTML内容中识别重复的模式(如列表卡片、表格行等)及其对应的CSS选择器。如果仅通过markdown无法明确,可使用进行视觉检查:
screenshotbash
cfbr.sh screenshot '{"url":"https://target-site.com/listings", "screenshotOptions":{"fullPage":true}, "gotoOptions":{"waitUntil":"networkidle0"}}' listings.png3. Extract — pull structured data
3. 提取阶段——获取结构化数据
Option A: CSS selectors (when you know the DOM structure)
bash
cfbr.sh scrape '{
"url": "https://target-site.com/listings",
"gotoOptions": {"waitUntil": "networkidle0"},
"elements": [
{"selector": ".listing-card .title"},
{"selector": ".listing-card .price"},
{"selector": ".listing-card .address"},
{"selector": ".listing-card a"}
]
}'The endpoint returns , , (including ), and position/dimensions for each match. Correlate results across selectors by index (first title matches first price, etc.).
scrapetexthtmlattributeshrefOption B: AI extraction (when structure is complex or unknown)
bash
cfbr.sh json '{
"url": "https://target-site.com/listings",
"gotoOptions": {"waitUntil": "networkidle0"},
"prompt": "Extract all rental listings with title, price, address, bedrooms, and link",
"response_format": {
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"listings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "string"},
"address": {"type": "string"},
"bedrooms": {"type": "string"},
"url": {"type": "string"}
},
"required": ["title", "price"]
}
}
}
}
}
}'Prefer when selectors are clear — it's deterministic and free. Use when the page structure is messy or you need semantic interpretation (incurs Workers AI charges).
scrapejson选项A:CSS选择器(清楚DOM结构时)
bash
cfbr.sh scrape '{
"url": "https://target-site.com/listings",
"gotoOptions": {"waitUntil": "networkidle0"},
"elements": [
{"selector": ".listing-card .title"},
{"selector": ".listing-card .price"},
{"selector": ".listing-card .address"},
{"selector": ".listing-card a"}
]
}'scrapetexthtmlattributeshref选项B:AI提取(结构复杂或未知时)
bash
cfbr.sh json '{
"url": "https://target-site.com/listings",
"gotoOptions": {"waitUntil": "networkidle0"},
"prompt": "Extract all rental listings with title, price, address, bedrooms, and link",
"response_format": {
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"listings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "string"},
"address": {"type": "string"},
"bedrooms": {"type": "string"},
"url": {"type": "string"}
},
"required": ["title", "price"]
}
}
}
}
}
}'当选择器明确时,优先使用——结果确定且免费。当页面结构混乱或需要语义解析时,使用(会产生Workers AI费用)。
scrapejson4. Paginate — get all results
4. 分页处理——获取所有结果
Use to find pagination URLs:
linksbash
cfbr.sh links '{"url":"https://target-site.com/listings"}'Look for , , or load-more patterns. Repeat extraction for each page.
?page=2nextInfinite-scroll pages are a limitation — the API is stateless (one request = one browser session), so there's no way to scroll, wait for new content to load, and then extract in a single call. For these pages, look for an underlying API or URL parameters (e.g. , ) that serve paginated data directly.
?page=2?offset=20使用端点查找分页URL:
linksbash
cfbr.sh links '{"url":"https://target-site.com/listings"}'查找、或加载更多等模式。对每个页面重复提取操作。
?page=2next无限滚动页面是一个限制——该API是无状态的(一次请求对应一个浏览器会话),因此无法在单次调用中完成滚动、等待新内容加载然后提取的操作。对于这类页面,请查找可直接提供分页数据的底层API或URL参数(如、)。
?page=2?offset=205. Handle obstacles
5. 处理常见问题
SPA / empty results — Add or .
"gotoOptions": {"waitUntil": "networkidle0"}"waitForSelector": "<selector>"Slow pages — Increase timeout: .
"gotoOptions": {"timeout": 60000}Heavy pages — Strip unnecessary resources:
json
{"rejectResourceTypes": ["image", "stylesheet", "font", "media"]}Auth-gated pages — Pass session cookies:
json
{"cookies": [{"name": "session", "value": "abc123", "domain": "target-site.com", "path": "/"}]}Bot detection — Cloudflare Browser Rendering is always identified as a bot. The field changes what the site sees but will not bypass bot protection. If a site blocks the request, there is no workaround via this API.
userAgentSPA / 结果为空 — 添加或参数。
"gotoOptions": {"waitUntil": "networkidle0"}"waitForSelector": "<selector>"页面加载缓慢 — 增加超时时间:。
"gotoOptions": {"timeout": 60000}页面内容庞大 — 剔除不必要的资源:
json
{"rejectResourceTypes": ["image", "stylesheet", "font", "media"]}需要认证的页面 — 传递会话Cookie:
json
{"cookies": [{"name": "session", "value": "abc123", "domain": "target-site.com", "path": "/"}]}机器人检测 — Cloudflare Browser Rendering始终会被识别为机器人。字段可修改网站看到的用户代理,但无法绕过机器人防护。如果网站阻止了请求,此API没有解决办法。
userAgentTips
小贴士
- is the best default for content extraction — it's clean, compact, and LLM-ready.
markdown - Always use or
networkidle0on any modern site. Without it you'll get incomplete content.waitForSelector - dramatically speeds up text-only operations. Always strip images/fonts/stylesheets when you only need text.
rejectResourceTypes - results are ordered by DOM position — correlate across selectors by array index.
scrape - For large scraping jobs, process pages sequentially to stay within rate limits.
- 是内容提取的最佳默认选择——内容简洁紧凑,适合LLM处理。
markdown - 在任何现代网站上,务必使用或
networkidle0参数。否则可能获取到不完整的内容。waitForSelector - 可显著加快纯文本操作的速度。当仅需要文本内容时,务必剔除图片、字体和样式表。
rejectResourceTypes - 结果按DOM位置排序——可通过数组索引关联不同选择器的结果。
scrape - 对于大规模抓取任务,请按顺序处理页面,以避免超出速率限制。