Loading...
Loading...
Compare original and translation side by side
CLOUDFLARE_ACCOUNT_IDCLOUDFLARE_API_TOKENCLOUDFLARE_ACCOUNT_IDCLOUDFLARE_API_TOKENCLOUDFLARE_ACCOUNT_IDCLOUDFLARE_API_TOKEN.env.env.env.local.env.local.env~/.env.envCLOUDFLARE_ACCOUNT_ID=CLOUDFLARE_API_TOKEN=undefinedCLOUDFLARE_ACCOUNT_IDCLOUDFLARE_API_TOKEN.env.env.env.local.env.local.env~/.env.envCLOUDFLARE_ACCOUNT_ID=CLOUDFLARE_API_TOKEN=undefined
If credentials are still missing after checking all sources, tell the user to add them to their project `.env` file:
The API token needs "Browser Rendering - Edit" permission. Create one at [Cloudflare Dashboard > API Tokens](https://dash.cloudflare.com/profile/api-tokens).
如果检查所有来源后仍缺少凭证,请告知用户将其添加到项目的`.env`文件中:
API令牌需要拥有「Browser Rendering - 编辑」权限。可在[Cloudflare 控制台 > API令牌](https://dash.cloudflare.com/profile/api-tokens)页面创建。curl -s -X POST "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"url": "<TARGET_URL>",
"limit": <NUMBER_OF_PAGES>,
"formats": ["markdown"],
"options": {
"excludePatterns": ["**/changelog/**", "**/api-reference/**"]
}
}'modifiedSincecurl -s -X POST "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"url": "<TARGET_URL>",
"limit": <NUMBER_OF_PAGES>,
"formats": ["markdown"],
"modifiedSince": <UNIX_TIMESTAMP>
}'--sincedate -d "2026-03-10" +%sdate -j -f "%Y-%m-%d" "2026-03-10" +%s{"success": true, "result": "job-uuid-here"}curl -s -X POST "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"url": "<TARGET_URL>",
"limit": <NUMBER_OF_PAGES>,
"formats": ["markdown"],
"options": {
"excludePatterns": ["**/changelog/**", "**/api-reference/**"]
}
}'modifiedSincecurl -s -X POST "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"url": "<TARGET_URL>",
"limit": <NUMBER_OF_PAGES>,
"formats": ["markdown"],
"modifiedSince": <UNIX_TIMESTAMP>
}'--sincedate -d "2026-03-10" +%sdate -j -f "%Y-%m-%d" "2026-03-10" +%s{"success": true, "result": "job-uuid-here"}curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?limit=1" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'Status: {d[\"result\"][\"status\"]} | Finished: {d[\"result\"][\"finished\"]}/{d[\"result\"][\"total\"]}')"runningcompletedcancelled_due_to_timeoutcancelled_due_to_limitserroredcurl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?limit=1" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'Status: {d[\"result\"][\"status\"]} | Finished: {d[\"result\"][\"finished\"]}/{d[\"result\"][\"total\"]}')"runningcompletedcancelled_due_to_timeoutcancelled_due_to_limitserroredmodifiedSinceundefinedmodifiedSinceundefined
Fetch all completed records using pagination (cursor-based):
```bash
curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=completed&limit=50" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"cursorcurl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=completed&limit=50&cursor=<CURSOR>" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"
使用分页(基于游标)获取所有已完成的记录:
```bash
curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=completed&limit=50" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"cursorcurl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=completed&limit=50&cursor=<CURSOR>" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"undefinedundefinedreq = urllib.request.Request(url, headers={
'Authorization': f'Bearer {api_token}'
})
with urllib.request.urlopen(req) as resp:
data = json.load(resp)
records = data.get('result', {}).get('records', [])
if not records:
break
for rec in records:
page_url = rec.get('url', '')
md = rec.get('markdown', '')
if not md:
continue
# Convert URL to filename
name = re.sub(r'https?://', '', page_url)
name = re.sub(r'[^a-zA-Z0-9]', '_', name).strip('_')[:120]
filepath = os.path.join(outdir, f'{name}.md')
with open(filepath, 'w') as f:
f.write(f'<!-- Source: {page_url} -->\n\n')
f.write(md)
total_saved += 1
cursor = data.get('result', {}).get('cursor')
if cursor is None:
breakundefinedreq = urllib.request.Request(url, headers={
'Authorization': f'Bearer {api_token}'
})
with urllib.request.urlopen(req) as resp:
data = json.load(resp)
records = data.get('result', {}).get('records', [])
if not records:
break
for rec in records:
page_url = rec.get('url', '')
md = rec.get('markdown', '')
if not md:
continue
# Convert URL to filename
name = re.sub(r'https?://', '', page_url)
name = re.sub(r'[^a-zA-Z0-9]', '_', name).strip('_')[:120]
filepath = os.path.join(outdir, f'{name}.md')
with open(filepath, 'w') as f:
f.write(f'<!-- Source: {page_url} -->\n\n')
f.write(md)
total_saved += 1
cursor = data.get('result', {}).get('cursor')
if cursor is None:
breakundefined| Parameter | Type | Default | Description |
|---|---|---|---|
| string | (required) | Starting URL to crawl |
| number | 10 | Max pages to crawl (up to 100,000) |
| number | 100,000 | Max link depth from starting URL |
| array | ["html"] | Output formats: |
| boolean | true | |
| string | "all" | Page discovery: |
| number | 86400 | Cache validity in seconds (max 604800) |
| number | - | Unix timestamp; only crawl pages modified after this time |
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 字符串 | 必填 | 爬取的起始URL |
| 数字 | 10 | 最大爬取页面数(最多100,000) |
| 数字 | 100,000 | 从起始URL开始的最大链接深度 |
| 数组 | ["html"] | 输出格式: |
| 布尔值 | true | |
| 字符串 | "all" | 页面发现方式: |
| 数字 | 86400 | 缓存有效期(秒),最大为604800 |
| 数字 | - | Unix时间戳;仅爬取该时间戳之后修改的页面 |
| Parameter | Type | Default | Description |
|---|---|---|---|
| array | [] | Wildcard patterns to include ( |
| array | [] | Wildcard patterns to exclude (higher priority) |
| boolean | false | Follow links to subdomains |
| boolean | false | Follow external links |
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 数组 | [] | 需包含的URL通配符模式(支持 |
| 数组 | [] | 需排除的URL通配符模式(优先级更高) |
| 布尔值 | false | 是否爬取子域名链接 |
| 布尔值 | false | 是否爬取外部链接 |
| Parameter | Type | Description |
|---|---|---|
| object | AI-powered structured extraction (prompt, response_format) |
| object | HTTP basic auth (username, password) |
| object | Custom headers for requests |
| array | Skip: image, media, font, stylesheet |
| string | Custom user agent string |
| array | Custom cookies for requests |
| 参数 | 类型 | 描述 |
|---|---|---|
| 对象 | 基于AI的结构化提取(包含prompt、response_format) |
| 对象 | HTTP基础认证(包含username、password) |
| 对象 | 请求自定义头部 |
| 数组 | 跳过的资源类型:image、media、font、stylesheet |
| 字符串 | 自定义用户代理字符串 |
| 数组 | 请求自定义Cookie |
/cf-crawl https://docs.example.com --limit 50/cf-crawl https://docs.example.com --limit 50/cf-crawl https://docs.example.com --limit 100 --include "/guides/**,/api/**" --exclude "/changelog/**"/cf-crawl https://docs.example.com --limit 100 --include "/guides/**,/api/**" --exclude "/changelog/**"/cf-crawl https://docs.example.com --limit 50 --since 2026-03-10status=skipped/cf-crawl https://docs.example.com --limit 50 --since 2026-03-10status=skipped/cf-crawl https://docs.example.com --no-render --limit 200/cf-crawl https://docs.example.com --no-render --limit 200/cf-crawl https://docs.example.com --limit 50 --merge/cf-crawl https://docs.example.com --limit 50 --merge/cf-crawl--limit N-l N--depth N-d N--include "pattern1,pattern2"--exclude "pattern1,pattern2"--no-render--merge--output DIR-o DIR.crawl-output--source sitemaps|links|all--since DATE2026-03-10modifiedSince/cf-crawl--limit N-l N--depth N-d N--include "pattern1,pattern2"--exclude "pattern1,pattern2"--no-render--merge--output DIR-o DIR.crawl-output--source sitemaps|links|all--since DATE2026-03-10modifiedSince"status": "disallowed"render: false*/**/"status": "disallowed"render: false*/**/