bright-data-best-practices
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBright Data APIs
Bright Data APIs
Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.
Bright Data 提供可大规模进行网页数据提取的基础设施。其包含四个核心API,分别适用于不同使用场景——请始终选择最贴合需求的工具。
Choosing the Right API
选择合适的API
| Use Case | API | Why |
|---|---|---|
| Scrape any webpage by URL (no interaction) | Web Unlocker | HTTP-based, auto-bypasses bot detection, cheapest |
| Google / Bing / Yandex search results | SERP API | Specialized for SERP extraction, returns structured data |
| Structured data from Amazon, LinkedIn, Instagram, TikTok, etc. | Web Scraper API | Pre-built scrapers, no parsing needed |
| Click, scroll, fill forms, run JS, intercept XHR | Browser API | Full browser automation |
| Puppeteer / Playwright / Selenium automation | Browser API | Connects via CDP/WebDriver |
| 使用场景 | API | 理由 |
|---|---|---|
| 通过URL抓取任意网页(无需交互) | Web Unlocker | 基于HTTP协议,自动绕过机器人检测,成本最低 |
| Google / Bing / Yandex搜索结果 | SERP API | 专为SERP提取设计,返回结构化数据 |
| 从Amazon、LinkedIn、Instagram、TikTok等平台提取结构化数据 | Web Scraper API | 内置预构建抓取器,无需解析逻辑 |
| 点击、滚动、填写表单、运行JS、拦截XHR | Browser API | 完整浏览器自动化 |
| Puppeteer / Playwright / Selenium自动化 | Browser API | 通过CDP/WebDriver连接 |
Authentication Pattern (All APIs)
所有API的认证模式
All APIs share the same authentication model:
bash
export BRIGHTDATA_API_KEY="your-api-key" # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name" # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name" # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD" # Browser API credentialsREST API authentication header for Web Unlocker and SERP API:
Authorization: Bearer YOUR_API_KEY所有API采用相同的认证模型:
bash
export BRIGHTDATA_API_KEY="your-api-key" # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name" # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name" # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD" # Browser API credentialsWeb Unlocker和SERP API的REST API认证请求头:
Authorization: Bearer YOUR_API_KEYWeb Unlocker API
Web Unlocker API
HTTP-based scraping proxy. Best for simple page fetches without browser interaction.
Endpoint:
POST https://api.brightdata.com/requestpython
import requests
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_ZONE_NAME",
"url": "https://example.com/product/123",
"format": "raw"
}
)
html = response.text基于HTTP协议的抓取代理。最适合无需浏览器交互的简单页面获取场景。
Endpoint:
POST https://api.brightdata.com/requestpython
import requests
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_ZONE_NAME",
"url": "https://example.com/product/123",
"format": "raw"
}
)
html = response.textKey Parameters
关键参数
| Parameter | Type | Description |
|---|---|---|
| string | Zone name (required) |
| string | Target URL with |
| string | |
| string | HTTP verb, default |
| string | 2-letter ISO for geo-targeting (e.g., |
| string | Transform: |
| boolean | |
| 参数 | 类型 | 描述 |
|---|---|---|
| string | 区域名称(必填) |
| string | 目标URL,需包含 |
| string | |
| string | HTTP请求方法,默认值为 |
| string | 用于地理定位的两位ISO国家代码(例如: |
| string | 转换格式: |
| boolean | 设置为 |
Quick Patterns
快速使用示例
python
undefinedpython
undefinedGet markdown (best for LLM input)
获取markdown格式(最适合LLM输入)
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}
)
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}
)
Geo-targeted request
地理定位请求
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}
Screenshot for debugging
用于调试的截图
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}
Async for bulk processing
批量处理的异步模式
json={"zone": ZONE, "url": url, "format": "raw", "async": True}
**Critical rule:** Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.
See **[references/web-unlocker.md](references/web-unlocker.md)** for complete reference including proxy interface, special headers, async flow, features, and billing.
---json={"zone": ZONE, "url": url, "format": "raw", "async": True}
**重要规则:** 切勿将Web Unlocker与Puppeteer、Playwright、Selenium或反检测浏览器配合使用,请改用Browser API。
完整参考文档请查看 **[references/web-unlocker.md](references/web-unlocker.md)**,包括代理接口、特殊请求头、异步流程、功能特性和计费说明。
---SERP API
SERP API
Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.
Endpoint: (same as Web Unlocker)
POST https://api.brightdata.com/requestpython
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_SERP_ZONE",
"url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
"format": "raw"
}
)
data = response.json()
for result in data.get("organic", []):
print(result["rank"], result["title"], result["link"])针对Google、Bing、Yandex、DuckDuckGo的结构化搜索引擎结果提取工具。
Endpoint: (与Web Unlocker相同)
POST https://api.brightdata.com/requestpython
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_SERP_ZONE",
"url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
"format": "raw"
}
)
data = response.json()
for result in data.get("organic", []):
print(result["rank"], result["title"], result["link"])Essential Google URL Parameters
Google URL核心参数
| Parameter | Description | Example |
|---|---|---|
| Search query | |
| Parsed JSON output | |
| Country for search | |
| Language | |
| Pagination offset | |
| Search type | |
| Device | |
| Browser | |
| Trigger AI Overview | |
| Encoded geo location | for precise location targeting |
Note: parameter is deprecated as of September 2025. Use for pagination.
numstart| 参数 | 描述 | 示例 |
|---|---|---|
| 搜索关键词 | |
| 解析后的JSON输出 | |
| 搜索所在国家 | |
| 语言 | |
| 分页偏移量 | |
| 搜索类型 | |
| 设备类型 | |
| 浏览器 | |
| 触发AI概览 | |
| 编码后的地理位置 | 用于精准定位 |
注意: 参数自2025年9月起已被弃用,请使用参数进行分页。
numstartParsed JSON Response Structure
解析后的JSON响应结构
json
{
"organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
"paid": [],
"people_also_ask": [],
"knowledge_graph": {},
"related_searches": [],
"general": {"results_cnt": 1240000000, "query": "..."}
}json
{
"organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
"paid": [],
"people_also_ask": [],
"knowledge_graph": {},
"related_searches": [],
"general": {"results_cnt": 1240000000, "query": "..."}
}Bing Key Parameters
Bing核心参数
| Parameter | Description |
|---|---|
| Search query |
| Language (prefer 4-letter: |
| Country code |
| Pagination (increment by 10: 1, 11, 21...) |
| |
| Device type |
| 参数 | 描述 |
|---|---|
| 搜索关键词 |
| 语言(推荐使用四位代码: |
| 国家代码 |
| 分页参数(每次递增10:1、11、21...) |
| |
| 设备类型 |
Async for Bulk SERP
批量SERP处理的异步模式
python
undefinedpython
undefinedSubmit
提交请求
response = requests.post(
"https://api.brightdata.com/request",
params={"async": "1"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}
)
response_id = response.headers.get("x-response-id")
response = requests.post(
"https://api.brightdata.com/request",
params={"async": "1"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}
)
response_id = response.headers.get("x-response-id")
Retrieve (retrieve calls are NOT billed)
获取结果(获取请求不计费)
result = requests.get(
"https://api.brightdata.com/serp/get_result",
params={"response_id": response_id},
headers={"Authorization": f"Bearer {API_KEY}"}
)
**Billing:** Pay per 1,000 successful requests only. Async retrieve calls are not billed.
See **[references/serp-api.md](references/serp-api.md)** for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.
---result = requests.get(
"https://api.brightdata.com/serp/get_result",
params={"response_id": response_id},
headers={"Authorization": f"Bearer {API_KEY}"}
)
**计费规则:** 仅对每1000次成功请求计费。异步获取请求不计费。
完整参考文档请查看 **[references/serp-api.md](references/serp-api.md)**,包括地图、趋势、评论、镜头、酒店、航班等参数说明。
---Web Scraper API
Web Scraper API
Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.
Sync Endpoint:
Async Endpoint:
POST https://api.brightdata.com/datasets/v3/scrapePOST https://api.brightdata.com/datasets/v3/triggerpython
undefined针对100+平台的预构建抓取器,用于结构化数据提取,无需编写解析逻辑。
同步Endpoint:
异步Endpoint:
POST https://api.brightdata.com/datasets/v3/scrapePOST https://api.brightdata.com/datasets/v3/triggerpython
undefinedSync (up to 20 URLs, returns immediately)
同步模式(最多支持20个URL,立即返回结果)
response = requests.post(
"https://api.brightdata.com/datasets/v3/scrape",
params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}
)
if response.status_code == 200:
data = response.json() # Results ready
elif response.status_code == 202:
snapshot_id = response.json()["snapshot_id"] # Poll for completion
undefinedresponse = requests.post(
"https://api.brightdata.com/datasets/v3/scrape",
params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}
)
if response.status_code == 200:
data = response.json() # 结果已就绪
elif response.status_code == 202:
snapshot_id = response.json()["snapshot_id"] # 轮询等待完成
undefinedParameters
参数说明
| Parameter | Type | Description |
|---|---|---|
| string | Scraper identifier from the Scraper Library (required) |
| string | |
| string | Pipe-separated fields: |
| boolean | Include error info in results |
| 参数 | 类型 | 描述 |
|---|---|---|
| string | 抓取器库中的抓取器标识符(必填) |
| string | |
| string | 竖线分隔的字段列表: |
| boolean | 在结果中包含错误信息 |
Request Body
请求体示例
json
{
"input": [
{ "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
{ "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
]
}json
{
"input": [
{ "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
{ "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
]
}Poll for Async Results
异步结果轮询
python
import timepython
import timeTrigger
触发异步任务
snapshot_id = requests.post(
"https://api.brightdata.com/datasets/v3/trigger",
params={"dataset_id": DATASET_ID, "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": u} for u in urls]}
).json()["snapshot_id"]
snapshot_id = requests.post(
"https://api.brightdata.com/datasets/v3/trigger",
params={"dataset_id": DATASET_ID, "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": u} for u in urls]}
).json()["snapshot_id"]
Poll
轮询状态
while True:
status = requests.get(
f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",
headers={"Authorization": f"Bearer {API_KEY}"}
).json()["status"]
if status == "ready": break
if status == "failed": raise Exception("Job failed")
time.sleep(10)while True:
status = requests.get(
f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",
headers={"Authorization": f"Bearer {API_KEY}"}
).json()["status"]
if status == "ready": break
if status == "failed": raise Exception("Job failed")
time.sleep(10)Download
下载结果
data = requests.get(
f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",
params={"format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"}
).json()
**Progress status values:** `starting` → `running` → `ready` | `failed`
**Data retention:** 30 days.
**Billing:** Per delivered record. Invalid input URLs that fail are still billable.
See **[references/web-scraper-api.md](references/web-scraper-api.md)** for complete reference including scraper types, output formats, delivery options, and billing details.
---data = requests.get(
f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",
params={"format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"}
).json()
**进度状态值:** `starting` → `running` → `ready` | `failed`
**数据保留期限:** 30天。
**计费规则:** 按交付的记录数计费。输入无效URL导致的失败请求仍会计费。
完整参考文档请查看 **[references/web-scraper-api.md](references/web-scraper-api.md)**,包括抓取器类型、输出格式、交付选项和计费详情。
---Browser API (Scraping Browser)
Browser API(Scraping Browser)
Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.
Connection:
- Playwright/Puppeteer:
wss://${AUTH}@brd.superproxy.io:9222 - Selenium:
https://${AUTH}@brd.superproxy.io:9515
javascript
const { chromium } = require("playwright-core");
const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes
await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();python
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
page = await browser.new_page()
page.set_default_navigation_timeout(120000)
await page.goto("https://example.com", wait_until="domcontentloaded")
html = await page.content()
await browser.close()通过CDP/WebDriver实现完整浏览器自动化,自动处理验证码、指纹识别和反机器人检测。
连接方式:
- Playwright/Puppeteer:
wss://${AUTH}@brd.superproxy.io:9222 - Selenium:
https://${AUTH}@brd.superproxy.io:9515
javascript
const { chromium } = require("playwright-core");
const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // 建议始终设置为2分钟
await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();python
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
page = await browser.new_page()
page.set_default_navigation_timeout(120000)
await page.goto("https://example.com", wait_until="domcontentloaded")
html = await page.content()
await browser.close()Custom CDP Functions
自定义CDP函数
| Function | Purpose |
|---|---|
| Manually trigger CAPTCHA solving |
| Enable/disable auto CAPTCHA solving |
| Set precise geo location (call BEFORE goto) |
| Maintain same IP across sessions |
| Apply device profile (iPhone 14, etc.) |
| List available device profiles |
| Block ads to save bandwidth |
| Re-enable ads |
| Fast text input for bulk form filling |
| Install client SSL cert for session |
| Get DevTools debug URL for live session |
javascript
// CDP session pattern for custom functions
const client = await page.target().createCDPSession();
// CAPTCHA solve with timeout
const result = await client.send("Captcha.solve", { timeout: 30000 });
// Precise geo location (must be before goto)
await client.send("Proxy.setLocation", {
latitude: 37.7749,
longitude: -122.4194,
distance: 10,
strict: true
});
// Block unnecessary resources
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });
// Device emulation
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });| 函数 | 用途 |
|---|---|
| 手动触发验证码求解 |
| 启用/禁用自动验证码求解 |
| 设置精准地理位置(需在goto之前调用) |
| 在会话间保持相同IP |
| 应用设备配置文件(如iPhone 14) |
| 列出可用的设备配置文件 |
| 拦截广告以节省带宽 |
| 重新启用广告 |
| 快速文本输入,适用于批量表单填写 |
| 为会话安装客户端SSL证书 |
| 获取用于实时会话调试的DevTools URL |
javascript
// 自定义函数的CDP会话模式
const client = await page.target().createCDPSession();
// 带超时的验证码求解
const result = await client.send("Captcha.solve", { timeout: 30000 });
// 精准地理位置(必须在goto之前调用)
await client.send("Proxy.setLocation", {
latitude: 37.7749,
longitude: -122.4194,
distance: 10,
strict: true
});
// 拦截不必要的资源
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });
// 设备模拟
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });Session Rules
会话规则
- One initial navigation per session — new URL = new session
- Idle timeout: 5 minutes
- Max duration: 30 minutes
- 每个会话仅支持一次初始导航 — 访问新URL需创建新会话
- 空闲超时: 5分钟
- 最长会话时长: 30分钟
Geolocation
地理定位
- Country-level: append to credentials username
-country-us - EU-wide: append (routes through 29+ European countries)
-country-eu - Precise: use CDP command (before navigation)
Proxy.setLocation
- 国家级别:在凭证用户名后追加
-country-us - 欧盟范围:在凭证用户名后追加(路由至29+欧洲国家)
-country-eu - 精准定位:使用CDP命令(需在导航前调用)
Proxy.setLocation
Error Codes
错误码
| Code | Issue | Fix |
|---|---|---|
| Wrong port | Playwright/Puppeteer → |
| Bad auth | Check credentials format and zone type |
| Service scaling | Wait 1 minute, reconnect |
Billing: Traffic-based only. Block images/CSS/fonts to reduce costs.
See references/browser-api.md for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.
| 错误码 | 问题 | 修复方案 |
|---|---|---|
| 端口错误 | Playwright/Puppeteer → |
| 认证失败 | 检查凭证格式和区域类型 |
| 服务扩容中 | 等待1分钟后重新连接 |
计费规则: 仅按流量计费。拦截图片/CSS/字体可降低成本。
完整参考文档请查看 references/browser-api.md,包括所有CDP函数、带宽优化、验证码处理模式和调试方法。
Detailed References
详细参考文档
- references/web-unlocker.md — Web Unlocker: full parameter list, proxy interface, special headers, async flow, features, billing, anti-patterns
- references/serp-api.md — SERP API: all Google params (Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, parsed JSON structure, async, billing
- references/web-scraper-api.md — Web Scraper API: sync vs async, all parameters, polling, scraper types, output formats, billing
- references/browser-api.md — Browser API: connection strings, session rules, all CDP functions, geo-targeting, bandwidth optimization, CAPTCHA, debugging, error codes
- references/web-unlocker.md — Web Unlocker:完整参数列表、代理接口、特殊请求头、异步流程、功能特性、计费规则、反模式说明
- references/serp-api.md — SERP API:所有Google参数(地图、趋势、评论、镜头、酒店、航班)、Bing参数、解析后的JSON结构、异步模式、计费规则
- references/web-scraper-api.md — Web Scraper API:同步vs异步、所有参数、轮询机制、抓取器类型、输出格式、计费规则
- references/browser-api.md — Browser API:连接字符串、会话规则、所有CDP函数、地理定位、带宽优化、验证码处理、调试方法、错误码说明