bright-data-best-practices

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bright Data APIs

Bright Data APIs

Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.
Bright Data 提供可大规模进行网页数据提取的基础设施。其包含四个核心API,分别适用于不同使用场景——请始终选择最贴合需求的工具。

Choosing the Right API

选择合适的API

Use CaseAPIWhy
Scrape any webpage by URL (no interaction)Web UnlockerHTTP-based, auto-bypasses bot detection, cheapest
Google / Bing / Yandex search resultsSERP APISpecialized for SERP extraction, returns structured data
Structured data from Amazon, LinkedIn, Instagram, TikTok, etc.Web Scraper APIPre-built scrapers, no parsing needed
Click, scroll, fill forms, run JS, intercept XHRBrowser APIFull browser automation
Puppeteer / Playwright / Selenium automationBrowser APIConnects via CDP/WebDriver
使用场景API理由
通过URL抓取任意网页(无需交互)Web Unlocker基于HTTP协议,自动绕过机器人检测,成本最低
Google / Bing / Yandex搜索结果SERP API专为SERP提取设计,返回结构化数据
从Amazon、LinkedIn、Instagram、TikTok等平台提取结构化数据Web Scraper API内置预构建抓取器,无需解析逻辑
点击、滚动、填写表单、运行JS、拦截XHRBrowser API完整浏览器自动化
Puppeteer / Playwright / Selenium自动化Browser API通过CDP/WebDriver连接

Authentication Pattern (All APIs)

所有API的认证模式

All APIs share the same authentication model:
bash
export BRIGHTDATA_API_KEY="your-api-key"         # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name"       # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name"      # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD"  # Browser API credentials
REST API authentication header for Web Unlocker and SERP API:
Authorization: Bearer YOUR_API_KEY

所有API采用相同的认证模型:
bash
export BRIGHTDATA_API_KEY="your-api-key"         # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name"       # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name"      # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD"  # Browser API credentials
Web Unlocker和SERP API的REST API认证请求头:
Authorization: Bearer YOUR_API_KEY

Web Unlocker API

Web Unlocker API

HTTP-based scraping proxy. Best for simple page fetches without browser interaction.
Endpoint:
POST https://api.brightdata.com/request
python
import requests

response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "zone": "YOUR_ZONE_NAME",
        "url": "https://example.com/product/123",
        "format": "raw"
    }
)
html = response.text
基于HTTP协议的抓取代理。最适合无需浏览器交互的简单页面获取场景。
Endpoint:
POST https://api.brightdata.com/request
python
import requests

response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "zone": "YOUR_ZONE_NAME",
        "url": "https://example.com/product/123",
        "format": "raw"
    }
)
html = response.text

Key Parameters

关键参数

ParameterTypeDescription
zone
stringZone name (required)
url
stringTarget URL with
http://
or
https://
(required)
format
string
"raw"
(HTML) or
"json"
(structured wrapper) (required)
method
stringHTTP verb, default
"GET"
country
string2-letter ISO for geo-targeting (e.g.,
"us"
,
"de"
)
data_format
stringTransform:
"markdown"
or
"screenshot"
async
boolean
true
for async mode
参数类型描述
zone
string区域名称(必填)
url
string目标URL,需包含
http://
https://
(必填)
format
string
"raw"
(HTML格式)或
"json"
(结构化包装格式)(必填)
method
stringHTTP请求方法,默认值为
"GET"
country
string用于地理定位的两位ISO国家代码(例如:
"us"
"de"
data_format
string转换格式:
"markdown"
"screenshot"
async
boolean设置为
true
启用异步模式

Quick Patterns

快速使用示例

python
undefined
python
undefined

Get markdown (best for LLM input)

获取markdown格式(最适合LLM输入)

response = requests.post( "https://api.brightdata.com/request", headers={"Authorization": f"Bearer {API_KEY}"}, json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"} )
response = requests.post( "https://api.brightdata.com/request", headers={"Authorization": f"Bearer {API_KEY}"}, json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"} )

Geo-targeted request

地理定位请求

json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}

Screenshot for debugging

用于调试的截图

json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}

Async for bulk processing

批量处理的异步模式

json={"zone": ZONE, "url": url, "format": "raw", "async": True}

**Critical rule:** Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.

See **[references/web-unlocker.md](references/web-unlocker.md)** for complete reference including proxy interface, special headers, async flow, features, and billing.

---
json={"zone": ZONE, "url": url, "format": "raw", "async": True}

**重要规则:** 切勿将Web Unlocker与Puppeteer、Playwright、Selenium或反检测浏览器配合使用,请改用Browser API。

完整参考文档请查看 **[references/web-unlocker.md](references/web-unlocker.md)**,包括代理接口、特殊请求头、异步流程、功能特性和计费说明。

---

SERP API

SERP API

Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.
Endpoint:
POST https://api.brightdata.com/request
(same as Web Unlocker)
python
response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "zone": "YOUR_SERP_ZONE",
        "url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
        "format": "raw"
    }
)
data = response.json()
for result in data.get("organic", []):
    print(result["rank"], result["title"], result["link"])
针对Google、Bing、Yandex、DuckDuckGo的结构化搜索引擎结果提取工具。
Endpoint:
POST https://api.brightdata.com/request
(与Web Unlocker相同)
python
response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "zone": "YOUR_SERP_ZONE",
        "url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
        "format": "raw"
    }
)
data = response.json()
for result in data.get("organic", []):
    print(result["rank"], result["title"], result["link"])

Essential Google URL Parameters

Google URL核心参数

ParameterDescriptionExample
q
Search query
q=python+web+scraping
brd_json
Parsed JSON output
brd_json=1
(always use for data pipelines)
gl
Country for search
gl=us
hl
Language
hl=en
start
Pagination offset
start=10
(page 2),
start=20
(page 3)
tbm
Search type
tbm=nws
(news),
tbm=isch
(images),
tbm=vid
(videos)
brd_mobile
Device
brd_mobile=1
(mobile),
brd_mobile=ios
brd_browser
Browser
brd_browser=chrome
brd_ai_overview
Trigger AI Overview
brd_ai_overview=2
uule
Encoded geo locationfor precise location targeting
Note:
num
parameter is deprecated as of September 2025. Use
start
for pagination.
参数描述示例
q
搜索关键词
q=python+web+scraping
brd_json
解析后的JSON输出
brd_json=1
(数据管道场景请始终启用)
gl
搜索所在国家
gl=us
hl
语言
hl=en
start
分页偏移量
start=10
(第2页)、
start=20
(第3页)
tbm
搜索类型
tbm=nws
(新闻)、
tbm=isch
(图片)、
tbm=vid
(视频)
brd_mobile
设备类型
brd_mobile=1
(移动端)、
brd_mobile=ios
brd_browser
浏览器
brd_browser=chrome
brd_ai_overview
触发AI概览
brd_ai_overview=2
uule
编码后的地理位置用于精准定位
注意:
num
参数自2025年9月起已被弃用,请使用
start
参数进行分页。

Parsed JSON Response Structure

解析后的JSON响应结构

json
{
  "organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
  "paid": [],
  "people_also_ask": [],
  "knowledge_graph": {},
  "related_searches": [],
  "general": {"results_cnt": 1240000000, "query": "..."}
}
json
{
  "organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
  "paid": [],
  "people_also_ask": [],
  "knowledge_graph": {},
  "related_searches": [],
  "general": {"results_cnt": 1240000000, "query": "..."}
}

Bing Key Parameters

Bing核心参数

ParameterDescription
q
Search query
setLang
Language (prefer 4-letter:
en-US
)
cc
Country code
first
Pagination (increment by 10: 1, 11, 21...)
safesearch
off
,
moderate
,
strict
brd_mobile
Device type
参数描述
q
搜索关键词
setLang
语言(推荐使用四位代码:
en-US
cc
国家代码
first
分页参数(每次递增10:1、11、21...)
safesearch
off
moderate
strict
brd_mobile
设备类型

Async for Bulk SERP

批量SERP处理的异步模式

python
undefined
python
undefined

Submit

提交请求

response = requests.post( "https://api.brightdata.com/request", params={"async": "1"}, headers={"Authorization": f"Bearer {API_KEY}"}, json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"} ) response_id = response.headers.get("x-response-id")
response = requests.post( "https://api.brightdata.com/request", params={"async": "1"}, headers={"Authorization": f"Bearer {API_KEY}"}, json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"} ) response_id = response.headers.get("x-response-id")

Retrieve (retrieve calls are NOT billed)

获取结果(获取请求不计费)

result = requests.get( "https://api.brightdata.com/serp/get_result", params={"response_id": response_id}, headers={"Authorization": f"Bearer {API_KEY}"} )

**Billing:** Pay per 1,000 successful requests only. Async retrieve calls are not billed.

See **[references/serp-api.md](references/serp-api.md)** for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.

---
result = requests.get( "https://api.brightdata.com/serp/get_result", params={"response_id": response_id}, headers={"Authorization": f"Bearer {API_KEY}"} )

**计费规则:** 仅对每1000次成功请求计费。异步获取请求不计费。

完整参考文档请查看 **[references/serp-api.md](references/serp-api.md)**,包括地图、趋势、评论、镜头、酒店、航班等参数说明。

---

Web Scraper API

Web Scraper API

Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.
Sync Endpoint:
POST https://api.brightdata.com/datasets/v3/scrape
Async Endpoint:
POST https://api.brightdata.com/datasets/v3/trigger
python
undefined
针对100+平台的预构建抓取器,用于结构化数据提取,无需编写解析逻辑。
同步Endpoint:
POST https://api.brightdata.com/datasets/v3/scrape
异步Endpoint:
POST https://api.brightdata.com/datasets/v3/trigger
python
undefined

Sync (up to 20 URLs, returns immediately)

同步模式(最多支持20个URL,立即返回结果)

response = requests.post( "https://api.brightdata.com/datasets/v3/scrape", params={"dataset_id": "YOUR_DATASET_ID", "format": "json"}, headers={"Authorization": f"Bearer {API_KEY}"}, json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]} )
if response.status_code == 200: data = response.json() # Results ready elif response.status_code == 202: snapshot_id = response.json()["snapshot_id"] # Poll for completion
undefined
response = requests.post( "https://api.brightdata.com/datasets/v3/scrape", params={"dataset_id": "YOUR_DATASET_ID", "format": "json"}, headers={"Authorization": f"Bearer {API_KEY}"}, json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]} )
if response.status_code == 200: data = response.json() # 结果已就绪 elif response.status_code == 202: snapshot_id = response.json()["snapshot_id"] # 轮询等待完成
undefined

Parameters

参数说明

ParameterTypeDescription
dataset_id
stringScraper identifier from the Scraper Library (required)
format
string
json
(default),
ndjson
,
jsonl
,
csv
custom_output_fields
stringPipe-separated fields:
url|title|price
include_errors
booleanInclude error info in results
参数类型描述
dataset_id
string抓取器库中的抓取器标识符(必填)
format
string
json
(默认值)、
ndjson
jsonl
csv
custom_output_fields
string竖线分隔的字段列表:
url|title|price
include_errors
boolean在结果中包含错误信息

Request Body

请求体示例

json
{
  "input": [
    { "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
    { "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
  ]
}
json
{
  "input": [
    { "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
    { "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
  ]
}

Poll for Async Results

异步结果轮询

python
import time
python
import time

Trigger

触发异步任务

snapshot_id = requests.post( "https://api.brightdata.com/datasets/v3/trigger", params={"dataset_id": DATASET_ID, "format": "json"}, headers={"Authorization": f"Bearer {API_KEY}"}, json={"input": [{"url": u} for u in urls]} ).json()["snapshot_id"]
snapshot_id = requests.post( "https://api.brightdata.com/datasets/v3/trigger", params={"dataset_id": DATASET_ID, "format": "json"}, headers={"Authorization": f"Bearer {API_KEY}"}, json={"input": [{"url": u} for u in urls]} ).json()["snapshot_id"]

Poll

轮询状态

while True: status = requests.get( f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}", headers={"Authorization": f"Bearer {API_KEY}"} ).json()["status"]
if status == "ready": break
if status == "failed": raise Exception("Job failed")
time.sleep(10)
while True: status = requests.get( f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}", headers={"Authorization": f"Bearer {API_KEY}"} ).json()["status"]
if status == "ready": break
if status == "failed": raise Exception("Job failed")
time.sleep(10)

Download

下载结果

data = requests.get( f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}", params={"format": "json"}, headers={"Authorization": f"Bearer {API_KEY}"} ).json()

**Progress status values:** `starting` → `running` → `ready` | `failed`
**Data retention:** 30 days.
**Billing:** Per delivered record. Invalid input URLs that fail are still billable.

See **[references/web-scraper-api.md](references/web-scraper-api.md)** for complete reference including scraper types, output formats, delivery options, and billing details.

---
data = requests.get( f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}", params={"format": "json"}, headers={"Authorization": f"Bearer {API_KEY}"} ).json()

**进度状态值:** `starting` → `running` → `ready` | `failed`
**数据保留期限:** 30天。
**计费规则:** 按交付的记录数计费。输入无效URL导致的失败请求仍会计费。

完整参考文档请查看 **[references/web-scraper-api.md](references/web-scraper-api.md)**,包括抓取器类型、输出格式、交付选项和计费详情。

---

Browser API (Scraping Browser)

Browser API(Scraping Browser)

Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.
Connection:
  • Playwright/Puppeteer:
    wss://${AUTH}@brd.superproxy.io:9222
  • Selenium:
    https://${AUTH}@brd.superproxy.io:9515
javascript
const { chromium } = require("playwright-core");

const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes

await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();
python
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
    page = await browser.new_page()
    page.set_default_navigation_timeout(120000)
    await page.goto("https://example.com", wait_until="domcontentloaded")
    html = await page.content()
    await browser.close()
通过CDP/WebDriver实现完整浏览器自动化,自动处理验证码、指纹识别和反机器人检测。
连接方式:
  • Playwright/Puppeteer:
    wss://${AUTH}@brd.superproxy.io:9222
  • Selenium:
    https://${AUTH}@brd.superproxy.io:9515
javascript
const { chromium } = require("playwright-core");

const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // 建议始终设置为2分钟

await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();
python
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
    page = await browser.new_page()
    page.set_default_navigation_timeout(120000)
    await page.goto("https://example.com", wait_until="domcontentloaded")
    html = await page.content()
    await browser.close()

Custom CDP Functions

自定义CDP函数

FunctionPurpose
Captcha.solve
Manually trigger CAPTCHA solving
Captcha.setAutoSolve
Enable/disable auto CAPTCHA solving
Proxy.setLocation
Set precise geo location (call BEFORE goto)
Proxy.useSession
Maintain same IP across sessions
Emulation.setDevice
Apply device profile (iPhone 14, etc.)
Emulation.getSupportedDevices
List available device profiles
Unblocker.enableAdBlock
Block ads to save bandwidth
Unblocker.disableAdBlock
Re-enable ads
Input.type
Fast text input for bulk form filling
Browser.addCertificate
Install client SSL cert for session
Page.inspect
Get DevTools debug URL for live session
javascript
// CDP session pattern for custom functions
const client = await page.target().createCDPSession();

// CAPTCHA solve with timeout
const result = await client.send("Captcha.solve", { timeout: 30000 });

// Precise geo location (must be before goto)
await client.send("Proxy.setLocation", {
  latitude: 37.7749,
  longitude: -122.4194,
  distance: 10,
  strict: true
});

// Block unnecessary resources
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });

// Device emulation
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });
函数用途
Captcha.solve
手动触发验证码求解
Captcha.setAutoSolve
启用/禁用自动验证码求解
Proxy.setLocation
设置精准地理位置(需在goto之前调用)
Proxy.useSession
在会话间保持相同IP
Emulation.setDevice
应用设备配置文件(如iPhone 14)
Emulation.getSupportedDevices
列出可用的设备配置文件
Unblocker.enableAdBlock
拦截广告以节省带宽
Unblocker.disableAdBlock
重新启用广告
Input.type
快速文本输入,适用于批量表单填写
Browser.addCertificate
为会话安装客户端SSL证书
Page.inspect
获取用于实时会话调试的DevTools URL
javascript
// 自定义函数的CDP会话模式
const client = await page.target().createCDPSession();

// 带超时的验证码求解
const result = await client.send("Captcha.solve", { timeout: 30000 });

// 精准地理位置(必须在goto之前调用)
await client.send("Proxy.setLocation", {
  latitude: 37.7749,
  longitude: -122.4194,
  distance: 10,
  strict: true
});

// 拦截不必要的资源
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });

// 设备模拟
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });

Session Rules

会话规则

  • One initial navigation per session — new URL = new session
  • Idle timeout: 5 minutes
  • Max duration: 30 minutes
  • 每个会话仅支持一次初始导航 — 访问新URL需创建新会话
  • 空闲超时: 5分钟
  • 最长会话时长: 30分钟

Geolocation

地理定位

  • Country-level: append
    -country-us
    to credentials username
  • EU-wide: append
    -country-eu
    (routes through 29+ European countries)
  • Precise: use
    Proxy.setLocation
    CDP command (before navigation)
  • 国家级别:在凭证用户名后追加
    -country-us
  • 欧盟范围:在凭证用户名后追加
    -country-eu
    (路由至29+欧洲国家)
  • 精准定位:使用
    Proxy.setLocation
    CDP命令(需在导航前调用)

Error Codes

错误码

CodeIssueFix
407
Wrong portPlaywright/Puppeteer →
9222
, Selenium →
9515
403
Bad authCheck credentials format and zone type
503
Service scalingWait 1 minute, reconnect
Billing: Traffic-based only. Block images/CSS/fonts to reduce costs.
See references/browser-api.md for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.

错误码问题修复方案
407
端口错误Playwright/Puppeteer →
9222
,Selenium →
9515
403
认证失败检查凭证格式和区域类型
503
服务扩容中等待1分钟后重新连接
计费规则: 仅按流量计费。拦截图片/CSS/字体可降低成本。
完整参考文档请查看 references/browser-api.md,包括所有CDP函数、带宽优化、验证码处理模式和调试方法。

Detailed References

详细参考文档

  • references/web-unlocker.md — Web Unlocker: full parameter list, proxy interface, special headers, async flow, features, billing, anti-patterns
  • references/serp-api.md — SERP API: all Google params (Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, parsed JSON structure, async, billing
  • references/web-scraper-api.md — Web Scraper API: sync vs async, all parameters, polling, scraper types, output formats, billing
  • references/browser-api.md — Browser API: connection strings, session rules, all CDP functions, geo-targeting, bandwidth optimization, CAPTCHA, debugging, error codes
  • references/web-unlocker.md — Web Unlocker:完整参数列表、代理接口、特殊请求头、异步流程、功能特性、计费规则、反模式说明
  • references/serp-api.md — SERP API:所有Google参数(地图、趋势、评论、镜头、酒店、航班)、Bing参数、解析后的JSON结构、异步模式、计费规则
  • references/web-scraper-api.md — Web Scraper API:同步vs异步、所有参数、轮询机制、抓取器类型、输出格式、计费规则
  • references/browser-api.md — Browser API:连接字符串、会话规则、所有CDP函数、地理定位、带宽优化、验证码处理、调试方法、错误码说明