data-feeds
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBright Data - Structured Data Feeds
Bright Data - 结构化数据馈送
Extract structured data from major websites with automatic parsing. No scraping logic needed - just provide a URL and get clean JSON data.
从主流网站提取结构化数据,无需编写爬取逻辑——只需提供URL,即可获取整洁的JSON数据。
Setup
配置
Environment Variables (Required)
必需的环境变量
bash
export BRIGHTDATA_API_KEY="your-api-key"bash
export BRIGHTDATA_API_KEY="your-api-key"Optional
可选配置
bash
export BRIGHTDATA_POLLING_TIMEOUT=600 # Max seconds to wait (default: 600)Get your API key from Bright Data Dashboard.
bash
export BRIGHTDATA_POLLING_TIMEOUT=600 # 最长等待秒数(默认值:600)从Bright Data Dashboard获取你的API密钥。
Usage
使用方法
bash
bash scripts/datasets.sh <dataset_type> <url> [additional_params...]bash
bash scripts/datasets.sh <dataset_type> <url> [additional_params...]Available Datasets
可用数据集
E-Commerce
电商平台
| Dataset | Command | Description |
|---|---|---|
| Amazon Product | | Product details, pricing, ratings |
| Amazon Reviews | | Customer reviews for a product |
| Amazon Search | | Search results |
| Walmart Product | | Product details from Walmart |
| Walmart Seller | | Seller information |
| eBay Product | | eBay listing details |
| Home Depot | | Home Depot product data |
| Zara | | Zara product details |
| Etsy | | Etsy listing data |
| Best Buy | | Best Buy product info |
| 数据集 | 命令 | 描述 |
|---|---|---|
| 亚马逊产品 | | 产品详情、定价、评分 |
| 亚马逊评论 | | 产品的用户评论 |
| 亚马逊搜索 | | 搜索结果 |
| 沃尔玛产品 | | 沃尔玛产品详情 |
| 沃尔玛卖家 | | 卖家信息 |
| eBay产品 | | eBay商品列表详情 |
| 家得宝 | | 家得宝产品数据 |
| Zara | | Zara产品详情 |
| Etsy | | Etsy商品列表数据 |
| 百思买 | | 百思买产品信息 |
Professional Networks
职业社交网络
| Dataset | Command | Description |
|---|---|---|
| LinkedIn Person | | Profile data (experience, skills) |
| LinkedIn Company | | Company page data |
| LinkedIn Jobs | | Job posting details |
| LinkedIn Posts | | Post content and engagement |
| LinkedIn Search | | Find people |
| Crunchbase | | Company funding, employees |
| ZoomInfo | | Company profile data |
| 数据集 | 命令 | 描述 |
|---|---|---|
| 领英个人资料 | | 个人资料数据(经历、技能) |
| 领英公司主页 | | 公司主页数据 |
| 领英职位列表 | | 职位发布详情 |
| 领英帖子 | | 帖子内容及互动数据 |
| 领英人员搜索 | | 人员查找 |
| Crunchbase | | 公司融资、员工数据 |
| ZoomInfo | | 公司资料数据 |
| Dataset | Command | Description |
|---|---|---|
| Profiles | | Bio, followers, following |
| Posts | | Post details, likes, captions |
| Reels | | Reel data and metrics |
| Comments | | Post comments |
| 数据集 | 命令 | 描述 |
|---|---|---|
| 个人主页 | | 简介、粉丝数、关注数 |
| 帖子 | | 帖子详情、点赞数、文案 |
| Reels短视频 | | Reels数据及指标 |
| 评论 | | 帖子评论 |
| Dataset | Command | Description |
|---|---|---|
| Posts | | Post content and reactions |
| Marketplace | | Listing details |
| Reviews | | Company reviews |
| Events | | Event details |
| 数据集 | 命令 | 描述 |
|---|---|---|
| 帖子 | | 帖子内容及互动反应 |
| Marketplace集市 | | 商品列表详情 |
| 公司评论 | | 公司评论 |
| 活动 | | 活动详情 |
TikTok
TikTok
| Dataset | Command | Description |
|---|---|---|
| Profiles | | Creator profile data |
| Posts | | Video details and metrics |
| Shop | | TikTok Shop product data |
| Comments | | Video comments |
| 数据集 | 命令 | 描述 |
|---|---|---|
| 创作者主页 | | 创作者主页数据 |
| 视频帖子 | | 视频详情及指标 |
| TikTok店铺 | | TikTok店铺产品数据 |
| 评论 | | 视频评论 |
YouTube
YouTube
| Dataset | Command | Description |
|---|---|---|
| Profiles | | Channel data |
| Videos | | Video details and stats |
| Comments | | Video comments (default: 10) |
| 数据集 | 命令 | 描述 |
|---|---|---|
| 频道主页 | | 频道数据 |
| 视频 | | 视频详情及统计数据 |
| 评论 | | 视频评论(默认:10条) |
Other Social
其他社交平台
| Dataset | Command | Description |
|---|---|---|
| X (Twitter) | | Tweet data |
| Post and comment data |
| 数据集 | 命令 | 描述 |
|---|---|---|
| X(原Twitter) | | 推文数据 |
| 帖子及评论数据 |
Google Services
Google服务
| Dataset | Command | Description |
|---|---|---|
| Maps Reviews | | Business reviews (default: 3 days) |
| Shopping | | Product comparison data |
| Play Store | | App details and reviews |
| 数据集 | 命令 | 描述 |
|---|---|---|
| 地图评论 | | 商家评论(默认:3天内) |
| 购物平台 | | 产品对比数据 |
| Play商店 | | 应用详情及评论 |
Other
其他类别
| Dataset | Command | Description |
|---|---|---|
| Apple App Store | | iOS app data |
| Reuters News | | News article content |
| GitHub | | Repository file data |
| Yahoo Finance | | Stock and company data |
| Zillow | | Property listing details |
| Booking.com | | Hotel listing data |
| 数据集 | 命令 | 描述 |
|---|---|---|
| Apple App Store | | iOS应用数据 |
| 路透社新闻 | | 新闻文章内容 |
| GitHub | | 仓库文件数据 |
| 雅虎财经 | | 股票及公司数据 |
| Zillow房产 | | 房产列表详情 |
| Booking.com | | 酒店列表数据 |
Examples
示例
Get LinkedIn Profile
获取领英个人资料
bash
bash scripts/datasets.sh linkedin_person_profile "https://www.linkedin.com/in/satyanadella/"bash
bash scripts/datasets.sh linkedin_person_profile "https://www.linkedin.com/in/satyanadella/"Get Amazon Product
获取亚马逊产品数据
bash
bash scripts/datasets.sh amazon_product "https://www.amazon.com/dp/B09V3KXJPB"bash
bash scripts/datasets.sh amazon_product "https://www.amazon.com/dp/B09V3KXJPB"Get Instagram Profile
获取Instagram主页数据
bash
bash scripts/datasets.sh instagram_profiles "https://www.instagram.com/natgeo/"bash
bash scripts/datasets.sh instagram_profiles "https://www.instagram.com/natgeo/"Get YouTube Comments
获取YouTube评论
bash
bash scripts/datasets.sh youtube_comments "https://www.youtube.com/watch?v=dQw4w9WgXcQ" 20bash
bash scripts/datasets.sh youtube_comments "https://www.youtube.com/watch?v=dQw4w9WgXcQ" 20Search Amazon
亚马逊搜索
bash
bash scripts/datasets.sh amazon_product_search "wireless headphones" "https://www.amazon.com"bash
bash scripts/datasets.sh amazon_product_search "wireless headphones" "https://www.amazon.com"Output Format
输出格式
Returns structured JSON with website-specific fields. Example for LinkedIn profile:
json
{
"name": "Satya Nadella",
"headline": "Chairman and CEO at Microsoft",
"location": "Greater Seattle Area",
"connections": "500+",
"experience": [...],
"education": [...],
"skills": [...]
}返回包含网站专属字段的结构化JSON。以下是领英个人资料的示例:
json
{
"name": "Satya Nadella",
"headline": "Chairman and CEO at Microsoft",
"location": "Greater Seattle Area",
"connections": "500+",
"experience": [...],
"education": [...],
"skills": [...]
}How It Works
工作原理
- Trigger: Sends URL to Bright Data's Web Data API
- Poll: Waits for data collection to complete (checks every second)
- Return: Outputs structured JSON when ready
The polling mechanism handles rate limits and ensures data quality by waiting for full extraction.
- 触发:将URL发送至Bright Data的Web Data API
- 轮询:等待数据收集完成(每秒检查一次)
- 返回:数据准备就绪后输出结构化JSON
轮询机制可处理速率限制,并通过等待完整提取来确保数据质量。
Advanced: Direct Fetch
进阶:直接获取
For custom dataset IDs or advanced use cases:
bash
bash scripts/fetch.sh <dataset_id> '<json_input>'Example:
bash
bash scripts/fetch.sh gd_l1viktl72bvl7bjuj0 '{"url":"https://linkedin.com/in/someone"}'针对自定义数据集ID或进阶使用场景:
bash
bash scripts/fetch.sh <dataset_id> '<json_input>'示例:
bash
bash scripts/fetch.sh gd_l1viktl72bvl7bjuj0 '{"url":"https://linkedin.com/in/someone"}'