scraping

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Scraping Skill

数据抓取技能

Progressive Escalation

渐进式升级策略

  1. Direct fetch — curl/fetch with standard headers
  2. Browser rendering — Headless browser for JS-heavy sites
  3. Proxy rotation — For rate-limited or geo-restricted content
  4. Specialized APIs — Platform-specific scrapers
  1. 直接请求 — 携带标准请求头的curl/fetch请求
  2. 浏览器渲染 — 针对重度依赖JS的站点使用无头浏览器
  3. 代理轮换 — 用于获取存在请求频率限制或地域限制的内容
  4. 专用API — 平台专属scraper

Rules

规则

  • Respect robots.txt
  • Rate limit requests
  • Handle pagination
  • Extract structured data
  • Store raw + processed versions
  • 遵守robots.txt规则
  • 限制请求频率
  • 处理分页逻辑
  • 提取结构化数据
  • 同时存储原始数据和处理后的数据