scraping
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseScraping Skill
数据抓取技能
Progressive Escalation
渐进式升级策略
- Direct fetch — curl/fetch with standard headers
- Browser rendering — Headless browser for JS-heavy sites
- Proxy rotation — For rate-limited or geo-restricted content
- Specialized APIs — Platform-specific scrapers
- 直接请求 — 携带标准请求头的curl/fetch请求
- 浏览器渲染 — 针对重度依赖JS的站点使用无头浏览器
- 代理轮换 — 用于获取存在请求频率限制或地域限制的内容
- 专用API — 平台专属scraper
Rules
规则
- Respect robots.txt
- Rate limit requests
- Handle pagination
- Extract structured data
- Store raw + processed versions
- 遵守robots.txt规则
- 限制请求频率
- 处理分页逻辑
- 提取结构化数据
- 同时存储原始数据和处理后的数据