firecrawl-build-scrape

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Firecrawl Build Scrape

Firecrawl 单页爬取能力构建

Use this when the application already has the URL and needs content from one page.
当应用已经获取到URL,需要提取单个页面的内容时使用本方案。

Use This When

适用场景

  • the feature starts from a known URL
  • you need page content for retrieval, summarization, enrichment, or monitoring
  • you want the default extraction primitive before considering
    /interact
  • 功能从已知URL启动
  • 需要获取页面内容用于检索、摘要、信息补全或监控
  • 在考虑使用
    /interact
    之前,你需要默认的提取原语

Default Recommendations

默认推荐配置

  • Start with
    /scrape
    , not
    /crawl
    .
  • Return
    markdown
    unless the feature truly needs another format.
  • Use
    onlyMainContent
    for article-like pages where nav and chrome add noise.
  • Add waits or other rendering options only when the page needs them.
  • 优先使用
    /scrape
    ,而非
    /crawl
  • 默认返回
    markdown
    格式,除非功能确实需要其他格式
  • 对于文章类页面,导航栏和页面冗余元素会带来噪声,使用
    onlyMainContent
    参数提取
  • 仅当页面有需要时,才添加等待或其他渲染选项

Common Product Patterns

常见产品使用模式

  • knowledge ingestion from known URLs
  • enrichment from a company, product, or docs page
  • pricing, changelog, and documentation extraction
  • page-level quality checks or monitoring
  • 从已知URL导入知识内容
  • 从公司、产品或文档页面补充丰富信息
  • 价格、更新日志和文档内容提取
  • 页面级质量检查或监控

Escalation Rules

升级处理规则

  • If you do not have the URL yet, start with firecrawl-build-search.
  • If content requires clicks, typing, or multi-step navigation, escalate to firecrawl-build-interact.
  • If you need many pages from the same site, consider firecrawl-build-crawl or firecrawl-build-map.
  • 如果你还没有获取到URL,优先使用firecrawl-build-search
  • 如果内容需要点击、输入或多步骤导航才能获取,升级使用firecrawl-build-interact
  • 如果你需要提取同一站点的多个页面,考虑使用firecrawl-build-crawlfirecrawl-build-map

Implementation Notes

实现注意事项

  • Keep the integration narrow: one feature, one URL, one extraction contract.
  • Treat
    /scrape
    as the default primitive for downstream LLM or indexing pipelines.
  • Request richer formats only when the consumer needs them, such as links, screenshots, or branding data.
  • 保持集成的专一性:一个功能、一个URL、一个提取契约
  • /scrape
    作为下游LLM或索引流水线的默认提取原语
  • 仅当使用方有需求时,才请求更丰富的返回格式,例如链接、截图或品牌数据

Docs (Source of Truth)

官方文档(权威参考)

Read the docs for request/response schemas, parameters, and SDK examples before writing integration code:
编写集成代码前,请阅读官方文档了解请求/响应schema、参数和SDK示例:

See Also

相关参考

  • firecrawl-build
  • firecrawl-build-search
  • firecrawl-build-interact
  • firecrawl-build
  • firecrawl-build-search
  • firecrawl-build-interact