firecrawl-crawl

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

firecrawl crawl

firecrawl crawl

Bulk extract content from a website. Crawls pages following links up to a depth/limit.
批量提取网站内容。可按照链接抓取页面,支持深度/数量限制。

When to use

适用场景

  • You need content from many pages on a site (e.g., all
    /docs/
    )
  • You want to extract an entire site section
  • Step 4 in the workflow escalation pattern: search → scrape → map → crawl → browser
  • 你需要获取一个网站中多个页面的内容(例如所有
    /docs/
    页面)
  • 你想要提取整个网站版块的内容
  • 工作流升级模式的第4步(firecrawl-cli):搜索 → 抓取 → 映射 → 爬取 → 浏览器

Quick start

快速开始

bash
undefined
bash
undefined

Crawl a docs section

爬取文档版块

firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json
firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json

Full crawl with depth limit

带深度限制的全量爬取

firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json
firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json

Check status of a running crawl

查看正在运行的爬取任务状态

firecrawl crawl <job-id>
undefined
firecrawl crawl <job-id>
undefined

Options

参数选项

OptionDescription
--wait
Wait for crawl to complete before returning
--progress
Show progress while waiting
--limit <n>
Max pages to crawl
--max-depth <n>
Max link depth to follow
--include-paths <paths>
Only crawl URLs matching these paths
--exclude-paths <paths>
Skip URLs matching these paths
--delay <ms>
Delay between requests
--max-concurrency <n>
Max parallel crawl workers
--pretty
Pretty print JSON output
-o, --output <path>
Output file path
选项说明
--wait
等待爬取完成后再返回结果
--progress
等待时显示爬取进度
--limit <n>
最大爬取页面数量
--max-depth <n>
最大跟随链接深度
--include-paths <paths>
仅爬取匹配这些路径的URL
--exclude-paths <paths>
跳过匹配这些路径的URL
--delay <ms>
请求之间的延迟时间
--max-concurrency <n>
最大并发爬取线程数
--pretty
格式化输出JSON结果
-o, --output <path>
输出文件路径

Tips

提示

  • Always use
    --wait
    when you need the results immediately. Without it, crawl returns a job ID for async polling.
  • Use
    --include-paths
    to scope the crawl — don't crawl an entire site when you only need one section.
  • Crawl consumes credits per page. Check
    firecrawl credit-usage
    before large crawls.
  • 若需要立即获取结果,请始终使用
    --wait
    参数。不使用该参数时,爬取操作会返回一个任务ID用于异步轮询。
  • 使用
    --include-paths
    来限定爬取范围——当你只需要一个版块的内容时,不要爬取整个网站。
  • 每爬取一个页面会消耗积分。大规模爬取前,请使用
    firecrawl credit-usage
    查看积分使用情况。

See also

另请参阅

  • firecrawl-scrape — scrape individual pages
  • firecrawl-map — discover URLs before deciding to crawl
  • firecrawl-download — download site to local files (uses map + scrape)
  • firecrawl-scrape —— 抓取单个页面
  • firecrawl-map —— 在决定爬取前发现URL
  • firecrawl-download —— 将网站内容下载到本地文件(基于map + scrape功能)