firecrawl-scrape
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesefirecrawl scrape
firecrawl scrape
Scrape one or more URLs. Returns clean, LLM-optimized markdown. Multiple URLs are scraped concurrently.
抓取一个或多个URL,返回干净、针对LLM优化的Markdown内容。多URL会被并发抓取。
When to use
使用场景
- You have a specific URL and want its content
- The page is static or JS-rendered (SPA)
- Step 2 in the workflow escalation pattern: search → scrape → map → crawl → browser
- 你有具体的URL并想要获取其内容
- 目标页面是静态页面或JS渲染的SPA
- 属于工作流升级模式的第二步:搜索 → 抓取 → 映射 → 爬取 → 浏览器操作
Quick start
快速开始
bash
undefinedbash
undefinedBasic markdown extraction
Basic markdown extraction
firecrawl scrape "<url>" -o .firecrawl/page.md
firecrawl scrape "<url>" -o .firecrawl/page.md
Main content only, no nav/footer
Main content only, no nav/footer
firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md
firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md
Wait for JS to render, then scrape
Wait for JS to render, then scrape
firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md
firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md
Multiple URLs (each saved to .firecrawl/)
Multiple URLs (each saved to .firecrawl/)
Get markdown and links together
Get markdown and links together
firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json
undefinedfirecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json
undefinedOptions
选项说明
| Option | Description |
|---|---|
| Output formats: markdown, html, rawHtml, links, screenshot, json |
| Include HTTP headers in output |
| Strip nav, footer, sidebar — main content only |
| Wait for JS rendering before scraping |
| Only include these HTML tags |
| Exclude these HTML tags |
| Output file path |
| 选项 | 描述 |
|---|---|
| 输出格式:markdown, html, rawHtml, links, screenshot, json |
| 在输出中包含HTTP头部信息 |
| 移除导航栏、页脚、侧边栏 — 仅保留主内容 |
| 等待JS渲染完成后再执行抓取 |
| 仅包含指定的HTML标签 |
| 排除指定的HTML标签 |
| 输出文件路径 |
Tips
使用技巧
- Try scrape before browser. Scrape handles static pages and JS-rendered SPAs. Only escalate to browser when you need interaction (clicks, form fills, pagination).
- Multiple URLs are scraped concurrently — check for your concurrency limit.
firecrawl --status - Single format outputs raw content. Multiple formats (e.g., ) output JSON.
--format markdown,links - Always quote URLs — shell interprets and
?as special characters.& - Naming convention:
.firecrawl/{site}-{path}.md
- 优先使用scrape而非browser:scrape可处理静态页面和JS渲染的SPA。仅当需要交互操作(点击、表单填写、分页)时,再升级使用browser。
- 多URL会被并发抓取 — 可通过查看你的并发限制。
firecrawl --status - 单一格式输出原始内容,多格式(如)则输出JSON格式。
--format markdown,links - 始终给URL加引号 — 终端会将和
?视为特殊字符。& - 命名规范:
.firecrawl/{site}-{path}.md
See also
相关工具
- firecrawl-search — find pages when you don't have a URL
- firecrawl-browser — when scrape can't get the content (interaction needed)
- firecrawl-download — bulk download an entire site to local files
- firecrawl-search — 当你没有具体URL时,用于查找页面
- firecrawl-browser — 当scrape无法获取内容时使用(需要交互操作)
- firecrawl-download — 批量下载整个站点到本地文件