firecrawl-scraper
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFirecrawl Web Scraper Skill
Firecrawl网页抓取Skill
Status: Production Ready ✅
Last Updated: 2025-11-21
Official Docs: https://docs.firecrawl.dev
API Version: v2.5
状态:已就绪可投入生产 ✅
最后更新时间:2025-11-21
官方文档:https://docs.firecrawl.dev
API版本:v2.5
What is Firecrawl?
什么是Firecrawl?
Firecrawl is a Web Data API for AI that turns entire websites into LLM-ready markdown or structured data. It handles:
- JavaScript rendering - Executes client-side JavaScript to capture dynamic content
- Anti-bot bypass - Gets past CAPTCHA and bot detection systems
- Format conversion - Outputs as markdown, JSON, or structured data
- Screenshot capture - Saves visual representations of pages
- Browser automation - Full headless browser capabilities
Firecrawl是一款面向AI的网页数据API,可将整个网站内容转换为适用于LLM的markdown格式或结构化数据。它支持:
- JavaScript渲染 - 执行客户端JavaScript以捕获动态内容
- 反机器人绕过 - 突破验证码和机器人检测系统
- 格式转换 - 输出markdown、JSON或结构化数据
- 截图捕获 - 保存页面的可视化内容
- 浏览器自动化 - 完整的无头浏览器功能
API Endpoints
API端点
1. /v2/scrape
- Single Page Scraping
/v2/scrape1. /v2/scrape
- 单页面抓取
/v2/scrapeScrapes a single webpage and returns clean, structured content.
Use Cases:
- Extract article content
- Get product details
- Scrape specific pages
- Convert HTML to markdown
Key Options:
- : ["markdown", "html", "screenshot"]
formats - : true/false (removes nav, footer, ads)
onlyMainContent - : milliseconds to wait before scraping
waitFor - : browser automation actions (click, scroll, etc.)
actions
抓取单个网页并返回干净的结构化内容。
适用场景:
- 提取文章内容
- 获取产品详情
- 抓取特定页面
- 将HTML转换为markdown
关键选项:
- : ["markdown", "html", "screenshot"]
formats - : true/false(移除导航栏、页脚、广告)
onlyMainContent - : 抓取前等待的毫秒数
waitFor - : 浏览器自动化操作(点击、滚动等)
actions
2. /v2/crawl
- Full Site Crawling
/v2/crawl2. /v2/crawl
- 全站爬取
/v2/crawlCrawls all accessible pages from a starting URL.
Use Cases:
- Index entire documentation sites
- Archive website content
- Build knowledge bases
- Scrape multi-page content
Key Options:
- : max pages to crawl
limit - : how many links deep to follow
maxDepth - : restrict to specific domains
allowedDomains - : skip certain URL patterns
excludePaths
从起始URL爬取所有可访问的页面。
适用场景:
- 索引整个文档站点
- 归档网站内容
- 构建知识库
- 抓取多页面内容
关键选项:
- : 最大爬取页面数
limit - : 跟随链接的深度
maxDepth - : 限制为特定域名
allowedDomains - : 跳过特定URL模式
excludePaths
3. /v2/map
- URL Discovery
/v2/map3. /v2/map
- URL发现
/v2/mapMaps all URLs on a website without scraping content.
Use Cases:
- Find sitemap
- Discover all pages
- Plan crawling strategy
- Audit website structure
映射网站上的所有URL但不抓取内容。
适用场景:
- 查找站点地图
- 发现所有页面
- 规划爬取策略
- 审核网站结构
4. /v2/extract
- Structured Data Extraction
/v2/extract4. /v2/extract
- 结构化数据提取
/v2/extractUses AI to extract specific data fields from pages.
Use Cases:
- Extract product prices and names
- Parse contact information
- Build structured datasets
- Custom data schemas
Key Options:
- : Zod or JSON schema defining desired structure
schema - : guide AI extraction behavior
systemPrompt
使用AI从页面中提取特定数据字段。
适用场景:
- 提取产品价格和名称
- 解析联系信息
- 构建结构化数据集
- 自定义数据模式
关键选项:
- : 定义所需结构的Zod或JSON schema
schema - : 引导AI提取行为
systemPrompt
Authentication
身份验证
Firecrawl requires an API key for all requests.
所有Firecrawl请求都需要API密钥。
Get API Key
获取API密钥
- Sign up at https://www.firecrawl.dev
- Go to dashboard → API Keys
- Copy your API key (starts with )
fc-
- 在https://www.firecrawl.dev注册账号
- 进入控制台 → API密钥
- 复制你的API密钥(以开头)
fc-
Store Securely
安全存储
NEVER hardcode API keys in code!
bash
undefined绝对不要在代码中硬编码API密钥!
bash
undefined.env file
.env 文件
FIRECRAWL_API_KEY=fc-your-api-key-here
```bashFIRECRAWL_API_KEY=fc-your-api-key-here
```bash.env.local (for local development)
.env.local(用于本地开发)
FIRECRAWL_API_KEY=fc-your-api-key-here
---FIRECRAWL_API_KEY=fc-your-api-key-here
---SDK Quick Start
SDK快速开始
Python
Python
bash
pip install firecrawl-py # v4.5.0+python
from firecrawl import FirecrawlApp
import os
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
result = app.scrape_url("https://example.com", params={"formats": ["markdown"], "onlyMainContent": True})
print(result.get("markdown"))bash
pip install firecrawl-py # v4.5.0+python
from firecrawl import FirecrawlApp
import os
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
result = app.scrape_url("https://example.com", params={"formats": ["markdown"], "onlyMainContent": True})
print(result.get("markdown"))TypeScript/Node.js
TypeScript/Node.js
bash
bun add @mendable/firecrawl-js # v4.4.1+typescript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const result = await app.scrapeUrl('https://example.com', { formats: ['markdown'], onlyMainContent: true });
console.log(result.markdown);See: for crawl, extract, and advanced examples
templates/bash
bun add @mendable/firecrawl-js # v4.4.1+typescript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const result = await app.scrapeUrl('https://example.com', { formats: ['markdown'], onlyMainContent: true });
console.log(result.markdown);参考:目录下包含爬取、提取和进阶示例
templates/Common Use Cases
常见使用场景
| Use Case | Endpoint | Key Options |
|---|---|---|
| Documentation scraping | | |
| Product data extraction | | Zod schema + |
| News article scraping | | |
| URL discovery | | Find all pages before crawling |
See: for complete examples.
references/common-patterns.md| 使用场景 | 端点 | 关键选项 |
|---|---|---|
| 文档抓取 | | |
| 产品数据提取 | | Zod schema + |
| 新闻文章抓取 | | |
| URL发现 | | 爬取前先找到所有页面 |
参考:获取完整示例。
references/common-patterns.mdError Handling
错误处理
python
undefinedpython
undefinedPython
Python
try:
result = app.scrape_url("https://example.com")
except FirecrawlException as e:
print(f"Firecrawl error: {e}")
```typescript
// TypeScript
try {
const result = await app.scrapeUrl('https://example.com');
} catch (error) {
console.error('Error:', error.message);
}try:
result = app.scrape_url("https://example.com")
except FirecrawlException as e:
print(f"Firecrawl错误: {e}")
```typescript
// TypeScript
try {
const result = await app.scrapeUrl('https://example.com');
} catch (error) {
console.error('错误:', error.message);
}Rate Limits & Best Practices
速率限制与最佳实践
| Best Practice | Why |
|---|---|
Use | Reduces credits, cleaner output |
Set reasonable | Avoid excessive costs |
Use | Plan crawling strategy |
| Cache results | Avoid re-scraping |
| Batch extract calls | More efficient for multiple URLs |
Credits: Free tier = 500/month, paid tiers higher.
| 最佳实践 | 原因 |
|---|---|
使用 | 减少积分消耗,输出更简洁 |
设置合理的 | 避免过高成本 |
先使用 | 规划爬取策略 |
| 缓存结果 | 避免重复抓取 |
| 批量调用提取接口 | 处理多个URL更高效 |
积分说明:免费层级每月500次调用,付费层级次数更高。
Cloudflare Workers Integration
Cloudflare Workers集成
⚠️ SDK cannot run in Workers (Node.js dependencies). Use direct REST API:
typescript
const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ url, formats: ['markdown'], onlyMainContent: true })
});See: for complete Workers example with caching.
references/common-patterns.md⚠️ SDK无法在Workers中运行(依赖Node.js)。请直接使用REST API:
typescript
const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ url, formats: ['markdown'], onlyMainContent: true })
});参考:获取带缓存的完整Workers示例。
references/common-patterns.mdWhen to Use This Skill
何时使用该Skill
| ✅ Use Firecrawl | ❌ Don't Use |
|---|---|
| Modern JS-rendered sites | Simple static HTML (use cheerio) |
| Clean markdown for LLMs | Existing Puppeteer setup works |
| RAG/chatbot content | Direct API available |
| Structured data extraction | Budget constraints |
| Bot protection bypass |
| ✅ 适合使用Firecrawl | ❌ 不适合使用 |
|---|---|
| 现代JS渲染站点 | 简单静态HTML(使用cheerio) |
| 为LLM生成干净的markdown | 现有Puppeteer配置可用 |
| RAG/聊天机器人内容 | 已有直接API可用 |
| 结构化数据提取 | 预算有限 |
| 绕过机器人防护 |
Common Issues
常见问题
| Issue | Cause | Fix |
|---|---|---|
| "Invalid API Key" | Key not set | Check |
| "Rate limit exceeded" | Monthly credits used | Check dashboard, upgrade plan |
| "Timeout error" | Page slow to load | Add |
| "Content is empty" | JS loads late | Add |
| 问题 | 原因 | 解决方法 |
|---|---|---|
| "Invalid API Key" | 密钥未设置 | 检查 |
| "Rate limit exceeded" | 月度积分已用完 | 查看控制台,升级套餐 |
| "Timeout error" | 页面加载缓慢 | 添加 |
| "Content is empty" | JS加载延迟 | 添加 |
Advanced Features
进阶功能
| Feature | Usage |
|---|---|
| Browser actions | |
| Custom headers | |
| Webhooks | |
| Screenshots | |
See: for complete API reference.
references/endpoints.md| 功能 | 使用方式 |
|---|---|
| 浏览器操作 | |
| 自定义请求头 | |
| Webhooks | |
| 截图 | |
参考:获取完整API参考。
references/endpoints.mdWhen to Load References
何时加载参考文档
| Reference | Load When... |
|---|---|
| Need complete API endpoint documentation |
| Cloudflare Workers, caching, batch processing, error handling |
| 参考文档 | 加载时机 |
|---|---|
| 需要完整的API端点文档时 |
| 处理Cloudflare Workers、缓存、批量处理、错误处理时 |
Package Versions
包版本
| Package | Version |
|---|---|
| firecrawl-py | 4.5.0+ |
| @mendable/firecrawl-js | 4.4.1+ |
| API | v2 |
Note: Node.js SDK requires Node.js >=22.0.0, cannot run in Workers.
Official Docs: https://docs.firecrawl.dev | GitHub: https://github.com/mendableai/firecrawl
Token Savings: ~60% | Production Ready: ✅
| 包 | 版本 |
|---|---|
| firecrawl-py | 4.5.0+ |
| @mendable/firecrawl-js | 4.4.1+ |
| API | v2 |
注意:Node.js SDK要求Node.js >=22.0.0,无法在Workers中运行。
Token节省率:约60% | 生产就绪:✅