firecrawl-scraper
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFirecrawl Web Scraper Skill
Firecrawl 网页抓取工具技能文档
Status: Production Ready ✅
Last Updated: 2025-10-24
Official Docs: https://docs.firecrawl.dev
API Version: v2
状态:已就绪可投入生产 ✅
最后更新:2025-10-24
官方文档:https://docs.firecrawl.dev
API版本:v2
What is Firecrawl?
什么是Firecrawl?
Firecrawl is a Web Data API for AI that turns entire websites into LLM-ready markdown or structured data. It handles:
- JavaScript rendering - Executes client-side JavaScript to capture dynamic content
- Anti-bot bypass - Gets past CAPTCHA and bot detection systems
- Format conversion - Outputs as markdown, JSON, or structured data
- Screenshot capture - Saves visual representations of pages
- Browser automation - Full headless browser capabilities
Firecrawl 是一款面向AI的网页数据API,可将整个网站内容转换为适用于LLM的Markdown或结构化数据。它支持:
- JavaScript渲染 - 执行客户端JavaScript以捕获动态内容
- 反机器人绕过 - 突破验证码和机器人检测系统
- 格式转换 - 输出为Markdown、JSON或结构化数据
- 截图捕获 - 保存页面的可视化内容
- 浏览器自动化 - 完整的无头浏览器功能
API Endpoints
API端点
1. /v2/scrape
- Single Page Scraping
/v2/scrape1. /v2/scrape
- 单页抓取
/v2/scrapeScrapes a single webpage and returns clean, structured content.
Use Cases:
- Extract article content
- Get product details
- Scrape specific pages
- Convert HTML to markdown
Key Options:
- : ["markdown", "html", "screenshot"]
formats - : true/false (removes nav, footer, ads)
onlyMainContent - : milliseconds to wait before scraping
waitFor - : browser automation actions (click, scroll, etc.)
actions
抓取单个网页并返回干净的结构化内容。
适用场景:
- 提取文章内容
- 获取产品详情
- 抓取特定页面
- 将HTML转换为Markdown
关键参数选项:
- : ["markdown", "html", "screenshot"]
formats - : true/false(移除导航、页脚、广告)
onlyMainContent - : 抓取前等待的毫秒数
waitFor - : 浏览器自动化操作(点击、滚动等)
actions
2. /v2/crawl
- Full Site Crawling
/v2/crawl2. /v2/crawl
- 全站爬取
/v2/crawlCrawls all accessible pages from a starting URL.
Use Cases:
- Index entire documentation sites
- Archive website content
- Build knowledge bases
- Scrape multi-page content
Key Options:
- : max pages to crawl
limit - : how many links deep to follow
maxDepth - : restrict to specific domains
allowedDomains - : skip certain URL patterns
excludePaths
从起始URL爬取所有可访问的页面。
适用场景:
- 索引整个文档站点
- 归档网站内容
- 构建知识库
- 抓取多页面内容
关键参数选项:
- : 最大爬取页面数
limit - : 跟随链接的深度
maxDepth - : 限制爬取的特定域名
allowedDomains - : 跳过特定URL模式
excludePaths
3. /v2/map
- URL Discovery
/v2/map3. /v2/map
- URL发现
/v2/mapMaps all URLs on a website without scraping content.
Use Cases:
- Find sitemap
- Discover all pages
- Plan crawling strategy
- Audit website structure
映射网站上的所有URL而不抓取内容。
适用场景:
- 查找站点地图
- 发现所有页面
- 规划爬取策略
- 审计网站结构
4. /v2/extract
- Structured Data Extraction
/v2/extract4. /v2/extract
- 结构化数据提取
/v2/extractUses AI to extract specific data fields from pages.
Use Cases:
- Extract product prices and names
- Parse contact information
- Build structured datasets
- Custom data schemas
Key Options:
- : Zod or JSON schema defining desired structure
schema - : guide AI extraction behavior
systemPrompt
使用AI从页面中提取特定数据字段。
适用场景:
- 提取产品价格和名称
- 解析联系信息
- 构建结构化数据集
- 自定义数据Schema
关键参数选项:
- : 定义所需结构的Zod或JSON Schema
schema - : 引导AI提取行为的系统提示词
systemPrompt
Authentication
身份验证
Firecrawl requires an API key for all requests.
所有Firecrawl请求都需要API密钥。
Get API Key
获取API密钥
- Sign up at https://www.firecrawl.dev
- Go to dashboard → API Keys
- Copy your API key (starts with )
fc-
- 在 https://www.firecrawl.dev 注册账号
- 进入控制台 → API Keys
- 复制你的API密钥(以开头)
fc-
Store Securely
安全存储
NEVER hardcode API keys in code!
bash
undefined绝对不要在代码中硬编码API密钥!
bash
undefined.env file
.env 文件
FIRECRAWL_API_KEY=fc-your-api-key-here
```bashFIRECRAWL_API_KEY=fc-your-api-key-here
```bash.env.local (for local development)
.env.local(本地开发用)
FIRECRAWL_API_KEY=fc-your-api-key-here
---FIRECRAWL_API_KEY=fc-your-api-key-here
---Python SDK Usage
Python SDK 使用指南
Installation
安装
bash
pip install firecrawl-pyLatest Version:
firecrawl-py v4.5.0+bash
pip install firecrawl-py最新版本:
firecrawl-py v4.5.0+Basic Scrape
基础抓取示例
python
import os
from firecrawl import FirecrawlApppython
import os
from firecrawl import FirecrawlAppInitialize client
初始化客户端
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
Scrape a single page
抓取单个页面
result = app.scrape_url(
url="https://example.com/article",
params={
"formats": ["markdown", "html"],
"onlyMainContent": True
}
)
result = app.scrape_url(
url="https://example.com/article",
params={
"formats": ["markdown", "html"],
"onlyMainContent": True
}
)
Access markdown content
获取Markdown内容
markdown = result.get("markdown")
print(markdown)
undefinedmarkdown = result.get("markdown")
print(markdown)
undefinedCrawl Multiple Pages
多页面爬取示例
python
import os
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))python
import os
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))Start crawl
开始爬取
crawl_result = app.crawl_url(
url="https://docs.example.com",
params={
"limit": 100,
"scrapeOptions": {
"formats": ["markdown"]
}
},
poll_interval=5 # Check status every 5 seconds
)
crawl_result = app.crawl_url(
url="https://docs.example.com",
params={
"limit": 100,
"scrapeOptions": {
"formats": ["markdown"]
}
},
poll_interval=5 # 每5秒检查一次状态
)
Process results
处理结果
for page in crawl_result.get("data", []):
url = page.get("url")
markdown = page.get("markdown")
print(f"Scraped: {url}")
undefinedfor page in crawl_result.get("data", []):
url = page.get("url")
markdown = page.get("markdown")
print(f"已抓取: {url}")
undefinedExtract Structured Data
结构化数据提取示例
python
import os
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))python
import os
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))Define schema
定义Schema
schema = {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"product_price": {"type": "number"},
"availability": {"type": "string"}
},
"required": ["company_name", "product_price"]
}
schema = {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"product_price": {"type": "number"},
"availability": {"type": "string"}
},
"required": ["company_name", "product_price"]
}
Extract data
提取数据
result = app.extract(
urls=["https://example.com/product"],
params={
"schema": schema,
"systemPrompt": "Extract product information from the page"
}
)
print(result)
---result = app.extract(
urls=["https://example.com/product"],
params={
"schema": schema,
"systemPrompt": "从页面中提取产品信息"
}
)
print(result)
---TypeScript/Node.js SDK Usage
TypeScript/Node.js SDK 使用指南
Installation
安装
bash
npm install @mendable/firecrawl-jsbash
npm install @mendable/firecrawl-jsor
或
pnpm add @mendable/firecrawl-js
pnpm add @mendable/firecrawl-js
or use the unscoped package:
或使用非作用域包:
npm install firecrawl
**Latest Version**: `@mendable/firecrawl-js v4.4.1+` (or `firecrawl v4.4.1+`)npm install firecrawl
**最新版本**: `@mendable/firecrawl-js v4.4.1+`(或 `firecrawl v4.4.1+`)Basic Scrape
基础抓取示例
typescript
import FirecrawlApp from '@mendable/firecrawl-js';
// Initialize client
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// Scrape a single page
const result = await app.scrapeUrl('https://example.com/article', {
formats: ['markdown', 'html'],
onlyMainContent: true
});
// Access markdown content
const markdown = result.markdown;
console.log(markdown);typescript
import FirecrawlApp from '@mendable/firecrawl-js';
// 初始化客户端
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// 抓取单个页面
const result = await app.scrapeUrl('https://example.com/article', {
formats: ['markdown', 'html'],
onlyMainContent: true
});
// 获取Markdown内容
const markdown = result.markdown;
console.log(markdown);Crawl Multiple Pages
多页面爬取示例
typescript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// Start crawl
const crawlResult = await app.crawlUrl('https://docs.example.com', {
limit: 100,
scrapeOptions: {
formats: ['markdown']
}
});
// Process results
for (const page of crawlResult.data) {
console.log(`Scraped: ${page.url}`);
console.log(page.markdown);
}typescript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// 开始爬取
const crawlResult = await app.crawlUrl('https://docs.example.com', {
limit: 100,
scrapeOptions: {
formats: ['markdown']
}
});
// 处理结果
for (const page of crawlResult.data) {
console.log(`已抓取: ${page.url}`);
console.log(page.markdown);
}Extract Structured Data with Zod
使用Zod提取结构化数据
typescript
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// Define schema with Zod
const schema = z.object({
company_name: z.string(),
product_price: z.number(),
availability: z.string()
});
// Extract data
const result = await app.extract({
urls: ['https://example.com/product'],
schema: schema,
systemPrompt: 'Extract product information from the page'
});
console.log(result);typescript
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// 使用Zod定义Schema
const schema = z.object({
company_name: z.string(),
product_price: z.number(),
availability: z.string()
});
// 提取数据
const result = await app.extract({
urls: ['https://example.com/product'],
schema: schema,
systemPrompt: '从页面中提取产品信息'
});
console.log(result);Common Use Cases
常见使用场景
1. Documentation Scraping
1. 文档站点抓取
Scenario: Convert entire documentation site to markdown for RAG/chatbot
python
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
docs = app.crawl_url(
url="https://docs.myapi.com",
params={
"limit": 500,
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": True
},
"allowedDomains": ["docs.myapi.com"]
}
)场景: 将整个文档站点转换为Markdown,用于构建RAG/聊天机器人
python
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
docs = app.crawl_url(
url="https://docs.myapi.com",
params={
"limit": 500,
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": True
},
"allowedDomains": ["docs.myapi.com"]
}
)Save to files
保存到文件
for page in docs.get("data", []):
filename = page["url"].replace("https://", "").replace("/", "_") + ".md"
with open(f"docs/{filename}", "w") as f:
f.write(page["markdown"])
undefinedfor page in docs.get("data", []):
filename = page["url"].replace("https://", "").replace("/", "_") + ".md"
with open(f"docs/{filename}", "w") as f:
f.write(page["markdown"])
undefined2. Product Data Extraction
2. 产品数据提取
Scenario: Extract structured product data for e-commerce
typescript
const schema = z.object({
title: z.string(),
price: z.number(),
description: z.string(),
images: z.array(z.string()),
in_stock: z.boolean()
});
const products = await app.extract({
urls: productUrls,
schema: schema,
systemPrompt: 'Extract all product details including price and availability'
});场景: 为电商平台提取结构化产品数据
typescript
const schema = z.object({
title: z.string(),
price: z.number(),
description: z.string(),
images: z.array(z.string()),
in_stock: z.boolean()
});
const products = await app.extract({
urls: productUrls,
schema: schema,
systemPrompt: '提取所有产品详情,包括价格和库存状态'
});3. News Article Scraping
3. 新闻文章抓取
Scenario: Extract clean article content without ads/navigation
python
article = app.scrape_url(
url="https://news.com/article",
params={
"formats": ["markdown"],
"onlyMainContent": True,
"removeBase64Images": True
}
)场景: 提取干净的文章内容,去除广告和导航
python
article = app.scrape_url(
url="https://news.com/article",
params={
"formats": ["markdown"],
"onlyMainContent": True,
"removeBase64Images": True
}
)Get clean markdown
获取干净的Markdown内容
content = article.get("markdown")
---content = article.get("markdown")
---Error Handling
错误处理
Python
Python
python
from firecrawl import FirecrawlApp
from firecrawl.exceptions import FirecrawlException
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
try:
result = app.scrape_url("https://example.com")
except FirecrawlException as e:
print(f"Firecrawl error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")python
from firecrawl import FirecrawlApp
from firecrawl.exceptions import FirecrawlException
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
try:
result = app.scrape_url("https://example.com")
except FirecrawlException as e:
print(f"Firecrawl错误: {e}")
except Exception as e:
print(f"意外错误: {e}")TypeScript
TypeScript
typescript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
try {
const result = await app.scrapeUrl('https://example.com');
} catch (error) {
if (error.response) {
// API error
console.error('API Error:', error.response.data);
} else {
// Network or other error
console.error('Error:', error.message);
}
}typescript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
try {
const result = await app.scrapeUrl('https://example.com');
} catch (error) {
if (error.response) {
// API错误
console.error('API错误:', error.response.data);
} else {
// 网络或其他错误
console.error('错误:', error.message);
}
}Rate Limits & Best Practices
速率限制与最佳实践
Rate Limits
速率限制
- Free tier: 500 credits/month
- Paid tiers: Higher limits based on plan
- Credits consumed vary by endpoint and options
- 免费版: 每月500积分
- 付费版: 根据套餐提供更高限额
- 积分消耗根据端点和参数选项有所不同
Best Practices
最佳实践
- Use to reduce credits and get cleaner data
onlyMainContent: true - Set reasonable limits on crawls to avoid excessive costs
- Handle retries with exponential backoff for transient errors
- Cache results locally to avoid re-scraping same content
- Use endpoint first to plan crawling strategy
map - Batch extract calls when processing multiple URLs
- Monitor credit usage in dashboard
- 启用以减少积分消耗并获取更干净的数据
onlyMainContent: true - 为爬取设置合理的限制 以避免过高成本
- 处理重试 对临时错误使用指数退避策略
- 本地缓存结果 避免重复抓取相同内容
- 先使用端点 规划爬取策略
map - 批量提取调用 处理多个URL时
- 在控制台监控积分使用情况
Cloudflare Workers Integration
Cloudflare Workers 集成
⚠️ Important: SDK Compatibility
⚠️ 重要提示:SDK兼容性
The Firecrawl SDK cannot run in Cloudflare Workers due to Node.js dependencies (specifically which uses Node.js module). Workers require Web Standard APIs.
axioshttp✅ Use the direct REST API with instead (see example below).
fetchAlternative: Self-host with workers-firecrawl - a Workers-native implementation (requires Workers Paid Plan, only implements endpoint).
/searchFirecrawl SDK无法在Cloudflare Workers中运行,因为它依赖Node.js模块(特别是使用Node.js 模块的)。Workers要求使用Web标准API。
httpaxios✅ 替代方案:直接使用REST API结合(见下方示例)。
fetch另一种选择: 使用workers-firecrawl自托管 - 一个Workers原生实现(需要Workers付费套餐,仅支持端点)。
/searchWorkers Example: Direct REST API
Workers示例:直接调用REST API
This example uses the API to call Firecrawl directly - works perfectly in Cloudflare Workers:
fetchtypescript
interface Env {
FIRECRAWL_API_KEY: string;
SCRAPED_CACHE?: KVNamespace; // Optional: for caching results
}
interface FirecrawlScrapeResponse {
success: boolean;
data: {
markdown?: string;
html?: string;
metadata: {
title?: string;
description?: string;
language?: string;
sourceURL: string;
};
};
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
if (request.method !== 'POST') {
return Response.json({ error: 'Method not allowed' }, { status: 405 });
}
try {
const { url } = await request.json<{ url: string }>();
if (!url) {
return Response.json({ error: 'URL is required' }, { status: 400 });
}
// Check cache (optional)
if (env.SCRAPED_CACHE) {
const cached = await env.SCRAPED_CACHE.get(url, 'json');
if (cached) {
return Response.json({ cached: true, data: cached });
}
}
// Call Firecrawl API directly using fetch
const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: url,
formats: ['markdown'],
onlyMainContent: true,
removeBase64Images: true
})
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(`Firecrawl API error (${response.status}): ${errorText}`);
}
const result = await response.json<FirecrawlScrapeResponse>();
// Cache for 1 hour (optional)
if (env.SCRAPED_CACHE && result.success) {
await env.SCRAPED_CACHE.put(
url,
JSON.stringify(result.data),
{ expirationTtl: 3600 }
);
}
return Response.json({
cached: false,
data: result.data
});
} catch (error) {
console.error('Scraping error:', error);
return Response.json(
{ error: error instanceof Error ? error.message : 'Unknown error' },
{ status: 500 }
);
}
}
};Environment Setup: Add in Wrangler secrets:
FIRECRAWL_API_KEYbash
npx wrangler secret put FIRECRAWL_API_KEYOptional KV Binding (for caching - add to ):
wrangler.jsoncjsonc
{
"kv_namespaces": [
{
"binding": "SCRAPED_CACHE",
"id": "your-kv-namespace-id"
}
]
}See for a complete production-ready example.
templates/firecrawl-worker-fetch.ts以下示例使用 API直接调用Firecrawl - 可在Cloudflare Workers中完美运行:
fetchtypescript
interface Env {
FIRECRAWL_API_KEY: string;
SCRAPED_CACHE?: KVNamespace; // 可选:用于缓存结果
}
interface FirecrawlScrapeResponse {
success: boolean;
data: {
markdown?: string;
html?: string;
metadata: {
title?: string;
description?: string;
language?: string;
sourceURL: string;
};
};
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
if (request.method !== 'POST') {
return Response.json({ error: '方法不允许' }, { status: 405 });
}
try {
const { url } = await request.json<{ url: string }>();
if (!url) {
return Response.json({ error: 'URL为必填项' }, { status: 400 });
}
// 检查缓存(可选)
if (env.SCRAPED_CACHE) {
const cached = await env.SCRAPED_CACHE.get(url, 'json');
if (cached) {
return Response.json({ cached: true, data: cached });
}
}
// 使用fetch直接调用Firecrawl API
const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: url,
formats: ['markdown'],
onlyMainContent: true,
removeBase64Images: true
})
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(`Firecrawl API错误 (${response.status}): ${errorText}`);
}
const result = await response.json<FirecrawlScrapeResponse>();
// 缓存1小时(可选)
if (env.SCRAPED_CACHE && result.success) {
await env.SCRAPED_CACHE.put(
url,
JSON.stringify(result.data),
{ expirationTtl: 3600 }
);
}
return Response.json({
cached: false,
data: result.data
});
} catch (error) {
console.error('抓取错误:', error);
return Response.json(
{ error: error instanceof Error ? error.message : '未知错误' },
{ status: 500 }
);
}
}
};环境配置: 在Wrangler密钥中添加:
FIRECRAWL_API_KEYbash
npx wrangler secret put FIRECRAWL_API_KEY可选KV绑定(用于缓存 - 添加到):
wrangler.jsoncjsonc
{
"kv_namespaces": [
{
"binding": "SCRAPED_CACHE",
"id": "your-kv-namespace-id"
}
]
}完整的生产就绪示例请查看。
templates/firecrawl-worker-fetch.tsWhen to Use This Skill
何时使用该工具
✅ Use Firecrawl when:
- Scraping modern websites with JavaScript
- Need clean markdown output for LLMs
- Building RAG systems from web content
- Extracting structured data at scale
- Dealing with bot protection
- Need reliable, production-ready scraping
❌ Don't use Firecrawl when:
- Scraping simple static HTML (use cheerio/beautifulsoup)
- Have existing Puppeteer/Playwright setup working well
- Working with APIs (use direct API calls instead)
- Budget constraints (free tier has limits)
✅ 推荐使用Firecrawl的场景:
- 抓取包含JavaScript的现代网站
- 需要为LLM提供干净的Markdown输出
- 从网页内容构建RAG系统
- 大规模提取结构化数据
- 应对反机器人防护
- 需要可靠的、可投入生产的抓取方案
❌ 不推荐使用Firecrawl的场景:
- 抓取简单的静态HTML(使用cheerio/beautifulsoup即可)
- 已有稳定运行的Puppeteer/Playwright环境
- 直接调用API即可获取数据(无需抓取)
- 预算有限(免费版有额度限制)
Common Issues & Solutions
常见问题与解决方案
Issue: "Invalid API Key"
问题:"Invalid API Key"
Cause: API key not set or incorrect
Fix:
bash
undefined原因: API密钥未设置或不正确
解决方法:
bash
undefinedCheck env variable is set
检查环境变量是否已设置
echo $FIRECRAWL_API_KEY
echo $FIRECRAWL_API_KEY
Verify key format (should start with fc-)
验证密钥格式(应以fc-开头)
undefinedundefinedIssue: "Rate limit exceeded"
问题:"Rate limit exceeded"
Cause: Exceeded monthly credits
Fix:
- Check usage in dashboard
- Upgrade plan or wait for reset
- Use to reduce credits
onlyMainContent: true
原因: 超出每月积分限额
解决方法:
- 在控制台查看使用情况
- 升级套餐或等待额度重置
- 启用以减少积分消耗
onlyMainContent: true
Issue: "Timeout error"
问题:"Timeout error"
Cause: Page takes too long to load
Fix:
python
result = app.scrape_url(url, params={"waitFor": 10000}) # Wait 10s原因: 页面加载时间过长
解决方法:
python
result = app.scrape_url(url, params={"waitFor": 10000}) # 等待10秒Issue: "Content is empty"
问题:"Content is empty"
Cause: Content loaded via JavaScript after initial render
Fix:
python
result = app.scrape_url(url, params={
"waitFor": 5000,
"actions": [{"type": "wait", "milliseconds": 3000}]
})原因: 内容在初始渲染后通过JavaScript加载
解决方法:
python
result = app.scrape_url(url, params={
"waitFor": 5000,
"actions": [{"type": "wait", "milliseconds": 3000}]
})Advanced Features
高级功能
Browser Actions
浏览器操作
Perform interactions before scraping:
python
result = app.scrape_url(
url="https://example.com",
params={
"actions": [
{"type": "click", "selector": "button.load-more"},
{"type": "wait", "milliseconds": 2000},
{"type": "scroll", "direction": "down"}
]
}
)在抓取前执行交互操作:
python
result = app.scrape_url(
url="https://example.com",
params={
"actions": [
{"type": "click", "selector": "button.load-more"},
{"type": "wait", "milliseconds": 2000},
{"type": "scroll", "direction": "down"}
]
}
)Custom Headers
自定义请求头
python
result = app.scrape_url(
url="https://example.com",
params={
"headers": {
"User-Agent": "Custom Bot 1.0",
"Accept-Language": "en-US"
}
}
)python
result = app.scrape_url(
url="https://example.com",
params={
"headers": {
"User-Agent": "Custom Bot 1.0",
"Accept-Language": "en-US"
}
}
)Webhooks for Long Crawls
长时爬取的Webhook
Instead of polling, receive results via webhook:
python
crawl = app.crawl_url(
url="https://docs.example.com",
params={
"limit": 1000,
"webhook": "https://your-domain.com/webhook"
}
)替代轮询,通过Webhook接收结果:
python
crawl = app.crawl_url(
url="https://docs.example.com",
params={
"limit": 1000,
"webhook": "https://your-domain.com/webhook"
}
)Package Versions
包版本
| Package | Version | Last Checked |
|---|---|---|
| firecrawl-py | 4.5.0+ | 2025-10-20 |
| @mendable/firecrawl-js (or firecrawl) | 4.4.1+ | 2025-10-24 |
| API Version | v2 | Current |
Note: The Node.js SDK requires Node.js >=22.0.0 and cannot run in Cloudflare Workers. Use direct REST API calls in Workers (see Cloudflare Workers Integration section).
| 包 | 版本 | 最后检查日期 |
|---|---|---|
| firecrawl-py | 4.5.0+ | 2025-10-20 |
| @mendable/firecrawl-js (或 firecrawl) | 4.4.1+ | 2025-10-24 |
| API版本 | v2 | 当前 |
注意: Node.js SDK需要Node.js >=22.0.0,且无法在Cloudflare Workers中运行。在Workers中请使用直接REST API调用(见Cloudflare Workers集成部分)。
Official Documentation
官方文档
- Docs: https://docs.firecrawl.dev
- Python SDK: https://docs.firecrawl.dev/sdks/python
- Node.js SDK: https://docs.firecrawl.dev/sdks/node
- API Reference: https://docs.firecrawl.dev/api-reference
- GitHub: https://github.com/mendableai/firecrawl
- Dashboard: https://www.firecrawl.dev/app
- 文档: https://docs.firecrawl.dev
- Python SDK: https://docs.firecrawl.dev/sdks/python
- Node.js SDK: https://docs.firecrawl.dev/sdks/node
- API参考: https://docs.firecrawl.dev/api-reference
- GitHub: https://github.com/mendableai/firecrawl
- 控制台: https://www.firecrawl.dev/app
Next Steps After Using This Skill
使用该工具后的后续步骤
- Store scraped data: Use Cloudflare D1, R2, or KV to persist results
- Build RAG system: Combine with Vectorize for semantic search
- Add scheduling: Use Cloudflare Queues for recurring scrapes
- Process content: Use Workers AI to analyze scraped data
Token Savings: ~60% vs manual integration
Error Prevention: API authentication, rate limiting, format handling
Production Ready: ✅
- 存储抓取数据: 使用Cloudflare D1、R2或KV持久化结果
- 构建RAG系统: 结合Vectorize实现语义搜索
- 添加调度: 使用Cloudflare Queues实现定期抓取
- 处理内容: 使用Workers AI分析抓取的数据
Token节省: 相比手动集成节省约60%
错误预防: API身份验证、速率限制、格式处理
生产就绪: ✅