firecrawl-knowledge-base
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFirecrawl Knowledge Base
Firecrawl知识库
Use this to turn URLs or topics into organized LLM-ready content.
使用此工具将URL或主题转换为结构化的LLM就绪内容。
Onboarding Interview
引导访谈
Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately.
Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested.
从上下文推断来源、目标、深度和输出位置。如果来源和目标明确,立即开始执行。
仅在遇到阻碍时提出最多1-3个简洁问题,例如来源URL/主题、输出类型是参考文档/RAG/训练数据/文档镜像,或者如果要求生成训练数据,询问训练格式。
Firecrawl Collection Plan
Firecrawl采集方案
Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables.
For files, follow the Firecrawl download-style convention:
text
.firecrawl/
<hostname>/
<path>/
index.md针对文档站点使用Firecrawl地图,搜索基于主题的语料库,将页面抓取为markdown格式,并保留代码示例和表格。
对于文件,遵循Firecrawl下载式命名规范:
text
.firecrawl/
<hostname>/
<path>/
index.mdParallel Work
并行工作
If appropriate, use sub-agents or equivalent parallel task runners:
- one docs section per researcher
- official docs, tutorials, community discussions, and references by source type
- source scraping vs chunk generation vs manifest generation
如果合适,使用子Agent或等效的并行任务运行器:
- 每个研究员负责一个文档板块
- 按来源类型区分官方文档、教程、社区讨论和参考资料
- 来源抓取、块生成和清单生成并行处理
Output Modes
输出模式
- Reference: markdown files, , and
index.md.sources.json - RAG: markdown files plus chunk files and .
manifest.json - Training: scraped source files plus and
training-data.jsonl.training-metadata.json - Docs mirror: complete markdown mirror with a table of contents.
- 参考模式:markdown文件、和
index.md。sources.json - RAG模式:markdown文件加上块文件和。
manifest.json - 训练模式:抓取的源文件加上和
training-data.jsonl。training-metadata.json - 文档镜像:带目录的完整markdown镜像。
Final Deliverable
最终交付物
markdown
undefinedmarkdown
undefinedKnowledge Base: [Source]
知识库:[来源]
Summary
摘要
[What was collected and why]
[采集内容及原因]
Output Structure
输出结构
[Files/directories created]
[创建的文件/目录]
Coverage
覆盖范围
[Sections, source types, counts]
[板块、来源类型、数量]
Usage Notes
使用说明
[How to use in RAG, docs, training, or agent context]
[如何在RAG、文档、训练或Agent场景中使用]
Sources
来源
[URLs collected]
[采集的URL]
Rerun Inputs
重新运行参数
workflow: firecrawl-knowledge-base
source: [url/topic]
goal: [reference/rag/train/docs]
depth: [quick/thorough/exhaustive]
output_dir: [.firecrawl/]
undefinedworkflow: firecrawl-knowledge-base
source: [url/主题]
goal: [reference/rag/train/docs]
depth: [quick/thorough/exhaustive]
output_dir: [.firecrawl/]
undefinedQuality Bar
质量标准
- Preserve code examples and formatting.
- Remove boilerplate navigation where possible.
- Include source URLs in frontmatter or metadata.
- 保留代码示例和格式。
- 尽可能移除冗余导航内容。
- 在前置元数据或元数据中包含来源URL。