firecrawl-knowledge-base

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Firecrawl Knowledge Base

Firecrawl知识库

Use this to turn URLs or topics into organized LLM-ready content.
使用此工具将URL或主题转换为结构化的LLM就绪内容。

Onboarding Interview

引导访谈

Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately.
Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested.
从上下文推断来源、目标、深度和输出位置。如果来源和目标明确,立即开始执行。
仅在遇到阻碍时提出最多1-3个简洁问题,例如来源URL/主题、输出类型是参考文档/RAG/训练数据/文档镜像,或者如果要求生成训练数据,询问训练格式。

Firecrawl Collection Plan

Firecrawl采集方案

Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables.
For files, follow the Firecrawl download-style convention:
text
.firecrawl/
  <hostname>/
    <path>/
      index.md
针对文档站点使用Firecrawl地图,搜索基于主题的语料库,将页面抓取为markdown格式,并保留代码示例和表格。
对于文件,遵循Firecrawl下载式命名规范:
text
.firecrawl/
  <hostname>/
    <path>/
      index.md

Parallel Work

并行工作

If appropriate, use sub-agents or equivalent parallel task runners:
  • one docs section per researcher
  • official docs, tutorials, community discussions, and references by source type
  • source scraping vs chunk generation vs manifest generation
如果合适,使用子Agent或等效的并行任务运行器:
  • 每个研究员负责一个文档板块
  • 按来源类型区分官方文档、教程、社区讨论和参考资料
  • 来源抓取、块生成和清单生成并行处理

Output Modes

输出模式

  • Reference: markdown files,
    index.md
    , and
    sources.json
    .
  • RAG: markdown files plus chunk files and
    manifest.json
    .
  • Training: scraped source files plus
    training-data.jsonl
    and
    training-metadata.json
    .
  • Docs mirror: complete markdown mirror with a table of contents.
  • 参考模式:markdown文件、
    index.md
    sources.json
  • RAG模式:markdown文件加上块文件和
    manifest.json
  • 训练模式:抓取的源文件加上
    training-data.jsonl
    training-metadata.json
  • 文档镜像:带目录的完整markdown镜像。

Final Deliverable

最终交付物

markdown
undefined
markdown
undefined

Knowledge Base: [Source]

知识库:[来源]

Summary

摘要

[What was collected and why]
[采集内容及原因]

Output Structure

输出结构

[Files/directories created]
[创建的文件/目录]

Coverage

覆盖范围

[Sections, source types, counts]
[板块、来源类型、数量]

Usage Notes

使用说明

[How to use in RAG, docs, training, or agent context]
[如何在RAG、文档、训练或Agent场景中使用]

Sources

来源

[URLs collected]
[采集的URL]

Rerun Inputs

重新运行参数

workflow: firecrawl-knowledge-base source: [url/topic] goal: [reference/rag/train/docs] depth: [quick/thorough/exhaustive] output_dir: [.firecrawl/]
undefined
workflow: firecrawl-knowledge-base source: [url/主题] goal: [reference/rag/train/docs] depth: [quick/thorough/exhaustive] output_dir: [.firecrawl/]
undefined

Quality Bar

质量标准

  • Preserve code examples and formatting.
  • Remove boilerplate navigation where possible.
  • Include source URLs in frontmatter or metadata.
  • 保留代码示例和格式。
  • 尽可能移除冗余导航内容。
  • 在前置元数据或元数据中包含来源URL。