firecrawl-knowledge-ingest
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFirecrawl Knowledge Ingest
Firecrawl 知识库摄取
Use this when a docs portal needs browser navigation, auth, pagination, or JS rendering.
当文档门户需要浏览器导航、认证、分页或JS渲染时,请使用此方案。
Onboarding Interview
初始对接访谈
Infer the portal URL, output format, auth needs, and page limit from context. If the portal is clear, proceed immediately.
Ask at most 1-3 concise questions only if blocked, such as the portal URL, whether authentication is required, or the desired output format.
从上下文推断门户URL、输出格式、认证需求和页面限制。若门户信息明确,可直接开展工作。
仅在受阻时提出最多1-3个简洁问题,例如门户URL、是否需要认证或期望的输出格式。
Firecrawl Collection Plan
Firecrawl 采集方案
Use Firecrawl browser to:
- open the portal and inspect navigation
- identify sections, categories, sidebar links, and article URLs
- follow sidebar navigation, next links, pagination, load-more controls, or search
- scrape article content as markdown
- extract metadata such as title, section, last updated date, author, and tags
Try Firecrawl map as a supplement for public URLs, but use browser navigation for auth-gated or JS-heavy content.
使用Firecrawl浏览器:
- 打开门户并检查导航结构
- 识别板块、分类、侧边栏链接和文章URL
- 跟随侧边栏导航、下一页链接、分页控件、加载更多按钮或搜索功能
- 将文章内容抓取为markdown格式
- 提取元数据,如标题、板块、最后更新日期、作者和标签
对于公开URL,可尝试使用Firecrawl map作为补充,但针对登录受限或JS密集型内容,请使用浏览器导航。
Final Deliverable
最终交付物
markdown
undefinedmarkdown
undefinedKnowledge Ingest: [Portal]
知识库摄取: [门户名称]
Summary
摘要
[Pages extracted, sections covered, limitations]
[提取的页面数量、覆盖的板块、限制条件]
Output
输出
[JSON/markdown/merged file path or content]
[JSON/markdown/合并文件路径或内容]
Sections
板块
[Section names and article counts]
[板块名称及文章数量]
Failed Or Restricted Pages
失败或受限页面
[Any access/loading issues]
[任何访问/加载问题]
Sources
来源
[URLs extracted]
[提取的URL列表]
Rerun Inputs
重跑输入参数
workflow: firecrawl-knowledge-ingest
url: [portal url]
format: [json/markdown/merged]
max_pages: [number]
undefinedworkflow: firecrawl-knowledge-ingest
url: [门户URL]
format: [json/markdown/merged]
max_pages: [数字]
undefinedJSON Shape
JSON 结构
Use , , , , and with article , , , , and .
sourceurlextractedAttotalArticlessections[]titleurlsectioncontentmetadata使用、、、字段,以及包含文章、、、和的数组。
sourceurlextractedAttotalArticlestitleurlsectioncontentmetadatasections[]Quality Bar
质量标准
- Preserve code examples, tables, and formatting.
- Strip nav chrome, headers, and footers.
- Track extraction progress and page failures.
- Respect authentication boundaries.
- 保留代码示例、表格和格式。
- 移除导航栏、页眉和页脚。
- 跟踪提取进度和页面失败情况。
- 遵守认证边界。