media-ingest

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Media Ingest Skill

Media Ingest Skill

Ingest video, audio, PDF, book, screenshot, and GitHub repo content into the brain.
Filing rule: Read
skills/_brain-filing-rules.md
before creating any new page.
将视频、音频、PDF、书籍、截图及GitHub仓库内容导入至知识库(brain)。
归档规则: 创建任何新页面前,请阅读
skills/_brain-filing-rules.md

Contract

协议

This skill guarantees:
  • Every ingested media item has a brain page with analysis (not just a transcript dump)
  • Transcripts (video/audio) saved in raw and human-readable formats
  • Entity extraction: every person and company mentioned gets back-linked
  • Raw source files preserved via
    gbrain files upload-raw
  • Filing by primary subject, not by media format
Convention: See
skills/conventions/quality.md
for Iron Law back-linking.
Every mention of a person or company with a brain page MUST create a back-link.
本技能保证:
  • 每一个被导入的媒体项都对应一个带有分析内容的知识库页面(而非仅转录内容的转储)
  • 视频/音频的转录内容以原始格式和易读格式保存
  • 实体提取:提及的每一个人物和公司都会添加反向链接
  • 通过
    gbrain files upload-raw
    保留原始源文件
  • 按主题分类归档,而非按媒体格式
约定: 关于反向链接的铁律,请参阅
skills/conventions/quality.md
所有提及的、拥有知识库页面的人物或公司,必须创建反向链接。

Phases

阶段

Phase 1: Identify format and fetch

阶段1:识别格式并获取内容

FormatAction
YouTube/video URLFetch transcript (Whisper, transcription service, or captions)
Audio fileTranscribe with available STT service
PDFExtract text (OCR if needed)
Book PDFExtract text, identify chapters/sections
Screenshot/imageOCR via vision model, extract text and entities
GitHub repoClone, read README + key files, summarize architecture
格式操作
YouTube/视频URL获取转录内容(使用Whisper、转录服务或字幕)
音频文件通过可用的STT(语音转文字)服务进行转录
PDF提取文本(必要时使用OCR)
书籍PDF提取文本,识别章节/小节
截图/图像通过视觉模型进行OCR,提取文本和实体
GitHub仓库克隆仓库,读取README及关键文件,总结架构

Phase 2: Upload raw source

阶段2:上传原始源文件

Save the original file for provenance:
gbrain files upload-raw <file> --page <slug>
保存原始文件以确保来源可追溯:
gbrain files upload-raw <file> --page <slug>

Phase 3: Create brain page

阶段3:创建知识库页面

File by primary subject (not format). Use this template:
markdown
undefined
按主题分类归档(而非格式)。使用以下模板:
markdown
undefined

{Title}

{Title}

Source: {URL or file path} Format: {video/audio/PDF/book/screenshot/repo} Created: {date}
Source: {URL or file path} Format: {video/audio/PDF/book/screenshot/repo} Created: {date}

Summary

Summary

{Key points, not a transcript dump}
{Key points, not a transcript dump}

Key Segments / Highlights

Key Segments / Highlights

{For video/audio: timestamped highlights. For books: chapter summaries.}
{For video/audio: timestamped highlights. For books: chapter summaries.}

People Mentioned

People Mentioned

{List with links to brain pages}
{List with links to brain pages}

Companies Mentioned

Companies Mentioned

{List with links to brain pages}
undefined
{List with links to brain pages}
undefined

Phase 4: Entity extraction and propagation

阶段4:实体提取与传播

For every person and company mentioned:
  1. Check brain for existing page
  2. Create/enrich if needed (delegate to enrich skill)
  3. Add back-link from entity page to this media page
  4. Add timeline entry on entity page
A media item is NOT fully ingested until entity propagation is complete.
针对每一个提及的人物和公司:
  1. 检查知识库中是否存在对应页面
  2. 如有需要则创建/丰富页面(委托给enrich skill)
  3. 从实体页面添加反向链接至本媒体页面
  4. 在实体页面添加时间线条目
只有完成实体传播后,媒体项才算完全导入完成。

Phase 5: Sync

阶段5:同步

gbrain sync
to update the index.
执行
gbrain sync
以更新索引。

Output Format

输出格式

Brain page created with summary, highlights, and entity cross-links. Report to user: "Ingested {title}: {N} entities detected, {N} pages updated."
创建包含摘要、重点内容及实体交叉链接的知识库页面。向用户反馈: "已导入{title}:检测到{N}个实体,更新了{N}个页面。"

Anti-Patterns

反模式

  • Dumping raw transcripts without analysis
  • Skipping entity extraction ("I'll do that separately")
  • Filing raw ingest by format (all videos in
    media/videos/
    ) instead of by subject. Note: format-prefixed paths under
    media/<format>/<slug>
    ARE sanctioned for synthesized one-of-one output like book-mirror's
    media/books/<slug>-personalized.md
    . The anti-pattern is for raw ingest, not for sui generis synthesis. See
    skills/_brain-filing-rules.md
    "Sanctioned exception: synthesis output is sui generis."
  • Not preserving raw source files
  • Creating stub pages without meaningful content
  • 仅转储原始转录内容而不进行分析
  • 跳过实体提取(“我之后再处理”)
  • 按格式归档原始导入内容(例如所有视频放在
    media/videos/
    )而非按主题。注意:在
    media/<format>/<slug>
    下的格式前缀路径,仅适用于合成的独一无二输出,例如book-mirror的
    media/books/<slug>-personalized.md
    。此反模式针对的是原始导入内容,而非独特的合成内容。请参阅
    skills/_brain-filing-rules.md
    中的“认可例外:合成输出为独特内容”。
  • 未保留原始源文件
  • 创建无实际意义内容的占位页面