media-ingest

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Media Ingest Skill

Ingest video, audio, PDF, book, screenshot, and GitHub repo content into the brain.

Filing rule: Read
skills/_brain-filing-rules.md
before creating any new page.

将视频、音频、PDF、书籍、截图及GitHub仓库内容导入至知识库（brain）。

归档规则： 创建任何新页面前，请阅读
skills/_brain-filing-rules.md
。

Contract

协议

This skill guarantees:

Every ingested media item has a brain page with analysis (not just a transcript dump)
Transcripts (video/audio) saved in raw and human-readable formats
Entity extraction: every person and company mentioned gets back-linked
Raw source files preserved via
```
gbrain files upload-raw
```
Filing by primary subject, not by media format

Convention: See
skills/conventions/quality.md
for Iron Law back-linking.

Every mention of a person or company with a brain page MUST create a back-link.

本技能保证：

每一个被导入的媒体项都对应一个带有分析内容的知识库页面（而非仅转录内容的转储）
视频/音频的转录内容以原始格式和易读格式保存
实体提取：提及的每一个人物和公司都会添加反向链接
通过
```
gbrain files upload-raw
```
保留原始源文件
按主题分类归档，而非按媒体格式

约定： 关于反向链接的铁律，请参阅
skills/conventions/quality.md
。

所有提及的、拥有知识库页面的人物或公司，必须创建反向链接。

Phases

阶段

Phase 1: Identify format and fetch

阶段1：识别格式并获取内容

Format	Action
YouTube/video URL	Fetch transcript (Whisper, transcription service, or captions)
Audio file	Transcribe with available STT service
PDF	Extract text (OCR if needed)
Book PDF	Extract text, identify chapters/sections
Screenshot/image	OCR via vision model, extract text and entities
GitHub repo	Clone, read README + key files, summarize architecture

格式	操作
YouTube/视频URL	获取转录内容（使用Whisper、转录服务或字幕）
音频文件	通过可用的STT（语音转文字）服务进行转录
PDF	提取文本（必要时使用OCR）
书籍PDF	提取文本，识别章节/小节
截图/图像	通过视觉模型进行OCR，提取文本和实体
GitHub仓库	克隆仓库，读取README及关键文件，总结架构

Phase 2: Upload raw source

阶段2：上传原始源文件

Save the original file for provenance:

gbrain files upload-raw <file> --page <slug>

保存原始文件以确保来源可追溯：

gbrain files upload-raw <file> --page <slug>

Phase 3: Create brain page

阶段3：创建知识库页面

File by primary subject (not format). Use this template:

markdown

undefined

按主题分类归档（而非格式）。使用以下模板：

markdown

undefined

{Title}

Source: {URL or file path} Format: {video/audio/PDF/book/screenshot/repo} Created: {date}

Summary

{Key points, not a transcript dump}

Key Segments / Highlights

{For video/audio: timestamped highlights. For books: chapter summaries.}

People Mentioned

{List with links to brain pages}

Companies Mentioned

{List with links to brain pages}

undefined

{List with links to brain pages}

undefined

Phase 4: Entity extraction and propagation

阶段4：实体提取与传播

For every person and company mentioned:

Check brain for existing page
Create/enrich if needed (delegate to enrich skill)
Add back-link from entity page to this media page
Add timeline entry on entity page

A media item is NOT fully ingested until entity propagation is complete.

针对每一个提及的人物和公司：

检查知识库中是否存在对应页面
如有需要则创建/丰富页面（委托给enrich skill）
从实体页面添加反向链接至本媒体页面
在实体页面添加时间线条目

只有完成实体传播后，媒体项才算完全导入完成。

Phase 5: Sync

阶段5：同步

gbrain sync

to update the index.

执行

gbrain sync

以更新索引。

Output Format

输出格式

Brain page created with summary, highlights, and entity cross-links. Report to user: "Ingested {title}: {N} entities detected, {N} pages updated."

创建包含摘要、重点内容及实体交叉链接的知识库页面。向用户反馈： "已导入{title}：检测到{N}个实体，更新了{N}个页面。"

Anti-Patterns

反模式

Dumping raw transcripts without analysis
Skipping entity extraction ("I'll do that separately")
Filing raw ingest by format (all videos in
```
media/videos/
```
) instead of by subject. Note: format-prefixed paths under
```
media/<format>/<slug>
```
ARE sanctioned for synthesized one-of-one output like book-mirror's
```
media/books/<slug>-personalized.md
```
. The anti-pattern is for raw ingest, not for sui generis synthesis. See
```
skills/_brain-filing-rules.md
```
"Sanctioned exception: synthesis output is sui generis."
Not preserving raw source files
Creating stub pages without meaningful content

仅转储原始转录内容而不进行分析
跳过实体提取（“我之后再处理”）
按格式归档原始导入内容（例如所有视频放在
```
media/videos/
```
）而非按主题。注意：在
```
media/<format>/<slug>
```
下的格式前缀路径，仅适用于合成的独一无二输出，例如book-mirror的
```
media/books/<slug>-personalized.md
```
。此反模式针对的是原始导入内容，而非独特的合成内容。请参阅
```
skills/_brain-filing-rules.md
```
中的“认可例外：合成输出为独特内容”。
未保留原始源文件
创建无实际意义内容的占位页面