media-ingest
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMedia Ingest Skill
Media Ingest Skill
Ingest video, audio, PDF, book, screenshot, and GitHub repo content into the brain.
Filing rule: Readbefore creating any new page.skills/_brain-filing-rules.md
将视频、音频、PDF、书籍、截图及GitHub仓库内容导入至知识库(brain)。
归档规则: 创建任何新页面前,请阅读。skills/_brain-filing-rules.md
Contract
协议
This skill guarantees:
- Every ingested media item has a brain page with analysis (not just a transcript dump)
- Transcripts (video/audio) saved in raw and human-readable formats
- Entity extraction: every person and company mentioned gets back-linked
- Raw source files preserved via
gbrain files upload-raw - Filing by primary subject, not by media format
Convention: Seefor Iron Law back-linking.skills/conventions/quality.md
Every mention of a person or company with a brain page MUST create a back-link.
本技能保证:
- 每一个被导入的媒体项都对应一个带有分析内容的知识库页面(而非仅转录内容的转储)
- 视频/音频的转录内容以原始格式和易读格式保存
- 实体提取:提及的每一个人物和公司都会添加反向链接
- 通过保留原始源文件
gbrain files upload-raw - 按主题分类归档,而非按媒体格式
约定: 关于反向链接的铁律,请参阅。skills/conventions/quality.md
所有提及的、拥有知识库页面的人物或公司,必须创建反向链接。
Phases
阶段
Phase 1: Identify format and fetch
阶段1:识别格式并获取内容
| Format | Action |
|---|---|
| YouTube/video URL | Fetch transcript (Whisper, transcription service, or captions) |
| Audio file | Transcribe with available STT service |
| Extract text (OCR if needed) | |
| Book PDF | Extract text, identify chapters/sections |
| Screenshot/image | OCR via vision model, extract text and entities |
| GitHub repo | Clone, read README + key files, summarize architecture |
| 格式 | 操作 |
|---|---|
| YouTube/视频URL | 获取转录内容(使用Whisper、转录服务或字幕) |
| 音频文件 | 通过可用的STT(语音转文字)服务进行转录 |
| 提取文本(必要时使用OCR) | |
| 书籍PDF | 提取文本,识别章节/小节 |
| 截图/图像 | 通过视觉模型进行OCR,提取文本和实体 |
| GitHub仓库 | 克隆仓库,读取README及关键文件,总结架构 |
Phase 2: Upload raw source
阶段2:上传原始源文件
Save the original file for provenance:
gbrain files upload-raw <file> --page <slug>保存原始文件以确保来源可追溯:
gbrain files upload-raw <file> --page <slug>Phase 3: Create brain page
阶段3:创建知识库页面
File by primary subject (not format). Use this template:
markdown
undefined按主题分类归档(而非格式)。使用以下模板:
markdown
undefined{Title}
{Title}
Source: {URL or file path}
Format: {video/audio/PDF/book/screenshot/repo}
Created: {date}
Source: {URL or file path}
Format: {video/audio/PDF/book/screenshot/repo}
Created: {date}
Summary
Summary
{Key points, not a transcript dump}
{Key points, not a transcript dump}
Key Segments / Highlights
Key Segments / Highlights
{For video/audio: timestamped highlights. For books: chapter summaries.}
{For video/audio: timestamped highlights. For books: chapter summaries.}
People Mentioned
People Mentioned
{List with links to brain pages}
{List with links to brain pages}
Companies Mentioned
Companies Mentioned
{List with links to brain pages}
undefined{List with links to brain pages}
undefinedPhase 4: Entity extraction and propagation
阶段4:实体提取与传播
For every person and company mentioned:
- Check brain for existing page
- Create/enrich if needed (delegate to enrich skill)
- Add back-link from entity page to this media page
- Add timeline entry on entity page
A media item is NOT fully ingested until entity propagation is complete.
针对每一个提及的人物和公司:
- 检查知识库中是否存在对应页面
- 如有需要则创建/丰富页面(委托给enrich skill)
- 从实体页面添加反向链接至本媒体页面
- 在实体页面添加时间线条目
只有完成实体传播后,媒体项才算完全导入完成。
Phase 5: Sync
阶段5:同步
gbrain sync执行以更新索引。
gbrain syncOutput Format
输出格式
Brain page created with summary, highlights, and entity cross-links. Report to user:
"Ingested {title}: {N} entities detected, {N} pages updated."
创建包含摘要、重点内容及实体交叉链接的知识库页面。向用户反馈:
"已导入{title}:检测到{N}个实体,更新了{N}个页面。"
Anti-Patterns
反模式
- Dumping raw transcripts without analysis
- Skipping entity extraction ("I'll do that separately")
- Filing raw ingest by format (all videos in ) instead of by subject. Note: format-prefixed paths under
media/videos/ARE sanctioned for synthesized one-of-one output like book-mirror'smedia/<format>/<slug>. The anti-pattern is for raw ingest, not for sui generis synthesis. Seemedia/books/<slug>-personalized.md"Sanctioned exception: synthesis output is sui generis."skills/_brain-filing-rules.md - Not preserving raw source files
- Creating stub pages without meaningful content
- 仅转储原始转录内容而不进行分析
- 跳过实体提取(“我之后再处理”)
- 按格式归档原始导入内容(例如所有视频放在)而非按主题。注意:在
media/videos/下的格式前缀路径,仅适用于合成的独一无二输出,例如book-mirror的media/<format>/<slug>。此反模式针对的是原始导入内容,而非独特的合成内容。请参阅media/books/<slug>-personalized.md中的“认可例外:合成输出为独特内容”。skills/_brain-filing-rules.md - 未保留原始源文件
- 创建无实际意义内容的占位页面