ingest

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Ingest Skill

Ingest meetings, articles, media, documents, and conversations into the brain.

Filing rule: Read
skills/_brain-filing-rules.md
before creating any new page.

将会议、文章、媒体、文档和对话摄入至大脑系统。

归档规则： 创建任何新页面前，请阅读
skills/_brain-filing-rules.md
。

Contract

约定

Every fact written to a brain page carries an inline
```
[Source: ...]
```
citation with date and provenance.
Every entity mention creates a back-link from the entity's page to the page mentioning them (Iron Law).
Raw sources are preserved for provenance via
```
gbrain files upload-raw
```
with automatic size routing.
State sections are rewritten with current best understanding, never appended to.
Entity detection fires on every inbound message; notable entities get pages or updates.

Convention: See
skills/conventions/quality.md
for Iron Law back-linking.

Every mention of a person or company with a brain page MUST create a back-link FROM that entity's page TO the page mentioning them. An unlinked mention is a broken brain. See

skills/_brain-filing-rules.md

for format.

写入大脑页面的每一条事实都需附带内嵌的
```
[Source: ...]
```
引用，包含日期和来源信息。
每提及一个实体，需创建从该实体页面指向提及页面的反向链接（铁则）。
通过
```
gbrain files upload-raw
```
保存原始来源以确保可溯源，该工具会自动根据文件大小进行路由。
状态部分需基于当前最佳认知重写，绝不允许追加内容。
每一条传入消息都会触发实体检测；值得关注的实体将获得专属页面或更新。

惯例： 关于反向链接的铁则，请参阅
skills/conventions/quality.md
。

每提及一个已有大脑页面的个人或公司，必须创建从该实体页面指向提及页面的反向链接。未链接的提及会导致大脑系统失效。格式请参阅

skills/_brain-filing-rules.md

。

Citation Requirements (MANDATORY)

引用要求（强制性）

Every fact written to a brain page must carry an inline

[Source: ...]

citation.

User's statements:
```
[Source: User, {context}, YYYY-MM-DD]
```
Meeting data:
```
[Source: Meeting "{title}", YYYY-MM-DD]
```

Email/message:

[Source: email from {name} re: {subject}, YYYY-MM-DD]

Web content:

[Source: {publication}, {URL}, YYYY-MM-DD]

Social media:
```
[Source: X/@handle, YYYY-MM-DD](URL)
```
(include link)
Synthesis:
```
[Source: compiled from {sources}]
```

写入大脑页面的每一条事实都必须附带内嵌的

[Source: ...]

引用。

用户陈述：
```
[Source: User, {context}, YYYY-MM-DD]
```
会议数据：
```
[Source: Meeting "{title}", YYYY-MM-DD]
```

电子邮件/消息：

[Source: email from {name} re: {subject}, YYYY-MM-DD]

网页内容：

[Source: {publication}, {URL}, YYYY-MM-DD]

社交媒体：
```
[Source: X/@handle, YYYY-MM-DD](URL)
```
（需包含链接）
综合内容：
```
[Source: compiled from {sources}]
```

Phases

阶段

Router note: This skill is a router. For specialized ingestion, see: idea-ingest, media-ingest, meeting-ingestion.

Parse the source. Extract people, companies, dates, and events from the input.
For each entity mentioned:
- Read the entity's page from gbrain to check if it exists
- If exists: update compiled_truth (rewrite State section with new info, don't append)
- If new: check notability gate, then store the page in gbrain with the appropriate type and slug
Append to timeline. Add a timeline entry in gbrain for each event, with date, summary, and source citation.
Create cross-reference links. Link entities in gbrain for every entity pair mentioned together, using the appropriate relationship type.
Back-link all entities. Update EVERY mentioned entity's page with a back-link to this page (Iron Law).
Timeline merge. The same event appears on ALL mentioned entities' timelines. If Alice met Bob at Acme Corp, the event goes on Alice's page, Bob's page, and Acme Corp's page.

路由说明： 本技能是一个路由工具。如需专门的摄入处理，请参阅：idea-ingest、media-ingest、meeting-ingestion。

解析来源。 从输入中提取人物、公司、日期和事件。
针对每个提及的实体：
- 从gbrain读取该实体的页面，检查其是否存在
- 若存在：更新compiled_truth（使用新信息重写状态部分，不追加）
- 若不存在：检查关注度门槛，然后将页面以合适的类型和slug存储至gbrain
追加至时间线。 在gbrain中为每个事件添加时间线条目，包含日期、摘要和来源引用。
创建交叉引用链接。 为每一对同时提及的实体在gbrain中创建链接，并使用合适的关系类型。
为所有实体添加反向链接。 更新每个提及实体的页面，添加指向本页面的反向链接（铁则）。
时间线合并。 同一事件会出现在所有提及实体的时间线上。例如，如果Alice在Acme Corp与Bob会面，该事件会出现在Alice的页面、Bob的页面以及Acme Corp的页面上。

Entity Detection on Every Message

每条消息都需进行实体检测

Production agents should detect entity mentions on EVERY inbound message. This is the signal detection loop that makes the brain compound over time.

生产环境中的Agent应对每条传入消息进行实体检测。这是让大脑系统随时间不断完善的信号检测循环。

Protocol

协议

Scan the message for entity mentions: people, companies, concepts, original thinking. Fire on every message (no exceptions unless purely operational).
For each entity detected:
- ```
gbrain search "name"
```
  -- does a page already exist?
- If yes: load context with
```
gbrain get <slug>
```
  . Use the compiled truth to inform your response. Update the page if the message contains new information.
- If no: assess notability (see
```
skills/_brain-filing-rules.md
```
  ). If the entity is worth tracking, create a new page with
```
gbrain put <type/slug>
```
  and populate with what you know.
After creating or updating pages: sync to gbrain:
bash
```
gbrain sync --no-pull --no-embed
```
Don't block the conversation. Entity detection and enrichment should happen alongside the response, not before it. The user shouldn't wait for brain writes to get an answer.

扫描消息以检测实体提及：人物、公司、概念、原创观点。每条消息都需触发检测（除非纯操作类消息，否则无例外）。
针对每个检测到的实体：
- ```
gbrain search "name"
```
  -- 是否已有对应页面？
- 若有： 使用
```
gbrain get <slug>
```
  加载上下文。利用compiled_truth辅助响应。如果消息包含新信息，则更新页面。
- 若无： 评估关注度（参阅
```
skills/_brain-filing-rules.md
```
  ）。如果该实体值得追踪，使用
```
gbrain put <type/slug>
```
  创建新页面，并填充已知信息。
创建或更新页面后： 同步至gbrain：
bash
```
gbrain sync --no-pull --no-embed
```
不要阻塞对话。 实体检测和增强应与响应并行进行，而非在响应之前。用户不应等待大脑系统写入完成才能获得答案。

What counts as notable

什么属于值得关注的实体

People the user interacts with or discusses (not random mentions)
Companies relevant to the user's work or interests
Concepts or frameworks the user references or creates
The user's own original thinking (ideas, theses, observations) -- highest value
See
```
skills/_brain-filing-rules.md
```
for the full notability gate

用户互动或讨论的人物（非随机提及）
与用户工作或兴趣相关的公司
用户引用或创建的概念或框架
用户自己的原创观点（想法、论点、观察）——最高价值
完整的关注度门槛请参阅
```
skills/_brain-filing-rules.md
```

What to capture from the user's own thinking

需捕捉的用户原创观点内容

Original thinking is the most valuable signal. Capture exact phrasing -- the user's language IS the insight. Don't paraphrase.

Novel observations or theses
Frameworks, mental models, heuristics
Connections between ideas that others miss
Contrarian positions with reasoning
Strong reactions to external stimuli (what triggered it and why)

原创观点是最有价值的信号。请捕捉确切措辞——用户的语言本身就是洞察。不要意译。

新颖的观察或论点
框架、思维模型、启发法
他人未发现的想法间关联
带有推理的逆向立场
对外部刺激的强烈反应（触发因素及原因）

Media Workflows

媒体工作流

Content the user encounters should be captured in the brain. File by PRIMARY SUBJECT, not by format (see

skills/_brain-filing-rules.md

用户接触到的内容应被捕捉至大脑系统。按核心主题归档，而非按格式（参阅

skills/_brain-filing-rules.md

）。

Articles & Web Content

文章与网页内容

Input: URL shared by user, or article mentioned in conversation.

Process:

Fetch content (
```
web_fetch
```
or equivalent)
Extract: title, author, publication, date, full text
Summarize: executive summary + key arguments (not a rehash)
Extract entities: people, companies, concepts mentioned
Save raw source for provenance (see Raw Source Preservation below)
Analyze for the user: don't just summarize. What's interesting given what you know about them? Flag connections, contradictions, content opportunities.

Write to: appropriate directory per filing rules (about a person ->

people/

, about a company ->

companies/

, reusable framework ->

concepts/

, raw data ->

sources/

)

输入： 用户分享的URL，或对话中提及的文章。

流程：

获取内容（
```
web_fetch
```
或等效工具）
提取：标题、作者、出版物、日期、全文
总结：执行摘要+核心论点（非重复内容）
提取实体：提及的人物、公司、概念
保存原始来源以确保可溯源（见下文原始来源保存）
为用户分析：不要仅做总结。结合对用户的了解，哪些内容是有趣的？标记关联、矛盾、内容机会。

写入位置： 根据归档规则存入对应目录（关于人物->

people/

，关于公司->

companies/

，可复用框架->

concepts/

，原始数据->

sources/

）

Videos & Podcasts

视频与播客

Input: URL (YouTube, podcast, etc.) or local audio/video file.

Process:

Get transcript -- speaker-diarized if possible (services like Diarize.io provide speaker-labeled, word-level timing)
Save raw transcript (both JSON and human-readable TXT)
Analyze: executive summary, key ideas, key quotes with speaker attribution, notable stories/anecdotes, people and companies mentioned
Extract and cross-reference all entities mentioned
HARD RULE: every video/podcast brain page MUST link to the raw diarized transcript. A page without transcript links is incomplete.

Write to:

media/videos/

media/podcasts/

with back-links to all entities.

Quality bar:

Compelling headline (not "This video discusses...")
Executive summary that makes you want to watch/listen
Key Ideas as actual insights, not topic labels
Verbatim quotes with real speaker names (not "speaker_0")
All entities extracted with context and back-linked

输入： URL（YouTube、播客等）或本地音视频文件。

流程：

获取转录文本——尽可能使用说话人分离的版本（如Diarize.io等服务可提供带说话人标签、精确到词的时序转录）
保存原始转录文本（JSON格式和人类可读的TXT格式）
分析：执行摘要、核心观点、带说话人归属的关键引语、值得关注的故事/轶事、提及的人物和公司
提取并交叉引用所有提及的实体
硬性规则： 每个视频/播客的大脑页面必须链接至原始分离说话人的转录文本。无转录链接的页面视为不完整。

写入位置：

media/videos/

或

media/podcasts/

，并添加指向所有实体的反向链接。

质量标准：

引人注目的标题（非“本视频讨论了...”）
能吸引用户观看/收听的执行摘要
核心观点需为实际洞察，而非主题标签
带有真实说话人姓名的逐字引语（非“speaker_0”）
所有实体均提取并附带上下文及反向链接

PDFs & Documents

PDF与文档

Input: File path or URL.

Process:

Extract text (OCR if scanned/image PDF)
Save raw source for provenance
Summarize: executive summary + key sections + notable data
Extract entities
Cross-reference from entity pages

Write to: per filing rules (file by primary subject, not format).

输入： 文件路径或URL。

流程：

提取文本（如果是扫描件/图片PDF则使用OCR）
保存原始来源以确保可溯源
总结：执行摘要+核心章节+重要数据
提取实体
从实体页面添加交叉引用

写入位置： 根据归档规则（按核心主题归档，而非格式）。

Screenshots & Images

截图与图片

Input: Image file.

Process:

Analyze content (OCR for text-heavy images, description for photos)
If tweet screenshot: extract text, author, date, route to social media workflow
If article screenshot: extract text, route to article workflow
If data/chart: extract data points, describe findings

Write to: depends on content -- route to the appropriate workflow above.

输入： 图片文件。

流程：

分析内容（文本密集型图片使用OCR，照片则生成描述）
如果是推文截图：提取文本、作者、日期，路由至社交媒体工作流
如果是文章截图：提取文本，路由至文章工作流
如果是数据/图表：提取数据点，描述发现

写入位置： 取决于内容——路由至上述合适的工作流。

Meeting Transcripts

会议转录文本

Input: Transcript from meeting recording service, or manual notes.

Process:

Pull full transcript (source of truth -- AI summaries are medium-low trust)
Save raw transcript for provenance
Write meeting page with YOUR analysis above the line, raw transcript below
Entity propagation (MANDATORY): for each attendee and company discussed:
- Update their brain page State section if new info surfaced
- Append to their Timeline with link to the meeting page
- Create page if person/company is notable and has no page yet
A meeting is NOT fully ingested until all entity pages are updated

Write to:

meetings/YYYY-MM-DD-short-description.md

What makes a good meeting page:

Reveals the real crux, not a bullet dump
Connects to existing brain pages (people, companies, deals)
Flags what changed (status, decisions, new info)
Names tension or what was left unsaid
Captures actual dynamic, not performative summary

输入： 会议录制服务提供的转录文本，或手动笔记。

流程：

获取完整转录文本（事实来源——AI摘要可信度中等偏低）
保存原始转录文本以确保可溯源
撰写会议页面，上方为你的分析内容，下方为原始转录文本
实体传播（强制性）： 针对每位参会者和讨论的公司：
- 如果出现新信息，更新其大脑页面的状态部分
- 在其时间线中追加指向会议页面的条目
- 如果人物/公司值得关注且尚无页面，则创建页面
只有所有实体页面都更新完成，会议才算完全摄入

写入位置：

meetings/YYYY-MM-DD-short-description.md

优质会议页面的标准：

揭示真正的核心问题，而非简单罗列要点
关联至现有大脑页面（人物、公司、交易）
标记变化（状态、决策、新信息）
指出分歧或未提及的内容
捕捉实际互动动态，而非形式化总结

Social Media Content

社交媒体内容

Input: Tweet, thread, or social media post.

Process:

Fetch full content (thread, quote tweets, context)
If images present: OCR via vision model for full text extraction
Summarize: what's being said, why it matters, who's involved
Extract entities and update brain pages
Include direct link to the original post (MANDATORY for citations)

Write to:

media/x/

for daily aggregation, or entity-specific directories if the post is primarily about a person/company.

输入： 推文、线程或社交媒体帖子。

流程：

获取完整内容（线程、引用推文、上下文）
如果包含图片：通过视觉模型进行OCR以提取完整文本
总结：内容主旨、重要性、涉及人员
提取实体并更新大脑页面
包含指向原始帖子的直接链接（引用要求强制性）

写入位置： 日常聚合内容存入

media/x/

，如果帖子主要关于某个人物/公司，则存入对应实体的目录。

Raw Source Preservation

原始来源保存

Every ingested item must have its raw source preserved for provenance.

Use
gbrain files upload-raw
for automatic size routing:

bash

gbrain files upload-raw <file> --page <page-slug> --type <type>

< 100 MB text/PDF: stays in git (brain repo
```
.raw/
```
sidecar directories)
>= 100 MB OR media (video, audio, images): uploaded to cloud storage via TUS resumable upload,
```
.redirect.yaml
```
pointer left in the brain repo

The

.redirect.yaml

pointer format:

yaml

target: supabase://brain-files/page-slug/filename.mp4
bucket: brain-files
storage_path: page-slug/filename.mp4
size: 524288000
size_human: 500 MB
hash: sha256:abc123...
mime: video/mp4
uploaded: 2026-04-11T...
type: transcript

Accessing stored files:

```
gbrain files signed-url <storage-path>
```
-- generate 1-hour signed URL for viewing/sharing
```
gbrain files restore <dir>
```
-- download back to local from cloud storage

Use

put_raw_data

in gbrain to store raw API responses and metadata (JSON, not binary).

每一个摄入的项目都必须保存其原始来源以确保可溯源。

使用
gbrain files upload-raw
进行自动大小路由：

bash

gbrain files upload-raw <file> --page <page-slug> --type <type>

小于100MB的文本/PDF：保存在git中（大脑仓库的
```
.raw/
```
副目录）
大于等于100MB或媒体文件（视频、音频、图片）：通过TUS可恢复上传至云存储，大脑仓库中留下
```
.redirect.yaml
```
指针

.redirect.yaml

指针格式：

yaml

target: supabase://brain-files/page-slug/filename.mp4
bucket: brain-files
storage_path: page-slug/filename.mp4
size: 524288000
size_human: 500 MB
hash: sha256:abc123...
mime: video/mp4
uploaded: 2026-04-11T...
type: transcript

访问存储的文件：

```
gbrain files signed-url <storage-path>
```
-- 生成1小时有效期的签名URL用于查看/分享
```
gbrain files restore <dir>
```
-- 从云存储下载至本地

使用gbrain中的

put_raw_data

存储原始API响应和元数据（JSON格式，非二进制）。

Test Before Bulk

批量处理前先测试

When processing multiple items (batch video ingestion, bulk meeting processing, etc.):

Test on 3-5 items first. Run in test mode if available.
Read the actual output. Is the quality good? Are titles compelling (not "This video discusses...")? Are entities extracted and back-linked? Is the format clean?
Fix what's wrong in the approach/skill, not via one-off patches.
Only then: bulk execute with throttling, commits every 5-10 items.

The marginal cost of testing 3 items first is near zero. The cost of cleaning up 100 bad pages is enormous.

处理多个项目时（批量视频摄入、批量会议处理等）：

先在3-5个项目上测试。 如果有测试模式则使用测试模式。
查看实际输出。 质量是否达标？标题是否引人入胜（非“本视频讨论了...”）？实体是否被提取并添加反向链接？格式是否整洁？
修正方法/技能中的问题，而非通过一次性补丁修复。
之后再执行批量处理，并设置限流，每5-10个项目提交一次。

先测试3个项目的边际成本几乎为零。清理100个劣质页面的成本则极高。

Quality Rules

质量规则

Executive summary in compiled_truth must be updated, not just timeline appended
State section is REWRITTEN, not appended to. Current best understanding only.
Timeline entries are reverse-chronological (newest first)
Every person/company mentioned gets a page if notable (see filing rules)
Link types: knows, works_at, invested_in, founded, met_at, discussed
Source attribution: every timeline entry includes [Source: ...] citation
Back-links: every entity mention creates a back-link (Iron Law)
Filing: file by primary subject, not format or source (see filing rules)

compiled_truth中的执行摘要必须更新，而非仅追加时间线
状态部分需重写，而非追加内容。仅保留当前最佳认知。
时间线条目按逆序排列（最新的在前）
每提及一个值得关注的人物/公司都需创建页面（参阅归档规则）
链接类型：knows、works_at、invested_in、founded、met_at、discussed
来源归属：每个时间线条目都包含
```
[Source: ...]
```
引用
反向链接：每提及一个实体都需创建反向链接（铁则）
归档：按核心主题归档，而非格式或来源（参阅归档规则）

Anti-Patterns

反模式

Appending to State sections. State is rewritten with the current best understanding on every update. Append-only State sections grow stale and contradictory.
Ingesting without back-links. An unlinked mention is a broken brain. Every entity mentioned must have a back-link from their page to the page mentioning them.
Skipping raw source preservation. Every ingested item must have its raw source preserved. A brain page without provenance is unverifiable.
Bulk processing without sample test. Test on 3-5 items first. Fix quality issues in the approach, not via one-off patches.
Paraphrasing the user's original thinking. The user's exact language IS the insight. Capture verbatim phrasing for ideas, theses, and frameworks.

向状态部分追加内容。 每次更新时，状态部分需基于当前最佳认知重写。仅追加的状态部分会变得陈旧且矛盾。
摄入时不添加反向链接。 未链接的提及会导致大脑系统失效。每提及一个实体都必须创建从其页面指向提及页面的反向链接。
跳过原始来源保存。 每一个摄入的项目都必须保存其原始来源。无可溯源信息的大脑页面是无法验证的。
未进行样本测试就批量处理。 先在3-5个项目上测试。修正方法中的质量问题，而非通过一次性补丁修复。
意译用户的原创观点。 用户的确切措辞本身就是洞察。对于想法、论点和框架，需逐字捕捉。

Output Format

输出格式

INGESTED: [title]
==================

Page: [slug]
Type: [person / company / meeting / media / concept]
Source: [source description]

Entities detected: N
- [entity] -> [created / updated] ([slug])

Back-links created: N
Timeline entries: N
Raw source: [preserved at path / uploaded to cloud]

INGESTED: [title]
==================

Page: [slug]
Type: [person / company / meeting / media / concept]
Source: [source description]

Entities detected: N
- [entity] -> [created / updated] ([slug])

Back-links created: N
Timeline entries: N
Raw source: [preserved at path / uploaded to cloud]

Tools Used

使用的工具

Read a page from gbrain (get_page)
Store/update a page in gbrain (put_page)
Add a timeline entry in gbrain (add_timeline_entry)
Link entities in gbrain (add_link)
List tags for a page (get_tags)
Tag a page in gbrain (add_tag)
Store raw data in gbrain (put_raw_data)
Check backlinks in gbrain (get_backlinks)

从gbrain读取页面（get_page）
在gbrain中存储/更新页面（put_page）
在gbrain中添加时间线条目（add_timeline_entry）
在gbrain中链接实体（add_link）
列出页面的标签（get_tags）
为gbrain中的页面添加标签（add_tag）
在gbrain中存储原始数据（put_raw_data）
检查gbrain中的反向链接（get_backlinks）