ai-rag-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

RAG Patterns — Retrieval-Augmented Generation

RAG模式——Retrieval-Augmented Generation

Use When

适用场景

Use when building features that answer questions from private data, documents, policies, or time-sensitive information — RAG architecture, chunking strategies, hybrid search, re-ranking, vector databases, evaluation, agentic RAG, multimodal RAG...
The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.

适用于构建从私有数据、文档、政策或时效性信息中解答问题的功能场景——涵盖RAG架构、分块策略、混合搜索、重排序、向量数据库、评估、Agentic RAG、多模态RAG等内容...
任务需要可复用的判断逻辑、领域约束或成熟工作流，而非临时建议。

Do Not Use When

不适用于以下场景

The task is unrelated to
```
ai-rag-patterns
```
or would be better handled by a more specific companion skill.
The request only needs a trivial answer and none of this skill's constraints or references materially help.

任务与
```
ai-rag-patterns
```
无关，或使用更特定的配套技能处理效果更佳。
请求仅需简单答案，本技能的约束条件或参考内容无法提供实质性帮助。

Required Inputs

必要输入

Gather relevant project context, constraints, and the concrete problem to solve; load
```
references
```
only as needed.
Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.

收集相关项目背景、约束条件及具体待解决问题；仅在需要时加载
```
references
```
内容。
确认期望交付成果：设计方案、代码、评审意见、迁移计划、审计报告或文档。

Workflow

工作流程

Read this
```
SKILL.md
```
first, then load only the referenced deep-dive files that are necessary for the task.
Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.

首先阅读本
```
SKILL.md
```
，仅加载完成任务所需的相关深度参考文件。
应用本技能中的有序指导、检查清单和决策规则，而非随意挑选孤立片段。
生成交付成果时，若相关需明确标注假设条件、风险及后续工作。

Quality Standards

质量标准

Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
Prefer deterministic, reviewable steps over vague advice or tool-specific magic.

输出内容需以执行为导向，简洁明了，并与仓库的基础工程标准保持一致。
除非技能明确要求更高标准，否则需与现有项目约定保持兼容性。
优先采用可确定、可评审的步骤，而非模糊建议或工具专属的"魔法操作"。

Anti-Patterns

反模式

Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
Loading every reference file by default instead of using progressive disclosure.

未经检查适配性、约束条件或失败模式，就将示例内容直接复制粘贴使用。
默认加载所有参考文件，而非逐步按需披露内容。

Outputs

输出成果

A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
References used, companion skills, or follow-up actions when they materially improve execution.

符合任务需求的具体结果：实施指导、评审发现、架构决策、模板或生成的工件。
若仅靠现有上下文无法完成任务，需明确标注假设条件、权衡方案或未解决的空白点。
若能实质性提升执行效果，需列出使用的参考资料、配套技能或后续行动。

Evidence Produced

生成的证据

Category	Artifact	Format	Example
Correctness	RAG retrieval evaluation report	Markdown doc covering recall / precision / answer-quality on a fixed eval set	`docs/ai/rag-eval-2026-04-16.md`
Data safety	Index ingestion + tenancy isolation note	Markdown doc covering chunking, source filtering, and per-tenant index segregation	`docs/ai/rag-tenancy-note.md`

分类	工件	格式	示例
正确性	RAG检索评估报告	覆盖固定评估集召回率/精确率/答案质量的Markdown文档	`docs/ai/rag-eval-2026-04-16.md`
数据安全	索引导入 + 租户隔离说明	覆盖分块、源过滤及租户索引隔离的Markdown文档	`docs/ai/rag-tenancy-note.md`

References

参考资料

Use the
```
references/
```
directory for deep detail after reading the core workflow below.

阅读以下核心工作流后，如需详细内容可查看
```
references/
```
目录。

Overview

概述

RAG solves the core LLM limitation: they only know what they were trained on. Use RAG to inject private data (invoices, menus, policies, reports) into every AI response.

Core principle: RAG = look up a database + LLM synthesises the results. The LLM never needs to "know" your data.

RAG解决了大语言模型（LLM）的核心局限性：它们仅知晓训练时的数据。使用RAG可将私有数据（发票、菜单、政策、报告）注入每一次AI响应中。

核心原则： RAG = 查询数据库 + LLM合成结果。LLM无需"知晓"你的数据。

When to Use RAG

何时使用RAG

Condition	Action
Knowledge base < 200K tokens (~500 pages)	Include everything in context — no RAG needed
Knowledge base > 200K tokens	Use RAG
Data changes frequently (menus, prices, stock)	RAG (update documents, not model)
Data is private/confidential	RAG (keeps data out of training pipelines)
Need source citations	RAG (chunks are traceable to source)
Model needs brand voice / domain jargon	Fine-tune instead

条件	操作
知识库小于200K tokens（约500页）	将所有内容纳入上下文——无需使用RAG
知识库大于200K tokens	使用RAG
数据频繁变更（菜单、价格、库存）	使用RAG（更新文档，而非模型）
数据为私有/机密内容	使用RAG（避免数据进入训练流水线）
需要来源引用	使用RAG（分块可追溯至源文档）
模型需要品牌语调/领域术语	改为使用微调

RAG vs Fine-Tuning

RAG与微调对比

Factor	RAG	Fine-Tuning
Up-to-date content	✅ Yes (add docs anytime)	❌ Stale until retrained
Hallucinations	✅ Lower (document-grounded)	❌ Higher
Source citations	✅ Yes	❌ No
Brand voice control	❌ Weak	✅ Strong
Domain jargon	❌ Weak	✅ Strong
Up-front cost	✅ Lower	❌ High

Default: start with RAG. Fine-tune only when RAG + prompt engineering cannot deliver the required tone or vocabulary.

因素	RAG	微调
内容时效性	✅ 是（可随时添加文档）	❌ 未重新训练前内容过时
幻觉问题	✅ 发生率更低（基于文档）	❌ 发生率更高
来源引用	✅ 支持	❌ 不支持
品牌语调控制	❌ 较弱	✅ 较强
领域术语适配	❌ 较弱	✅ 较强
前期成本	✅ 更低	❌ 较高

默认建议：从RAG开始。 仅当RAG+提示工程无法满足所需语调和词汇要求时，再考虑微调。

Additional Guidance

额外指导

Guidance is split across two reference files so this entrypoint stays compact.

references/skill-deep-dive.md — architecture, chunking, retrieval, schema:

```
Pipeline Architecture
```
```
Chunking Strategies
```
```
Embedding Model Selection
```
```
Vector Database Selection
```
```
Retrieval Algorithms
```
```
Re-Ranking
```
```
Full RAG Query Algorithm
```
```
Query Rewriting (Multi-Turn)
```
```
RAG Schema (Multi-Tenant)
```
```
Evaluation Framework
```
```
Production Patterns
```
```
Agentic RAG
```

Multimodal RAG

Edge Cases

Cost Optimisation

Sources

references/production-rag.md — the progression from draft to production and the gates before shipping:

```
RAG Maturity Model
```
— Naive → Advanced → Modular
```
Query Transformation
```
— HyDE, Multi-Query, Step-Back
```
Contextual Compression
```
```
Self-RAG
```
```
RAGAS Evaluation
```
— 4 metrics with production thresholds
```
Embedding Pipeline
```
— batching, upserts, re-embed triggers, $/1M-token table
```
Cost Management Decision Tree
```
— concrete dollar figures per branch
```
Failure Mode Playbook
```
— empty, irrelevant, hallucinated, stale
```
Gates Before Shipping
```

Load the production file when building a RAG system that has to pass evaluation gates, survive multi-tenant review, or hit a cost budget under load.

指导内容分为两个参考文件，以保持本入口文档简洁。

references/skill-deep-dive.md — 架构、分块、检索、 schema：

```
Pipeline Architecture
```
```
Chunking Strategies
```
```
Embedding Model Selection
```
```
Vector Database Selection
```
```
Retrieval Algorithms
```
```
Re-Ranking
```
```
Full RAG Query Algorithm
```
```
Query Rewriting (Multi-Turn)
```
```
RAG Schema (Multi-Tenant)
```
```
Evaluation Framework
```
```
Production Patterns
```
```
Agentic RAG
```

Multimodal RAG

Edge Cases

Cost Optimisation

Sources

references/production-rag.md — 从草稿到生产环境的演进流程及上线前的检查关卡：

```
RAG Maturity Model
```
— 基础 → 进阶 → 模块化
```
Query Transformation
```
— HyDE、多查询、回溯法
```
Contextual Compression
```
```
Self-RAG
```
```
RAGAS Evaluation
```
— 含生产阈值的4项指标
```
Embedding Pipeline
```
— 批量处理、更新插入、重新嵌入触发、每百万token成本表
```
Cost Management Decision Tree
```
— 各分支对应具体金额
```
Failure Mode Playbook
```
— 无结果、无关结果、幻觉结果、过时结果
```
Gates Before Shipping
```

当构建需通过评估关卡、通过多租户评审或在负载下满足成本预算的RAG系统时，加载此生产环境参考文件。