ai-rag-patterns
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRAG Patterns — Retrieval-Augmented Generation
RAG模式——Retrieval-Augmented Generation
<!-- dual-compat-start -->
<!-- dual-compat-start -->
Use When
适用场景
- Use when building features that answer questions from private data, documents, policies, or time-sensitive information — RAG architecture, chunking strategies, hybrid search, re-ranking, vector databases, evaluation, agentic RAG, multimodal RAG...
- The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.
- 适用于构建从私有数据、文档、政策或时效性信息中解答问题的功能场景——涵盖RAG架构、分块策略、混合搜索、重排序、向量数据库、评估、Agentic RAG、多模态RAG等内容...
- 任务需要可复用的判断逻辑、领域约束或成熟工作流,而非临时建议。
Do Not Use When
不适用于以下场景
- The task is unrelated to or would be better handled by a more specific companion skill.
ai-rag-patterns - The request only needs a trivial answer and none of this skill's constraints or references materially help.
- 任务与无关,或使用更特定的配套技能处理效果更佳。
ai-rag-patterns - 请求仅需简单答案,本技能的约束条件或参考内容无法提供实质性帮助。
Required Inputs
必要输入
- Gather relevant project context, constraints, and the concrete problem to solve; load only as needed.
references - Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.
- 收集相关项目背景、约束条件及具体待解决问题;仅在需要时加载内容。
references - 确认期望交付成果:设计方案、代码、评审意见、迁移计划、审计报告或文档。
Workflow
工作流程
- Read this first, then load only the referenced deep-dive files that are necessary for the task.
SKILL.md - Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
- Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.
- 首先阅读本,仅加载完成任务所需的相关深度参考文件。
SKILL.md - 应用本技能中的有序指导、检查清单和决策规则,而非随意挑选孤立片段。
- 生成交付成果时,若相关需明确标注假设条件、风险及后续工作。
Quality Standards
质量标准
- Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
- Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
- Prefer deterministic, reviewable steps over vague advice or tool-specific magic.
- 输出内容需以执行为导向,简洁明了,并与仓库的基础工程标准保持一致。
- 除非技能明确要求更高标准,否则需与现有项目约定保持兼容性。
- 优先采用可确定、可评审的步骤,而非模糊建议或工具专属的"魔法操作"。
Anti-Patterns
反模式
- Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
- Loading every reference file by default instead of using progressive disclosure.
- 未经检查适配性、约束条件或失败模式,就将示例内容直接复制粘贴使用。
- 默认加载所有参考文件,而非逐步按需披露内容。
Outputs
输出成果
- A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
- Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
- References used, companion skills, or follow-up actions when they materially improve execution.
- 符合任务需求的具体结果:实施指导、评审发现、架构决策、模板或生成的工件。
- 若仅靠现有上下文无法完成任务,需明确标注假设条件、权衡方案或未解决的空白点。
- 若能实质性提升执行效果,需列出使用的参考资料、配套技能或后续行动。
Evidence Produced
生成的证据
| Category | Artifact | Format | Example |
|---|---|---|---|
| Correctness | RAG retrieval evaluation report | Markdown doc covering recall / precision / answer-quality on a fixed eval set | |
| Data safety | Index ingestion + tenancy isolation note | Markdown doc covering chunking, source filtering, and per-tenant index segregation | |
| 分类 | 工件 | 格式 | 示例 |
|---|---|---|---|
| 正确性 | RAG检索评估报告 | 覆盖固定评估集召回率/精确率/答案质量的Markdown文档 | |
| 数据安全 | 索引导入 + 租户隔离说明 | 覆盖分块、源过滤及租户索引隔离的Markdown文档 | |
References
参考资料
- Use the directory for deep detail after reading the core workflow below.
references/
- 阅读以下核心工作流后,如需详细内容可查看目录。
references/
Overview
概述
RAG solves the core LLM limitation: they only know what they were trained on. Use RAG to inject private data (invoices, menus, policies, reports) into every AI response.
Core principle: RAG = look up a database + LLM synthesises the results. The LLM never needs to "know" your data.
RAG解决了大语言模型(LLM)的核心局限性:它们仅知晓训练时的数据。使用RAG可将私有数据(发票、菜单、政策、报告)注入每一次AI响应中。
核心原则: RAG = 查询数据库 + LLM合成结果。LLM无需"知晓"你的数据。
When to Use RAG
何时使用RAG
| Condition | Action |
|---|---|
| Knowledge base < 200K tokens (~500 pages) | Include everything in context — no RAG needed |
| Knowledge base > 200K tokens | Use RAG |
| Data changes frequently (menus, prices, stock) | RAG (update documents, not model) |
| Data is private/confidential | RAG (keeps data out of training pipelines) |
| Need source citations | RAG (chunks are traceable to source) |
| Model needs brand voice / domain jargon | Fine-tune instead |
| 条件 | 操作 |
|---|---|
| 知识库小于200K tokens(约500页) | 将所有内容纳入上下文——无需使用RAG |
| 知识库大于200K tokens | 使用RAG |
| 数据频繁变更(菜单、价格、库存) | 使用RAG(更新文档,而非模型) |
| 数据为私有/机密内容 | 使用RAG(避免数据进入训练流水线) |
| 需要来源引用 | 使用RAG(分块可追溯至源文档) |
| 模型需要品牌语调/领域术语 | 改为使用微调 |
RAG vs Fine-Tuning
RAG与微调对比
| Factor | RAG | Fine-Tuning |
|---|---|---|
| Up-to-date content | ✅ Yes (add docs anytime) | ❌ Stale until retrained |
| Hallucinations | ✅ Lower (document-grounded) | ❌ Higher |
| Source citations | ✅ Yes | ❌ No |
| Brand voice control | ❌ Weak | ✅ Strong |
| Domain jargon | ❌ Weak | ✅ Strong |
| Up-front cost | ✅ Lower | ❌ High |
Default: start with RAG. Fine-tune only when RAG + prompt engineering cannot deliver the required tone or vocabulary.
| 因素 | RAG | 微调 |
|---|---|---|
| 内容时效性 | ✅ 是(可随时添加文档) | ❌ 未重新训练前内容过时 |
| 幻觉问题 | ✅ 发生率更低(基于文档) | ❌ 发生率更高 |
| 来源引用 | ✅ 支持 | ❌ 不支持 |
| 品牌语调控制 | ❌ 较弱 | ✅ 较强 |
| 领域术语适配 | ❌ 较弱 | ✅ 较强 |
| 前期成本 | ✅ 更低 | ❌ 较高 |
默认建议:从RAG开始。 仅当RAG+提示工程无法满足所需语调和词汇要求时,再考虑微调。
Additional Guidance
额外指导
Guidance is split across two reference files so this entrypoint stays compact.
references/skill-deep-dive.md — architecture, chunking, retrieval, schema:
Pipeline ArchitectureChunking StrategiesEmbedding Model SelectionVector Database SelectionRetrieval AlgorithmsRe-RankingFull RAG Query AlgorithmQuery Rewriting (Multi-Turn)RAG Schema (Multi-Tenant)Evaluation FrameworkProduction PatternsAgentic RAG- ,
Multimodal RAG,Edge Cases,Cost OptimisationSources
references/production-rag.md — the progression from draft to production and the gates before shipping:
- — Naive → Advanced → Modular
RAG Maturity Model - — HyDE, Multi-Query, Step-Back
Query Transformation Contextual CompressionSelf-RAG- — 4 metrics with production thresholds
RAGAS Evaluation - — batching, upserts, re-embed triggers, $/1M-token table
Embedding Pipeline - — concrete dollar figures per branch
Cost Management Decision Tree - — empty, irrelevant, hallucinated, stale
Failure Mode Playbook Gates Before Shipping
Load the production file when building a RAG system that has to pass evaluation gates, survive multi-tenant review, or hit a cost budget under load.
指导内容分为两个参考文件,以保持本入口文档简洁。
references/skill-deep-dive.md — 架构、分块、检索、 schema:
Pipeline ArchitectureChunking StrategiesEmbedding Model SelectionVector Database SelectionRetrieval AlgorithmsRe-RankingFull RAG Query AlgorithmQuery Rewriting (Multi-Turn)RAG Schema (Multi-Tenant)Evaluation FrameworkProduction PatternsAgentic RAG- ,
Multimodal RAG,Edge Cases,Cost OptimisationSources
references/production-rag.md — 从草稿到生产环境的演进流程及上线前的检查关卡:
- — 基础 → 进阶 → 模块化
RAG Maturity Model - — HyDE、多查询、回溯法
Query Transformation Contextual CompressionSelf-RAG- — 含生产阈值的4项指标
RAGAS Evaluation - — 批量处理、更新插入、重新嵌入触发、每百万token成本表
Embedding Pipeline - — 各分支对应具体金额
Cost Management Decision Tree - — 无结果、无关结果、幻觉结果、过时结果
Failure Mode Playbook Gates Before Shipping
当构建需通过评估关卡、通过多租户评审或在负载下满足成本预算的RAG系统时,加载此生产环境参考文件。