ai-rag-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

RAG Patterns — Retrieval-Augmented Generation

RAG模式——Retrieval-Augmented Generation

<!-- dual-compat-start -->
<!-- dual-compat-start -->

Use When

适用场景

  • Use when building features that answer questions from private data, documents, policies, or time-sensitive information — RAG architecture, chunking strategies, hybrid search, re-ranking, vector databases, evaluation, agentic RAG, multimodal RAG...
  • The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.
  • 适用于构建从私有数据、文档、政策或时效性信息中解答问题的功能场景——涵盖RAG架构、分块策略、混合搜索、重排序、向量数据库、评估、Agentic RAG、多模态RAG等内容...
  • 任务需要可复用的判断逻辑、领域约束或成熟工作流,而非临时建议。

Do Not Use When

不适用于以下场景

  • The task is unrelated to
    ai-rag-patterns
    or would be better handled by a more specific companion skill.
  • The request only needs a trivial answer and none of this skill's constraints or references materially help.
  • 任务与
    ai-rag-patterns
    无关,或使用更特定的配套技能处理效果更佳。
  • 请求仅需简单答案,本技能的约束条件或参考内容无法提供实质性帮助。

Required Inputs

必要输入

  • Gather relevant project context, constraints, and the concrete problem to solve; load
    references
    only as needed.
  • Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.
  • 收集相关项目背景、约束条件及具体待解决问题;仅在需要时加载
    references
    内容。
  • 确认期望交付成果:设计方案、代码、评审意见、迁移计划、审计报告或文档。

Workflow

工作流程

  • Read this
    SKILL.md
    first, then load only the referenced deep-dive files that are necessary for the task.
  • Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
  • Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.
  • 首先阅读本
    SKILL.md
    ,仅加载完成任务所需的相关深度参考文件。
  • 应用本技能中的有序指导、检查清单和决策规则,而非随意挑选孤立片段。
  • 生成交付成果时,若相关需明确标注假设条件、风险及后续工作。

Quality Standards

质量标准

  • Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
  • Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
  • Prefer deterministic, reviewable steps over vague advice or tool-specific magic.
  • 输出内容需以执行为导向,简洁明了,并与仓库的基础工程标准保持一致。
  • 除非技能明确要求更高标准,否则需与现有项目约定保持兼容性。
  • 优先采用可确定、可评审的步骤,而非模糊建议或工具专属的"魔法操作"。

Anti-Patterns

反模式

  • Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
  • Loading every reference file by default instead of using progressive disclosure.
  • 未经检查适配性、约束条件或失败模式,就将示例内容直接复制粘贴使用。
  • 默认加载所有参考文件,而非逐步按需披露内容。

Outputs

输出成果

  • A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
  • Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
  • References used, companion skills, or follow-up actions when they materially improve execution.
  • 符合任务需求的具体结果:实施指导、评审发现、架构决策、模板或生成的工件。
  • 若仅靠现有上下文无法完成任务,需明确标注假设条件、权衡方案或未解决的空白点。
  • 若能实质性提升执行效果,需列出使用的参考资料、配套技能或后续行动。

Evidence Produced

生成的证据

CategoryArtifactFormatExample
CorrectnessRAG retrieval evaluation reportMarkdown doc covering recall / precision / answer-quality on a fixed eval set
docs/ai/rag-eval-2026-04-16.md
Data safetyIndex ingestion + tenancy isolation noteMarkdown doc covering chunking, source filtering, and per-tenant index segregation
docs/ai/rag-tenancy-note.md
分类工件格式示例
正确性RAG检索评估报告覆盖固定评估集召回率/精确率/答案质量的Markdown文档
docs/ai/rag-eval-2026-04-16.md
数据安全索引导入 + 租户隔离说明覆盖分块、源过滤及租户索引隔离的Markdown文档
docs/ai/rag-tenancy-note.md

References

参考资料

  • Use the
    references/
    directory for deep detail after reading the core workflow below.
<!-- dual-compat-end -->
  • 阅读以下核心工作流后,如需详细内容可查看
    references/
    目录。
<!-- dual-compat-end -->

Overview

概述

RAG solves the core LLM limitation: they only know what they were trained on. Use RAG to inject private data (invoices, menus, policies, reports) into every AI response.
Core principle: RAG = look up a database + LLM synthesises the results. The LLM never needs to "know" your data.

RAG解决了大语言模型(LLM)的核心局限性:它们仅知晓训练时的数据。使用RAG可将私有数据(发票、菜单、政策、报告)注入每一次AI响应中。
核心原则: RAG = 查询数据库 + LLM合成结果。LLM无需"知晓"你的数据。

When to Use RAG

何时使用RAG

ConditionAction
Knowledge base < 200K tokens (~500 pages)Include everything in context — no RAG needed
Knowledge base > 200K tokensUse RAG
Data changes frequently (menus, prices, stock)RAG (update documents, not model)
Data is private/confidentialRAG (keeps data out of training pipelines)
Need source citationsRAG (chunks are traceable to source)
Model needs brand voice / domain jargonFine-tune instead

条件操作
知识库小于200K tokens(约500页)将所有内容纳入上下文——无需使用RAG
知识库大于200K tokens使用RAG
数据频繁变更(菜单、价格、库存)使用RAG(更新文档,而非模型)
数据为私有/机密内容使用RAG(避免数据进入训练流水线)
需要来源引用使用RAG(分块可追溯至源文档)
模型需要品牌语调/领域术语改为使用微调

RAG vs Fine-Tuning

RAG与微调对比

FactorRAGFine-Tuning
Up-to-date content✅ Yes (add docs anytime)❌ Stale until retrained
Hallucinations✅ Lower (document-grounded)❌ Higher
Source citations✅ Yes❌ No
Brand voice control❌ Weak✅ Strong
Domain jargon❌ Weak✅ Strong
Up-front cost✅ Lower❌ High
Default: start with RAG. Fine-tune only when RAG + prompt engineering cannot deliver the required tone or vocabulary.

因素RAG微调
内容时效性✅ 是(可随时添加文档)❌ 未重新训练前内容过时
幻觉问题✅ 发生率更低(基于文档)❌ 发生率更高
来源引用✅ 支持❌ 不支持
品牌语调控制❌ 较弱✅ 较强
领域术语适配❌ 较弱✅ 较强
前期成本✅ 更低❌ 较高
默认建议:从RAG开始。 仅当RAG+提示工程无法满足所需语调和词汇要求时,再考虑微调。

Additional Guidance

额外指导

Guidance is split across two reference files so this entrypoint stays compact.
references/skill-deep-dive.md — architecture, chunking, retrieval, schema:
  • Pipeline Architecture
  • Chunking Strategies
  • Embedding Model Selection
  • Vector Database Selection
  • Retrieval Algorithms
  • Re-Ranking
  • Full RAG Query Algorithm
  • Query Rewriting (Multi-Turn)
  • RAG Schema (Multi-Tenant)
  • Evaluation Framework
  • Production Patterns
  • Agentic RAG
  • Multimodal RAG
    ,
    Edge Cases
    ,
    Cost Optimisation
    ,
    Sources
references/production-rag.md — the progression from draft to production and the gates before shipping:
  • RAG Maturity Model
    — Naive → Advanced → Modular
  • Query Transformation
    — HyDE, Multi-Query, Step-Back
  • Contextual Compression
  • Self-RAG
  • RAGAS Evaluation
    — 4 metrics with production thresholds
  • Embedding Pipeline
    — batching, upserts, re-embed triggers, $/1M-token table
  • Cost Management Decision Tree
    — concrete dollar figures per branch
  • Failure Mode Playbook
    — empty, irrelevant, hallucinated, stale
  • Gates Before Shipping
Load the production file when building a RAG system that has to pass evaluation gates, survive multi-tenant review, or hit a cost budget under load.
指导内容分为两个参考文件,以保持本入口文档简洁。
references/skill-deep-dive.md — 架构、分块、检索、 schema:
  • Pipeline Architecture
  • Chunking Strategies
  • Embedding Model Selection
  • Vector Database Selection
  • Retrieval Algorithms
  • Re-Ranking
  • Full RAG Query Algorithm
  • Query Rewriting (Multi-Turn)
  • RAG Schema (Multi-Tenant)
  • Evaluation Framework
  • Production Patterns
  • Agentic RAG
  • Multimodal RAG
    ,
    Edge Cases
    ,
    Cost Optimisation
    ,
    Sources
references/production-rag.md — 从草稿到生产环境的演进流程及上线前的检查关卡:
  • RAG Maturity Model
    — 基础 → 进阶 → 模块化
  • Query Transformation
    — HyDE、多查询、回溯法
  • Contextual Compression
  • Self-RAG
  • RAGAS Evaluation
    — 含生产阈值的4项指标
  • Embedding Pipeline
    — 批量处理、更新插入、重新嵌入触发、每百万token成本表
  • Cost Management Decision Tree
    — 各分支对应具体金额
  • Failure Mode Playbook
    — 无结果、无关结果、幻觉结果、过时结果
  • Gates Before Shipping
当构建需通过评估关卡、通过多租户评审或在负载下满足成本预算的RAG系统时,加载此生产环境参考文件。