ai-rag

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

RAG & Search Engineering — Complete Reference

RAG与搜索工程——完整参考指南

Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.
This skill covers:
  • RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
  • Search: BM25, vector search, hybrid fusion, ranking pipelines
  • Evaluation: recall@k, nDCG, MRR, groundedness metrics
Modern Best Practices (Jan 2026):
Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.
Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.
借助hybrid searchgrounded generation可衡量的质量标准构建生产级检索系统。
本指南涵盖以下内容:
  • RAG:分块、上下文检索、事实锚定、自适应/自校正系统
  • 搜索:BM25、向量搜索、混合融合、排序流水线
  • 评估:recall@k、nDCG、MRR、事实锚定指标
2026年1月现代最佳实践
默认原则:确定性流水线、有限上下文、明确的故障处理,以及每个阶段的遥测监控。
范围说明:生成阶段涉及的提示词结构与输出规范,请参考ai-prompt-engineering

Quick Reference

快速参考

TaskTool/FrameworkCommand/PatternWhen to Use
Decide RAG vs alternativesDecision frameworkRAG if: freshness + citations + corpus size; else: fine-tune/cachingAvoid unnecessary retrieval latency/complexity
Chunking & parsingChunker + parserStart simple; add structure-aware chunking per doc typeIngestion for docs, code, tables, PDFs
RetrievalSparse + dense (hybrid)Fusion (e.g., RRF) + metadata filters + top-k tuningMixed query styles; high recall requirements
Precision boostRerankerCross-encoder/LLM rerank of top-k candidatesWhen top-k contains near-misses/noise
GroundingOutput contract + citationsQuote/ID citations; answerability gate; refuse on missing evidenceCompliance, trust, and auditability
EvaluationOffline + online evalRetrieval metrics + answer metrics + regression testsPrevent silent regressions and staleness failures
任务工具/框架命令/模式使用场景
选择RAG或替代方案决策框架满足以下条件选RAG:时效性+引用需求+语料规模;否则选择微调/缓存避免不必要的检索延迟与复杂度
分块与解析分块器+解析器从简单方案开始;根据文档类型添加结构感知分块文档、代码、表格、PDF的导入处理
检索稀疏+稠密(混合)融合算法(如RRF)+元数据过滤+top-k调优混合查询风格;高召回率需求
精度提升重排序器对top-k候选结果进行交叉编码器/LLM重排序当top-k结果包含近似匹配/噪声时
事实锚定输出规范+引用带稳定ID的引用;可回答性检查;缺少证据时拒绝生成合规性、可信度与可审计性需求
评估离线+在线评估检索指标+回答指标+回归测试防止隐性退化与内容过期故障

Decision Tree: RAG Architecture Selection

决策树:RAG架构选择

text
Building RAG system: [Architecture Path]
    ├─ Document type?
    │   ├─ Page/section-structured? → Structure-aware chunking (pages/sections + metadata)
    │   ├─ Technical docs/code? → Structure-aware + code-aware chunking (symbols, headers)
    │   └─ Simple content? → Fixed-size token chunking with overlap (baseline)
    ├─ Retrieval accuracy low?
    │   ├─ Query ambiguity? → Query rewriting + multi-query expansion + filters
    │   ├─ Noisy results? → Add reranker + better metadata filters
    │   └─ Mixed queries? → Hybrid retrieval (sparse + dense) + reranking
    ├─ Dataset size?
    │   ├─ <100k chunks? → Flat index (exact search)
    │   ├─ 100k-10M? → HNSW (low latency)
    │   └─ >10M? → IVF/ScaNN/DiskANN (scalable)
    └─ Production quality?
        └─ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)
text
构建RAG系统:[架构路径]
    ├─ 文档类型?
    │   ├─ 按页面/章节结构化?→ 结构感知分块(页面/章节 + 元数据)
    │   ├─ 技术文档/代码?→ 结构感知+代码感知分块(符号、标题)
    │   └─ 简单内容?→ 带重叠的固定大小Token分块(基准方案)
    ├─ 检索准确率低?
    │   ├─ 查询存在歧义?→ 查询改写+多查询扩展+过滤
    │   ├─ 结果存在噪声?→ 添加重排序器+更优元数据过滤
    │   └─ 混合查询场景?→ 混合检索(稀疏+稠密)+重排序
    ├─ 数据集规模?
    │   ├─ <100k分块?→ 扁平索引(精确搜索)
    │   ├─ 100k-10M?→ HNSW(低延迟)
    │   └─ >10M?→ IVF/ScaNN/DiskANN(可扩展)
    └─ 生产级质量要求?
        └─ 添加:访问控制列表(ACLs)、时效性/失效处理、评估闸门,以及端到端遥测监控

Core Concepts (Vendor-Agnostic)

核心概念(与厂商无关)

  • Pipeline stages: ingest → chunk → embed → index → retrieve → rerank → pack context → generate → verify.
  • Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).
  • Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).
  • Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
  • 流水线阶段:导入→分块→嵌入→索引→检索→重排序→上下文打包→生成→验证。
  • 两个评估维度:检索相关性(是否获取到正确证据?)与生成保真度(是否正确使用证据?)。
  • 时效性模型:过期阈值、失效触发条件,以及重建策略(增量式 vs 全量式)。
  • 信任边界:检索内容属于不可信输入;需采用与用户输入相同的严谨处理方式(参考OWASP LLM十大风险:https://owasp.org/www-project-top-10-for-large-language-model-applications/)。

Implementation Practices (Tooling Examples)

实践方案(工具示例)

  • Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs.
  • Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
  • Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).
  • 使用检索API规范:包含查询参数、过滤器、top_k、trace_id,以及返回的证据ID。
  • 为每个阶段添加链路追踪/指标监控(参考OpenTelemetry GenAI语义规范:https://opentelemetry.io/docs/specs/semconv/gen-ai/)。
  • 谨慎添加缓存:嵌入缓存、检索缓存(查询+过滤器),以及带失效机制的响应缓存。

Do / Avoid

建议/禁忌

Do
  • Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.
  • Do enforce document-level ACLs at retrieval time (not only at generation time).
  • Do include citations with stable IDs and verify citation coverage in tests.
Avoid
  • Avoid shipping RAG without a test set and regression gate.
  • Avoid "stuff everything" context packing; it increases cost and can reduce accuracy.
  • Avoid mixing corpora without metadata and tenant isolation.
建议
  • 确保检索过程的确定性:固定top_k、稳定排序、明确的过滤器。
  • 在检索阶段强制实施文档级访问控制列表(ACLs),而非仅在生成阶段。
  • 包含带稳定ID的引用,并在测试中验证引用覆盖率。
禁忌
  • 避免在没有测试集与回归闸门的情况下上线RAG系统。
  • 避免“打包所有内容”的上下文打包方式:这会增加成本并可能降低准确率。
  • 避免在没有元数据与租户隔离的情况下混合不同语料库。

When to Use This Skill

适用场景

Use this skill when the user asks:
  • "Help me design a RAG pipeline."
  • "How should I chunk this document?"
  • "Optimize retrieval for my use case."
  • "My RAG system is hallucinating — fix it."
  • "Choose the right vector database / index type."
  • "Create a RAG evaluation framework."
  • "Debug why retrieval gives irrelevant results."
当用户提出以下问题时,可使用本指南:
  • "帮我设计一个RAG流水线。"
  • "我该如何对这份文档进行分块?"
  • "针对我的使用场景优化检索效果。"
  • "我的RAG系统出现幻觉问题——如何修复?"
  • "选择合适的向量数据库/索引类型。"
  • "创建RAG评估框架。"
  • "调试检索结果不相关的问题。"

Tool/Model Recommendation Protocol

工具/模型推荐流程

When users ask for vendor/model/framework recommendations, validate claims against current primary sources.
当用户询问厂商/模型/框架推荐时,需根据当前权威来源验证相关主张。

Triggers

触发场景

  • "What's the best vector database for [use case]?"
  • "What should I use for [chunking/embedding/reranking]?"
  • "What's the latest in RAG development?"
  • "Current best practices for [retrieval/grounding/evaluation]?"
  • "Is [Pinecone/Qdrant/Chroma] still relevant in 2026?"
  • "[Vector DB A] vs [Vector DB B]?"
  • "Best embedding model for [use case]?"
  • "What RAG framework should I use?"
  • "针对[使用场景],最佳的向量数据库是什么?"
  • "我应该使用什么工具进行[分块/嵌入/重排序]?"
  • "RAG开发的最新进展是什么?"
  • "[检索/事实锚定/评估]的当前最佳实践是什么?"
  • "Pinecone/Qdrant/Chroma在2026年是否仍适用?"
  • "[向量数据库A] vs [向量数据库B]?"
  • "针对[使用场景]的最佳嵌入模型是什么?"
  • "我应该使用哪个RAG框架?"

Required Checks

必要检查

  1. Read
    data/sources.json
    and start from sources with
    "add_as_web_search": true
    .
  2. Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).
  3. If browsing isn't available, state assumptions and give a verification checklist.
  1. 读取
    data/sources.json
    ,优先使用标记为
    "add_as_web_search": true
    的来源。
  2. 为每个推荐验证1-2份权威文档(发布说明、基准测试、官方文档)。
  3. 若无法浏览外部资源,需说明假设前提并提供验证清单。

What to Report

输出内容

After checking, provide:
  • Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago)
  • Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG)
  • Deprecated/declining: Approaches or tools losing relevance
  • Recommendation: Based on fresh data, not just static knowledge
检查完成后,提供以下信息:
  • 当前格局:当前流行的向量数据库/嵌入模型(而非6个月前的)
  • 新兴趋势:逐渐普及的技术(延迟交互、Agentic RAG、图RAG)
  • 已淘汰/衰退:逐渐失去相关性的方法或工具
  • 推荐方案:基于最新数据,而非静态知识

Example Topics (verify with current sources)

示例主题(需根据当前来源验证)

  • Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)
  • Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)
  • Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)
  • RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
  • Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)
  • Evaluation (RAGAS, TruLens, DeepEval, BEIR)
  • 向量数据库(Pinecone、Qdrant、Weaviate、Milvus、pgvector、LanceDB)
  • 嵌入模型(OpenAI、Cohere、Voyage AI、Jina、Sentence Transformers)
  • 重排序(Cohere Rerank、Jina Reranker、FlashRank、RankGPT)
  • RAG框架(LlamaIndex、LangChain、Haystack、txtai)
  • 高级RAG(上下文检索、Agentic RAG、图RAG、CRAG)
  • 评估工具(RAGAS、TruLens、DeepEval、BEIR)

Related Skills

相关技能

For adjacent topics, reference these skills:
  • ai-llm - Prompting, fine-tuning, instruction datasets
  • ai-agents - Agentic RAG workflows and tool routing
  • ai-llm-inference - Serving performance, quantization, batching
  • ai-mlops - Deployment, monitoring, security, privacy, and governance
  • ai-prompt-engineering - Prompt patterns for RAG generation phase
对于相邻主题,可参考以下技能:
  • ai-llm - 提示词工程、微调、指令数据集
  • ai-agents - Agentic RAG工作流与工具路由
  • ai-llm-inference - 服务性能、量化、批处理
  • ai-mlops - 部署、监控、安全、隐私与治理
  • ai-prompt-engineering - RAG生成阶段的提示词模式

Templates

模板

System Design (Start Here)

系统设计(从这里开始)

  • RAG System Design
  • RAG系统设计

Chunking & Ingestion

分块与导入

  • Basic Chunking
  • Code Chunking
  • Long Document Chunking
  • 基础分块
  • 代码分块
  • 长文档分块

Embedding & Indexing

嵌入与索引

  • Index Configuration
  • Metadata Schema
  • 索引配置
  • 元数据Schema

Retrieval & Reranking

检索与重排序

  • Retrieval Pipeline
  • Hybrid Search
  • Reranking
  • Ranking Pipeline
  • Reranker
  • 检索流水线
  • 混合搜索
  • 重排序
  • 排序流水线
  • 重排序器

Context Packaging & Grounding

上下文打包与事实锚定

  • Context Packing
  • Grounding
  • 上下文打包
  • 事实锚定

Evaluation

评估

  • RAG Evaluation
  • RAG Test Set
  • Search Evaluation
  • Search Test Set
  • RAG评估
  • RAG测试集
  • 搜索评估
  • 搜索测试集

Search Configuration

搜索配置

  • BM25 Configuration
  • HNSW Configuration
  • IVF Configuration
  • Hybrid Configuration
  • BM25配置
  • HNSW配置
  • IVF配置
  • 混合配置

Query Rewriting

查询改写

  • Query Rewrite
  • 查询改写

Navigation

导航

Resources
  • references/advanced-rag-patterns.md
  • references/agentic-rag-patterns.md
  • references/bm25-tuning.md
  • references/chunking-patterns.md
  • references/chunking-strategies.md
  • references/rag-evaluation-guide.md
  • references/rag-troubleshooting.md
  • references/contextual-retrieval-guide.md
  • references/distributed-search-slos.md
  • references/grounding-checklists.md
  • references/hybrid-fusion-patterns.md
  • references/index-selection-guide.md
  • references/multilingual-domain-patterns.md
  • references/pipeline-architecture.md
  • references/query-rewriting-patterns.md
  • references/ranking-pipeline-guide.md
  • references/retrieval-patterns.md
  • references/search-debugging.md
  • references/search-evaluation-guide.md
  • references/user-feedback-learning.md
  • references/vector-search-patterns.md
Templates
  • assets/context/template-context-packing.md
  • assets/context/template-grounding.md
  • assets/design/rag-system-design.md
  • assets/chunking/template-basic-chunking.md
  • assets/chunking/template-code-chunking.md
  • assets/chunking/template-long-doc-chunking.md
  • assets/retrieval/template-retrieval-pipeline.md
  • assets/retrieval/template-hybrid-search.md
  • assets/retrieval/template-reranking.md
  • assets/eval/template-rag-eval.md
  • assets/eval/template-rag-testset.jsonl
  • assets/eval/template-search-eval.md
  • assets/eval/template-search-testset.jsonl
  • assets/indexing/template-index-config.md
  • assets/indexing/template-metadata-schema.md
  • assets/query/template-query-rewrite.md
  • assets/ranking/template-ranking-pipeline.md
  • assets/ranking/template-reranker.md
  • assets/search/template-bm25-config.md
  • assets/search/template-hnsw-config.md
  • assets/search/template-ivf-config.md
  • assets/search/template-hybrid-config.md
Data
  • data/sources.json — Curated external references
Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.
资源
  • references/advanced-rag-patterns.md
  • references/agentic-rag-patterns.md
  • references/bm25-tuning.md
  • references/chunking-patterns.md
  • references/chunking-strategies.md
  • references/rag-evaluation-guide.md
  • references/rag-troubleshooting.md
  • references/contextual-retrieval-guide.md
  • references/distributed-search-slos.md
  • references/grounding-checklists.md
  • references/hybrid-fusion-patterns.md
  • references/index-selection-guide.md
  • references/multilingual-domain-patterns.md
  • references/pipeline-architecture.md
  • references/query-rewriting-patterns.md
  • references/ranking-pipeline-guide.md
  • references/retrieval-patterns.md
  • references/search-debugging.md
  • references/search-evaluation-guide.md
  • references/user-feedback-learning.md
  • references/vector-search-patterns.md
模板
  • assets/context/template-context-packing.md
  • assets/context/template-grounding.md
  • assets/design/rag-system-design.md
  • assets/chunking/template-basic-chunking.md
  • assets/chunking/template-code-chunking.md
  • assets/chunking/template-long-doc-chunking.md
  • assets/retrieval/template-retrieval-pipeline.md
  • assets/retrieval/template-hybrid-search.md
  • assets/retrieval/template-reranking.md
  • assets/eval/template-rag-eval.md
  • assets/eval/template-rag-testset.jsonl
  • assets/eval/template-search-eval.md
  • assets/eval/template-search-testset.jsonl
  • assets/indexing/template-index-config.md
  • assets/indexing/template-metadata-schema.md
  • assets/query/template-query-rewrite.md
  • assets/ranking/template-ranking-pipeline.md
  • assets/ranking/template-reranker.md
  • assets/search/template-bm25-config.md
  • assets/search/template-hnsw-config.md
  • assets/search/template-ivf-config.md
  • assets/search/template-hybrid-config.md
数据
  • data/sources.json — 精选外部参考资源
当用户需要检索增强系统的设计或调试支持(而非提示词工作或部署相关)时,可使用本技能。