ai-rag

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

RAG & Search Engineering — Complete Reference

RAG与搜索工程——完整参考指南

Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.

This skill covers:

RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
Search: BM25, vector search, hybrid fusion, ranking pipelines
Evaluation: recall@k, nDCG, MRR, groundedness metrics

Modern Best Practices (Jan 2026):

Separate retrieval quality from answer quality; evaluate both (RAG: https://arxiv.org/abs/2005.11401).
Default to hybrid retrieval (sparse + dense) with reranking when precision matters (DPR: https://arxiv.org/abs/2004.04906).
Use a failure taxonomy to debug systematically (Seven Failure Points in RAG: https://arxiv.org/abs/2401.05856).
Treat freshness/invalidation as first-class; staleness is a correctness bug, not a UX issue.
Add grounding gates: answerability checks, citation coverage checks, and refusal-on-missing-context defaults.
Threat-model RAG: retrieved text is untrusted input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.

Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.

借助hybrid search、grounded generation和可衡量的质量标准构建生产级检索系统。

本指南涵盖以下内容：

RAG：分块、上下文检索、事实锚定、自适应/自校正系统
搜索：BM25、向量搜索、混合融合、排序流水线
评估：recall@k、nDCG、MRR、事实锚定指标

2026年1月现代最佳实践：

将检索质量与回答质量分开评估（参考RAG论文：https://arxiv.org/abs/2005.11401）。
当对精度有要求时，默认采用混合检索（稀疏+稠密）搭配重排序方案（参考DPR论文：https://arxiv.org/abs/2004.04906）。
使用故障分类法进行系统化调试（参考RAG的七个故障点论文：https://arxiv.org/abs/2401.05856）。
将时效性/失效处理视为核心需求：内容过期属于正确性问题，而非用户体验问题。
添加事实锚定闸门：可回答性检查、引用覆盖率检查，以及在缺少上下文时默认拒绝生成回答。
对RAG进行威胁建模：检索到的文本属于不可信输入（参考OWASP LLM十大风险：https://owasp.org/www-project-top-10-for-large-language-model-applications/）。

默认原则：确定性流水线、有限上下文、明确的故障处理，以及每个阶段的遥测监控。

范围说明：生成阶段涉及的提示词结构与输出规范，请参考ai-prompt-engineering。

Quick Reference

快速参考

Task	Tool/Framework	Command/Pattern	When to Use
Decide RAG vs alternatives	Decision framework	RAG if: freshness + citations + corpus size; else: fine-tune/caching	Avoid unnecessary retrieval latency/complexity
Chunking & parsing	Chunker + parser	Start simple; add structure-aware chunking per doc type	Ingestion for docs, code, tables, PDFs
Retrieval	Sparse + dense (hybrid)	Fusion (e.g., RRF) + metadata filters + top-k tuning	Mixed query styles; high recall requirements
Precision boost	Reranker	Cross-encoder/LLM rerank of top-k candidates	When top-k contains near-misses/noise
Grounding	Output contract + citations	Quote/ID citations; answerability gate; refuse on missing evidence	Compliance, trust, and auditability
Evaluation	Offline + online eval	Retrieval metrics + answer metrics + regression tests	Prevent silent regressions and staleness failures

任务	工具/框架	命令/模式	使用场景
选择RAG或替代方案	决策框架	满足以下条件选RAG：时效性+引用需求+语料规模；否则选择微调/缓存	避免不必要的检索延迟与复杂度
分块与解析	分块器+解析器	从简单方案开始；根据文档类型添加结构感知分块	文档、代码、表格、PDF的导入处理
检索	稀疏+稠密（混合）	融合算法（如RRF）+元数据过滤+top-k调优	混合查询风格；高召回率需求
精度提升	重排序器	对top-k候选结果进行交叉编码器/LLM重排序	当top-k结果包含近似匹配/噪声时
事实锚定	输出规范+引用	带稳定ID的引用；可回答性检查；缺少证据时拒绝生成	合规性、可信度与可审计性需求
评估	离线+在线评估	检索指标+回答指标+回归测试	防止隐性退化与内容过期故障

Decision Tree: RAG Architecture Selection

决策树：RAG架构选择

text

Building RAG system: [Architecture Path]
    ├─ Document type?
    │   ├─ Page/section-structured? → Structure-aware chunking (pages/sections + metadata)
    │   ├─ Technical docs/code? → Structure-aware + code-aware chunking (symbols, headers)
    │   └─ Simple content? → Fixed-size token chunking with overlap (baseline)
    │
    ├─ Retrieval accuracy low?
    │   ├─ Query ambiguity? → Query rewriting + multi-query expansion + filters
    │   ├─ Noisy results? → Add reranker + better metadata filters
    │   └─ Mixed queries? → Hybrid retrieval (sparse + dense) + reranking
    │
    ├─ Dataset size?
    │   ├─ <100k chunks? → Flat index (exact search)
    │   ├─ 100k-10M? → HNSW (low latency)
    │   └─ >10M? → IVF/ScaNN/DiskANN (scalable)
    │
    └─ Production quality?
        └─ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)

text

构建RAG系统：[架构路径]
    ├─ 文档类型？
    │   ├─ 按页面/章节结构化？→ 结构感知分块（页面/章节 + 元数据）
    │   ├─ 技术文档/代码？→ 结构感知+代码感知分块（符号、标题）
    │   └─ 简单内容？→ 带重叠的固定大小Token分块（基准方案）
    │
    ├─ 检索准确率低？
    │   ├─ 查询存在歧义？→ 查询改写+多查询扩展+过滤
    │   ├─ 结果存在噪声？→ 添加重排序器+更优元数据过滤
    │   └─ 混合查询场景？→ 混合检索（稀疏+稠密）+重排序
    │
    ├─ 数据集规模？
    │   ├─ <100k分块？→ 扁平索引（精确搜索）
    │   ├─ 100k-10M？→ HNSW（低延迟）
    │   └─ >10M？→ IVF/ScaNN/DiskANN（可扩展）
    │
    └─ 生产级质量要求？
        └─ 添加：访问控制列表（ACLs）、时效性/失效处理、评估闸门，以及端到端遥测监控

Core Concepts (Vendor-Agnostic)

核心概念（与厂商无关）

Pipeline stages: ingest → chunk → embed → index → retrieve → rerank → pack context → generate → verify.
Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).
Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).
Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

流水线阶段：导入→分块→嵌入→索引→检索→重排序→上下文打包→生成→验证。
两个评估维度：检索相关性（是否获取到正确证据？）与生成保真度（是否正确使用证据？）。
时效性模型：过期阈值、失效触发条件，以及重建策略（增量式 vs 全量式）。
信任边界：检索内容属于不可信输入；需采用与用户输入相同的严谨处理方式（参考OWASP LLM十大风险：https://owasp.org/www-project-top-10-for-large-language-model-applications/）。

Implementation Practices (Tooling Examples)

实践方案（工具示例）

Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs.
Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).

使用检索API规范：包含查询参数、过滤器、top_k、trace_id，以及返回的证据ID。
为每个阶段添加链路追踪/指标监控（参考OpenTelemetry GenAI语义规范：https://opentelemetry.io/docs/specs/semconv/gen-ai/）。
谨慎添加缓存：嵌入缓存、检索缓存（查询+过滤器），以及带失效机制的响应缓存。

Do / Avoid

建议/禁忌

Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.
Do enforce document-level ACLs at retrieval time (not only at generation time).
Do include citations with stable IDs and verify citation coverage in tests.

Avoid

Avoid shipping RAG without a test set and regression gate.
Avoid "stuff everything" context packing; it increases cost and can reduce accuracy.
Avoid mixing corpora without metadata and tenant isolation.

建议

确保检索过程的确定性：固定top_k、稳定排序、明确的过滤器。
在检索阶段强制实施文档级访问控制列表（ACLs），而非仅在生成阶段。
包含带稳定ID的引用，并在测试中验证引用覆盖率。

禁忌

避免在没有测试集与回归闸门的情况下上线RAG系统。
避免“打包所有内容”的上下文打包方式：这会增加成本并可能降低准确率。
避免在没有元数据与租户隔离的情况下混合不同语料库。

When to Use This Skill

适用场景

Use this skill when the user asks:

"Help me design a RAG pipeline."
"How should I chunk this document?"
"Optimize retrieval for my use case."
"My RAG system is hallucinating — fix it."
"Choose the right vector database / index type."
"Create a RAG evaluation framework."
"Debug why retrieval gives irrelevant results."

当用户提出以下问题时，可使用本指南：

"帮我设计一个RAG流水线。"
"我该如何对这份文档进行分块？"
"针对我的使用场景优化检索效果。"
"我的RAG系统出现幻觉问题——如何修复？"
"选择合适的向量数据库/索引类型。"
"创建RAG评估框架。"
"调试检索结果不相关的问题。"

Tool/Model Recommendation Protocol

工具/模型推荐流程

When users ask for vendor/model/framework recommendations, validate claims against current primary sources.

当用户询问厂商/模型/框架推荐时，需根据当前权威来源验证相关主张。

Triggers

触发场景

"What's the best vector database for [use case]?"
"What should I use for [chunking/embedding/reranking]?"
"What's the latest in RAG development?"
"Current best practices for [retrieval/grounding/evaluation]?"
"Is [Pinecone/Qdrant/Chroma] still relevant in 2026?"
"[Vector DB A] vs [Vector DB B]?"
"Best embedding model for [use case]?"
"What RAG framework should I use?"

"针对[使用场景]，最佳的向量数据库是什么？"
"我应该使用什么工具进行[分块/嵌入/重排序]？"
"RAG开发的最新进展是什么？"
"[检索/事实锚定/评估]的当前最佳实践是什么？"
"Pinecone/Qdrant/Chroma在2026年是否仍适用？"
"[向量数据库A] vs [向量数据库B]？"
"针对[使用场景]的最佳嵌入模型是什么？"
"我应该使用哪个RAG框架？"

Required Checks

必要检查

Read

data/sources.json

and start from sources with

"add_as_web_search": true

Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).
If browsing isn't available, state assumptions and give a verification checklist.

读取

data/sources.json

，优先使用标记为

"add_as_web_search": true

的来源。

为每个推荐验证1-2份权威文档（发布说明、基准测试、官方文档）。
若无法浏览外部资源，需说明假设前提并提供验证清单。

What to Report

输出内容

After checking, provide:

Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago)
Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG)
Deprecated/declining: Approaches or tools losing relevance
Recommendation: Based on fresh data, not just static knowledge

检查完成后，提供以下信息：

当前格局：当前流行的向量数据库/嵌入模型（而非6个月前的）
新兴趋势：逐渐普及的技术（延迟交互、Agentic RAG、图RAG）
已淘汰/衰退：逐渐失去相关性的方法或工具
推荐方案：基于最新数据，而非静态知识

Example Topics (verify with current sources)

示例主题（需根据当前来源验证）

Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)
Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)
Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)
RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)
Evaluation (RAGAS, TruLens, DeepEval, BEIR)

向量数据库（Pinecone、Qdrant、Weaviate、Milvus、pgvector、LanceDB）
嵌入模型（OpenAI、Cohere、Voyage AI、Jina、Sentence Transformers）
重排序（Cohere Rerank、Jina Reranker、FlashRank、RankGPT）
RAG框架（LlamaIndex、LangChain、Haystack、txtai）
高级RAG（上下文检索、Agentic RAG、图RAG、CRAG）
评估工具（RAGAS、TruLens、DeepEval、BEIR）

Related Skills

Templates

模板

System Design (Start Here)

系统设计（从这里开始）

RAG System Design

RAG系统设计

Chunking & Ingestion

分块与导入

Basic Chunking
Code Chunking
Long Document Chunking

基础分块
代码分块
长文档分块

Embedding & Indexing

嵌入与索引

Index Configuration
Metadata Schema

索引配置
元数据Schema

Retrieval & Reranking

检索与重排序

Retrieval Pipeline
Hybrid Search
Reranking
Ranking Pipeline
Reranker

检索流水线
混合搜索
重排序
排序流水线
重排序器

Context Packaging & Grounding

上下文打包与事实锚定

Context Packing
Grounding

上下文打包
事实锚定

Evaluation

评估

RAG Evaluation
RAG Test Set
Search Evaluation
Search Test Set

RAG评估
RAG测试集
搜索评估
搜索测试集

Search Configuration

搜索配置

BM25 Configuration
HNSW Configuration
IVF Configuration
Hybrid Configuration

BM25配置
HNSW配置
IVF配置
混合配置

Query Rewriting

查询改写

Query Rewrite

查询改写

Navigation

references/advanced-rag-patterns.md
references/agentic-rag-patterns.md
references/bm25-tuning.md
references/chunking-patterns.md
references/chunking-strategies.md
references/rag-evaluation-guide.md
references/rag-troubleshooting.md
references/contextual-retrieval-guide.md
references/distributed-search-slos.md
references/grounding-checklists.md
references/hybrid-fusion-patterns.md
references/index-selection-guide.md
references/multilingual-domain-patterns.md
references/pipeline-architecture.md
references/query-rewriting-patterns.md
references/ranking-pipeline-guide.md
references/retrieval-patterns.md
references/search-debugging.md
references/search-evaluation-guide.md
references/user-feedback-learning.md
references/vector-search-patterns.md

Templates

assets/context/template-context-packing.md
assets/context/template-grounding.md
assets/design/rag-system-design.md
assets/chunking/template-basic-chunking.md
assets/chunking/template-code-chunking.md
assets/chunking/template-long-doc-chunking.md
assets/retrieval/template-retrieval-pipeline.md
assets/retrieval/template-hybrid-search.md
assets/retrieval/template-reranking.md
assets/eval/template-rag-eval.md
assets/eval/template-rag-testset.jsonl
assets/eval/template-search-eval.md
assets/eval/template-search-testset.jsonl
assets/indexing/template-index-config.md
assets/indexing/template-metadata-schema.md
assets/query/template-query-rewrite.md
assets/ranking/template-ranking-pipeline.md
assets/ranking/template-reranker.md
assets/search/template-bm25-config.md
assets/search/template-hnsw-config.md
assets/search/template-ivf-config.md
assets/search/template-hybrid-config.md

Data

data/sources.json — Curated external references

Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.

资源

references/advanced-rag-patterns.md
references/agentic-rag-patterns.md
references/bm25-tuning.md
references/chunking-patterns.md
references/chunking-strategies.md
references/rag-evaluation-guide.md
references/rag-troubleshooting.md
references/contextual-retrieval-guide.md
references/distributed-search-slos.md
references/grounding-checklists.md
references/hybrid-fusion-patterns.md
references/index-selection-guide.md
references/multilingual-domain-patterns.md
references/pipeline-architecture.md
references/query-rewriting-patterns.md
references/ranking-pipeline-guide.md
references/retrieval-patterns.md
references/search-debugging.md
references/search-evaluation-guide.md
references/user-feedback-learning.md
references/vector-search-patterns.md

模板

assets/context/template-context-packing.md
assets/context/template-grounding.md
assets/design/rag-system-design.md
assets/chunking/template-basic-chunking.md
assets/chunking/template-code-chunking.md
assets/chunking/template-long-doc-chunking.md
assets/retrieval/template-retrieval-pipeline.md
assets/retrieval/template-hybrid-search.md
assets/retrieval/template-reranking.md
assets/eval/template-rag-eval.md
assets/eval/template-rag-testset.jsonl
assets/eval/template-search-eval.md
assets/eval/template-search-testset.jsonl
assets/indexing/template-index-config.md
assets/indexing/template-metadata-schema.md
assets/query/template-query-rewrite.md
assets/ranking/template-ranking-pipeline.md
assets/ranking/template-reranker.md
assets/search/template-bm25-config.md
assets/search/template-hnsw-config.md
assets/search/template-ivf-config.md
assets/search/template-hybrid-config.md

数据

data/sources.json — 精选外部参考资源

当用户需要检索增强系统的设计或调试支持（而非提示词工作或部署相关）时，可使用本技能。