ai-rag
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRAG & Search Engineering — Complete Reference
RAG与搜索工程——完整参考指南
Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.
This skill covers:
- RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
- Search: BM25, vector search, hybrid fusion, ranking pipelines
- Evaluation: recall@k, nDCG, MRR, groundedness metrics
Modern Best Practices (Jan 2026):
- Separate retrieval quality from answer quality; evaluate both (RAG: https://arxiv.org/abs/2005.11401).
- Default to hybrid retrieval (sparse + dense) with reranking when precision matters (DPR: https://arxiv.org/abs/2004.04906).
- Use a failure taxonomy to debug systematically (Seven Failure Points in RAG: https://arxiv.org/abs/2401.05856).
- Treat freshness/invalidation as first-class; staleness is a correctness bug, not a UX issue.
- Add grounding gates: answerability checks, citation coverage checks, and refusal-on-missing-context defaults.
- Threat-model RAG: retrieved text is untrusted input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.
Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.
借助hybrid search、grounded generation和可衡量的质量标准构建生产级检索系统。
本指南涵盖以下内容:
- RAG:分块、上下文检索、事实锚定、自适应/自校正系统
- 搜索:BM25、向量搜索、混合融合、排序流水线
- 评估:recall@k、nDCG、MRR、事实锚定指标
2026年1月现代最佳实践:
- 将检索质量与回答质量分开评估(参考RAG论文:https://arxiv.org/abs/2005.11401)。
- 当对精度有要求时,默认采用混合检索(稀疏+稠密)搭配重排序方案(参考DPR论文:https://arxiv.org/abs/2004.04906)。
- 使用故障分类法进行系统化调试(参考RAG的七个故障点论文:https://arxiv.org/abs/2401.05856)。
- 将时效性/失效处理视为核心需求:内容过期属于正确性问题,而非用户体验问题。
- 添加事实锚定闸门:可回答性检查、引用覆盖率检查,以及在缺少上下文时默认拒绝生成回答。
- 对RAG进行威胁建模:检索到的文本属于不可信输入(参考OWASP LLM十大风险:https://owasp.org/www-project-top-10-for-large-language-model-applications/)。
默认原则:确定性流水线、有限上下文、明确的故障处理,以及每个阶段的遥测监控。
范围说明:生成阶段涉及的提示词结构与输出规范,请参考ai-prompt-engineering。
Quick Reference
快速参考
| Task | Tool/Framework | Command/Pattern | When to Use |
|---|---|---|---|
| Decide RAG vs alternatives | Decision framework | RAG if: freshness + citations + corpus size; else: fine-tune/caching | Avoid unnecessary retrieval latency/complexity |
| Chunking & parsing | Chunker + parser | Start simple; add structure-aware chunking per doc type | Ingestion for docs, code, tables, PDFs |
| Retrieval | Sparse + dense (hybrid) | Fusion (e.g., RRF) + metadata filters + top-k tuning | Mixed query styles; high recall requirements |
| Precision boost | Reranker | Cross-encoder/LLM rerank of top-k candidates | When top-k contains near-misses/noise |
| Grounding | Output contract + citations | Quote/ID citations; answerability gate; refuse on missing evidence | Compliance, trust, and auditability |
| Evaluation | Offline + online eval | Retrieval metrics + answer metrics + regression tests | Prevent silent regressions and staleness failures |
| 任务 | 工具/框架 | 命令/模式 | 使用场景 |
|---|---|---|---|
| 选择RAG或替代方案 | 决策框架 | 满足以下条件选RAG:时效性+引用需求+语料规模;否则选择微调/缓存 | 避免不必要的检索延迟与复杂度 |
| 分块与解析 | 分块器+解析器 | 从简单方案开始;根据文档类型添加结构感知分块 | 文档、代码、表格、PDF的导入处理 |
| 检索 | 稀疏+稠密(混合) | 融合算法(如RRF)+元数据过滤+top-k调优 | 混合查询风格;高召回率需求 |
| 精度提升 | 重排序器 | 对top-k候选结果进行交叉编码器/LLM重排序 | 当top-k结果包含近似匹配/噪声时 |
| 事实锚定 | 输出规范+引用 | 带稳定ID的引用;可回答性检查;缺少证据时拒绝生成 | 合规性、可信度与可审计性需求 |
| 评估 | 离线+在线评估 | 检索指标+回答指标+回归测试 | 防止隐性退化与内容过期故障 |
Decision Tree: RAG Architecture Selection
决策树:RAG架构选择
text
Building RAG system: [Architecture Path]
├─ Document type?
│ ├─ Page/section-structured? → Structure-aware chunking (pages/sections + metadata)
│ ├─ Technical docs/code? → Structure-aware + code-aware chunking (symbols, headers)
│ └─ Simple content? → Fixed-size token chunking with overlap (baseline)
│
├─ Retrieval accuracy low?
│ ├─ Query ambiguity? → Query rewriting + multi-query expansion + filters
│ ├─ Noisy results? → Add reranker + better metadata filters
│ └─ Mixed queries? → Hybrid retrieval (sparse + dense) + reranking
│
├─ Dataset size?
│ ├─ <100k chunks? → Flat index (exact search)
│ ├─ 100k-10M? → HNSW (low latency)
│ └─ >10M? → IVF/ScaNN/DiskANN (scalable)
│
└─ Production quality?
└─ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)text
构建RAG系统:[架构路径]
├─ 文档类型?
│ ├─ 按页面/章节结构化?→ 结构感知分块(页面/章节 + 元数据)
│ ├─ 技术文档/代码?→ 结构感知+代码感知分块(符号、标题)
│ └─ 简单内容?→ 带重叠的固定大小Token分块(基准方案)
│
├─ 检索准确率低?
│ ├─ 查询存在歧义?→ 查询改写+多查询扩展+过滤
│ ├─ 结果存在噪声?→ 添加重排序器+更优元数据过滤
│ └─ 混合查询场景?→ 混合检索(稀疏+稠密)+重排序
│
├─ 数据集规模?
│ ├─ <100k分块?→ 扁平索引(精确搜索)
│ ├─ 100k-10M?→ HNSW(低延迟)
│ └─ >10M?→ IVF/ScaNN/DiskANN(可扩展)
│
└─ 生产级质量要求?
└─ 添加:访问控制列表(ACLs)、时效性/失效处理、评估闸门,以及端到端遥测监控Core Concepts (Vendor-Agnostic)
核心概念(与厂商无关)
- Pipeline stages: ingest → chunk → embed → index → retrieve → rerank → pack context → generate → verify.
- Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).
- Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).
- Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
- 流水线阶段:导入→分块→嵌入→索引→检索→重排序→上下文打包→生成→验证。
- 两个评估维度:检索相关性(是否获取到正确证据?)与生成保真度(是否正确使用证据?)。
- 时效性模型:过期阈值、失效触发条件,以及重建策略(增量式 vs 全量式)。
- 信任边界:检索内容属于不可信输入;需采用与用户输入相同的严谨处理方式(参考OWASP LLM十大风险:https://owasp.org/www-project-top-10-for-large-language-model-applications/)。
Implementation Practices (Tooling Examples)
实践方案(工具示例)
- Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs.
- Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
- Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).
- 使用检索API规范:包含查询参数、过滤器、top_k、trace_id,以及返回的证据ID。
- 为每个阶段添加链路追踪/指标监控(参考OpenTelemetry GenAI语义规范:https://opentelemetry.io/docs/specs/semconv/gen-ai/)。
- 谨慎添加缓存:嵌入缓存、检索缓存(查询+过滤器),以及带失效机制的响应缓存。
Do / Avoid
建议/禁忌
Do
- Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.
- Do enforce document-level ACLs at retrieval time (not only at generation time).
- Do include citations with stable IDs and verify citation coverage in tests.
Avoid
- Avoid shipping RAG without a test set and regression gate.
- Avoid "stuff everything" context packing; it increases cost and can reduce accuracy.
- Avoid mixing corpora without metadata and tenant isolation.
建议
- 确保检索过程的确定性:固定top_k、稳定排序、明确的过滤器。
- 在检索阶段强制实施文档级访问控制列表(ACLs),而非仅在生成阶段。
- 包含带稳定ID的引用,并在测试中验证引用覆盖率。
禁忌
- 避免在没有测试集与回归闸门的情况下上线RAG系统。
- 避免“打包所有内容”的上下文打包方式:这会增加成本并可能降低准确率。
- 避免在没有元数据与租户隔离的情况下混合不同语料库。
When to Use This Skill
适用场景
Use this skill when the user asks:
- "Help me design a RAG pipeline."
- "How should I chunk this document?"
- "Optimize retrieval for my use case."
- "My RAG system is hallucinating — fix it."
- "Choose the right vector database / index type."
- "Create a RAG evaluation framework."
- "Debug why retrieval gives irrelevant results."
当用户提出以下问题时,可使用本指南:
- "帮我设计一个RAG流水线。"
- "我该如何对这份文档进行分块?"
- "针对我的使用场景优化检索效果。"
- "我的RAG系统出现幻觉问题——如何修复?"
- "选择合适的向量数据库/索引类型。"
- "创建RAG评估框架。"
- "调试检索结果不相关的问题。"
Tool/Model Recommendation Protocol
工具/模型推荐流程
When users ask for vendor/model/framework recommendations, validate claims against current primary sources.
当用户询问厂商/模型/框架推荐时,需根据当前权威来源验证相关主张。
Triggers
触发场景
- "What's the best vector database for [use case]?"
- "What should I use for [chunking/embedding/reranking]?"
- "What's the latest in RAG development?"
- "Current best practices for [retrieval/grounding/evaluation]?"
- "Is [Pinecone/Qdrant/Chroma] still relevant in 2026?"
- "[Vector DB A] vs [Vector DB B]?"
- "Best embedding model for [use case]?"
- "What RAG framework should I use?"
- "针对[使用场景],最佳的向量数据库是什么?"
- "我应该使用什么工具进行[分块/嵌入/重排序]?"
- "RAG开发的最新进展是什么?"
- "[检索/事实锚定/评估]的当前最佳实践是什么?"
- "Pinecone/Qdrant/Chroma在2026年是否仍适用?"
- "[向量数据库A] vs [向量数据库B]?"
- "针对[使用场景]的最佳嵌入模型是什么?"
- "我应该使用哪个RAG框架?"
Required Checks
必要检查
- Read and start from sources with
data/sources.json."add_as_web_search": true - Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).
- If browsing isn't available, state assumptions and give a verification checklist.
- 读取,优先使用标记为
data/sources.json的来源。"add_as_web_search": true - 为每个推荐验证1-2份权威文档(发布说明、基准测试、官方文档)。
- 若无法浏览外部资源,需说明假设前提并提供验证清单。
What to Report
输出内容
After checking, provide:
- Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago)
- Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG)
- Deprecated/declining: Approaches or tools losing relevance
- Recommendation: Based on fresh data, not just static knowledge
检查完成后,提供以下信息:
- 当前格局:当前流行的向量数据库/嵌入模型(而非6个月前的)
- 新兴趋势:逐渐普及的技术(延迟交互、Agentic RAG、图RAG)
- 已淘汰/衰退:逐渐失去相关性的方法或工具
- 推荐方案:基于最新数据,而非静态知识
Example Topics (verify with current sources)
示例主题(需根据当前来源验证)
- Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)
- Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)
- Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)
- RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
- Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)
- Evaluation (RAGAS, TruLens, DeepEval, BEIR)
- 向量数据库(Pinecone、Qdrant、Weaviate、Milvus、pgvector、LanceDB)
- 嵌入模型(OpenAI、Cohere、Voyage AI、Jina、Sentence Transformers)
- 重排序(Cohere Rerank、Jina Reranker、FlashRank、RankGPT)
- RAG框架(LlamaIndex、LangChain、Haystack、txtai)
- 高级RAG(上下文检索、Agentic RAG、图RAG、CRAG)
- 评估工具(RAGAS、TruLens、DeepEval、BEIR)
Related Skills
相关技能
For adjacent topics, reference these skills:
- ai-llm - Prompting, fine-tuning, instruction datasets
- ai-agents - Agentic RAG workflows and tool routing
- ai-llm-inference - Serving performance, quantization, batching
- ai-mlops - Deployment, monitoring, security, privacy, and governance
- ai-prompt-engineering - Prompt patterns for RAG generation phase
对于相邻主题,可参考以下技能:
- ai-llm - 提示词工程、微调、指令数据集
- ai-agents - Agentic RAG工作流与工具路由
- ai-llm-inference - 服务性能、量化、批处理
- ai-mlops - 部署、监控、安全、隐私与治理
- ai-prompt-engineering - RAG生成阶段的提示词模式
Templates
模板
System Design (Start Here)
系统设计(从这里开始)
- RAG System Design
- RAG系统设计
Chunking & Ingestion
分块与导入
- Basic Chunking
- Code Chunking
- Long Document Chunking
- 基础分块
- 代码分块
- 长文档分块
Embedding & Indexing
嵌入与索引
- Index Configuration
- Metadata Schema
- 索引配置
- 元数据Schema
Retrieval & Reranking
检索与重排序
- Retrieval Pipeline
- Hybrid Search
- Reranking
- Ranking Pipeline
- Reranker
- 检索流水线
- 混合搜索
- 重排序
- 排序流水线
- 重排序器
Context Packaging & Grounding
上下文打包与事实锚定
- Context Packing
- Grounding
- 上下文打包
- 事实锚定
Evaluation
评估
- RAG Evaluation
- RAG Test Set
- Search Evaluation
- Search Test Set
- RAG评估
- RAG测试集
- 搜索评估
- 搜索测试集
Search Configuration
搜索配置
- BM25 Configuration
- HNSW Configuration
- IVF Configuration
- Hybrid Configuration
- BM25配置
- HNSW配置
- IVF配置
- 混合配置
Query Rewriting
查询改写
- Query Rewrite
- 查询改写
Navigation
导航
Resources
- references/advanced-rag-patterns.md
- references/agentic-rag-patterns.md
- references/bm25-tuning.md
- references/chunking-patterns.md
- references/chunking-strategies.md
- references/rag-evaluation-guide.md
- references/rag-troubleshooting.md
- references/contextual-retrieval-guide.md
- references/distributed-search-slos.md
- references/grounding-checklists.md
- references/hybrid-fusion-patterns.md
- references/index-selection-guide.md
- references/multilingual-domain-patterns.md
- references/pipeline-architecture.md
- references/query-rewriting-patterns.md
- references/ranking-pipeline-guide.md
- references/retrieval-patterns.md
- references/search-debugging.md
- references/search-evaluation-guide.md
- references/user-feedback-learning.md
- references/vector-search-patterns.md
Templates
- assets/context/template-context-packing.md
- assets/context/template-grounding.md
- assets/design/rag-system-design.md
- assets/chunking/template-basic-chunking.md
- assets/chunking/template-code-chunking.md
- assets/chunking/template-long-doc-chunking.md
- assets/retrieval/template-retrieval-pipeline.md
- assets/retrieval/template-hybrid-search.md
- assets/retrieval/template-reranking.md
- assets/eval/template-rag-eval.md
- assets/eval/template-rag-testset.jsonl
- assets/eval/template-search-eval.md
- assets/eval/template-search-testset.jsonl
- assets/indexing/template-index-config.md
- assets/indexing/template-metadata-schema.md
- assets/query/template-query-rewrite.md
- assets/ranking/template-ranking-pipeline.md
- assets/ranking/template-reranker.md
- assets/search/template-bm25-config.md
- assets/search/template-hnsw-config.md
- assets/search/template-ivf-config.md
- assets/search/template-hybrid-config.md
Data
- data/sources.json — Curated external references
Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.
资源
- references/advanced-rag-patterns.md
- references/agentic-rag-patterns.md
- references/bm25-tuning.md
- references/chunking-patterns.md
- references/chunking-strategies.md
- references/rag-evaluation-guide.md
- references/rag-troubleshooting.md
- references/contextual-retrieval-guide.md
- references/distributed-search-slos.md
- references/grounding-checklists.md
- references/hybrid-fusion-patterns.md
- references/index-selection-guide.md
- references/multilingual-domain-patterns.md
- references/pipeline-architecture.md
- references/query-rewriting-patterns.md
- references/ranking-pipeline-guide.md
- references/retrieval-patterns.md
- references/search-debugging.md
- references/search-evaluation-guide.md
- references/user-feedback-learning.md
- references/vector-search-patterns.md
模板
- assets/context/template-context-packing.md
- assets/context/template-grounding.md
- assets/design/rag-system-design.md
- assets/chunking/template-basic-chunking.md
- assets/chunking/template-code-chunking.md
- assets/chunking/template-long-doc-chunking.md
- assets/retrieval/template-retrieval-pipeline.md
- assets/retrieval/template-hybrid-search.md
- assets/retrieval/template-reranking.md
- assets/eval/template-rag-eval.md
- assets/eval/template-rag-testset.jsonl
- assets/eval/template-search-eval.md
- assets/eval/template-search-testset.jsonl
- assets/indexing/template-index-config.md
- assets/indexing/template-metadata-schema.md
- assets/query/template-query-rewrite.md
- assets/ranking/template-ranking-pipeline.md
- assets/ranking/template-reranker.md
- assets/search/template-bm25-config.md
- assets/search/template-hnsw-config.md
- assets/search/template-ivf-config.md
- assets/search/template-hybrid-config.md
数据
- data/sources.json — 精选外部参考资源
当用户需要检索增强系统的设计或调试支持(而非提示词工作或部署相关)时,可使用本技能。