rag-retrieval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRAG Retrieval
RAG 检索
Combine vector search with LLM generation for accurate, grounded responses.
将向量搜索与LLM生成相结合,以生成准确、基于事实的响应。
Basic RAG Pattern
基础RAG模式
python
async def rag_query(question: str, top_k: int = 5) -> str:
"""Basic RAG: retrieve then generate."""
# 1. Retrieve relevant documents
docs = await vector_db.search(question, limit=top_k)
# 2. Construct context
context = "\n\n".join([
f"[{i+1}] {doc.text}"
for i, doc in enumerate(docs)
])
# 3. Generate with context
response = await llm.chat([
{"role": "system", "content":
"Answer using ONLY the provided context. "
"If not in context, say 'I don't have that information.'"},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return response.contentpython
async def rag_query(question: str, top_k: int = 5) -> str:
"""Basic RAG: retrieve then generate."""
# 1. Retrieve relevant documents
docs = await vector_db.search(question, limit=top_k)
# 2. Construct context
context = "\n\n".join([
f"[{i+1}] {doc.text}"
for i, doc in enumerate(docs)
])
# 3. Generate with context
response = await llm.chat([
{"role": "system", "content":
"Answer using ONLY the provided context. "
"If not in context, say 'I don't have that information.'"},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return response.contentRAG with Citations
带引用的RAG
python
async def rag_with_citations(question: str) -> dict:
"""RAG with inline citations [1], [2], etc."""
docs = await vector_db.search(question, limit=5)
context = "\n\n".join([
f"[{i+1}] {doc.text}\nSource: {doc.metadata['source']}"
for i, doc in enumerate(docs)
])
response = await llm.chat([
{"role": "system", "content":
"Answer with inline citations like [1], [2]. "
"End with a Sources section."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return {
"answer": response.content,
"sources": [doc.metadata['source'] for doc in docs]
}python
async def rag_with_citations(question: str) -> dict:
"""RAG with inline citations [1], [2], etc."""
docs = await vector_db.search(question, limit=5)
context = "\n\n".join([
f"[{i+1}] {doc.text}\nSource: {doc.metadata['source']}"
for i, doc in enumerate(docs)
])
response = await llm.chat([
{"role": "system", "content":
"Answer with inline citations like [1], [2]. "
"End with a Sources section."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return {
"answer": response.content,
"sources": [doc.metadata['source'] for doc in docs]
}Hybrid Search (Semantic + Keyword)
混合搜索(语义+关键词)
python
def reciprocal_rank_fusion(
semantic_results: list,
keyword_results: list,
k: int = 60
) -> list:
"""Combine semantic and keyword search with RRF."""
scores = {}
for rank, doc in enumerate(semantic_results):
scores[doc.id] = scores.get(doc.id, 0) + 1 / (k + rank + 1)
for rank, doc in enumerate(keyword_results):
scores[doc.id] = scores.get(doc.id, 0) + 1 / (k + rank + 1)
# Sort by combined score
ranked_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
return [get_doc(id) for id in ranked_ids]python
def reciprocal_rank_fusion(
semantic_results: list,
keyword_results: list,
k: int = 60
) -> list:
"""Combine semantic and keyword search with RRF."""
scores = {}
for rank, doc in enumerate(semantic_results):
scores[doc.id] = scores.get(doc.id, 0) + 1 / (k + rank + 1)
for rank, doc in enumerate(keyword_results):
scores[doc.id] = scores.get(doc.id, 0) + 1 / (k + rank + 1)
# Sort by combined score
ranked_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
return [get_doc(id) for id in ranked_ids]Context Window Management
上下文窗口管理
python
def fit_context(docs: list, max_tokens: int = 6000) -> list:
"""Truncate context to fit token budget."""
total_tokens = 0
selected = []
for doc in docs:
doc_tokens = count_tokens(doc.text)
if total_tokens + doc_tokens > max_tokens:
break
selected.append(doc)
total_tokens += doc_tokens
return selectedGuidelines:
- Keep context under 75% of model limit
- Reserve tokens for system prompt + response
- Prioritize highest-relevance documents
python
def fit_context(docs: list, max_tokens: int = 6000) -> list:
"""Truncate context to fit token budget."""
total_tokens = 0
selected = []
for doc in docs:
doc_tokens = count_tokens(doc.text)
if total_tokens + doc_tokens > max_tokens:
break
selected.append(doc)
total_tokens += doc_tokens
return selected指导原则:
- 保持上下文在模型限制的75%以内
- 为系统提示词和响应预留token
- 优先选择相关性最高的文档
Context Sufficiency Check (2026 Best Practice)
上下文充分性检查(2026最佳实践)
python
from pydantic import BaseModel
class SufficiencyCheck(BaseModel):
"""Pre-generation context validation."""
is_sufficient: bool
confidence: float # 0.0-1.0
missing_info: str | None = None
async def rag_with_sufficiency(question: str, top_k: int = 5) -> str:
"""RAG with hallucination prevention via sufficiency check.
Based on Google Research ICLR 2025: Adding a sufficiency check
before generation reduces hallucinations from insufficient context.
"""
docs = await vector_db.search(question, limit=top_k)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])
# Pre-generation sufficiency check (prevents hallucination)
check = await llm.with_structured_output(SufficiencyCheck).ainvoke(
f"""Does this context contain sufficient information to answer the question?
Question: {question}
Context:
{context}
Evaluate:
- is_sufficient: Can the question be fully answered from context?
- confidence: How confident are you? (0.0-1.0)
- missing_info: What's missing if not sufficient?"""
)
# Abstain if context insufficient (high-confidence)
if not check.is_sufficient and check.confidence > 0.7:
return f"I don't have enough information to answer this question. Missing: {check.missing_info}"
# Low confidence → retrieve more context
if not check.is_sufficient and check.confidence <= 0.7:
more_docs = await vector_db.search(question, limit=top_k * 2)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(more_docs)])
# Generate only with sufficient context
response = await llm.chat([
{"role": "system", "content":
"Answer using ONLY the provided context. "
"If information is missing, say so rather than guessing."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return response.contentWhy this matters (Google Research 2025):
- RAG paradoxically increases hallucinations when context is insufficient
- Additional context increases model confidence → more likely to hallucinate
- Sufficiency check allows abstention when information is missing
python
from pydantic import BaseModel
class SufficiencyCheck(BaseModel):
"""Pre-generation context validation."""
is_sufficient: bool
confidence: float # 0.0-1.0
missing_info: str | None = None
async def rag_with_sufficiency(question: str, top_k: int = 5) -> str:
"""RAG with hallucination prevention via sufficiency check.
Based on Google Research ICLR 2025: Adding a sufficiency check
before generation reduces hallucinations from insufficient context.
"""
docs = await vector_db.search(question, limit=top_k)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])
# Pre-generation sufficiency check (prevents hallucination)
check = await llm.with_structured_output(SufficiencyCheck).ainvoke(
f"""Does this context contain sufficient information to answer the question?
Question: {question}
Context:
{context}
Evaluate:
- is_sufficient: Can the question be fully answered from context?
- confidence: How confident are you? (0.0-1.0)
- missing_info: What's missing if not sufficient?"""
)
# Abstain if context insufficient (high-confidence)
if not check.is_sufficient and check.confidence > 0.7:
return f"I don't have enough information to answer this question. Missing: {check.missing_info}"
# Low confidence → retrieve more context
if not check.is_sufficient and check.confidence <= 0.7:
more_docs = await vector_db.search(question, limit=top_k * 2)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(more_docs)])
# Generate only with sufficient context
response = await llm.chat([
{"role": "system", "content":
"Answer using ONLY the provided context. "
"If information is missing, say so rather than guessing."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return response.content为什么这很重要(Google Research 2025):
- 当上下文不足时,RAG反而会增加幻觉
- 额外的上下文会提升模型信心,进而更容易产生幻觉
- 充分性检查允许在信息缺失时拒绝回答
Key Decisions
关键决策
| Decision | Recommendation |
|---|---|
| Top-k | 3-10 documents |
| Temperature | 0.1-0.3 (factual) |
| Context budget | 4K-8K tokens |
| Hybrid ratio | 50/50 semantic/keyword |
| 决策项 | 推荐方案 |
|---|---|
| Top-k | 3-10份文档 |
| 温度值 | 0.1-0.3(事实类场景) |
| 上下文预算 | 4K-8K tokens |
| 混合比例 | 语义/关键词搜索各占50% |
Common Mistakes
常见错误
- No citation tracking (unverifiable answers)
- Context too large (dilutes relevance)
- Temperature too high (hallucinations)
- Single retrieval method (misses keyword matches)
- 未跟踪引用(答案无法验证)
- 上下文过大(稀释相关性)
- 温度值过高(产生幻觉)
- 单一检索方式(错过关键词匹配)
Advanced Patterns
高级模式
See for:
references/advanced-rag.md- HyDE Integration: Hypothetical document embeddings for vocabulary mismatch
- Agentic RAG: Multi-step retrieval with tool use
- Self-RAG: LLM decides when to retrieve and validates outputs
- Corrective RAG: Evaluate retrieval quality and correct if needed
- Pipeline Composition: Combine HyDE + Hybrid + Rerank
请参阅了解:
references/advanced-rag.md- HyDE集成:针对词汇不匹配问题的假设文档嵌入
- 智能体RAG:结合工具调用的多步检索
- Self-RAG:由LLM决定何时检索并验证输出
- 纠正性RAG:评估检索质量并在需要时进行修正
- 流水线组合:结合HyDE + 混合搜索 + 重排序
Related Skills
相关技能
- - Creating vectors for retrieval
embeddings - - Hypothetical document embeddings
hyde-retrieval - - Multi-concept query handling
query-decomposition - - Cross-encoder and LLM reranking
reranking-patterns - - Anthropic's context-prepending technique
contextual-retrieval - - Building agentic RAG workflows
langgraph-functional
- - 为检索创建向量
embeddings - - 假设文档嵌入
hyde-retrieval - - 多概念查询处理
query-decomposition - - 交叉编码器与LLM重排序
reranking-patterns - - Anthropic的上下文前置技术
contextual-retrieval - - 构建智能体RAG工作流
langgraph-functional
Capability Details
能力详情
retrieval-patterns
retrieval-patterns
Keywords: retrieval, context, chunks, relevance
Solves:
- Retrieve relevant context for LLM
- Implement RAG pipeline
- Optimize retrieval quality
关键词: retrieval, context, chunks, relevance
解决问题:
- 为LLM检索相关上下文
- 实现RAG流水线
- 优化检索质量
hybrid-search
hybrid-search
Keywords: hybrid, bm25, vector, fusion
Solves:
- Combine keyword and semantic search
- Implement reciprocal rank fusion
- Balance precision and recall
关键词: hybrid, bm25, vector, fusion
解决问题:
- 结合关键词与语义搜索
- 实现倒数排名融合(RRF)
- 平衡精确率与召回率
chatbot-example
chatbot-example
Keywords: chatbot, rag, example, typescript
Solves:
- Build RAG chatbot example
- TypeScript implementation
- End-to-end RAG pipeline
关键词: chatbot, rag, example, typescript
解决问题:
- 构建RAG聊天机器人示例
- TypeScript实现
- 端到端RAG流水线
pipeline-template
pipeline-template
Keywords: pipeline, template, implementation, starter
Solves:
- RAG pipeline starter template
- Production-ready code
- Copy-paste implementation
关键词: pipeline, template, implementation, starter
解决问题:
- RAG流水线启动模板
- 生产级代码
- 可直接复制使用的实现