hybrid-retrieval

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Hybrid Retrieval for RAG

面向RAG的混合检索

Combine dense (semantic) and sparse (keyword) retrieval for superior results.
结合稠密(语义)与稀疏(关键词)检索,获得更优结果。

When to Use

适用场景

  • Vector search misses exact keyword matches
  • Domain-specific terminology needs exact matching
  • Users search with both natural language and specific terms
  • Need to balance semantic understanding with precision
  • 向量搜索无法命中精确关键词匹配结果
  • 特定领域术语需要精确匹配
  • 用户同时使用自然语言和特定术语进行搜索
  • 需要平衡语义理解与检索精度

The Problem with Vector-Only Search

仅向量搜索的问题

Query: "Error code E-4521 troubleshooting"

Vector search returns:
- "Common error handling patterns" (semantically similar)
- "Debugging techniques for applications" (related topic)

Missing:
- "E-4521: Database connection timeout" (exact match needed!)
Query: "Error code E-4521 troubleshooting"

Vector search returns:
- "Common error handling patterns" (semantically similar)
- "Debugging techniques for applications" (related topic)

Missing:
- "E-4521: Database connection timeout" (exact match needed!)

Hybrid Architecture

混合检索架构

┌─────────────────────────────────────────────────┐
│                   User Query                     │
└─────────────────────┬───────────────────────────┘
         ┌────────────┴────────────┐
         │                         │
         ▼                         ▼
┌─────────────────┐      ┌─────────────────┐
│  Dense Search   │      │  Sparse Search  │
│  (Embeddings)   │      │  (BM25/TF-IDF)  │
└────────┬────────┘      └────────┬────────┘
         │                         │
         └────────────┬────────────┘
              ┌───────────────┐
              │    Fusion     │
              │  (RRF/Linear) │
              └───────┬───────┘
              ┌───────────────┐
              │   Reranker    │
              │  (Optional)   │
              └───────┬───────┘
              ┌───────────────┐
              │ Final Results │
              └───────────────┘
┌─────────────────────────────────────────────────┐
│                   User Query                     │
└─────────────────────┬───────────────────────────┘
         ┌────────────┴────────────┐
         │                         │
         ▼                         ▼
┌─────────────────┐      ┌─────────────────┐
│  Dense Search   │      │  Sparse Search  │
│  (Embeddings)   │      │  (BM25/TF-IDF)  │
└────────┬────────┘      └────────┬────────┘
         │                         │
         └────────────┬────────────┘
              ┌───────────────┐
              │    Fusion     │
              │  (RRF/Linear) │
              └───────┬───────┘
              ┌───────────────┐
              │   Reranker    │
              │  (Optional)   │
              └───────┬───────┘
              ┌───────────────┐
              │ Final Results │
              └───────────────┘

Implementation

实现方案

Basic Hybrid with LangChain

基于LangChain的基础混合检索

python
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma
python
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma

Dense retriever (vector search)

Dense retriever (vector search)

vectorstore = Chroma.from_documents(docs, embeddings) dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
vectorstore = Chroma.from_documents(docs, embeddings) dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

Sparse retriever (BM25)

Sparse retriever (BM25)

bm25_retriever = BM25Retriever.from_documents(docs) bm25_retriever.k = 10
bm25_retriever = BM25Retriever.from_documents(docs) bm25_retriever.k = 10

Combine with ensemble

Combine with ensemble

hybrid_retriever = EnsembleRetriever( retrievers=[dense_retriever, bm25_retriever], weights=[0.5, 0.5] # Adjust based on your data )
results = hybrid_retriever.invoke("Error code E-4521")
undefined
hybrid_retriever = EnsembleRetriever( retrievers=[dense_retriever, bm25_retriever], weights=[0.5, 0.5] # Adjust based on your data )
results = hybrid_retriever.invoke("Error code E-4521")
undefined

Reciprocal Rank Fusion (RRF)

reciprocal Rank Fusion(RRF)算法

python
def reciprocal_rank_fusion(results_list: list, k: int = 60) -> list:
    """
    Combine multiple ranked lists using RRF.
    k=60 is the standard constant from the original paper.
    """
    fused_scores = {}

    for results in results_list:
        for rank, doc in enumerate(results):
            doc_id = doc.metadata.get("id", hash(doc.page_content))
            if doc_id not in fused_scores:
                fused_scores[doc_id] = {"doc": doc, "score": 0}
            fused_scores[doc_id]["score"] += 1 / (k + rank + 1)

    # Sort by fused score
    reranked = sorted(
        fused_scores.values(),
        key=lambda x: x["score"],
        reverse=True
    )
    return [item["doc"] for item in reranked]
python
def reciprocal_rank_fusion(results_list: list, k: int = 60) -> list:
    """
    Combine multiple ranked lists using RRF.
    k=60 is the standard constant from the original paper.
    """
    fused_scores = {}

    for results in results_list:
        for rank, doc in enumerate(results):
            doc_id = doc.metadata.get("id", hash(doc.page_content))
            if doc_id not in fused_scores:
                fused_scores[doc_id] = {"doc": doc, "score": 0}
            fused_scores[doc_id]["score"] += 1 / (k + rank + 1)

    # Sort by fused score
    reranked = sorted(
        fused_scores.values(),
        key=lambda x: x["score"],
        reverse=True
    )
    return [item["doc"] for item in reranked]

Usage

Usage

dense_results = dense_retriever.invoke(query) sparse_results = bm25_retriever.invoke(query) final_results = reciprocal_rank_fusion([dense_results, sparse_results])
undefined
dense_results = dense_retriever.invoke(query) sparse_results = bm25_retriever.invoke(query) final_results = reciprocal_rank_fusion([dense_results, sparse_results])
undefined

With Pinecone (Native Hybrid)

基于Pinecone(原生混合检索)

python
from pinecone import Pinecone
from pinecone_text.sparse import BM25Encoder
python
from pinecone import Pinecone
from pinecone_text.sparse import BM25Encoder

Initialize

Initialize

pc = Pinecone(api_key="...") index = pc.Index("hybrid-index")
pc = Pinecone(api_key="...") index = pc.Index("hybrid-index")

Sparse encoder

Sparse encoder

bm25 = BM25Encoder() bm25.fit(corpus)
bm25 = BM25Encoder() bm25.fit(corpus)

Query with both dense and sparse

Query with both dense and sparse

def hybrid_query(query: str, alpha: float = 0.5): # Dense vector dense_vec = embeddings.embed_query(query)
# Sparse vector
sparse_vec = bm25.encode_queries([query])[0]

# Hybrid search
results = index.query(
    vector=dense_vec,
    sparse_vector=sparse_vec,
    top_k=10,
    alpha=alpha,  # 0 = sparse only, 1 = dense only
    include_metadata=True
)
return results
undefined
def hybrid_query(query: str, alpha: float = 0.5): # Dense vector dense_vec = embeddings.embed_query(query)
# Sparse vector
sparse_vec = bm25.encode_queries([query])[0]

# Hybrid search
results = index.query(
    vector=dense_vec,
    sparse_vector=sparse_vec,
    top_k=10,
    alpha=alpha,  # 0 = sparse only, 1 = dense only
    include_metadata=True
)
return results
undefined

With Weaviate (Native Hybrid)

基于Weaviate(原生混合检索)

python
import weaviate

client = weaviate.Client("http://localhost:8080")

result = client.query.get(
    "Document",
    ["content", "title"]
).with_hybrid(
    query="Error code E-4521",
    alpha=0.5,  # Balance between vector and keyword
    fusion_type="rankedFusion"
).with_limit(10).do()
python
import weaviate

client = weaviate.Client("http://localhost:8080")

result = client.query.get(
    "Document",
    ["content", "title"]
).with_hybrid(
    query="Error code E-4521",
    alpha=0.5,  # Balance between vector and keyword
    fusion_type="rankedFusion"
).with_limit(10).do()

Adding a Reranker

添加重排序器

python
from sentence_transformers import CrossEncoder
python
from sentence_transformers import CrossEncoder

Load reranker model

Load reranker model

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank_results(query: str, docs: list, top_k: int = 5) -> list: """Rerank documents using cross-encoder.""" pairs = [[query, doc.page_content] for doc in docs] scores = reranker.predict(pairs)
# Sort by reranker scores
scored_docs = list(zip(docs, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)

return [doc for doc, score in scored_docs[:top_k]]
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank_results(query: str, docs: list, top_k: int = 5) -> list: """Rerank documents using cross-encoder.""" pairs = [[query, doc.page_content] for doc in docs] scores = reranker.predict(pairs)
# Sort by reranker scores
scored_docs = list(zip(docs, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)

return [doc for doc, score in scored_docs[:top_k]]

Full pipeline

Full pipeline

hybrid_results = hybrid_retriever.invoke(query) # Get 20 results final_results = rerank_results(query, hybrid_results, top_k=5) # Rerank to top 5
undefined
hybrid_results = hybrid_retriever.invoke(query) # Get 20 results final_results = rerank_results(query, hybrid_results, top_k=5) # Rerank to top 5
undefined

Weight Tuning Guidelines

权重调优指南

Data TypeDense WeightSparse WeightNotes
General text0.50.5Balanced default
Technical docs0.40.6Keywords matter more
Conversational0.70.3Semantic matters more
Code/APIs0.30.7Exact matches critical
Legal/Medical0.40.6Terminology precision
数据类型稠密检索权重稀疏检索权重说明
通用文本0.50.5平衡默认值
技术文档0.40.6关键词更重要
对话文本0.70.3语义更重要
代码/API0.30.7精确匹配至关重要
法律/医疗文本0.40.6术语精度优先

Evaluation

效果评估

python
def evaluate_retrieval(queries: list, ground_truth: dict, retriever) -> dict:
    """Calculate retrieval metrics."""
    metrics = {"mrr": 0, "recall@5": 0, "precision@5": 0}

    for query in queries:
        results = retriever.invoke(query)
        result_ids = [doc.metadata["id"] for doc in results[:5]]
        relevant_ids = ground_truth[query]

        # MRR
        for i, rid in enumerate(result_ids):
            if rid in relevant_ids:
                metrics["mrr"] += 1 / (i + 1)
                break

        # Recall & Precision
        hits = len(set(result_ids) & set(relevant_ids))
        metrics["recall@5"] += hits / len(relevant_ids)
        metrics["precision@5"] += hits / 5

    # Average
    n = len(queries)
    return {k: v/n for k, v in metrics.items()}
python
def evaluate_retrieval(queries: list, ground_truth: dict, retriever) -> dict:
    """Calculate retrieval metrics."""
    metrics = {"mrr": 0, "recall@5": 0, "precision@5": 0}

    for query in queries:
        results = retriever.invoke(query)
        result_ids = [doc.metadata["id"] for doc in results[:5]]
        relevant_ids = ground_truth[query]

        # MRR
        for i, rid in enumerate(result_ids):
            if rid in relevant_ids:
                metrics["mrr"] += 1 / (i + 1)
                break

        # Recall & Precision
        hits = len(set(result_ids) & set(relevant_ids))
        metrics["recall@5"] += hits / len(relevant_ids)
        metrics["precision@5"] += hits / 5

    # Average
    n = len(queries)
    return {k: v/n for k, v in metrics.items()}

Best Practices

最佳实践

  1. Start with 50/50 weights - then tune based on evaluation
  2. Always add a reranker - significant quality improvement
  3. Index sparse vectors - BM25 on raw text, not chunks
  4. Use native hybrid - when available (Pinecone, Weaviate, Qdrant)
  5. Monitor both paths - log which retriever contributed to final results
  1. 从50/50权重开始 - 基于评估结果再进行调优
  2. 始终添加重排序器 - 可显著提升检索质量
  3. 为稀疏向量建立索引 - 对原始文本而非文本块执行BM25
  4. 使用原生混合检索 - 当平台支持时(如Pinecone、Weaviate、Qdrant)
  5. 监控两条检索路径 - 记录最终结果来自哪个检索器