hybrid-retrieval

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Hybrid Retrieval for RAG

面向RAG的混合检索

Combine dense (semantic) and sparse (keyword) retrieval for superior results.

结合稠密（语义）与稀疏（关键词）检索，获得更优结果。

When to Use

适用场景

Vector search misses exact keyword matches
Domain-specific terminology needs exact matching
Users search with both natural language and specific terms
Need to balance semantic understanding with precision

向量搜索无法命中精确关键词匹配结果
特定领域术语需要精确匹配
用户同时使用自然语言和特定术语进行搜索
需要平衡语义理解与检索精度

The Problem with Vector-Only Search

仅向量搜索的问题

Query: "Error code E-4521 troubleshooting"

Vector search returns:
- "Common error handling patterns" (semantically similar)
- "Debugging techniques for applications" (related topic)

Missing:
- "E-4521: Database connection timeout" (exact match needed!)

Query: "Error code E-4521 troubleshooting"

Vector search returns:
- "Common error handling patterns" (semantically similar)
- "Debugging techniques for applications" (related topic)

Missing:
- "E-4521: Database connection timeout" (exact match needed!)

Hybrid Architecture

混合检索架构

┌─────────────────────────────────────────────────┐
│                   User Query                     │
└─────────────────────┬───────────────────────────┘
                      │
         ┌────────────┴────────────┐
         │                         │
         ▼                         ▼
┌─────────────────┐      ┌─────────────────┐
│  Dense Search   │      │  Sparse Search  │
│  (Embeddings)   │      │  (BM25/TF-IDF)  │
└────────┬────────┘      └────────┬────────┘
         │                         │
         └────────────┬────────────┘
                      │
                      ▼
              ┌───────────────┐
              │    Fusion     │
              │  (RRF/Linear) │
              └───────┬───────┘
                      │
                      ▼
              ┌───────────────┐
              │   Reranker    │
              │  (Optional)   │
              └───────┬───────┘
                      │
                      ▼
              ┌───────────────┐
              │ Final Results │
              └───────────────┘

┌─────────────────────────────────────────────────┐
│                   User Query                     │
└─────────────────────┬───────────────────────────┘
                      │
         ┌────────────┴────────────┐
         │                         │
         ▼                         ▼
┌─────────────────┐      ┌─────────────────┐
│  Dense Search   │      │  Sparse Search  │
│  (Embeddings)   │      │  (BM25/TF-IDF)  │
└────────┬────────┘      └────────┬────────┘
         │                         │
         └────────────┬────────────┘
                      │
                      ▼
              ┌───────────────┐
              │    Fusion     │
              │  (RRF/Linear) │
              └───────┬───────┘
                      │
                      ▼
              ┌───────────────┐
              │   Reranker    │
              │  (Optional)   │
              └───────┬───────┘
                      │
                      ▼
              ┌───────────────┐
              │ Final Results │
              └───────────────┘

Implementation

实现方案

Basic Hybrid with LangChain

基于LangChain的基础混合检索

python

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma

python

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma

Dense retriever (vector search)

vectorstore = Chroma.from_documents(docs, embeddings) dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

Sparse retriever (BM25)

bm25_retriever = BM25Retriever.from_documents(docs) bm25_retriever.k = 10

Combine with ensemble

hybrid_retriever = EnsembleRetriever( retrievers=[dense_retriever, bm25_retriever], weights=[0.5, 0.5] # Adjust based on your data )

results = hybrid_retriever.invoke("Error code E-4521")

undefined

hybrid_retriever = EnsembleRetriever( retrievers=[dense_retriever, bm25_retriever], weights=[0.5, 0.5] # Adjust based on your data )

results = hybrid_retriever.invoke("Error code E-4521")

undefined

Reciprocal Rank Fusion (RRF)

reciprocal Rank Fusion（RRF）算法

python

def reciprocal_rank_fusion(results_list: list, k: int = 60) -> list:
    """
    Combine multiple ranked lists using RRF.
    k=60 is the standard constant from the original paper.
    """
    fused_scores = {}

    for results in results_list:
        for rank, doc in enumerate(results):
            doc_id = doc.metadata.get("id", hash(doc.page_content))
            if doc_id not in fused_scores:
                fused_scores[doc_id] = {"doc": doc, "score": 0}
            fused_scores[doc_id]["score"] += 1 / (k + rank + 1)

    # Sort by fused score
    reranked = sorted(
        fused_scores.values(),
        key=lambda x: x["score"],
        reverse=True
    )
    return [item["doc"] for item in reranked]

python

def reciprocal_rank_fusion(results_list: list, k: int = 60) -> list:
    """
    Combine multiple ranked lists using RRF.
    k=60 is the standard constant from the original paper.
    """
    fused_scores = {}

    for results in results_list:
        for rank, doc in enumerate(results):
            doc_id = doc.metadata.get("id", hash(doc.page_content))
            if doc_id not in fused_scores:
                fused_scores[doc_id] = {"doc": doc, "score": 0}
            fused_scores[doc_id]["score"] += 1 / (k + rank + 1)

    # Sort by fused score
    reranked = sorted(
        fused_scores.values(),
        key=lambda x: x["score"],
        reverse=True
    )
    return [item["doc"] for item in reranked]

Usage

dense_results = dense_retriever.invoke(query) sparse_results = bm25_retriever.invoke(query) final_results = reciprocal_rank_fusion([dense_results, sparse_results])

undefined

dense_results = dense_retriever.invoke(query) sparse_results = bm25_retriever.invoke(query) final_results = reciprocal_rank_fusion([dense_results, sparse_results])

undefined

With Pinecone (Native Hybrid)

基于Pinecone（原生混合检索）

python

from pinecone import Pinecone
from pinecone_text.sparse import BM25Encoder

python

from pinecone import Pinecone
from pinecone_text.sparse import BM25Encoder

Initialize

pc = Pinecone(api_key="...") index = pc.Index("hybrid-index")

Sparse encoder

bm25 = BM25Encoder() bm25.fit(corpus)

Query with both dense and sparse

def hybrid_query(query: str, alpha: float = 0.5): # Dense vector dense_vec = embeddings.embed_query(query)

# Sparse vector
sparse_vec = bm25.encode_queries([query])[0]

# Hybrid search
results = index.query(
    vector=dense_vec,
    sparse_vector=sparse_vec,
    top_k=10,
    alpha=alpha,  # 0 = sparse only, 1 = dense only
    include_metadata=True
)
return results

undefined

def hybrid_query(query: str, alpha: float = 0.5): # Dense vector dense_vec = embeddings.embed_query(query)

# Sparse vector
sparse_vec = bm25.encode_queries([query])[0]

# Hybrid search
results = index.query(
    vector=dense_vec,
    sparse_vector=sparse_vec,
    top_k=10,
    alpha=alpha,  # 0 = sparse only, 1 = dense only
    include_metadata=True
)
return results

undefined

With Weaviate (Native Hybrid)

基于Weaviate（原生混合检索）

python

import weaviate

client = weaviate.Client("http://localhost:8080")

result = client.query.get(
    "Document",
    ["content", "title"]
).with_hybrid(
    query="Error code E-4521",
    alpha=0.5,  # Balance between vector and keyword
    fusion_type="rankedFusion"
).with_limit(10).do()

python

import weaviate

client = weaviate.Client("http://localhost:8080")

result = client.query.get(
    "Document",
    ["content", "title"]
).with_hybrid(
    query="Error code E-4521",
    alpha=0.5,  # Balance between vector and keyword
    fusion_type="rankedFusion"
).with_limit(10).do()

Adding a Reranker

添加重排序器

python

from sentence_transformers import CrossEncoder

python

from sentence_transformers import CrossEncoder

Load reranker model

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

def rerank_results(query: str, docs: list, top_k: int = 5) -> list: """Rerank documents using cross-encoder.""" pairs = [[query, doc.page_content] for doc in docs] scores = reranker.predict(pairs)

# Sort by reranker scores
scored_docs = list(zip(docs, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)

return [doc for doc, score in scored_docs[:top_k]]

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

def rerank_results(query: str, docs: list, top_k: int = 5) -> list: """Rerank documents using cross-encoder.""" pairs = [[query, doc.page_content] for doc in docs] scores = reranker.predict(pairs)

# Sort by reranker scores
scored_docs = list(zip(docs, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)

return [doc for doc, score in scored_docs[:top_k]]

Full pipeline

hybrid_results = hybrid_retriever.invoke(query) # Get 20 results final_results = rerank_results(query, hybrid_results, top_k=5) # Rerank to top 5

undefined

hybrid_results = hybrid_retriever.invoke(query) # Get 20 results final_results = rerank_results(query, hybrid_results, top_k=5) # Rerank to top 5

undefined

Weight Tuning Guidelines

权重调优指南

Data Type	Dense Weight	Sparse Weight	Notes
General text	0.5	0.5	Balanced default
Technical docs	0.4	0.6	Keywords matter more
Conversational	0.7	0.3	Semantic matters more
Code/APIs	0.3	0.7	Exact matches critical
Legal/Medical	0.4	0.6	Terminology precision

数据类型	稠密检索权重	稀疏检索权重	说明
通用文本	0.5	0.5	平衡默认值
技术文档	0.4	0.6	关键词更重要
对话文本	0.7	0.3	语义更重要
代码/API	0.3	0.7	精确匹配至关重要
法律/医疗文本	0.4	0.6	术语精度优先

Evaluation

效果评估

python

def evaluate_retrieval(queries: list, ground_truth: dict, retriever) -> dict:
    """Calculate retrieval metrics."""
    metrics = {"mrr": 0, "recall@5": 0, "precision@5": 0}

    for query in queries:
        results = retriever.invoke(query)
        result_ids = [doc.metadata["id"] for doc in results[:5]]
        relevant_ids = ground_truth[query]

        # MRR
        for i, rid in enumerate(result_ids):
            if rid in relevant_ids:
                metrics["mrr"] += 1 / (i + 1)
                break

        # Recall & Precision
        hits = len(set(result_ids) & set(relevant_ids))
        metrics["recall@5"] += hits / len(relevant_ids)
        metrics["precision@5"] += hits / 5

    # Average
    n = len(queries)
    return {k: v/n for k, v in metrics.items()}

python

def evaluate_retrieval(queries: list, ground_truth: dict, retriever) -> dict:
    """Calculate retrieval metrics."""
    metrics = {"mrr": 0, "recall@5": 0, "precision@5": 0}

    for query in queries:
        results = retriever.invoke(query)
        result_ids = [doc.metadata["id"] for doc in results[:5]]
        relevant_ids = ground_truth[query]

        # MRR
        for i, rid in enumerate(result_ids):
            if rid in relevant_ids:
                metrics["mrr"] += 1 / (i + 1)
                break

        # Recall & Precision
        hits = len(set(result_ids) & set(relevant_ids))
        metrics["recall@5"] += hits / len(relevant_ids)
        metrics["precision@5"] += hits / 5

    # Average
    n = len(queries)
    return {k: v/n for k, v in metrics.items()}

Best Practices

最佳实践

Start with 50/50 weights - then tune based on evaluation
Always add a reranker - significant quality improvement
Index sparse vectors - BM25 on raw text, not chunks
Use native hybrid - when available (Pinecone, Weaviate, Qdrant)
Monitor both paths - log which retriever contributed to final results

从50/50权重开始 - 基于评估结果再进行调优
始终添加重排序器 - 可显著提升检索质量
为稀疏向量建立索引 - 对原始文本而非文本块执行BM25
使用原生混合检索 - 当平台支持时（如Pinecone、Weaviate、Qdrant）
监控两条检索路径 - 记录最终结果来自哪个检索器