redis-vector-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Redis Vector Search

Redis 向量搜索

Guidance for storing and searching embeddings in Redis. Covers index configuration, algorithm selection, hybrid filtering, and the RAG retrieval pattern with RedisVL.
本文指导如何在Redis中存储和搜索嵌入向量,涵盖索引配置、算法选择、混合过滤,以及基于RedisVL的RAG检索模式。

When to apply

适用场景

  • Defining a
    VECTOR
    field in
    FT.CREATE
    (raw RQE) or a RedisVL
    IndexSchema
    .
  • Choosing HNSW vs FLAT and tuning HNSW parameters.
  • Adding category, date, or tenant filters to a vector query.
  • Building a retrieval-augmented generation (RAG) pipeline on top of Redis.
This skill builds on the
redis-query-engine
skill — vector fields live inside RQE indexes and share the same
FT.CREATE
/
FT.SEARCH
machinery.
  • FT.CREATE
    (原生RQE)或RedisVL
    IndexSchema
    中定义
    VECTOR
    字段时。
  • 选择HNSW与FLAT算法并调优HNSW参数时。
  • 为向量查询添加分类、日期或租户过滤器时。
  • 在Redis之上构建检索增强生成(RAG)流水线时。
本技能基于
redis-query-engine
技能扩展——向量字段存在于RQE索引中,共享相同的
FT.CREATE
/
FT.SEARCH
机制。

1. Configure the vector index properly

1. 正确配置向量索引

Three settings must match the embedding model:
  • DIM
    — the model's output dimensionality (e.g. 1536 for OpenAI
    text-embedding-3-small
    ). A mismatch produces silent garbage.
  • DISTANCE_METRIC
    COSINE
    for normalized text embeddings (the common case),
    IP
    for unnormalized inner-product,
    L2
    for raw Euclidean.
  • TYPE
    /
    datatype
    — usually
    FLOAT32
    . Use
    FLOAT16
    or quantized variants only when memory cost is a hard constraint.
Raw RQE:
FT.CREATE idx:docs ON HASH PREFIX 1 doc:
    SCHEMA
        content TEXT
        embedding VECTOR HNSW 6
            TYPE FLOAT32
            DIM 1536
            DISTANCE_METRIC COSINE
RedisVL:
python
schema = IndexSchema.from_dict({
    "index": {"name": "idx:docs", "prefix": "doc:"},
    "fields": [
        {"name": "content", "type": "text"},
        {"name": "embedding", "type": "vector", "attrs": {
            "dims": 1536, "algorithm": "HNSW",
            "datatype": "FLOAT32", "distance_metric": "COSINE",
        }},
    ]
})
See references/index-creation.md for redis-py and RedisVL variants.
有三个设置必须与嵌入模型匹配:
  • DIM
    ——模型输出的维度(例如OpenAI
    text-embedding-3-small
    为1536)。不匹配会导致无提示的错误结果。
  • DISTANCE_METRIC
    ——归一化文本嵌入使用
    COSINE
    (常见情况),非归一化内积使用
    IP
    ,原始欧氏距离使用
    L2
  • TYPE
    /
    datatype
    ——通常为
    FLOAT32
    。仅当内存成本是硬性约束时,才使用
    FLOAT16
    或量化变体。
原生RQE示例:
FT.CREATE idx:docs ON HASH PREFIX 1 doc:
    SCHEMA
        content TEXT
        embedding VECTOR HNSW 6
            TYPE FLOAT32
            DIM 1536
            DISTANCE_METRIC COSINE
RedisVL示例:
python
schema = IndexSchema.from_dict({
    "index": {"name": "idx:docs", "prefix": "doc:"},
    "fields": [
        {"name": "content", "type": "text"},
        {"name": "embedding", "type": "vector", "attrs": {
            "dims": 1536, "algorithm": "HNSW",
            "datatype": "FLOAT32", "distance_metric": "COSINE",
        }},
    ]
})
查看references/index-creation.md获取redis-py和RedisVL的更多变体。

2. HNSW vs FLAT

2. HNSW 对比 FLAT

AlgorithmSpeedAccuracyMemoryBest for
HNSWFast (approximate)~95%+ recall (tunable)HigherLarge datasets (>10k vectors), latency-sensitive
FLATSlow (exact)100%LowerSmall datasets (<10k), accuracy-critical
Default to HNSW for any production-scale workload. Tuning levers:
  • M
    — connections per node (16–64). Higher = better recall, more memory.
  • EF_CONSTRUCTION
    — build-time graph quality (100–500). Higher = better index, slower build.
  • EF_RUNTIME
    — query-time candidate-list size. Higher = better recall, slower queries.
Use FLAT when the corpus is small and you need exact results (e.g. semantic dedup over a few thousand items).
See references/algorithm-choice.md.
算法速度准确度内存占用最佳适用场景
HNSW快(近似搜索)~95%+召回率(可调节)较高大型数据集(>10k向量)、对延迟敏感的场景
FLAT慢(精确搜索)100%较低小型数据集(<10k)、对准确度要求极高的场景
对于任何生产级规模的工作负载,默认选择HNSW。可调参数:
  • M
    ——每个节点的连接数(16–64)。值越高,召回率越好,但内存占用越大。
  • EF_CONSTRUCTION
    ——构建时的图质量(100–500)。值越高,索引质量越好,但构建速度越慢。
  • EF_RUNTIME
    ——查询时的候选列表大小。值越高,召回率越好,但查询速度越慢。
当数据集较小且需要精确结果时(例如对数千条数据进行语义去重),选择FLAT
查看references/algorithm-choice.md

3. Hybrid search — filter before vector

3. 混合搜索——先过滤再向量检索

Apply attribute filters (TAG / NUMERIC) so the engine narrows the search space before the vector comparison. Don't fetch a wide result set and then filter client-side — that's slower and less accurate.
python
from redisvl.query import VectorQuery
from redisvl.query.filter import Num, Tag

filters = (Tag("category") == "technology") & (Num("date") >= 2024)

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="embedding",
    return_fields=["content", "category", "date"],
    num_results=10,
    filter_expression=filters,
)
results = index.query(query)
For text + vector fusion (BM25-weighted text scoring combined with vector similarity), use
HybridQuery
on Redis ≥ 8.4 with redis-py ≥ 7.1, or
AggregateHybridQuery
on older Redis. That's a different "hybrid" from filtered vector search above.
See references/hybrid-search.md.
应用属性过滤器(TAG / NUMERIC),让引擎在进行向量比较之前缩小搜索范围。不要先获取大范围结果再在客户端过滤——这种方式更慢且准确度更低。
python
from redisvl.query import VectorQuery
from redisvl.query.filter import Num, Tag

filters = (Tag("category") == "technology") & (Num("date") >= 2024)

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="embedding",
    return_fields=["content", "category", "date"],
    num_results=10,
    filter_expression=filters,
)
results = index.query(query)
对于文本+向量融合(BM25加权文本评分与向量相似度结合),在Redis ≥ 8.4且redis-py ≥ 7.1版本中使用
HybridQuery
,或在旧版Redis中使用
AggregateHybridQuery
。这与上述过滤式向量搜索是不同的“混合”类型。
查看references/hybrid-search.md

4. RAG pattern

4. RAG 模式

Standard pipeline: embed the user query → vector search Redis → pass top-K context to the LLM.
python
undefined
标准流水线:嵌入用户查询 → 在Redis中进行向量搜索 → 将Top-K上下文传递给大语言模型(LLM)。
python
undefined

Index documents with embeddings

为文档添加嵌入向量并建立索引

records = [{"content": doc.content, "embedding": embed_model.encode(doc.content).tolist(), "source": doc.source} for doc in documents] index.load(records)
records = [{"content": doc.content, "embedding": embed_model.encode(doc.content).tolist(), "source": doc.source} for doc in documents] index.load(records)

Retrieve relevant context for a user question

为用户问题检索相关上下文

q_emb = embed_model.encode(user_question) results = index.query(VectorQuery( vector=q_emb, vector_field_name="embedding", return_fields=["content", "source"], num_results=5, ))
q_emb = embed_model.encode(user_question) results = index.query(VectorQuery( vector=q_emb, vector_field_name="embedding", return_fields=["content", "source"], num_results=5, ))

Generate with retrieved context

结合检索到的上下文生成回答

context = "\n".join(r["content"] for r in results) response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}")

Practical tips:

- **Match metric to model.** Most modern text embedding models pair best with `COSINE`.
- **Chunk long documents** before indexing — retrieval over 200–500-token chunks usually beats indexing whole pages.
- **Batch inserts** with `index.load([...])` instead of one call per record.
- **Pre-filter with attributes** (tenant, recency, document type) before the vector search.

See [references/rag-pattern.md](references/rag-pattern.md).
context = "\n".join(r["content"] for r in results) response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}")

实用技巧:

- **匹配度量与模型**:大多数现代文本嵌入模型最适合搭配`COSINE`度量。
- **拆分长文档**:在建立索引前拆分长文档——对200–500token的片段进行检索通常比索引整页内容效果更好。
- **批量插入**:使用`index.load([...])`批量插入,而非逐条调用。
- **先按属性过滤**:在向量搜索前先按属性(租户、时效性、文档类型)过滤。

查看[references/rag-pattern.md](references/rag-pattern.md)。

References

参考资料