redis-vector-search

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Redis Vector Search

Redis 向量搜索

Guidance for storing and searching embeddings in Redis. Covers index configuration, algorithm selection, hybrid filtering, and the RAG retrieval pattern with RedisVL.

本文指导如何在Redis中存储和搜索嵌入向量，涵盖索引配置、算法选择、混合过滤，以及基于RedisVL的RAG检索模式。

When to apply

适用场景

Defining a
```
VECTOR
```
field in
```
FT.CREATE
```
(raw RQE) or a RedisVL
```
IndexSchema
```
.
Choosing HNSW vs FLAT and tuning HNSW parameters.
Adding category, date, or tenant filters to a vector query.
Building a retrieval-augmented generation (RAG) pipeline on top of Redis.

This skill builds on the

redis-query-engine

skill — vector fields live inside RQE indexes and share the same

FT.CREATE

FT.SEARCH

machinery.

在
```
FT.CREATE
```
（原生RQE）或RedisVL
```
IndexSchema
```
中定义
```
VECTOR
```
字段时。
选择HNSW与FLAT算法并调优HNSW参数时。
为向量查询添加分类、日期或租户过滤器时。
在Redis之上构建检索增强生成（RAG）流水线时。

本技能基于

redis-query-engine

技能扩展——向量字段存在于RQE索引中，共享相同的

FT.CREATE

FT.SEARCH

机制。

1. Configure the vector index properly

1. 正确配置向量索引

Three settings must match the embedding model:

DIM
— the model's output dimensionality (e.g. 1536 for OpenAI
```
text-embedding-3-small
```
). A mismatch produces silent garbage.
DISTANCE_METRIC
—
```
COSINE
```
for normalized text embeddings (the common case),
```
IP
```
for unnormalized inner-product,
```
L2
```
for raw Euclidean.
TYPE
/
datatype
— usually
```
FLOAT32
```
. Use
```
FLOAT16
```
or quantized variants only when memory cost is a hard constraint.

Raw RQE:

FT.CREATE idx:docs ON HASH PREFIX 1 doc:
    SCHEMA
        content TEXT
        embedding VECTOR HNSW 6
            TYPE FLOAT32
            DIM 1536
            DISTANCE_METRIC COSINE

RedisVL:

python

schema = IndexSchema.from_dict({
    "index": {"name": "idx:docs", "prefix": "doc:"},
    "fields": [
        {"name": "content", "type": "text"},
        {"name": "embedding", "type": "vector", "attrs": {
            "dims": 1536, "algorithm": "HNSW",
            "datatype": "FLOAT32", "distance_metric": "COSINE",
        }},
    ]
})

See references/index-creation.md for redis-py and RedisVL variants.

有三个设置必须与嵌入模型匹配：

DIM
——模型输出的维度（例如OpenAI
```
text-embedding-3-small
```
为1536）。不匹配会导致无提示的错误结果。
DISTANCE_METRIC
——归一化文本嵌入使用
```
COSINE
```
（常见情况），非归一化内积使用
```
IP
```
，原始欧氏距离使用
```
L2
```
。
TYPE
/
datatype
——通常为
```
FLOAT32
```
。仅当内存成本是硬性约束时，才使用
```
FLOAT16
```
或量化变体。

原生RQE示例：

FT.CREATE idx:docs ON HASH PREFIX 1 doc:
    SCHEMA
        content TEXT
        embedding VECTOR HNSW 6
            TYPE FLOAT32
            DIM 1536
            DISTANCE_METRIC COSINE

RedisVL示例：

python

schema = IndexSchema.from_dict({
    "index": {"name": "idx:docs", "prefix": "doc:"},
    "fields": [
        {"name": "content", "type": "text"},
        {"name": "embedding", "type": "vector", "attrs": {
            "dims": 1536, "algorithm": "HNSW",
            "datatype": "FLOAT32", "distance_metric": "COSINE",
        }},
    ]
})

查看references/index-creation.md获取redis-py和RedisVL的更多变体。

2. HNSW vs FLAT

2. HNSW 对比 FLAT

Algorithm	Speed	Accuracy	Memory	Best for
HNSW	Fast (approximate)	~95%+ recall (tunable)	Higher	Large datasets (>10k vectors), latency-sensitive
FLAT	Slow (exact)	100%	Lower	Small datasets (<10k), accuracy-critical

Default to HNSW for any production-scale workload. Tuning levers:

```
M
```
— connections per node (16–64). Higher = better recall, more memory.
```
EF_CONSTRUCTION
```
— build-time graph quality (100–500). Higher = better index, slower build.
```
EF_RUNTIME
```
— query-time candidate-list size. Higher = better recall, slower queries.

Use FLAT when the corpus is small and you need exact results (e.g. semantic dedup over a few thousand items).

See references/algorithm-choice.md.

算法	速度	准确度	内存占用	最佳适用场景
HNSW	快（近似搜索）	~95%+召回率（可调节）	较高	大型数据集（>10k向量）、对延迟敏感的场景
FLAT	慢（精确搜索）	100%	较低	小型数据集（<10k）、对准确度要求极高的场景

对于任何生产级规模的工作负载，默认选择HNSW。可调参数：

```
M
```
——每个节点的连接数（16–64）。值越高，召回率越好，但内存占用越大。
```
EF_CONSTRUCTION
```
——构建时的图质量（100–500）。值越高，索引质量越好，但构建速度越慢。
```
EF_RUNTIME
```
——查询时的候选列表大小。值越高，召回率越好，但查询速度越慢。

当数据集较小且需要精确结果时（例如对数千条数据进行语义去重），选择FLAT。

查看references/algorithm-choice.md。

3. Hybrid search — filter before vector

3. 混合搜索——先过滤再向量检索

Apply attribute filters (TAG / NUMERIC) so the engine narrows the search space before the vector comparison. Don't fetch a wide result set and then filter client-side — that's slower and less accurate.

python

from redisvl.query import VectorQuery
from redisvl.query.filter import Num, Tag

filters = (Tag("category") == "technology") & (Num("date") >= 2024)

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="embedding",
    return_fields=["content", "category", "date"],
    num_results=10,
    filter_expression=filters,
)
results = index.query(query)

For text + vector fusion (BM25-weighted text scoring combined with vector similarity), use

HybridQuery

on Redis ≥ 8.4 with redis-py ≥ 7.1, or

AggregateHybridQuery

on older Redis. That's a different "hybrid" from filtered vector search above.

See references/hybrid-search.md.

应用属性过滤器（TAG / NUMERIC），让引擎在进行向量比较之前缩小搜索范围。不要先获取大范围结果再在客户端过滤——这种方式更慢且准确度更低。

python

from redisvl.query import VectorQuery
from redisvl.query.filter import Num, Tag

filters = (Tag("category") == "technology") & (Num("date") >= 2024)

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="embedding",
    return_fields=["content", "category", "date"],
    num_results=10,
    filter_expression=filters,
)
results = index.query(query)

对于文本+向量融合（BM25加权文本评分与向量相似度结合），在Redis ≥ 8.4且redis-py ≥ 7.1版本中使用

HybridQuery

，或在旧版Redis中使用

AggregateHybridQuery

。这与上述过滤式向量搜索是不同的“混合”类型。

查看references/hybrid-search.md。

4. RAG pattern

4. RAG 模式

Standard pipeline: embed the user query → vector search Redis → pass top-K context to the LLM.

python

undefined

标准流水线：嵌入用户查询 → 在Redis中进行向量搜索 → 将Top-K上下文传递给大语言模型（LLM）。

python

undefined

Index documents with embeddings

为文档添加嵌入向量并建立索引

records = [{"content": doc.content, "embedding": embed_model.encode(doc.content).tolist(), "source": doc.source} for doc in documents] index.load(records)

Retrieve relevant context for a user question

为用户问题检索相关上下文

q_emb = embed_model.encode(user_question) results = index.query(VectorQuery( vector=q_emb, vector_field_name="embedding", return_fields=["content", "source"], num_results=5, ))

Generate with retrieved context

结合检索到的上下文生成回答

context = "\n".join(r["content"] for r in results) response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}")


Practical tips:

- **Match metric to model.** Most modern text embedding models pair best with `COSINE`.
- **Chunk long documents** before indexing — retrieval over 200–500-token chunks usually beats indexing whole pages.
- **Batch inserts** with `index.load([...])` instead of one call per record.
- **Pre-filter with attributes** (tenant, recency, document type) before the vector search.

See [references/rag-pattern.md](references/rag-pattern.md).

context = "\n".join(r["content"] for r in results) response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}")


实用技巧：

- **匹配度量与模型**：大多数现代文本嵌入模型最适合搭配`COSINE`度量。
- **拆分长文档**：在建立索引前拆分长文档——对200–500token的片段进行检索通常比索引整页内容效果更好。
- **批量插入**：使用`index.load([...])`批量插入，而非逐条调用。
- **先按属性过滤**：在向量搜索前先按属性（租户、时效性、文档类型）过滤。

查看[references/rag-pattern.md](references/rag-pattern.md)。

redis-vector-search

Original

Translation

Redis Vector Search

Redis 向量搜索

When to apply

适用场景

1. Configure the vector index properly

1. 正确配置向量索引

2. HNSW vs FLAT

2. HNSW 对比 FLAT

3. Hybrid search — filter before vector

3. 混合搜索——先过滤再向量检索

4. RAG pattern

4. RAG 模式

Index documents with embeddings

为文档添加嵌入向量并建立索引

Retrieve relevant context for a user question

为用户问题检索相关上下文

Generate with retrieved context

结合检索到的上下文生成回答

References

参考资料