qdrant-vector-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Qdrant - Vector Similarity Search Engine

Qdrant - 向量相似度搜索引擎

High-performance vector database written in Rust for production RAG and semantic search.
基于Rust编写的高性能向量数据库,适用于生产级RAG和语义搜索。

When to use Qdrant

何时使用Qdrant

Use Qdrant when:
  • Building production RAG systems requiring low latency
  • Need hybrid search (vectors + metadata filtering)
  • Require horizontal scaling with sharding/replication
  • Want on-premise deployment with full data control
  • Need multi-vector storage per record (dense + sparse)
  • Building real-time recommendation systems
Key features:
  • Rust-powered: Memory-safe, high performance
  • Rich filtering: Filter by any payload field during search
  • Multiple vectors: Dense, sparse, multi-dense per point
  • Quantization: Scalar, product, binary for memory efficiency
  • Distributed: Raft consensus, sharding, replication
  • REST + gRPC: Both APIs with full feature parity
Use alternatives instead:
  • Chroma: Simpler setup, embedded use cases
  • FAISS: Maximum raw speed, research/batch processing
  • Pinecone: Fully managed, zero ops preferred
  • Weaviate: GraphQL preference, built-in vectorizers
使用Qdrant的场景:
  • 构建需要低延迟的生产级RAG系统
  • 需要混合搜索(向量 + 元数据过滤)
  • 要求支持分片/复制的水平扩展
  • 希望部署在本地并完全控制数据
  • 需要为每条记录存储多向量(稠密 + 稀疏)
  • 构建实时推荐系统
核心特性:
  • Rust驱动:内存安全,高性能
  • 丰富的过滤功能:搜索时可按任意负载字段过滤
  • 多向量支持:每个点支持稠密、稀疏、多稠密向量
  • 量化功能:标量、乘积、二进制量化,提升内存效率
  • 分布式架构:Raft共识算法、分片、复制
  • REST + gRPC:两种API完全功能对等
可选择替代方案的场景:
  • Chroma:设置更简单,适用于嵌入式用例
  • FAISS:追求最大原始速度,适用于研究/批处理
  • Pinecone:偏好全托管、零运维的方案
  • Weaviate:偏好GraphQL,内置向量化工具

Quick start

快速开始

Installation

安装

bash
undefined
bash
undefined

Python client

Python client

pip install qdrant-client
pip install qdrant-client

Docker (recommended for development)

Docker (recommended for development)

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

Docker with persistent storage

Docker with persistent storage

docker run -p 6333:6333 -p 6334:6334
-v $(pwd)/qdrant_storage:/qdrant/storage
qdrant/qdrant
undefined
docker run -p 6333:6333 -p 6334:6334
-v $(pwd)/qdrant_storage:/qdrant/storage
qdrant/qdrant
undefined

Basic usage

基本使用

python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

Connect to Qdrant

Connect to Qdrant

client = QdrantClient(host="localhost", port=6333)
client = QdrantClient(host="localhost", port=6333)

Create collection

Create collection

client.create_collection( collection_name="documents", vectors_config=VectorParams(size=384, distance=Distance.COSINE) )
client.create_collection( collection_name="documents", vectors_config=VectorParams(size=384, distance=Distance.COSINE) )

Insert vectors with payload

Insert vectors with payload

client.upsert( collection_name="documents", points=[ PointStruct( id=1, vector=[0.1, 0.2, ...], # 384-dim vector payload={"title": "Doc 1", "category": "tech"} ), PointStruct( id=2, vector=[0.3, 0.4, ...], payload={"title": "Doc 2", "category": "science"} ) ] )
client.upsert( collection_name="documents", points=[ PointStruct( id=1, vector=[0.1, 0.2, ...], # 384-dim vector payload={"title": "Doc 1", "category": "tech"} ), PointStruct( id=2, vector=[0.3, 0.4, ...], payload={"title": "Doc 2", "category": "science"} ) ] )

Search with filtering

Search with filtering

results = client.search( collection_name="documents", query_vector=[0.15, 0.25, ...], query_filter={ "must": [{"key": "category", "match": {"value": "tech"}}] }, limit=10 )
for point in results: print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")
undefined
results = client.search( collection_name="documents", query_vector=[0.15, 0.25, ...], query_filter={ "must": [{"key": "category", "match": {"value": "tech"}}] }, limit=10 )
for point in results: print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")
undefined

Core concepts

核心概念

Points - Basic data unit

点 - 基本数据单元

python
from qdrant_client.models import PointStruct
python
from qdrant_client.models import PointStruct

Point = ID + Vector(s) + Payload

Point = ID + Vector(s) + Payload

point = PointStruct( id=123, # Integer or UUID string vector=[0.1, 0.2, 0.3, ...], # Dense vector payload={ # Arbitrary JSON metadata "title": "Document title", "category": "tech", "timestamp": 1699900000, "tags": ["python", "ml"] } )
point = PointStruct( id=123, # Integer or UUID string vector=[0.1, 0.2, 0.3, ...], # Dense vector payload={ # Arbitrary JSON metadata "title": "Document title", "category": "tech", "timestamp": 1699900000, "tags": ["python", "ml"] } )

Batch upsert (recommended)

Batch upsert (recommended)

client.upsert( collection_name="documents", points=[point1, point2, point3], wait=True # Wait for indexing )
undefined
client.upsert( collection_name="documents", points=[point1, point2, point3], wait=True # Wait for indexing )
undefined

Collections - Vector containers

集合 - 向量容器

python
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff
python
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff

Create with HNSW configuration

Create with HNSW configuration

client.create_collection( collection_name="documents", vectors_config=VectorParams( size=384, # Vector dimensions distance=Distance.COSINE # COSINE, EUCLID, DOT, MANHATTAN ), hnsw_config=HnswConfigDiff( m=16, # Connections per node (default 16) ef_construct=100, # Build-time accuracy (default 100) full_scan_threshold=10000 # Switch to brute force below this ), on_disk_payload=True # Store payload on disk )
client.create_collection( collection_name="documents", vectors_config=VectorParams( size=384, # Vector dimensions distance=Distance.COSINE # COSINE, EUCLID, DOT, MANHATTAN ), hnsw_config=HnswConfigDiff( m=16, # Connections per node (default 16) ef_construct=100, # Build-time accuracy (default 100) full_scan_threshold=10000 # Switch to brute force below this ), on_disk_payload=True # Store payload on disk )

Collection info

Collection info

info = client.get_collection("documents") print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")
undefined
info = client.get_collection("documents") print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")
undefined

Distance metrics

距离度量

MetricUse CaseRange
COSINE
Text embeddings, normalized vectors0 to 2
EUCLID
Spatial data, image features0 to ∞
DOT
Recommendations, unnormalized-∞ to ∞
MANHATTAN
Sparse features, discrete data0 to ∞
度量方式适用场景范围
COSINE
文本嵌入、归一化向量0 到 2
EUCLID
空间数据、图像特征0 到 ∞
DOT
推荐系统、非归一化向量-∞ 到 ∞
MANHATTAN
稀疏特征、离散数据0 到 ∞

Search operations

搜索操作

Basic search

基础搜索

python
undefined
python
undefined

Simple nearest neighbor search

Simple nearest neighbor search

results = client.search( collection_name="documents", query_vector=[0.1, 0.2, ...], limit=10, with_payload=True, with_vectors=False # Don't return vectors (faster) )
undefined
results = client.search( collection_name="documents", query_vector=[0.1, 0.2, ...], limit=10, with_payload=True, with_vectors=False # Don't return vectors (faster) )
undefined

Filtered search

带过滤的搜索

python
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
python
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

Complex filtering

Complex filtering

results = client.search( collection_name="documents", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key="category", match=MatchValue(value="tech")), FieldCondition(key="timestamp", range=Range(gte=1699000000)) ], must_not=[ FieldCondition(key="status", match=MatchValue(value="archived")) ] ), limit=10 )
results = client.search( collection_name="documents", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key="category", match=MatchValue(value="tech")), FieldCondition(key="timestamp", range=Range(gte=1699000000)) ], must_not=[ FieldCondition(key="status", match=MatchValue(value="archived")) ] ), limit=10 )

Shorthand filter syntax

Shorthand filter syntax

results = client.search( collection_name="documents", query_vector=query_embedding, query_filter={ "must": [ {"key": "category", "match": {"value": "tech"}}, {"key": "price", "range": {"gte": 10, "lte": 100}} ] }, limit=10 )
undefined
results = client.search( collection_name="documents", query_vector=query_embedding, query_filter={ "must": [ {"key": "category", "match": {"value": "tech"}}, {"key": "price", "range": {"gte": 10, "lte": 100}} ] }, limit=10 )
undefined

Batch search

批量搜索

python
from qdrant_client.models import SearchRequest
python
from qdrant_client.models import SearchRequest

Multiple queries in one request

Multiple queries in one request

results = client.search_batch( collection_name="documents", requests=[ SearchRequest(vector=[0.1, ...], limit=5), SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}), SearchRequest(vector=[0.3, ...], limit=10) ] )
undefined
results = client.search_batch( collection_name="documents", requests=[ SearchRequest(vector=[0.1, ...], limit=5), SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}), SearchRequest(vector=[0.3, ...], limit=10) ] )
undefined

RAG integration

RAG集成

With sentence-transformers

与sentence-transformers结合使用

python
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
python
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct

Initialize

Initialize

encoder = SentenceTransformer("all-MiniLM-L6-v2") client = QdrantClient(host="localhost", port=6333)
encoder = SentenceTransformer("all-MiniLM-L6-v2") client = QdrantClient(host="localhost", port=6333)

Create collection

Create collection

client.create_collection( collection_name="knowledge_base", vectors_config=VectorParams(size=384, distance=Distance.COSINE) )
client.create_collection( collection_name="knowledge_base", vectors_config=VectorParams(size=384, distance=Distance.COSINE) )

Index documents

Index documents

documents = [ {"id": 1, "text": "Python is a programming language", "source": "wiki"}, {"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"}, ]
points = [ PointStruct( id=doc["id"], vector=encoder.encode(doc["text"]).tolist(), payload={"text": doc["text"], "source": doc["source"]} ) for doc in documents ] client.upsert(collection_name="knowledge_base", points=points)
documents = [ {"id": 1, "text": "Python is a programming language", "source": "wiki"}, {"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"}, ]
points = [ PointStruct( id=doc["id"], vector=encoder.encode(doc["text"]).tolist(), payload={"text": doc["text"], "source": doc["source"]} ) for doc in documents ] client.upsert(collection_name="knowledge_base", points=points)

RAG retrieval

RAG retrieval

def retrieve(query: str, top_k: int = 5) -> list[dict]: query_vector = encoder.encode(query).tolist() results = client.search( collection_name="knowledge_base", query_vector=query_vector, limit=top_k ) return [{"text": r.payload["text"], "score": r.score} for r in results]
def retrieve(query: str, top_k: int = 5) -> list[dict]: query_vector = encoder.encode(query).tolist() results = client.search( collection_name="knowledge_base", query_vector=query_vector, limit=top_k ) return [{"text": r.payload["text"], "score": r.score} for r in results]

Use in RAG pipeline

Use in RAG pipeline

context = retrieve("What is Python?") prompt = f"Context: {context}\n\nQuestion: What is Python?"
undefined
context = retrieve("What is Python?") prompt = f"Context: {context}\n\nQuestion: What is Python?"
undefined

With LangChain

与LangChain结合使用

python
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
python
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

With LlamaIndex

与LlamaIndex结合使用

python
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext

vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()
python
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext

vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()

Multi-vector support

多向量支持

Named vectors (different embedding models)

命名向量(不同嵌入模型)

python
from qdrant_client.models import VectorParams, Distance
python
from qdrant_client.models import VectorParams, Distance

Collection with multiple vector types

Collection with multiple vector types

client.create_collection( collection_name="hybrid_search", vectors_config={ "dense": VectorParams(size=384, distance=Distance.COSINE), "sparse": VectorParams(size=30000, distance=Distance.DOT) } )
client.create_collection( collection_name="hybrid_search", vectors_config={ "dense": VectorParams(size=384, distance=Distance.COSINE), "sparse": VectorParams(size=30000, distance=Distance.DOT) } )

Insert with named vectors

Insert with named vectors

client.upsert( collection_name="hybrid_search", points=[ PointStruct( id=1, vector={ "dense": dense_embedding, "sparse": sparse_embedding }, payload={"text": "document text"} ) ] )
client.upsert( collection_name="hybrid_search", points=[ PointStruct( id=1, vector={ "dense": dense_embedding, "sparse": sparse_embedding }, payload={"text": "document text"} ) ] )

Search specific vector

Search specific vector

results = client.search( collection_name="hybrid_search", query_vector=("dense", query_dense), # Specify which vector limit=10 )
undefined
results = client.search( collection_name="hybrid_search", query_vector=("dense", query_dense), # Specify which vector limit=10 )
undefined

Sparse vectors (BM25, SPLADE)

稀疏向量(BM25、SPLADE)

python
from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector
python
from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector

Collection with sparse vectors

Collection with sparse vectors

client.create_collection( collection_name="sparse_search", vectors_config={}, sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))} )
client.create_collection( collection_name="sparse_search", vectors_config={}, sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))} )

Insert sparse vector

Insert sparse vector

client.upsert( collection_name="sparse_search", points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})] )
undefined
client.upsert( collection_name="sparse_search", points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})] )
undefined

Quantization (memory optimization)

量化(内存优化)

python
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
python
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType

Scalar quantization (4x memory reduction)

Scalar quantization (4x memory reduction)

client.create_collection( collection_name="quantized", vectors_config=VectorParams(size=384, distance=Distance.COSINE), quantization_config=ScalarQuantization( scalar=ScalarQuantizationConfig( type=ScalarType.INT8, quantile=0.99, # Clip outliers always_ram=True # Keep quantized in RAM ) ) )
client.create_collection( collection_name="quantized", vectors_config=VectorParams(size=384, distance=Distance.COSINE), quantization_config=ScalarQuantization( scalar=ScalarQuantizationConfig( type=ScalarType.INT8, quantile=0.99, # Clip outliers always_ram=True # Keep quantized in RAM ) ) )

Search with rescoring

Search with rescoring

results = client.search( collection_name="quantized", query_vector=query, search_params={"quantization": {"rescore": True}}, # Rescore top results limit=10 )
undefined
results = client.search( collection_name="quantized", query_vector=query, search_params={"quantization": {"rescore": True}}, # Rescore top results limit=10 )
undefined

Payload indexing

负载索引

python
from qdrant_client.models import PayloadSchemaType
python
from qdrant_client.models import PayloadSchemaType

Create payload index for faster filtering

Create payload index for faster filtering

client.create_payload_index( collection_name="documents", field_name="category", field_schema=PayloadSchemaType.KEYWORD )
client.create_payload_index( collection_name="documents", field_name="timestamp", field_schema=PayloadSchemaType.INTEGER )
client.create_payload_index( collection_name="documents", field_name="category", field_schema=PayloadSchemaType.KEYWORD )
client.create_payload_index( collection_name="documents", field_name="timestamp", field_schema=PayloadSchemaType.INTEGER )

Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL

Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL

undefined
undefined

Production deployment

生产环境部署

Qdrant Cloud

Qdrant云服务

python
from qdrant_client import QdrantClient
python
from qdrant_client import QdrantClient

Connect to Qdrant Cloud

Connect to Qdrant Cloud

client = QdrantClient( url="https://your-cluster.cloud.qdrant.io", api_key="your-api-key" )
undefined
client = QdrantClient( url="https://your-cluster.cloud.qdrant.io", api_key="your-api-key" )
undefined

Performance tuning

性能调优

python
undefined
python
undefined

Optimize for search speed (higher recall)

Optimize for search speed (higher recall)

client.update_collection( collection_name="documents", hnsw_config=HnswConfigDiff(ef_construct=200, m=32) )
client.update_collection( collection_name="documents", hnsw_config=HnswConfigDiff(ef_construct=200, m=32) )

Optimize for indexing speed (bulk loads)

Optimize for indexing speed (bulk loads)

client.update_collection( collection_name="documents", optimizer_config={"indexing_threshold": 20000} )
undefined
client.update_collection( collection_name="documents", optimizer_config={"indexing_threshold": 20000} )
undefined

Best practices

最佳实践

  1. Batch operations - Use batch upsert/search for efficiency
  2. Payload indexing - Index fields used in filters
  3. Quantization - Enable for large collections (>1M vectors)
  4. Sharding - Use for collections >10M vectors
  5. On-disk storage - Enable
    on_disk_payload
    for large payloads
  6. Connection pooling - Reuse client instances
  1. 批量操作 - 使用批量插入/搜索提升效率
  2. 负载索引 - 为过滤字段创建索引
  3. 量化功能 - 针对超过100万向量的大型集合启用
  4. 分片 - 针对超过1000万向量的集合使用
  5. 磁盘存储 - 针对大型负载启用
    on_disk_payload
  6. 连接池 - 复用客户端实例

Common issues

常见问题

Slow search with filters:
python
undefined
带过滤的搜索速度慢:
python
undefined

Create payload index for filtered fields

Create payload index for filtered fields

client.create_payload_index( collection_name="docs", field_name="category", field_schema=PayloadSchemaType.KEYWORD )

**Out of memory:**
```python
client.create_payload_index( collection_name="docs", field_name="category", field_schema=PayloadSchemaType.KEYWORD )

**内存不足:**
```python

Enable quantization and on-disk storage

Enable quantization and on-disk storage

client.create_collection( collection_name="large_collection", vectors_config=VectorParams(size=384, distance=Distance.COSINE), quantization_config=ScalarQuantization(...), on_disk_payload=True )

**Connection issues:**
```python
client.create_collection( collection_name="large_collection", vectors_config=VectorParams(size=384, distance=Distance.COSINE), quantization_config=ScalarQuantization(...), on_disk_payload=True )

**连接问题:**
```python

Use timeout and retry

Use timeout and retry

client = QdrantClient( host="localhost", port=6333, timeout=30, prefer_grpc=True # gRPC for better performance )
undefined
client = QdrantClient( host="localhost", port=6333, timeout=30, prefer_grpc=True # gRPC for better performance )
undefined

References

参考资料

  • Advanced Usage - Distributed mode, hybrid search, recommendations
  • Troubleshooting - Common issues, debugging, performance tuning
  • 高级用法 - 分布式模式、混合搜索、推荐系统
  • 故障排除 - 常见问题、调试、性能调优

Resources

资源