azure-search-documents-py
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAzure AI Search SDK for Python
Azure AI Search SDK for Python
Full-text, vector, and hybrid search with AI enrichment capabilities.
具备AI增强能力的全文检索、向量检索和混合检索功能。
Installation
安装
bash
pip install azure-search-documentsbash
pip install azure-search-documentsEnvironment Variables
环境变量
bash
AZURE_SEARCH_ENDPOINT=https://<service-name>.search.windows.net
AZURE_SEARCH_API_KEY=<your-api-key>
AZURE_SEARCH_INDEX_NAME=<your-index-name>bash
AZURE_SEARCH_ENDPOINT=https://<service-name>.search.windows.net
AZURE_SEARCH_API_KEY=<your-api-key>
AZURE_SEARCH_INDEX_NAME=<your-index-name>Authentication
身份验证
API Key
API密钥
python
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
client = SearchClient(
endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
index_name=os.environ["AZURE_SEARCH_INDEX_NAME"],
credential=AzureKeyCredential(os.environ["AZURE_SEARCH_API_KEY"])
)python
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
client = SearchClient(
endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
index_name=os.environ["AZURE_SEARCH_INDEX_NAME"],
credential=AzureKeyCredential(os.environ["AZURE_SEARCH_API_KEY"])
)Entra ID (Recommended)
Entra ID(推荐)
python
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential
client = SearchClient(
endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
index_name=os.environ["AZURE_SEARCH_INDEX_NAME"],
credential=DefaultAzureCredential()
)python
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential
client = SearchClient(
endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
index_name=os.environ["AZURE_SEARCH_INDEX_NAME"],
credential=DefaultAzureCredential()
)Client Types
客户端类型
| Client | Purpose |
|---|---|
| Search and document operations |
| Index management, synonym maps |
| Indexers, data sources, skillsets |
| 客户端 | 用途 |
|---|---|
| 搜索和文档操作 |
| 索引管理、同义词映射 |
| 索引器、数据源、技能集管理 |
Create Index with Vector Field
创建带向量字段的索引
python
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex,
SearchField,
SearchFieldDataType,
VectorSearch,
HnswAlgorithmConfiguration,
VectorSearchProfile,
SearchableField,
SimpleField
)
index_client = SearchIndexClient(endpoint, AzureKeyCredential(key))
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SearchableField(name="title", type=SearchFieldDataType.String),
SearchableField(name="content", type=SearchFieldDataType.String),
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1536,
vector_search_profile_name="my-vector-profile"
)
]
vector_search = VectorSearch(
algorithms=[
HnswAlgorithmConfiguration(name="my-hnsw")
],
profiles=[
VectorSearchProfile(
name="my-vector-profile",
algorithm_configuration_name="my-hnsw"
)
]
)
index = SearchIndex(
name="my-index",
fields=fields,
vector_search=vector_search
)
index_client.create_or_update_index(index)python
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex,
SearchField,
SearchFieldDataType,
VectorSearch,
HnswAlgorithmConfiguration,
VectorSearchProfile,
SearchableField,
SimpleField
)
index_client = SearchIndexClient(endpoint, AzureKeyCredential(key))
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SearchableField(name="title", type=SearchFieldDataType.String),
SearchableField(name="content", type=SearchFieldDataType.String),
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1536,
vector_search_profile_name="my-vector-profile"
)
]
vector_search = VectorSearch(
algorithms=[
HnswAlgorithmConfiguration(name="my-hnsw")
],
profiles=[
VectorSearchProfile(
name="my-vector-profile",
algorithm_configuration_name="my-hnsw"
)
]
)
index = SearchIndex(
name="my-index",
fields=fields,
vector_search=vector_search
)
index_client.create_or_update_index(index)Upload Documents
上传文档
python
from azure.search.documents import SearchClient
client = SearchClient(endpoint, "my-index", AzureKeyCredential(key))
documents = [
{
"id": "1",
"title": "Azure AI Search",
"content": "Full-text and vector search service",
"content_vector": [0.1, 0.2, ...] # 1536 dimensions
}
]
result = client.upload_documents(documents)
print(f"Uploaded {len(result)} documents")python
from azure.search.documents import SearchClient
client = SearchClient(endpoint, "my-index", AzureKeyCredential(key))
documents = [
{
"id": "1",
"title": "Azure AI Search",
"content": "Full-text and vector search service",
"content_vector": [0.1, 0.2, ...] # 1536 dimensions
}
]
result = client.upload_documents(documents)
print(f"Uploaded {len(result)} documents")Keyword Search
关键词搜索
python
results = client.search(
search_text="azure search",
select=["id", "title", "content"],
top=10
)
for result in results:
print(f"{result['title']}: {result['@search.score']}")python
results = client.search(
search_text="azure search",
select=["id", "title", "content"],
top=10
)
for result in results:
print(f"{result['title']}: {result['@search.score']}")Vector Search
向量搜索
python
from azure.search.documents.models import VectorizedQuerypython
from azure.search.documents.models import VectorizedQueryYour query embedding (1536 dimensions)
Your query embedding (1536 dimensions)
query_vector = get_embedding("semantic search capabilities")
vector_query = VectorizedQuery(
vector=query_vector,
k_nearest_neighbors=10,
fields="content_vector"
)
results = client.search(
vector_queries=[vector_query],
select=["id", "title", "content"]
)
for result in results:
print(f"{result['title']}: {result['@search.score']}")
undefinedquery_vector = get_embedding("semantic search capabilities")
vector_query = VectorizedQuery(
vector=query_vector,
k_nearest_neighbors=10,
fields="content_vector"
)
results = client.search(
vector_queries=[vector_query],
select=["id", "title", "content"]
)
for result in results:
print(f"{result['title']}: {result['@search.score']}")
undefinedHybrid Search (Vector + Keyword)
混合搜索(向量+关键词)
python
from azure.search.documents.models import VectorizedQuery
vector_query = VectorizedQuery(
vector=query_vector,
k_nearest_neighbors=10,
fields="content_vector"
)
results = client.search(
search_text="azure search",
vector_queries=[vector_query],
select=["id", "title", "content"],
top=10
)python
from azure.search.documents.models import VectorizedQuery
vector_query = VectorizedQuery(
vector=query_vector,
k_nearest_neighbors=10,
fields="content_vector"
)
results = client.search(
search_text="azure search",
vector_queries=[vector_query],
select=["id", "title", "content"],
top=10
)Semantic Ranking
语义排序
python
from azure.search.documents.models import QueryType
results = client.search(
search_text="what is azure search",
query_type=QueryType.SEMANTIC,
semantic_configuration_name="my-semantic-config",
select=["id", "title", "content"],
top=10
)
for result in results:
print(f"{result['title']}")
if result.get("@search.captions"):
print(f" Caption: {result['@search.captions'][0].text}")python
from azure.search.documents.models import QueryType
results = client.search(
search_text="what is azure search",
query_type=QueryType.SEMANTIC,
semantic_configuration_name="my-semantic-config",
select=["id", "title", "content"],
top=10
)
for result in results:
print(f"{result['title']}")
if result.get("@search.captions"):
print(f" Caption: {result['@search.captions'][0].text}")Filters
筛选器
python
results = client.search(
search_text="*",
filter="category eq 'Technology' and rating gt 4",
order_by=["rating desc"],
select=["id", "title", "category", "rating"]
)python
results = client.search(
search_text="*",
filter="category eq 'Technology' and rating gt 4",
order_by=["rating desc"],
select=["id", "title", "category", "rating"]
)Facets
分面
python
results = client.search(
search_text="*",
facets=["category,count:10", "rating"],
top=0 # Only get facets, no documents
)
for facet_name, facet_values in results.get_facets().items():
print(f"{facet_name}:")
for facet in facet_values:
print(f" {facet['value']}: {facet['count']}")python
results = client.search(
search_text="*",
facets=["category,count:10", "rating"],
top=0 # Only get facets, no documents
)
for facet_name, facet_values in results.get_facets().items():
print(f"{facet_name}:")
for facet in facet_values:
print(f" {facet['value']}: {facet['count']}")Autocomplete & Suggest
自动补全与建议
python
undefinedpython
undefinedAutocomplete
Autocomplete
results = client.autocomplete(
search_text="sea",
suggester_name="my-suggester",
mode="twoTerms"
)
results = client.autocomplete(
search_text="sea",
suggester_name="my-suggester",
mode="twoTerms"
)
Suggest
Suggest
results = client.suggest(
search_text="sea",
suggester_name="my-suggester",
select=["title"]
)
undefinedresults = client.suggest(
search_text="sea",
suggester_name="my-suggester",
select=["title"]
)
undefinedIndexer with Skillset
带技能集的索引器
python
from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
SearchIndexer,
SearchIndexerDataSourceConnection,
SearchIndexerSkillset,
EntityRecognitionSkill,
InputFieldMappingEntry,
OutputFieldMappingEntry
)
indexer_client = SearchIndexerClient(endpoint, AzureKeyCredential(key))python
from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
SearchIndexer,
SearchIndexerDataSourceConnection,
SearchIndexerSkillset,
EntityRecognitionSkill,
InputFieldMappingEntry,
OutputFieldMappingEntry
)
indexer_client = SearchIndexerClient(endpoint, AzureKeyCredential(key))Create data source
Create data source
data_source = SearchIndexerDataSourceConnection(
name="my-datasource",
type="azureblob",
connection_string=connection_string,
container={"name": "documents"}
)
indexer_client.create_or_update_data_source_connection(data_source)
data_source = SearchIndexerDataSourceConnection(
name="my-datasource",
type="azureblob",
connection_string=connection_string,
container={"name": "documents"}
)
indexer_client.create_or_update_data_source_connection(data_source)
Create skillset
Create skillset
skillset = SearchIndexerSkillset(
name="my-skillset",
skills=[
EntityRecognitionSkill(
inputs=[InputFieldMappingEntry(name="text", source="/document/content")],
outputs=[OutputFieldMappingEntry(name="organizations", target_name="organizations")]
)
]
)
indexer_client.create_or_update_skillset(skillset)
skillset = SearchIndexerSkillset(
name="my-skillset",
skills=[
EntityRecognitionSkill(
inputs=[InputFieldMappingEntry(name="text", source="/document/content")],
outputs=[OutputFieldMappingEntry(name="organizations", target_name="organizations")]
)
]
)
indexer_client.create_or_update_skillset(skillset)
Create indexer
Create indexer
indexer = SearchIndexer(
name="my-indexer",
data_source_name="my-datasource",
target_index_name="my-index",
skillset_name="my-skillset"
)
indexer_client.create_or_update_indexer(indexer)
undefinedindexer = SearchIndexer(
name="my-indexer",
data_source_name="my-datasource",
target_index_name="my-index",
skillset_name="my-skillset"
)
indexer_client.create_or_update_indexer(index)
undefinedBest Practices
最佳实践
- Use hybrid search for best relevance combining vector and keyword
- Enable semantic ranking for natural language queries
- Index in batches of 100-1000 documents for efficiency
- Use filters to narrow results before ranking
- Configure vector dimensions to match your embedding model
- Use HNSW algorithm for large-scale vector search
- Create suggesters at index creation time (cannot add later)
- 使用混合搜索:结合向量和关键词检索,获得最佳相关性
- 启用语义排序:针对自然语言查询优化
- 批量索引:以100-1000个文档为批次,提升效率
- 使用筛选器:在排序前缩小结果范围
- 配置向量维度:与你的嵌入模型匹配
- 使用HNSW算法:适用于大规模向量搜索
- 在创建索引时创建建议器:后续无法添加
Reference Files
参考文件
| File | Contents |
|---|---|
| references/vector-search.md | HNSW configuration, integrated vectorization, multi-vector queries |
| references/semantic-ranking.md | Semantic configuration, captions, answers, hybrid patterns |
| scripts/setup_vector_index.py | CLI script to create vector-enabled search index |
| 文件 | 内容 |
|---|---|
| references/vector-search.md | HNSW配置、集成向量化、多向量查询 |
| references/semantic-ranking.md | 语义配置、标题、答案、混合模式 |
| scripts/setup_vector_index.py | 创建支持向量检索的搜索索引的CLI脚本 |
Additional Azure AI Search Patterns
其他Azure AI Search模式
Azure AI Search Python SDK
Azure AI Search Python SDK
Write clean, idiomatic Python code for Azure AI Search using .
azure-search-documents使用编写简洁、规范的Azure AI Search Python代码。
azure-search-documentsInstallation
安装
bash
pip install azure-search-documents azure-identitybash
pip install azure-search-documents azure-identityEnvironment Variables
环境变量
bash
AZURE_SEARCH_ENDPOINT=https://<search-service>.search.windows.net
AZURE_SEARCH_INDEX_NAME=<index-name>bash
AZURE_SEARCH_ENDPOINT=https://<search-service>.search.windows.net
AZURE_SEARCH_INDEX_NAME=<index-name>For API key auth (not recommended for production)
For API key auth (not recommended for production)
AZURE_SEARCH_API_KEY=<api-key>
undefinedAZURE_SEARCH_API_KEY=<api-key>
undefinedAuthentication
身份验证
DefaultAzureCredential (preferred):
python
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
credential = DefaultAzureCredential()
client = SearchClient(endpoint, index_name, credential)API Key:
python
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
client = SearchClient(endpoint, index_name, AzureKeyCredential(api_key))DefaultAzureCredential(推荐):
python
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
credential = DefaultAzureCredential()
client = SearchClient(endpoint, index_name, credential)API密钥:
python
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
client = SearchClient(endpoint, index_name, AzureKeyCredential(api_key))Client Selection
客户端选择
| Client | Purpose |
|---|---|
| Query indexes, upload/update/delete documents |
| Create/manage indexes, knowledge sources, knowledge bases |
| Manage indexers, skillsets, data sources |
| Agentic retrieval with LLM-powered Q&A |
| 客户端 | 用途 |
|---|---|
| 查询索引、上传/更新/删除文档 |
| 创建/管理索引、知识源、知识库 |
| 管理索引器、技能集、数据源 |
| 基于LLM的智能检索与问答 |
Index Creation Pattern
索引创建模式
python
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex, SearchField, VectorSearch, VectorSearchProfile,
HnswAlgorithmConfiguration, AzureOpenAIVectorizer,
AzureOpenAIVectorizerParameters, SemanticSearch,
SemanticConfiguration, SemanticPrioritizedFields, SemanticField
)
index = SearchIndex(
name=index_name,
fields=[
SearchField(name="id", type="Edm.String", key=True),
SearchField(name="content", type="Edm.String", searchable=True),
SearchField(name="embedding", type="Collection(Edm.Single)",
vector_search_dimensions=3072,
vector_search_profile_name="vector-profile"),
],
vector_search=VectorSearch(
profiles=[VectorSearchProfile(
name="vector-profile",
algorithm_configuration_name="hnsw-algo",
vectorizer_name="openai-vectorizer"
)],
algorithms=[HnswAlgorithmConfiguration(name="hnsw-algo")],
vectorizers=[AzureOpenAIVectorizer(
vectorizer_name="openai-vectorizer",
parameters=AzureOpenAIVectorizerParameters(
resource_url=aoai_endpoint,
deployment_name=embedding_deployment,
model_name=embedding_model
)
)]
),
semantic_search=SemanticSearch(
default_configuration_name="semantic-config",
configurations=[SemanticConfiguration(
name="semantic-config",
prioritized_fields=SemanticPrioritizedFields(
content_fields=[SemanticField(field_name="content")]
)
)]
)
)
index_client = SearchIndexClient(endpoint, credential)
index_client.create_or_update_index(index)python
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex, SearchField, VectorSearch, VectorSearchProfile,
HnswAlgorithmConfiguration, AzureOpenAIVectorizer,
AzureOpenAIVectorizerParameters, SemanticSearch,
SemanticConfiguration, SemanticPrioritizedFields, SemanticField
)
index = SearchIndex(
name=index_name,
fields=[
SearchField(name="id", type="Edm.String", key=True),
SearchField(name="content", type="Edm.String", searchable=True),
SearchField(name="embedding", type="Collection(Edm.Single)",
vector_search_dimensions=3072,
vector_search_profile_name="vector-profile"),
],
vector_search=VectorSearch(
profiles=[VectorSearchProfile(
name="vector-profile",
algorithm_configuration_name="hnsw-algo",
vectorizer_name="openai-vectorizer"
)],
algorithms=[HnswAlgorithmConfiguration(name="hnsw-algo")],
vectorizers=[AzureOpenAIVectorizer(
vectorizer_name="openai-vectorizer",
parameters=AzureOpenAIVectorizerParameters(
resource_url=aoai_endpoint,
deployment_name=embedding_deployment,
model_name=embedding_model
)
)]
),
semantic_search=SemanticSearch(
default_configuration_name="semantic-config",
configurations=[SemanticConfiguration(
name="semantic-config",
prioritized_fields=SemanticPrioritizedFields(
content_fields=[SemanticField(field_name="content")]
)
)]
)
)
index_client = SearchIndexClient(endpoint, credential)
index_client.create_or_update_index(index)Document Operations
文档操作
python
from azure.search.documents import SearchIndexingBufferedSenderpython
from azure.search.documents import SearchIndexingBufferedSenderBatch upload with automatic batching
Batch upload with automatic batching
with SearchIndexingBufferedSender(endpoint, index_name, credential) as sender:
sender.upload_documents(documents)
with SearchIndexingBufferedSender(endpoint, index_name, credential) as sender:
sender.upload_documents(documents)
Direct operations via SearchClient
Direct operations via SearchClient
search_client = SearchClient(endpoint, index_name, credential)
search_client.upload_documents(documents) # Add new
search_client.merge_documents(documents) # Update existing
search_client.merge_or_upload_documents(documents) # Upsert
search_client.delete_documents(documents) # Remove
undefinedsearch_client = SearchClient(endpoint, index_name, credential)
search_client.upload_documents(documents) # Add new
search_client.merge_documents(documents) # Update existing
search_client.merge_or_upload_documents(documents) # Upsert
search_client.delete_documents(documents) # Remove
undefinedSearch Patterns
搜索模式
python
undefinedpython
undefinedBasic search
Basic search
results = search_client.search(search_text="query")
results = search_client.search(search_text="query")
Vector search
Vector search
from azure.search.documents.models import VectorizedQuery
results = search_client.search(
search_text=None,
vector_queries=[VectorizedQuery(
vector=embedding,
k_nearest_neighbors=5,
fields="embedding"
)]
)
from azure.search.documents.models import VectorizedQuery
results = search_client.search(
search_text=None,
vector_queries=[VectorizedQuery(
vector=embedding,
k_nearest_neighbors=5,
fields="embedding"
)]
)
Hybrid search (vector + keyword)
Hybrid search (vector + keyword)
results = search_client.search(
search_text="query",
vector_queries=[VectorizedQuery(vector=embedding, k_nearest_neighbors=5, fields="embedding")],
query_type="semantic",
semantic_configuration_name="semantic-config"
)
results = search_client.search(
search_text="query",
vector_queries=[VectorizedQuery(vector=embedding, k_nearest_neighbors=5, fields="embedding")],
query_type="semantic",
semantic_configuration_name="semantic-config"
)
With filters
With filters
results = search_client.search(
search_text="query",
filter="category eq 'technology'",
select=["id", "title", "content"],
top=10
)
undefinedresults = search_client.search(
search_text="query",
filter="category eq 'technology'",
select=["id", "title", "content"],
top=10
)
undefinedAgentic Retrieval (Knowledge Bases)
智能检索(知识库)
For LLM-powered Q&A with answer synthesis, see references/agentic-retrieval.md.
Key concepts:
- Knowledge Source: Points to a search index
- Knowledge Base: Wraps knowledge sources + LLM for query planning and synthesis
- Output modes: (raw chunks) or
EXTRACTIVE_DATA(LLM-generated answers)ANSWER_SYNTHESIS
如需基于LLM的问答与答案合成,请参阅references/agentic-retrieval.md。
核心概念:
- 知识源:指向搜索索引
- 知识库:封装知识源+LLM,用于查询规划与合成
- 输出模式:(原始片段)或
EXTRACTIVE_DATA(LLM生成的答案)ANSWER_SYNTHESIS
Async Pattern
异步模式
python
from azure.search.documents.aio import SearchClient
async with SearchClient(endpoint, index_name, credential) as client:
results = await client.search(search_text="query")
async for result in results:
print(result["title"])python
from azure.search.documents.aio import SearchClient
async with SearchClient(endpoint, index_name, credential) as client:
results = await client.search(search_text="query")
async for result in results:
print(result["title"])Best Practices
最佳实践
- Use environment variables for endpoints, keys, and deployment names
- Prefer over API keys for production
DefaultAzureCredential - Use for batch uploads (handles batching/retries)
SearchIndexingBufferedSender - Always define semantic configuration for agentic retrieval indexes
- Use for idempotent index creation
create_or_update_index - Close clients with context managers or explicit
close()
- 使用环境变量存储端点、密钥和部署名称
- 优先使用:生产环境不推荐使用API密钥
DefaultAzureCredential - **使用**进行批量上传(处理批量/重试)
SearchIndexingBufferedSender - 始终定义语义配置:针对智能检索索引
- 使用:实现幂等的索引创建
create_or_update_index - 关闭客户端:使用上下文管理器或显式调用
close()
Field Types Reference
字段类型参考
| EDM Type | Python | Notes |
|---|---|---|
| str | Searchable text |
| int | Integer |
| int | Long integer |
| float | Floating point |
| bool | True/False |
| datetime | ISO 8601 |
| List[float] | Vector embeddings |
| List[str] | String arrays |
| EDM类型 | Python类型 | 说明 |
|---|---|---|
| str | 可搜索文本 |
| int | 整数 |
| int | 长整数 |
| float | 浮点数 |
| bool | 布尔值 |
| datetime | ISO 8601格式 |
| List[float] | 向量嵌入 |
| List[str] | 字符串数组 |
Error Handling
错误处理
python
from azure.core.exceptions import (
HttpResponseError,
ResourceNotFoundError,
ResourceExistsError
)
try:
result = search_client.get_document(key="123")
except ResourceNotFoundError:
print("Document not found")
except HttpResponseError as e:
print(f"Search error: {e.message}")python
from azure.core.exceptions import (
HttpResponseError,
ResourceNotFoundError,
ResourceExistsError
)
try:
result = search_client.get_document(key="123")
except ResourceNotFoundError:
print("Document not found")
except HttpResponseError as e:
print(f"Search error: {e.message}")