elasticsearch-expert

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Elasticsearch Expert

Elasticsearch专家指南

Expert guidance for Elasticsearch, search optimization, ELK stack, and distributed search systems.
为Elasticsearch、搜索优化、ELK栈及分布式搜索系统提供专业指导。

Core Concepts

核心概念

  • Full-text search and inverted indexes
  • Document-oriented storage
  • RESTful API
  • Distributed architecture with sharding
  • ELK stack (Elasticsearch, Logstash, Kibana)
  • Aggregations and analytics
  • 全文检索与倒排索引
  • 面向文档的存储
  • RESTful API
  • 带分片的分布式架构
  • ELK栈(Elasticsearch、Logstash、Kibana)
  • 聚合与分析

Index Management

索引管理

python
from elasticsearch import Elasticsearch

es = Elasticsearch(['http://localhost:9200'])
python
from elasticsearch import Elasticsearch

es = Elasticsearch(['http://localhost:9200'])

Create index with mapping

Create index with mapping

mapping = { "mappings": { "properties": { "title": {"type": "text", "analyzer": "english"}, "content": {"type": "text"}, "author": {"type": "keyword"}, "created_at": {"type": "date"}, "views": {"type": "integer"} } } }
es.indices.create(index='articles', body=mapping)
mapping = { "mappings": { "properties": { "title": {"type": "text", "analyzer": "english"}, "content": {"type": "text"}, "author": {"type": "keyword"}, "created_at": {"type": "date"}, "views": {"type": "integer"} } } }
es.indices.create(index='articles', body=mapping)

Index document

Index document

doc = { "title": "Elasticsearch Guide", "content": "Complete guide to Elasticsearch", "author": "John Doe", "created_at": "2024-01-01", "views": 100 }
es.index(index='articles', id=1, body=doc)
doc = { "title": "Elasticsearch Guide", "content": "Complete guide to Elasticsearch", "author": "John Doe", "created_at": "2024-01-01", "views": 100 }
es.index(index='articles', id=1, body=doc)

Bulk indexing

Bulk indexing

from elasticsearch.helpers import bulk
actions = [ {"_index": "articles", "_id": i, "_source": doc} for i, doc in enumerate(documents) ]
bulk(es, actions)
undefined
from elasticsearch.helpers import bulk
actions = [ {"_index": "articles", "_id": i, "_source": doc} for i, doc in enumerate(documents) ]
bulk(es, actions)
undefined

Search Queries

搜索查询

python
undefined
python
undefined

Full-text search

Full-text search

query = { "query": { "match": { "content": "elasticsearch guide" } } }
results = es.search(index='articles', body=query)
query = { "query": { "match": { "content": "elasticsearch guide" } } }
results = es.search(index='articles', body=query)

Boolean query

Boolean query

bool_query = { "query": { "bool": { "must": [ {"match": {"content": "elasticsearch"}} ], "filter": [ {"range": {"views": {"gte": 100}}} ], "should": [ {"term": {"author": "john-doe"}} ], "must_not": [ {"term": {"status": "draft"}} ] } } }
bool_query = { "query": { "bool": { "must": [ {"match": {"content": "elasticsearch"}} ], "filter": [ {"range": {"views": {"gte": 100}}} ], "should": [ {"term": {"author": "john-doe"}} ], "must_not": [ {"term": {"status": "draft"}} ] } } }

Multi-match query

Multi-match query

multi_match = { "query": { "multi_match": { "query": "elasticsearch guide", "fields": ["title^2", "content"], # Boost title "type": "best_fields" } } }
multi_match = { "query": { "multi_match": { "query": "elasticsearch guide", "fields": ["title^2", "content"], # Boost title "type": "best_fields" } } }

Fuzzy search

Fuzzy search

fuzzy = { "query": { "fuzzy": { "title": { "value": "elasticseerch", "fuzziness": "AUTO" } } } }
undefined
fuzzy = { "query": { "fuzzy": { "title": { "value": "elasticseerch", "fuzziness": "AUTO" } } } }
undefined

Aggregations

聚合分析

python
undefined
python
undefined

Aggregation query

Aggregation query

agg_query = { "aggs": { "authors": { "terms": { "field": "author", "size": 10 } }, "avg_views": { "avg": { "field": "views" } }, "views_histogram": { "histogram": { "field": "views", "interval": 100 } }, "date_histogram": { "date_histogram": { "field": "created_at", "calendar_interval": "month" } } } }
result = es.search(index='articles', body=agg_query)
undefined
agg_query = { "aggs": { "authors": { "terms": { "field": "author", "size": 10 } }, "avg_views": { "avg": { "field": "views" } }, "views_histogram": { "histogram": { "field": "views", "interval": 100 } }, "date_histogram": { "date_histogram": { "field": "created_at", "calendar_interval": "month" } } } }
result = es.search(index='articles', body=agg_query)
undefined

Best Practices

最佳实践

  • Design mappings carefully
  • Use appropriate analyzers
  • Implement proper sharding strategy
  • Monitor cluster health
  • Use bulk operations
  • Implement pagination with search_after
  • Cache frequently used queries
  • 谨慎设计映射
  • 使用合适的分析器
  • 实施合理的分片策略
  • 监控集群健康状态
  • 使用批量操作
  • 用search_after实现分页
  • 缓存常用查询

Anti-Patterns

反模式

❌ Deep pagination with from/size ❌ Wildcard queries without prefix ❌ No replica shards ❌ Over-sharding ❌ Not using filters for exact matches ❌ Ignoring cluster yellow/red status
❌ 使用from/size进行深度分页 ❌ 不带前缀的通配符查询 ❌ 不设置副本分片 ❌ 过度分片 ❌ 精确匹配不使用过滤器 ❌ 忽略集群黄/红状态

Resources

参考资源