性能优化

一、Mapping优化

1.1 字段类型选择

选择合适的字段类型是性能优化的基础。

核心原则:

不需要全文检索的字段用keyword
不需要排序/聚合的字段禁用doc_values
不需要评分的字段用filter查询
数值范围查询优先用范围类型

优化示例:

PUT /products
{
  "mappings": {
    "properties": {
      //  错误:订单号用text
      "order_id": { "type": "text" }

      //  正确:订单号用keyword
      "order_id": {
        "type": "keyword",
        "norms": false  // 不需要评分,禁用norm
      },

      //  错误:商品描述需要聚合
      "description": {
        "type": "text",
        "fielddata": true  // 会导致内存溢出
      }

      //  正确:描述只用于搜索,不聚合
      "description": {
        "type": "text",
        "index_options": "offsets",  // 支持高亮
        "norms": false  // 长度归一化对描述意义不大
      },

      // 标题需要搜索+聚合
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256  // 超长字段截断
          }
        }
      },

      // 价格范围查询优化
      "price_range": {
        "type": "integer_range"  // 比两个integer字段更高效
      },

      // 日期范围
      "promotion_period": {
        "type": "date_range",
        "format": "yyyy-MM-dd"
      },

      // 不需要搜索的字段
      "internal_id": {
        "type": "keyword",
        "index": false,  // 不索引,节省磁盘和内存
        "doc_values": true  // 但保留doc_values用于聚合
      },

      // 只用于展示的字段
      "thumbnail": {
        "type": "keyword",
        "index": false,
        "doc_values": false  // 完全不索引
      }
    }
  }
}

1.2 禁用不必要的功能

index_options优化:

{
  "mappings": {
    "properties": {
      // 只需要知道词是否存在
      "tags": {
        "type": "text",
        "index_options": "docs"  // 只存储文档ID,不存储词频和位置
      },

      // 需要词频(TF-IDF算分)
      "title": {
        "type": "text",
        "index_options": "freqs"  // 存储文档ID和词频
      },

      // 需要短语查询
      "content": {
        "type": "text",
        "index_options": "positions"  // 存储位置信息(默认)
      },

      // 需要高亮
      "description": {
        "type": "text",
        "index_options": "offsets"  // 存储偏移量,加速高亮
      }
    }
  }
}

index_options对比:

选项	存储内容	支持功能	磁盘占用
docs	文档ID	简单匹配	最小
freqs	文档ID + 词频	TF-IDF算分	小
positions	文档ID + 词频 + 位置	短语查询	中(默认)
offsets	文档ID + 词频 + 位置 + 偏移	快速高亮	大

1.3 动态Mapping控制

问题:动态Mapping可能创建不必要的字段,导致性能下降。

解决方案:

PUT /logs
{
  "mappings": {
    // 严格模式:拒绝未定义字段
    "dynamic": "strict",

    // 或者只允许特定字段动态
    "properties": {
      "message": { "type": "text" },
      "level": { "type": "keyword" },

      // metadata可以动态添加
      "metadata": {
        "type": "object",
        "dynamic": true
      },

      // labels不索引,只存储
      "labels": {
        "type": "object",
        "enabled": false  // 完全不索引,只存在_source中
      }
    }
  }
}

1.4 数值类型优化

{
  "mappings": {
    "properties": {
      //  错误:所有数字都用long
      "age": { "type": "long" },         // 浪费空间
      "count": { "type": "long" },

      //  正确:根据范围选择类型
      "age": { "type": "byte" },         // -128 to 127
      "count": { "type": "integer" },    // -2^31 to 2^31-1
      "distance": { "type": "float" },   // 单精度够用

      // 价格用scaled_float,节省50%空间
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100  // 存储为整数×100
      }
      // 19.99 → 存储为 1999
    }
  }
}

1.5 避免字段爆炸

问题:字段数过多导致映射膨胀。

限制配置:

PUT /logs/_settings
{
  "index.mapping.total_fields.limit": 1000,      // 字段总数限制(默认1000)
  "index.mapping.depth.limit": 20,               // 嵌套深度限制(默认20)
  "index.mapping.nested_fields.limit": 50,       // nested字段限制(默认50)
  "index.mapping.nested_objects.limit": 10000    // nested对象限制(默认10000)
}

使用Flattened类型:

{
  "mappings": {
    "properties": {
      //  错误:动态字段导致字段爆炸
      "labels": {
        "type": "object"  // 每个key都创建新字段
      }

      //  正确:使用flattened类型
      "labels": {
        "type": "flattened"  // 整个对象作为一个字段
      }
    }
  }
}

// 查询flattened字段
GET /logs/_search
{
  "query": {
    "term": { "labels.env": "production" }
  }
}

二、查询优化

2.1 Filter vs Query

核心区别:

Query:计算相关性得分,不缓存
Filter:不计算得分,结果缓存,性能更好

优化建议:

//  性能差:全部用query
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "手机" } },
        { "term": { "status": "published" } },  // 不需要算分
        { "range": { "price": { "lte": 5000 } } }  // 不需要算分
      ]
    }
  }
}

//  性能好:精确条件用filter
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "手机" } }  // 需要算分
      ],
      "filter": [
        { "term": { "status": "published" } },  // 缓存
        { "range": { "price": { "lte": 5000 } } }  // 缓存
      ]
    }
  }
}

Filter缓存机制:

第一次查询:
filter: { "term": { "status": "published" } }
→ 执行查询,生成Bitset(位图)
→ 缓存到Query Cache

第二次查询:
filter: { "term": { "status": "published" } }
→ 直接从缓存读取Bitset
→ 快速过滤文档

2.2 深分页优化

问题:深分页导致性能急剧下降。

原因:

查询第1000页(每页10条):
from=10000, size=10

实际过程:
1. 每个分片返回前10010个文档
2. 协调节点汇总 3分片 × 10010 = 30030个文档
3. 全局排序后取10010-10020

方案1:Search After(推荐):

// 第一页
GET /products/_search
{
  "size": 10,
  "sort": [
    { "price": "desc" },
    { "_id": "asc" }  // 必须有唯一值作为tiebreaker
  ]
}

// 响应
{
  "hits": [
    {
      "_id": "100",
      "sort": [7999, "100"]  // 最后一个文档的sort值
    }
  ]
}

// 第二页:使用上一页最后的sort值
GET /products/_search
{
  "size": 10,
  "sort": [
    { "price": "desc" },
    { "_id": "asc" }
  ],
  "search_after": [7999, "100"]  // 从这里继续
}

方案2:Scroll API(不推荐新业务):

// 第一次查询,创建快照
POST /products/_search?scroll=5m
{
  "size": 100,
  "query": { "match_all": {} }
}

// 响应
{
  "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WY...",
  "hits": [...]
}

// 后续查询
POST /_search/scroll
{
  "scroll": "5m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WY..."
}

// 清理scroll
DELETE /_search/scroll
{
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WY..."
}

方案3:限制深度:

PUT /products/_settings
{
  "index.max_result_window": 10000  // 默认10000,限制from+size
}

方案对比:

方案	适用场景	优点	缺点
Search After	实时滚动,下一页	高性能,实时	不支持跳页
Scroll	全量导出,批处理	适合大量数据	占用资源,不实时
from/size	前100页	简单,支持跳页	深分页性能差

2.3 聚合优化

问题1:高基数聚合

//  性能差:用户ID基数很高(百万级)
{
  "aggs": {
    "users": {
      "terms": {
        "field": "user_id",
        "size": 10
      }
    }
  }
}

//  方案1:使用composite聚合
{
  "aggs": {
    "users": {
      "composite": {
        "sources": [
          { "user": { "terms": { "field": "user_id" } } }
        ],
        "size": 10
      }
    }
  }
}

//  方案2:使用sampler聚合(采样)
{
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 1000  // 每个分片采样1000个文档
      },
      "aggs": {
        "users": {
          "terms": { "field": "user_id" }
        }
      }
    }
  }
}

问题2:深度嵌套聚合

//  性能差:4层嵌套
{
  "aggs": {
    "province": {
      "terms": { "field": "province" },
      "aggs": {
        "city": {
          "terms": { "field": "city" },
          "aggs": {
            "district": {
              "terms": { "field": "district" },
              "aggs": {
                "street": {
                  "terms": { "field": "street" }
                }
              }
            }
          }
        }
      }
    }
  }
}

//  方案:预计算或使用多次查询

问题3:global聚合优化

// 计算过滤后和全局的统计
{
  "query": {
    "term": { "status": "published" }
  },
  "aggs": {
    "filtered_avg": {
      "avg": { "field": "price" }  // 过滤后的平均价格
    },
    "global_stats": {
      "global": {},  // 全局聚合,忽略query
      "aggs": {
        "global_avg": {
          "avg": { "field": "price" }  // 全局平均价格
        }
      }
    }
  }
}

2.4 缓存利用

Query Cache(过滤器缓存):

// 自动缓存的filter查询
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "status": "active" } }  // 自动缓存
      ]
    }
  }
}

// 查看缓存命中率
GET /_stats/query_cache

// 清空缓存
POST /_cache/clear?query=true

Request Cache(结果缓存):

// 开启request cache(只缓存size=0的聚合查询)
GET /products/_search?request_cache=true
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": { "field": "price" }
    }
  }
}

// 配置
PUT /products/_settings
{
  "index.requests.cache.enable": true
}

Field Data Cache(字段数据缓存):

// 监控fielddata内存使用
GET /_stats/fielddata?fields=*

// 清空fielddata缓存
POST /_cache/clear?fielddata=true

// 限制fielddata内存
PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.fielddata.limit": "40%"
  }
}

2.5 查询改写

Wildcard优化:

//  性能差:通配符在开头
{
  "query": {
    "wildcard": { "title": "*手机" }
  }
}

//  改用ngram分词
PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "ngram_analyzer": {
          "tokenizer": "ngram_tokenizer"
        }
      },
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ngram_analyzer"
      }
    }
  }
}

Prefix优化:

//  性能一般:prefix query
{
  "query": {
    "prefix": { "title": "iphone" }
  }
}

//  更好:match_phrase_prefix(支持分词)
{
  "query": {
    "match_phrase_prefix": {
      "title": "iphone 15"
    }
  }
}

//  最好:使用completion类型(自动补全)
PUT /products
{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion"
      }
    }
  }
}

GET /products/_search
{
  "suggest": {
    "product-suggest": {
      "prefix": "iph",
      "completion": { "field": "suggest" }
    }
  }
}

三、写入优化

3.1 批量写入

单条写入 vs 批量写入:

#  性能差:单条写入
for doc in docs:
    es.index(index='products', body=doc)
# 100条数据:约10秒

#  性能好:批量写入
from elasticsearch.helpers import bulk

bulk(es, [
    {'_index': 'products', '_source': doc}
    for doc in docs
])
# 100条数据:约0.5秒

Bulk API:

POST /_bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "title": "iPhone 15", "price": 7999 }
{ "index": { "_index": "products", "_id": "2" } }
{ "title": "MacBook Pro", "price": 15999 }
{ "delete": { "_index": "products", "_id": "3" } }
{ "update": { "_index": "products", "_id": "4" } }
{ "doc": { "price": 8999 } }

最佳实践:

// 批量大小
每批:1000-5000条
总大小:5-15MB

// 并发数
并发线程:CPU核数

// 示例配置
batch_size = 5000
concurrent_threads = 8

3.2 Refresh优化

问题:默认1秒refresh,频繁刷新影响写入性能。

优化方案:

// 方案1:延长refresh间隔
PUT /logs/_settings
{
  "index.refresh_interval": "30s"  // 或"60s"
}

// 方案2:批量导入时禁用refresh
PUT /logs/_settings
{
  "index.refresh_interval": "-1"  // 禁用自动refresh
}

// 导入完成后手动refresh
POST /logs/_refresh

// 恢复自动refresh
PUT /logs/_settings
{
  "index.refresh_interval": "1s"
}

性能对比:

refresh_interval=1s:  10000条/秒
refresh_interval=30s: 30000条/秒
refresh_interval=-1:  50000条/秒

3.3 副本数优化

// 批量导入时禁用副本
PUT /logs/_settings
{
  "index.number_of_replicas": 0
}

// 导入完成后恢复副本
PUT /logs/_settings
{
  "index.number_of_replicas": 1
}

3.4 Translog优化

默认配置:每次请求都fsync,保证数据不丢失。

性能优化:

PUT /logs/_settings
{
  // 异步刷盘(可能丢失5秒数据)
  "index.translog.durability": "async",
  "index.translog.sync_interval": "5s",

  // 调大flush阈值
  "index.translog.flush_threshold_size": "1gb"  // 默认512mb
}

风险:节点宕机可能丢失最多5秒数据。

3.5 合并段优化

问题:频繁写入产生大量小segment,影响查询性能。

解决方案:

// 调整合并策略
PUT /logs/_settings
{
  "index.merge.policy.max_merged_segment": "5gb",  // 单segment最大大小
  "index.merge.policy.segments_per_tier": 10       // 每层segment数
}

// 手动强制合并(慎用,IO密集)
POST /logs/_forcemerge?max_num_segments=1

// 只合并删除文档
POST /logs/_forcemerge?only_expunge_deletes=true

forcemerge使用场景:

批量导入完成后
静态索引(不再写入)
定期维护窗口

3.6 路由分片

问题:写入随机分配到各分片,无法利用locality。

优化:

// 根据用户ID路由,同一用户数据在同一分片
PUT /orders/_doc/order-123?routing=user-456
{
  "user_id": "user-456",
  "product": "iPhone"
}

// 查询时也指定routing,只查一个分片
GET /orders/_search?routing=user-456
{
  "query": {
    "term": { "user_id": "user-456" }
  }
}

四、索引生命周期管理(ILM)

4.1 ILM概述

ILM自动化管理索引的生命周期,从创建到删除。

典型阶段:

Hot:频繁写入和查询
Warm:只读,偶尔查询
Cold:归档,很少查询
Delete:删除

4.2 ILM策略配置

示例:日志索引策略

PUT /_ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",      // 大小超过50GB
            "max_age": "1d",         // 或创建超过1天
            "max_docs": 10000000     // 或文档数超过1000万
          },
          "set_priority": {
            "priority": 100  // 高优先级,优先恢复
          }
        }
      },
      "warm": {
        "min_age": "7d",  // 7天后进入warm阶段
        "actions": {
          "forcemerge": {
            "max_num_segments": 1  // 合并为1个segment
          },
          "shrink": {
            "number_of_shards": 1  // 缩减分片数
          },
          "allocate": {
            "number_of_replicas": 1,
            "require": {
              "box_type": "warm"  // 迁移到warm节点
            }
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "require": {
              "box_type": "cold"
            }
          },
          "freeze": {},  // 冻结索引,释放内存
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}  // 删除索引
        }
      }
    }
  }
}

4.3 应用ILM策略

// 1. 创建索引模板
PUT /_index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs_policy",     // 关联ILM策略
      "index.lifecycle.rollover_alias": "logs"   // rollover别名
    }
  }
}

// 2. 创建初始索引
PUT /logs-000001
{
  "aliases": {
    "logs": {
      "is_write_index": true  // 写入别名
    }
  }
}

// 3. 写入数据(使用别名)
POST /logs/_doc
{
  "message": "Application started",
  "timestamp": "2024-01-15T10:00:00Z"
}

// 4. 查看ILM状态
GET /logs-*/_ilm/explain

4.4 Data Stream(推荐)

Data Stream是ILM的升级版,更简单易用。

// 1. 创建data stream模板
PUT /_index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "data_stream": {},
  "template": {
    "settings": {
      "number_of_shards": 3,
      "index.lifecycle.name": "logs_policy"
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },  // 必需字段
        "message": { "type": "text" }
      }
    }
  }
}

// 2. 创建data stream(自动)
POST /logs-nginx/_doc
{
  "@timestamp": "2024-01-15T10:00:00Z",
  "message": "GET /index.html 200"
}

// 3. 查询data stream
GET /logs-nginx/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1h"
      }
    }
  }
}

4.5 ILM最佳实践

Hot节点配置:

# elasticsearch.yml
node.attr.box_type: hot
node.roles: [ master, data_hot, ingest ]

Warm节点配置:

node.attr.box_type: warm
node.roles: [ data_warm ]

Cold节点配置:

node.attr.box_type: cold
node.roles: [ data_cold ]

策略建议:

小型日志系统:
Hot(7天) → Delete

中型日志系统:
Hot(7天) → Warm(30天) → Delete

大型日志系统:
Hot(3天) → Warm(14天) → Cold(90天) → Delete

五、高频面试题

写入性能调优有哪些手段?

答案:

批量写入:使用Bulk API,每批5000条
延长refresh:从1s改为30s或禁用
减少副本:导入时设为0,完成后恢复
异步translog:durability=async
合理分片数:避免过度分片
禁用swap:防止内存交换

代码示例:

from elasticsearch.helpers import parallel_bulk

# 并行批量写入
for success, info in parallel_bulk(
    es,
    docs,
    thread_count=4,
    chunk_size=5000,
    request_timeout=60
):
    if not success:
        print(f'Failed: {info}')

深分页为什么慢?如何优化?

原因:

查询第1000页(from=10000, size=10):
1. 每个分片返回前10010个文档
2. 协调节点收集 3×10010 = 30030 个文档
3. 全局排序后取10000-10010
→ 浪费大量资源

优化方案:

方案	适用场景	实现
Search After	下一页/无限滚动	使用上一页最后的sort值
Scroll	批量导出	创建快照,游标遍历
限制深度	强制限制	max_result_window=10000

如何选择refresh_interval?

选择依据:

实时搜索(搜索引擎):
└─ refresh_interval: 1s(默认)

准实时搜索(电商):
└─ refresh_interval: 5s-10s

日志分析(可延迟):
└─ refresh_interval: 30s-60s

批量导入:
└─ refresh_interval: -1(禁用)

影响:

越短:实时性越好,写入性能越差
越长:写入性能越好,实时性越差

ILM的Hot/Warm/Cold架构是什么?

答案: 根据数据访问频率,分配到不同硬件节点。

架构:

Hot节点(频繁读写):
├─ 硬件: SSD + 大内存 + 多CPU
├─ 数据: 最近7天
└─ 分片: 3主+1副本

Warm节点(只读查询):
├─ 硬件: SATA + 中等内存
├─ 数据: 7-30天
└─ 分片: 缩减为1主+1副本

Cold节点(归档):
├─ 硬件: 大容量SATA
├─ 数据: 30-90天
└─ 分片: Frozen索引

Delete:
└─ 90天后删除

什么时候用Scroll,什么时候用Search After?

对比项	Scroll	Search After
原理	快照+游标	实时查询+上次位置
实时性	否(快照)	是
资源消耗	高(保持上下文)	低
排序要求	无	必须有唯一值
跳页	不支持	不支持
适用场景	全量导出,批处理	实时滚动,下一页

示例:

// Scroll:导出全部订单
POST /orders/_search?scroll=5m
{ "size": 1000 }

// Search After:商品列表翻页
{
  "size": 20,
  "sort": [{"price": "desc"}, {"_id": "asc"}],
  "search_after": [7999, "100"]
}

六、实战技巧

6.1 监控关键指标

# 索引速度
GET /_stats/indexing
{
  "indexing": {
    "index_total": 1000000,        # 总索引文档数
    "index_time_in_millis": 50000, # 总耗时
    "index_current": 10            # 当前正在索引的文档数
  }
}

# 查询性能
GET /_stats/search
{
  "search": {
    "query_total": 500000,
    "query_time_in_millis": 120000,
    "fetch_total": 50000,
    "fetch_time_in_millis": 30000
  }
}

# Segment统计
GET /products/_stats/segments
{
  "segments": {
    "count": 50,               # segment数量(太多需merge)
    "memory_in_bytes": 104857600
  }
}

# JVM堆内存
GET /_nodes/stats/jvm
{
  "jvm": {
    "mem": {
      "heap_used_percent": 75  # 超过85%需优化
    }
  }
}

6.2 慢查询日志

PUT /products/_settings
{
  "index.search.slowlog.threshold.query.warn": "2s",
  "index.search.slowlog.threshold.query.info": "1s",
  "index.search.slowlog.threshold.fetch.warn": "1s",

  "index.indexing.slowlog.threshold.index.warn": "2s",
  "index.indexing.slowlog.threshold.index.info": "1s"
}

6.3 Hot Threads分析

# 查看CPU占用高的线程
GET /_nodes/hot_threads

# 示例输出
   99.0% [cpu=98.9%, other=0.1%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[node-1][search][T#5]'
     at org.elasticsearch.index.query.TermQueryBuilder.doToQuery