查询DSL

一、查询分类

Elasticsearch查询分为两大类:

类型	特点	使用场景	是否计算相关性
Query Context	计算相关性得分(_score)	全文检索、模糊匹配	是
Filter Context	不计算得分,可缓存	精确匹配、范围过滤	否

性能对比:

// Query Context:计算得分,较慢
GET /products/_search
{
  "query": {
    "match": { "title": "手机" }  // 返回_score
  }
}

// Filter Context:不计算得分,更快,可缓存
GET /products/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "status": "published" }},
        { "range": { "price": { "gte": 1000 }}}
      ]
    }
  }
}

二、基础查询

2.1 Match Query(全文检索)

特点:

分词后进行搜索
支持模糊匹配
计算相关性得分

示例:

// 基础match
GET /products/_search
{
  "query": {
    "match": {
      "title": "苹果手机"  // 分词为["苹果", "手机"],匹配任意一个即可
    }
  }
}

// operator:and(必须同时匹配)
GET /products/_search
{
  "query": {
    "match": {
      "title": {
        "query": "苹果手机",
        "operator": "and"  // 必须包含"苹果"和"手机"
      }
    }
  }
}

// minimum_should_match(至少匹配N个词)
GET /products/_search
{
  "query": {
    "match": {
      "title": {
        "query": "苹果 华为 小米",
        "minimum_should_match": 2  // 至少匹配2个词
      }
    }
  }
}

2.2 Term Query(精确匹配)

特点:

不分词,精确匹配
用于keyword、数字、日期、布尔类型
不计算相关性

示例:

// 单值匹配
GET /products/_search
{
  "query": {
    "term": {
      "status": "published"
    }
  }
}

// 多值匹配(terms)
GET /products/_search
{
  "query": {
    "terms": {
      "tags": ["5G", "快充"]  // 匹配任意一个
    }
  }
}

常见坑:

//  错误:text字段使用term查询
GET /products/_search
{
  "query": {
    "term": { "title": "苹果手机" }  // 查询不到!
  }
}
// 原因:title是text类型,已分词为["苹果", "手机"]
// term查询不分词,查找"苹果手机"这个完整词,找不到

//  正确:使用title.keyword
GET /products/_search
{
  "query": {
    "term": { "title.keyword": "苹果手机" }  // 精确匹配
  }
}

2.3 Range Query(范围查询)

操作符:

gte:大于等于
gt:大于
lte:小于等于
lt:小于

示例:

// 数字范围
GET /products/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 5000
      }
    }
  }
}

// 日期范围
GET /logs/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "2024-01-01",
        "lte": "2024-12-31",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

// 相对时间
GET /logs/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "now-7d/d",    // 7天前的0点
        "lte": "now/d"        // 今天的0点
      }
    }
  }
}

2.4 Exists Query(字段存在查询)

// 查询包含email字段的文档
GET /users/_search
{
  "query": {
    "exists": {
      "field": "email"
    }
  }
}

// 查询email为空的文档
GET /users/_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": { "field": "email" }
      }
    }
  }
}

2.5 Wildcard/Prefix Query(通配符查询)

// 前缀查询(高效)
GET /products/_search
{
  "query": {
    "prefix": {
      "title": "iphone"  // 匹配iphone*
    }
  }
}

// 通配符查询(性能差)
GET /products/_search
{
  "query": {
    "wildcard": {
      "title": "*phone*"  // 匹配任意包含phone的词
    }
  }
}

// 正则查询(性能最差)
GET /products/_search
{
  "query": {
    "regexp": {
      "title": "iphone[0-9]+"
    }
  }
}

性能建议:

避免使用*phone(前缀通配符)
优先使用prefix而非wildcard
正则查询仅用于小数据集

三、复合查询

3.1 Bool Query(布尔查询)

子句类型:

子句	作用	影响得分	必须匹配
must	必须匹配,影响得分	是	是
filter	必须匹配,不影响得分	否	是
should	可选匹配,影响得分	是	否
must_not	必须不匹配	否	是

示例:

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "手机" }}  // 标题必须包含"手机",影响得分
      ],
      "filter": [
        { "range": { "price": { "gte": 1000 }}},  // 价格≥1000,不影响得分
        { "term": { "status": "published" }}       // 状态=published
      ],
      "must_not": [
        { "term": { "brand": "华为" }}  // 排除华为品牌
      ],
      "should": [
        { "term": { "color": "红色" }},  // 红色或黑色,提升得分
        { "term": { "color": "黑色" }}
      ],
      "minimum_should_match": 1  // should至少匹配1个
    }
  }
}

最佳实践:

//  正确:精确条件用filter,模糊条件用must
{
  "bool": {
    "must": [
      { "match": { "title": "手机" }}  // 全文检索,需要得分
    ],
    "filter": [
      { "term": { "status": "published" }},  // 精确匹配,不需要得分
      { "range": { "price": { "gte": 1000 }}}
    ]
  }
}

//  错误:所有条件都用must,性能差
{
  "bool": {
    "must": [
      { "match": { "title": "手机" }},
      { "term": { "status": "published" }},  // 应该用filter
      { "range": { "price": { "gte": 1000 }}}
    ]
  }
}

3.2 Multi Match Query(多字段查询)

// 在多个字段中搜索
GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "苹果",
      "fields": ["title", "description", "brand"]
    }
  }
}

// 字段权重(boost)
GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "苹果",
      "fields": ["title^3", "description"]  // title权重×3
    }
  }
}

// 不同匹配类型
GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "苹果 手机",
      "fields": ["title", "description"],
      "type": "best_fields",        // 取最佳字段得分(默认)
      // "type": "most_fields",     // 合并所有字段得分
      // "type": "cross_fields",    // 跨字段匹配(适合姓名搜索)
      // "type": "phrase"           // 短语匹配
    }
  }
}

3.3 Nested Query(嵌套查询)

场景:对象数组查询

问题示例:

// 文档结构
{
  "product": "iPhone",
  "comments": [
    { "user": "张三", "rating": 5 },
    { "user": "李四", "rating": 3 }
  ]
}

//  错误查询:无法关联user和rating
GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "comments.user": "张三" }},
        { "term": { "comments.rating": 3 }}  // 可能匹配到李四的评论!
      ]
    }
  }
}

解决方案:使用nested类型

// 1. Mapping定义nested类型
PUT /products
{
  "mappings": {
    "properties": {
      "comments": {
        "type": "nested",  // 关键!
        "properties": {
          "user": { "type": "keyword" },
          "rating": { "type": "integer" }
        }
      }
    }
  }
}

// 2. 使用nested查询
GET /products/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "match": { "comments.user": "张三" }},
            { "term": { "comments.rating": 5 }}  // 正确关联
          ]
        }
      }
    }
  }
}

四、聚合查询(Aggregations)

4.1 Bucket Aggregations(桶聚合)

Terms聚合:分组统计

// 按品牌分组统计
GET /products/_search
{
  "size": 0,  // 不返回文档,只返回聚合结果
  "aggs": {
    "by_brand": {
      "terms": {
        "field": "brand",
        "size": 10  // 返回前10个桶
      }
    }
  }
}

// 结果
{
  "aggregations": {
    "by_brand": {
      "buckets": [
        { "key": "Apple", "doc_count": 120 },
        { "key": "华为", "doc_count": 80 },
        { "key": "小米", "doc_count": 60 }
      ]
    }
  }
}

Range聚合:范围分组

// 价格区间统计
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 1000, "key": "低价" },
          { "from": 1000, "to": 5000, "key": "中价" },
          { "from": 5000, "key": "高价" }
        ]
      }
    }
  }
}

Date Histogram:时间直方图

// 按月统计订单量
GET /orders/_search
{
  "size": 0,
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"  // 按月
      }
    }
  }
}

4.2 Metric Aggregations(指标聚合)

// 统计价格:最小值、最大值、平均值、总和
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": {
        "field": "price"
      }
    },
    "avg_price": {
      "avg": { "field": "price" }
    },
    "max_price": {
      "max": { "field": "price" }
    }
  }
}

// 结果
{
  "aggregations": {
    "price_stats": {
      "count": 1000,
      "min": 999.0,
      "max": 12999.0,
      "avg": 4532.5,
      "sum": 4532500.0
    }
  }
}

4.3 Pipeline Aggregations(管道聚合)

对聚合结果再次聚合。

// 计算每月销售额,并找出最大销售额的月份
GET /orders/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      },
      "aggs": {
        "total_sales": {
          "sum": { "field": "amount" }
        }
      }
    },
    "max_monthly_sales": {
      "max_bucket": {
        "buckets_path": "sales_per_month>total_sales"  // 管道聚合
      }
    }
  }
}

4.4 嵌套聚合

// 按品牌分组,再按价格区间统计
GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_brand": {
      "terms": { "field": "brand" },
      "aggs": {
        "price_ranges": {
          "range": {
            "field": "price",
            "ranges": [
              { "to": 3000 },
              { "from": 3000, "to": 6000 },
              { "from": 6000 }
            ]
          }
        }
      }
    }
  }
}

五、分词器

5.1 内置分词器

分词器	特点	示例
standard	默认,按词分割,小写化	"Quick-Brown" → ["quick", "brown"]
simple	非字母处分割,小写化	"Quick-123" → ["quick"]
whitespace	按空格分割	"Quick Brown" → ["Quick", "Brown"]
keyword	不分词	"Quick Brown" → ["Quick Brown"]

5.2 IK中文分词器

安装:

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v8.11.0/elasticsearch-analysis-ik-8.11.0.zip

两种模式:

模式	特点	示例
ik_max_word	最细粒度,索引时使用	"中华人民共和国" → ["中华人民共和国", "中华人民", "中华", "华人", "人民共和国", "人民", "共和国"]
ik_smart	最粗粒度,搜索时使用	"中华人民共和国" → ["中华人民共和国"]

使用示例:

PUT /articles
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",      // 索引时细粒度分词
        "search_analyzer": "ik_smart"  // 搜索时粗粒度分词
      }
    }
  }
}

自定义词典:

# 配置文件:config/analysis-ik/IKAnalyzer.cfg.xml
<properties>
  <comment>IK Analyzer 扩展配置</comment>
  <entry key="ext_dict">custom_words.dic</entry>
  <entry key="ext_stopwords">custom_stopwords.dic</entry>
</properties>

# custom_words.dic
华为Mate60
苹果15Pro

# custom_stopwords.dic
的
了
吗

5.3 测试分词器

// 测试standard分词器
GET _analyze
{
  "analyzer": "standard",
  "text": "Quick Brown Fox"
}

// 测试IK分词器
GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "我爱北京天安门"
}

// 测试自定义analyzer
GET /articles/_analyze
{
  "field": "content",
  "text": "华为Mate60发布"
}

六、高频面试题

match和term的区别?

对比项	match	term
分词	查询词分词	不分词
适用字段	text	keyword、数字、日期
相关性	计算得分	不计算得分(可用filter)
使用场景	全文检索	精确匹配

如何优化聚合性能?

使用keyword字段:

//  错误:text字段聚合需要启用fielddata,内存占用大
{
  "aggs": {
    "by_title": {
      "terms": { "field": "title" }  // text字段,性能差
    }
  }
}

//  正确:使用keyword字段
{
  "aggs": {
    "by_title": {
      "terms": { "field": "title.keyword" }
    }
  }
}

限制桶数量:

{
  "aggs": {
    "by_brand": {
      "terms": {
        "field": "brand",
        "size": 10  // 只返回前10个桶
      }
    }
  }
}

使用filter减少聚合数据量:

{
  "query": {
    "range": { "created_at": { "gte": "now-7d" }}  // 先过滤
  },
  "aggs": {
    "sales": {
      "sum": { "field": "amount" }
    }
  }
}

must和filter的区别?

特性	must	filter
是否计算得分	是	否
是否缓存	否	是
性能	较慢	较快
使用场景	需要相关性排序	精确条件过滤

示例:

{
  "bool": {
    "must": [
      { "match": { "title": "手机" }}  // 需要得分,影响排序
    ],
    "filter": [
      { "term": { "status": "published" }},  // 不需要得分,可缓存
      { "range": { "price": { "gte": 1000 }}}
    ]
  }
}

如何实现高亮搜索?

GET /articles/_search
{
  "query": {
    "match": { "content": "elasticsearch" }
  },
  "highlight": {
    "fields": {
      "content": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"],
        "fragment_size": 150,  // 片段长度
        "number_of_fragments": 3  // 返回3个片段
      }
    }
  }
}

// 结果
{
  "hits": {
    "hits": [{
      "_source": { "content": "..." },
      "highlight": {
        "content": [
          "这是一篇关于<em>elasticsearch</em>的文章..."
        ]
      }
    }]
  }
}

如何实现分页?

方案1:from + size(浅分页)

GET /products/_search
{
  "from": 0,
  "size": 20,
  "query": { "match_all": {} }
}

// 限制:from + size ≤ 10000

方案2:search_after(深分页)

// 第一页
GET /products/_search
{
  "size": 20,
  "query": { "match_all": {} },
  "sort": [
    { "price": "asc" },
    { "_id": "asc" }
  ]
}

// 第二页:使用上一页最后一个文档的sort值
GET /products/_search
{
  "size": 20,
  "query": { "match_all": {} },
  "search_after": [999, "doc-123"],  // 上一页最后的sort值
  "sort": [
    { "price": "asc" },
    { "_id": "asc" }
  ]
}

方案3:scroll(全量导出)

// 创建scroll
POST /products/_search?scroll=1m
{
  "size": 1000,
  "query": { "match_all": {} }
}

// 使用scroll_id继续遍历
POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAA..."
}

方案	适用场景	优点	缺点
from + size	浅分页(< 10000)	简单,支持跳页	深分页性能差
search_after	深分页	性能好	不支持跳页
scroll	全量导出	一致性快照	占用资源,不实时