ElasticSearch 学习笔记：聚合(Aggregation) Top Hits

1 功能简介聚合后，每一个聚合 Bucket 里面仅返回指定顺序的前 N 条数据。

2 使用示例（1）场景示例： ES 库中存储着成员数据，每个成员有自己的编号 ID、所属的团队 ID 和个人得分等数据：id, team_id, score, age…

给定一组团队 ID 列表：team_id IN (1, 5, 7)

查询每个团队中得分最高的 2 个人的编号 ID。

（2）ES 查询示例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41


GET .../_search?routing=xxx  // 若已知数据属于某一个或几个路由分区，设置路由会提升性能。
{
  "size": 0,  // 仅过滤数据，不返回命中数据。
  "query": {
    "bool": {
      "filter": [  // 过滤条件，在聚合前先进行数据筛选。
        {
          "terms": {
            "team_id": [
              1,
              5,
              7
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "group_aggs": {  // 第一层聚合：先按照team_id将数据聚合成多个Bucket。
      "terms": {
        "field": "team_id",
        "execution_hint": "map"  // 若可知该层聚合结果数量很小，设置成map可提升性能。
      },
      "aggs": {
        "top_score_member": {  // 第二层聚合：在第一层聚合结果中的每个Bucket内，在进行top_hits聚合操作。
          "top_hits": {
            "size": 2,  // 仅返回前2条记录
            "sort": [   // 排序条件按照score倒序
              {
                "score": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

（3）Java 查询示例：

TransportClient 版本示例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


// 过滤条件
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.termsQuery("team_id", Lists.newArrayList(1, 3, 5)));

// 聚合条件
AggregationBuilder groupAggBuilder = AggregationBuilders.terms("group_aggs")
        .field("team_id")
        .executionHint("map");  // 若可知该层聚合结果数量很小，设置成map可提升性能。
AggregationBuilder topScoreAggBuilder = AggregationBuilders.topHits("top_score_member")
        .sort("score", SortOrder.DESC)
        .size(2);
groupAggBuilder.subAggregation(topScoreAggBuilder);

// 查询结果
SearchResponse response = transportClient.prepareSearch("index_name").setTypes("type_name")
                    .setRouting("xxx")  // 若已知数据属于某一个或几个路由分区，设置路由会提升性能。
                    .setSize(0)
                    .setQuery(boolQueryBuilder)
                    .addAggregation(groupGoodsAggBuilder)
                    .get();

RestHighLevelClient 示例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


// 过滤条件
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.termsQuery("team_id", Lists.newArrayList(1, 3, 5)));

// 聚合条件
AggregationBuilder groupAggBuilder = AggregationBuilders.terms("group_aggs")
        .field("team_id")
        .executionHint("map");  // 若可知该层聚合结果数量很小，设置成map可提升性能。
AggregationBuilder topScoreAggBuilder = AggregationBuilders.topHits("top_score_member")
        .sort("score", SortOrder.DESC)
        .size(2);
groupAggBuilder.subAggregation(topScoreAggBuilder);

// 构造查询对象
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(boolQueryBuilder);
searchSourceBuilder.size(0);
searchSourceBuilder.aggregation(groupGoodsAggBuilder);
// 观察线上接口响应情况，设置合理的超时时间。
searchSourceBuilder.timeout(new TimeValue(300));
SearchRequest request = new SearchRequest("index_name")
request.source(searchSourceBuilder);
request.setRouting("xxx")  // 若已知数据属于某一个或几个路由分区，设置路由会提升性能。

// 请求数据
SearchResponse searchResponse = restHighLevelClient.search(request, RequestOptions.DEFAULT);

SearchResponse 解析示例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


if (Objects.nonNull(response) && Objects.equals(response.status(), RestStatus.OK)) {
    Terms groupResult = response.getAggregations().get("group_aggs");
    if (Objects.nonNull(groupResult)) {
        for (Terms.Bucket groupBucket : groupResult.getBuckets()) {
            TopHits topScoreResult = groupBucket.getAggregations().get("top_score_member");
            if (Objects.nonNull(topScoreResult) && topScoreResult.getHits().getHits().length > 0) {
                SearchHit searchHit = topScoreResult.getHits().getAt(0);
                MemberDTO top1Member = JSON.parseObject(searchHit.getSourceAsString(), MemberDTO.class);
                SearchHit searchHit = topScoreResult.getHits().getAt(1);
                MemberDTO top2Member = JSON.parseObject(searchHit.getSourceAsString(), MemberDTO.class);
                // 其它逻辑
            }
        }
    }
}

参考：

https://blog.csdn.net/cuixianlong/article/details/104426160

另外的查询：

ES 分组取每组第一条的 ES 写法和 Java 写法_Counter-Strike 大牛-程序员秘密

例子中按 trace_id 分组，然后每个分组中按照 log_time 正序排列取第一条。 ES 写法：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70


{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "log_level:ERROR",
            "fields": [],
            "type": "best_fields",
            "default_operator": "or",
            "max_determinized_states": 10000,
            "enable_position_increments": true,
            "fuzziness": "AUTO",
            "fuzzy_prefix_length": 0,
            "fuzzy_max_expansions": 50,
            "phrase_slop": 0,
            "escape": false,
            "auto_generate_synonyms_phrase_query": true,
            "fuzzy_transpositions": true,
            "boost": 1
          }
        },
        {
          "range": {
            "log_time": {
              "from": "2021-06-02 18:00:44.727",
              "to": "2021-06-02 18:05:44.727",
              "include_lower": true,
              "include_upper": false,
              "format": "yyyy-MM-dd HH:mm:ss.SSS",
              "boost": 1
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "aggs": {
    "group_by_trace_id": {
      "terms": {
        "field": "trace_id",
        "order": {
          "top_hit": "asc"
        }
      },
      "aggs": {
        "min_trace": {
          "min": {
            "field": "log_time"
          }
        },
        "top_test": {
          "top_hits": {
            "sort": {
              "log_time": "asc"
            },
        "size":1
          }
        },
        "top_hit": {
          "min": {
            "script": "_score"
          }
        }
      }
    }
  }
}

Java 写法：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71


MetricData elasticsearchMetric = new MetricData();
        ElasticsearchInfo elasticsearchInfo = new ElasticsearchInfo(metricContract.getDataSourceContract());
        EsRestClientContainer esRestClientContainer = elasticsearchSourceManager.findEsRestClientContainer(elasticsearchInfo);
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
                .must(new QueryStringQueryBuilder(metricContract.getQueryString()))
                .must(QueryBuilders.rangeQuery(metricContract.getDataNameContract().getTimestampField())
                        .from(start.toDateTimeISO().toString(dateTimeFormatter))
                        .to(end.toDateTimeISO().toString(dateTimeFormatter))
                        .includeLower(true)
                        .includeUpper(false)
                        .format(dateTimeFormatter));
        Map<String, String> dataNameProperties = metricContract.getDataNameContract().getSettings();
        String indexPrefix = dataNameProperties.get("indexPrefix");
        String datePattern = dataNameProperties.get("timePattern");
        String[] indices = esRestClientContainer.buildIndices(start, end, indexPrefix, datePattern);
        Long count = null;
        try {
            count = esRestClientContainer.totalCount(boolQueryBuilder, indices);
        } catch (Exception ex) {
            log.error("queryElasticsearchMetricValue 发生异常：", ex);
            throw new RuntimeException("error when totalCount", ex);
        }
        if (metricContract.getAggregationType().equalsIgnoreCase(SymbolExpr.COUNT)) {
            elasticsearchMetric.setMetricValue(count);
        }
        if (count == 0) {
            elasticsearchMetric.setMetricValue(0);
            return elasticsearchMetric;
        }
        SearchRequest searchRequest = new SearchRequest(indices);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.trackScores(false);
        searchSourceBuilder.trackTotalHits(true);
        searchSourceBuilder.query(boolQueryBuilder).from(0).size(10)
                .sort(metricContract.getDataNameContract().getTimestampField(), SortOrder.DESC);
        attachAggregation(metricContract, searchSourceBuilder);
        // 聚合搜索
        TermsAggregationBuilder termsBuilder = AggregationBuilders.terms("group_by_trace_id").field("trace_id");
        MinAggregationBuilder minAggregationBuilder = AggregationBuilders.min("min_trace").field("log_time");
        TopHitsAggregationBuilder topHitsAggregationBuilder = AggregationBuilders.topHits("top_detail").sort("log_time", SortOrder.ASC).size(1);
        MinAggregationBuilder minAggregationBuilderTopHit = AggregationBuilders.min("top_hit").field("_score");
//        TopHitsAggregationBuilder topHitsAggregationBuilder = AggregationBuilders.topHits("min_trace").("trace_id", SortOrder.ASC).sort("log_time", SortOrder.ASC).size(10);
        termsBuilder.subAggregation(minAggregationBuilder);
        termsBuilder.subAggregation(topHitsAggregationBuilder);
        termsBuilder.subAggregation(minAggregationBuilderTopHit);
        searchSourceBuilder.aggregation(termsBuilder);
        // 执行查询
        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = esRestClientContainer.fetchHighLevelClient().search(searchRequest, RequestOptions.DEFAULT);
        ParsedStringTerms stringTerms = searchResponse.getAggregations().get("group_by_trace_id");
        List<? extends Terms.Bucket> buckets = stringTerms.getBuckets();
        if (metricContract.getAggregationType().equalsIgnoreCase(SymbolExpr.COUNT)) {
            if (buckets.size() > 0) {
                elasticsearchMetric.setMetricValue(buckets.size());
            }
        } else {
            Double numericValue = findAggregationValue(metricContract, searchResponse);
            elasticsearchMetric.setMetricValue(numericValue);
        }
        List<Map<String, Object>> latestDocumentList = new ArrayList<>();
        for (Terms.Bucket bucket : buckets) {
            ParsedTopHits topDetail = bucket.getAggregations().get("top_detail");
            SearchHit[] hits = topDetail.getHits().getHits();
            for (SearchHit hit : hits) {
                Map<String, Object> latestDocument = hit.getSourceAsMap();
                latestDocument.put("esDataId", latestDocument.get("id"));
                latestDocumentList.add(latestDocument);
                elasticsearchMetric.setLatestDocumentList(latestDocumentList);
            }
        }
        return elasticsearchMetric;

es 删除索引：

1
2


PASSWORD=jH9q52s82u5F33kyt74zxqwy
curl -u "elastic:$PASSWORD"  -X DELETE 'https://localhost:9200/kibana*' -k

es 删除材料

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46


PASSWORD=jH9q52s82u5F33kyt74zxqwy
curl -u "elastic:$PASSWORD" -X POST "https://localhost:9200/remote_statistics/_delete_by_query" -k -H 'Content-Type: application/json' -d'
{
  "query": {
   "term": {
     "beginTime": {
       "value": 0
     }
   }
  }
}
'
curl -u  "elastic:$PASSWORD" -X DELETE
PASSWORD=123456
curl -u  "elastic:$PASSWORD" -XDELETE "http://10.7.11.80:9201/remote_statistics" -k
curl -u "elastic:$PASSWORD" -X GET "https://localhost:9200/_cluster/health" -k
curl -u  "elastic:$PASSWORD" -XDELETE "http://10.7.11.80:9201/remote_statistics" -k
curl -u "elastic:$PASSWORD" -X GET "https://localhost:9200/_cat/indices?v&pretty'" -
单节点 Elasticsearch 健康状态为 yellow 问题的解决
PASSWORD=jH9q52s82u5F33kyt74zxqwy
curl -u "elastic:$PASSWORD" -X PUT "https://localhost:9200/_settings" -k -H 'Content-Type: application/json' -d'
{
 "number_of_replicas":0
}
'

curl  -u "elastic:$PASSWORD" -X PUT "https://localhost:9200/_cluster/settings?pretty" -k -H 'Content-Type: application/json' -d'
{
    "transient" : {
        "cluster.routing.allocation.enable" : "all"
    }
}'
DELETE /remote_statistics/_doc/1537410295226564608
DELETE   /remote_statistics/_doc/1537410404274274304
DELETE /remote_statistics/_doc/1537410394350551040
DELETE /remote_statistics/_doc/1537413001215344640
DELETE /remote_statistics/_doc/1537412942583169024
DELETE /remote_statistics/_doc/1537413009532649472
DELETE /remote_statistics/_doc/1537413006911209472
DELETE /remote_statistics/_doc/1537413984825769984
DELETE /remote_statistics/_doc/1537414138886750208
DELETE /remote_statistics/_doc/1537414142061838336
GET /remote_statistics/
GET /_cluster/health

GET /_cat/indices

esapi 学习

1
2
3
4


2.1、查看集群健康状况
curl -X GET "http://10.49.196.11:9200/_cat/health?v=true"
2.2、查看集节点信息
curl -X GET "http://10.49.196.11:9200/_cat/nodes?v=true"

3、Index APIs

3.1、创建索引

同时设置了 setting 和 mapping 信息；setting 里面包含分片和副本信息，mapping 里包含字段设置的详细信息。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


curl -X PUT -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index' -d '
{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 1
      }
  },
  "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      },
      "name": {
        "type": "keyword"
      },
      "poems": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_max_word"
      },
      "about": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_max_word"
      },
      "success": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_max_word"
      }
    }
  }
}'

3.2、修改 _mapping 信息

字段可以新增，已有的字段只能修改字段的 search_analyze r 属性。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


curl -X PUT -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index' -d '
{
  "properties": {
    "name": {
      "type": "text",
      "analyzer": "ik_max_word",
      "search_analyzer": "ik_max_word"
    },
    "age": {
      "type": "integer"
    },
    "desc": {
      "type": "text",
      "analyzer": "ik_max_word",
      "search_analyzer": "ik_smart"
    }
  }
}'

3.3、删除索引

1

curl -X DELETE 'http://10.49.196.11:9200/poet-index'

3.4、查询索引列表

1

curl -X GET "http://10.49.196.11:9200/*"

或

1

curl -X GET "http://10.49.196.11:9200/_all"

3.5、查询索引详情

1

curl -X GET 'http://10.49.196.11:9200/poet-index'

4、Document APIs

4.1、新增文档

A、设置 id 为 1

1
2
3
4
5
6
7
8


curl -X POST -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_create/1' -d '
{
  "age": 30,
  "name": "李白",
  "poems": "静夜思",
  "about": "字太白",
  "success": "创造了古代浪漫主义文学高峰、歌行体和七绝达到后人难及的高度"
}'

B、不设置 id，将自动生成

1
2
3
4
5
6
7
8


curl -X POST -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_doc' -d '
{
  "age": 31,
  "name": "杜甫",
  "poems": "登高",
  "about": "字子美",
  "success": "唐代伟大的现实主义文学作家，唐诗思想艺术的集大成者"
}'

C、批量新增文档

1
2
3
4
5
6
7


curl -X POST -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_bulk' -d '
{"index":{"_id":"11"}}
{"age": 30,"name": "杜甫11","poems": "登高","about": "字子美","success": "唐代伟大的现实主义文学作家，唐诗思想艺术的集大成者"}
{"index":{"_id":"12"}}
{"age": 30,"name": "杜甫12","poems": "登高","about": "字子美","success": "唐代伟大的现实主义文学作家，唐诗思想艺术的集大成者"}

'

注：最后的空行是需要的，否则会报错。

4.2、删除文档

1

curl -X DELETE 'http://10.49.196.11:9200/poet-index/_doc/1'

4.3、更新文档

只更新参数设置的字段。

1
2
3
4
5
6
7


curl -X POST -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_update/1' -d '
{
  "doc": {
    "age": 32,
    "poems": "望庐山瀑布"
  }
}'

4.4、新增或覆盖文档

没有对应 id 的文档就创建，有就覆盖更新所有的字段(相当于先删除再新增)。

1
2
3
4
5
6
7


curl -X PUT -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_doc/1' -d '
{
  "age": 31,
  "name": "李白",
  "poems": "静夜思",
  "about": "字太白"
}'

5、Search APIs

5.1、查询一个索引的所有文档

1

curl -X GET 'http://10.49.196.11:9200/poet-index/_search'

5.2、根据 id 查询文档

1

curl -X GET 'http://10.49.196.11:9200/poet-index/_doc/1'

5.3、term 查询

term 查询不会对输入的内容进行分词处理，而是作为一个整体来查询。

A、查询单个词

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "term": {
      "name": {
        "value": "李白"
      }
    }
  }
}'

B、查询多个词

1
2
3
4
5
6
7
8


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "terms": {
      "name": ["李白", "杜甫"]
    }
  }
}'

5.4、range 查询

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "range": {
      "age": {
        "gte": 20,
        "lte": 35
      }
    }
  }
}'

5.5、全文查询

5.5.1、match

对输入的内容进行分词处理，再根据分词查询。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "match": {
      "success": "理想主义"
    }
  },
  "from": 0,
  "size": 10,
  "sort": [{
    "name": {
      "order": "asc"
    }
  }]
}'

5.5.2、multi_match

多字段匹配。

1
2
3
4
5
6
7
8
9


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "multi_match": {
      "query": "太白",
      "fields": ["about", "success"]
    }
  }
}'

5.5.3、match_phrase

匹配整个查询字符串。

1
2
3
4
5
6
7
8


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "match_phrase": {
      "success": "文学作家"
    }
  }
}'

5.5.4、match_all

查询所有数据。

1
2
3
4
5
6
7


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "match_all": {
     }
  }
}'

5.5.5、query_string

query_string 可以同时实现前面几种查询方法。

A、类似 match

1
2
3
4
5
6
7
8
9


curl -X GET  -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "query_string": {
      "default_field": "success",
      "query": "古典文学"
    }
  }
}'

B、类似 mulit_match

1
2
3
4
5
6
7
8
9


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "query_string": {
      "query": "古典文学",
      "fields": ["about", "success"]
    }
  }
}'

C、类似 match_phrase

1
2
3
4
5
6
7
8
9


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "query_string": {
      "default_field": "success",
      "query": "\"古典文学\""
    }
  }
}'

D、带运算符查询，运算符两边的词不再分词

1、查询同时包含 ”文学“ 和 ”伟大“ 的文档

1
2
3
4
5
6
7
8
9


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "query_string": {
      "default_field": "success",
      "query": "文学 AND 伟大"
    }
  }
}'

或

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "query_string": {
      "fields": ["success"],
      "query": "文学 伟大",
      "default_operator": "AND"
    }
  }
}'

2、查询 name 或 success 字段包含"文学"和"伟大"这两个单词，或者包含"李白"这个单词的文档。

1
2
3
4
5
6
7
8
9


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "query_string": {
      "query": "(文学 AND 伟大) OR 李白",
      "fields": ["name", "success"]
    }
  }
}'

5.5.6、simple_query_string

类似 query_string，主要区别如下：

1、不支持 AND OR NOT ，会当做字符处理；使用 + 代替 AND，| 代替 OR，- 代替 NOT 2、会忽略错误的语法

查询同时包含 ”文学“ 和 ”伟大“ 的文档：

1
2
3
4
5
6
7
8
9


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "simple_query_string": {
      "fields": ["success"],
      "query": "文学 + 伟大"
    }
  }
}'

5.6、模糊查询

模糊查询时使用的参数：

fuzziness	允许的最大编辑距离，默认不开启模糊查询，相当于 fuzziness=0。支持的格式 1、可以是数字（0、1、2）代表固定的最大编辑距离 2、自动模式，AUTO:[low],[high] 查询词长度在 [0-low)范围内编辑距离为 0（即强匹配) 查询词长度在 [low, high) 范围内允许编辑 1 次查询词长度 >high 允许编辑 2 次
prefix_length	控制两个字符串匹配的最小相同的前缀大小，也就是前 n 个字符不允许编辑，必须与查询词相同，默认是 0，大于 0 时可以显著提升查询性能
max_expansions	产生的最大模糊选项
transpositions	相邻位置字符互换是否算作 1 次编辑距离，全文查询不支持该参数

A、全文查询时使用模糊参数

先分词再计算模糊选项。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "match": {
        "success": {
         "query": "古典文化",
        "fuzziness": 1,
        "prefix_length": 0,
        "max_expansions": 5
        }
    }
  }
}'

B、使用 fuzzy query

对输入不分词，直接计算模糊选项。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "fuzzy": {
      "success": {
        "value": "理想",
        "fuzziness": 1,
        "prefix_length": 0,
        "transpositions": true
      }
    }
  }
}'

5.7、组合查询

组合查询使用 bool 来组合多个查询条件。

条件	说明
must	同时满足
should	满足其中任意一个
must_not	同时不满足
filter	过滤搜索，不计算得分

A、查询 success 包含 “思想” 且 age 在 [20-40] 之间的文档：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "bool": {
      "must": [{
        "simple_query_string": {
          "query": "思想",
          "fields": ["success"]
        }
      }, {
        "range": {
          "age": {
            "gte": 20,
            "lte": 40
          }
        }
      }]
    }
  }
}'

B、过滤出 success 包含 “思想” 且 age 在 [20-40] 之间的文档，不计算得分：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "bool": {
      "filter": [{
        "simple_query_string": {
          "query": "思想",
          "fields": ["success"]
        }
      }, {
        "range": {
          "age": {
            "gte": 20,
            "lte": 40
          }
        }
      }]
    }
  }
}'

5.8、聚合查询

A、求和

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "aggs": {
    "age_sum": {
      "sum": {
        "field": "age"
      }
    }
  }
}'

B、类似 select count distinct(age) from poet-index

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/test-index/_search' -d '
{
  "aggs": {
    "age_count": {
      "cardinality": {
        "field": "age"
      }
    }
  }
}'

C、数量、最大、最小、平均、求和

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "aggs": {
    "age_stats": {
      "stats": {
        "field": "age"
      }
    }
  },
  "size": 0
}'

D、类似 select name,count(*) from poet-index group by name

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "aggs": {
    "name_terms": {
      "terms": {
        "field": "name"
      }
    }
  },
  "size": 0
}'

E、类似 select name,age, count(*) from poet-index group by name,age

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "aggs": {
    "name_terms": {
      "terms": {
        "field": "name"
      },
      "aggs": {
        "age_terms": {
          "terms": {
            "field": "age"
          }
        }
      }
    }
  },
  "size": 0
}'

F、类似 select avg(age) from poet-indexwhere name=‘李白’

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "name": "李白"
        }
      }
    }
  },
  "aggs": {
    "age_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "size": 0
}'

5.9、推荐搜索

如果希望 Elasticsearch 能够根据我们的搜索内容给一些推荐的搜索选项，可以使用推荐搜索。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "suggest": {
    "success_suggest": {
      "text": "思考",
      "term": {
        "field": "success",
        "analyzer": "ik_max_word",
        "suggest_mode": "always",
        "min_word_length":2
      }
    }
  }
}'

推荐模式 suggest_mode:

推荐模式	说明
popular	推荐词频更高的一些搜索
missing	当没有要搜索的结果的时候才推荐
always	无论什么情况下都进行推荐

5.10、高亮显示

对搜索结果中的关键字高亮显示。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/poet-index/_search' -d '
{
  "query": {
    "match": {
      "success": "思想"
    }
  },
  "highlight": {
    "pre_tags": "<span color='red'>",
    "post_tags": "</span>",
    "fields": {
      "success": {}
    }
  }
}'

5.11、SQL 查询

Elasticsearch 支持通过 SQL 查询数据。

1
2
3
4


curl -X GET -H 'Content-Type:application/json' 'http://10.49.196.11:9200/_sql' -d '
{
  "query": "SELECT * FROM \"poet-index\" limit 3"
}'

详细的 Elasticsearch REST API 使用说明，请参考官网文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/rest-apis.html。

标签: 搜索引擎

1、查看集群状态

curl ‘10.18.37.223:9200/_cat/health?v’ 绿色表示一切正常, 黄色表示所有的数据可用但是部分副本还没有分配,红色表示部分数据因为某些原因不可用

2、获取集群节点列表

curl ‘10.18.37.223:9200/_cat/nodes?v’

3、查看所有 index

curl -X GET ‘http://10.18.37.223:9200/_cat/indices?v’

4、查询所有的 index 包含其所有的 type

curl ‘10.18.37.223:9200/_mapping?pretty=true’

5、查询某个 index 下的所有 type

curl ‘10.18.37.223:9200/test/_mapping?pretty=true’ 查询 test 下的所有 type

6、查询某个 index 的所有数据

curl ‘10.18.37.223:9200/test/_search?pretty=true’

7、查询 index 下某个 type 类型的数据

curl ‘10.18.37.223:9200/test/test_topic/_search?pretty=true’ 其中：根据规划，Elastic 6.x 版只允许每个 Index 包含一个 Type，7.x 版将会彻底移除 Type, index=test type=test_topic 注意自己使用的版本

8、查询 index 下某个 type 下 id 确定的数据

curl ‘10.18.37.223:9200/test/test_topic/3525?pretty=true’ index = test type= test_topic id = 3525

9、和 sql 一样的查询数据

curl “10.18.37.223:9200/test/_search” -d’ { “query”: { “match_all”: {} }, “_source”: [“account_number”, “balance”], “sort”: { “balance”: { “order”: “desc” }, “from”: 10, “size”: 10 } ' 注:-d 之后的内容使用回车输入，不能使用换行符，es 不能识别 query:里面为查询条件此处为全部，不做限制，_source:为要显示的那些字段 sort：为排序字段 from 为从第 10 条开始,size:取 10 条除此之外还有：布尔匹配，or 匹配。包含匹配。范围匹配。更多查询请去官网查看：官网查询 API 地址

10、创建索引(index)

curl -X PUT ‘10.18.37.223:9200/test?pretty’ OR curl -X PUT ‘10.18.37.223:9200/test’ 创建一个名为 test 的索引注：索引只能是小写，不能以下划线开头，也不能包含逗号如果没有明确指定索引数据的 ID，那么 es 会自动生成一个随机的 ID,需要使用 POST 参数

11、往 index 里面插入数据

curl -X PUT ‘10.18.37.223:9200/test/test_zhang/1?pretty’ -d ' {“name”:“tom”,“age”:18}’ 往 es 中插入 index=test,type=test_zhang id = 1 的数据为 {“name”:“tom”,“age”:18}的数据。 -X POST 也即可

12、修改数据

curl -X PUT ‘10.18.37.223:9200/test/test_zhang/1?pretty’ -d ‘{“name”:“pete”,“age”:20}’ 注：修改 index = test type=test_zhang id = 1 数据： {“name”:“tom”,“age”:18} 为{“name”:“pete”,“age”:20} 成功之后执行查看数据命令可看到最新数据，且 version 会增加一个版本

13、更新数据同时新增数据，在一个 index，type 中

curl -X POST ‘10.18.37.223:9200/test/test_zhang/1/_update?pretty’ -d ‘{“doc”:{“name”:“Alice”,“age”:18,“addr”:“beijing”}}’ 注：修改了名字，年龄，同时新增了字段 addr=beijing

14、利用 script 更新数据

curl -X POST ‘10.18.37.223:9200/test/test_zhang/1/_update?pretty’ -d ‘{“script”: “ctx._source.age += 5”}’ 注：将年龄加 5 从 ES 1.4.3 以后， inline script 默认是被禁止的要打开，需要在 config/elasticsearch.yml 中添加如下配置： script.inline:true script.indexed:true 然后重启（如果是集群模式：需要每个节点都添加然后重启）

15、删除记录

curl -X DELETE ‘10.18.37.223:9200/test/test_zhang/1’ 注：删除 index = test type = test_zhang id = 1 的数据

16、删除 index

curl -X DELETE ‘10.18.37.223:9200/test’ 删除 index=test 的数据

17、批量操作

curl -X POST ‘10.18.37.223:9200/test/test_zhang/_bulk?pretty’ -d ' {“index”:{"_id":“2”}} {“name”:“zhangsan”,“age”:12} {“index”:{"_id":“3”}} {“name”:“lisi”} ' 注：在 index = test type = test_zhang 下新增 id= 2 和 id=3 的两条数据

curl -X POST ‘10.18.37.223:9200/test/test_zhang/_bulk?pretty’ -d ' {“update”:{"_id":“2”}} {“doc”:{“name”:“wangwu”}} {“delete”:{"_id":“3”}}’ 注：修改 id = 2 的数据并且同时删除掉 id=3 的数据在 index = test type = test_zhang 下

18、根据条件删除

curl -X POST “10.18.37.223:9200/test/_delete_by_query” -d’ { “query”: { “match”: { “name”: “pete” } } }’ 使用 es 的_delete_by_query，此插件在 es2.0 版本以后被移除掉，要使用此命令。需要自己安装_delete_by_query 插件：在 es 安装目录下。bin 目录下，执行： ./plugin install delete-by-query 安装插件如果是集群模式，则每个节点都需要安装然后重启