# Elasticsearch Part 6: aggregate statistical query

There has been no record before Elasticsearch Aggregate queries or other complex queries . Take notes in this article , To facilitate testing , The index data is still the test index library generated in Article 5 db_student_test , The nickname is student_test

## The first part Basic aggregation

1、 Maximum max、 minimum value min、 Average avg 、 Sum sum

Scene ： Inquiry language 、 Mathematics 、 English The maximum of these three subjects 、 minimum value 、 Average

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"max_chinese" : { "max" : { "field" : "chinese" } },
"min_chinese" : { "min" : { "field" : "chinese" } },
"avg_chinese" : { "avg" : { "field" : "chinese" } },
"max_math": { "max" : { "field" : "math" } },
"min_math": { "min" : { "field" : "math" } },
"avg_math": { "avg" : { "field" : "math" } },
"max_english": { "max" : { "field" : "english" } },
"min_english": { "min" : { "field" : "english" } },
"avg_english": { "avg" : { "field" : "english" } }
}
}```

The query result is ：

```{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"avg_english": {
"value": 57.78366490546798
},
"max_chinese": {
"value": 98
},
"min_chinese": {
"value": 25
},
"min_math": {
"value": 15
},
"max_english": {
"value": 98
},
"avg_chinese": {
"value": 59.353859695794505
},
"avg_math": {
"value": 56.92907568735187
},
"min_english": {
"value": 21
},
"max_math": {
"value": 99
}
}
}```

You can also query the total score of Chinese subjects , It is equivalent to sql  Of sum  Logic , Although it doesn't make any sense here ：

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"sum_chinese" : { "sum" : { "field" : "chinese" } }
}
}```

2、 Count the number , It is equivalent to sql Of count Logic

Scene ： Query the total number of all students , It's easy here count One The column is OK , For example, the field of Mathematics

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"age_count": {
"value_count": {
"field": "math"
}
}
}
}```

The return result is ：

```{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"age_count": {
"value": 50084828
}
}
}```

The total number between classes is ：50084828 It's consistent with the amount of data we generated in Chapter 5

3、distinct polymerization , It is equivalent to sql  Of   count ( distinct )

Scene ： How many kinds of value does the statistics language achievement have

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "chinese"
}
}
}
}```

The return result is ：

```{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"type_count": {
"value": 74
}
}
}```

In terms of results , Only 74 A different score , Match with the rules of the fifth randomly generated data

4、 Statistical aggregation

Scene ： Query language scores Total number 、 Maximum 、 minimum value 、 Average 、 Sum up, etc

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"chinese_stats": {
"stats": {
"field": "chinese"
}
}
}
}```

The return result is ：

```{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"chinese_stats": {
"count": 50084828,
"min": 25,
"max": 98,
"avg": 59.353859695794505,
"sum": 2972727854
}
}
}```

5、 Enhanced statistical aggregation , The query results are based on the above , Plus the variance and other statistical data

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"chinese_stats": {
"extended_stats": {
"field": "chinese"
}
}
}
}```

6、 Quantile aggregation Statistics

The default quantile is 1%  5%  25%  50%  75%  95%  99%  《= The concept of

The concept of quantile ：25% The quantile of is 54, Less than or equal to 54 Of the total sample 25% , That is 54 This number will be at the bottom of 1/4 We're going to split it up .

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"chinese_percents": {
"percentiles": {
"field": "chinese"
}
}
}
}```

You can also customize the quantile ：

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"chinese_percents": {
"percentiles": {
"field": "chinese",
"percents" : [10,20,30,40,50,60,70,80,90]
}
}
}
}```

7、 Range aggregation Statistics

Scene ： The language scores are less than 40 branch 、 Smaller than 50 branch 、 Smaller than 60 The percentage of points

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"gge_perc_rank": {
"percentile_ranks": {
"field": "chinese",
"values": [40,50,60]
}
}
}
}```

The above results are less than 40, Smaller than 50, Smaller than 60 % of , The information we get is ： 21.29%   36.09%   51.12%  As you can see, it's a sequence that's close to arithmetic , It can be seen that the randomness of the test data is still very good .

## The second part Other polymerization methods

1、Term polymerization

Scene ： Want to know the students' Chinese achievement , The number of points on all fractions

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"genres" : {
"terms" : {
"field" : "chinese"
}
}
}
}```

This query will put the fields Chinese Aggregate , for example 87 To synthesize a group of ,88 To synthesize a group of , wait ;

But the default here is to sort by group size , And it doesn't show all the groups , Groups that are too small may be ignored , The results are as follows ：

```{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"genres": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 42560269,
"buckets": [
{
"key": 61,
"doc_count": 752863
},
{
"key": 68,
"doc_count": 752835
},
{
"key": 55,
"doc_count": 752749
},
{
"key": 59,
"doc_count": 752444
},
{
"key": 76,
"doc_count": 752405
},
{
"key": 74,
"doc_count": 752309
},
{
"key": 56,
"doc_count": 752283
},
{
"key": 49,
"doc_count": 752273
},
{
"key": 52,
"doc_count": 752201
},
{
"key": 50,
"doc_count": 752197
}
]
}
}
}```

If you want to customize the filter criteria ,Term Aggregation can also be queried according to the following settings ：

```post  http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"genres" : {
"terms" : {
"field" : "chinese",
"size" : 100,                     //  There may be 100 It's an unused score , We're going to show it all
"order" : { "_count" : "asc" },   //  Sort according to the number of groups
"min_doc_count": 752200          // Filter conditions ： The minimum number of groups is 752200
}
}
}
}```

The query result is ：

```{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"genres": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 52,
"doc_count": 752201
},
{
"key": 49,
"doc_count": 752273
},
{
"key": 56,
"doc_count": 752283
},
{
"key": 74,
"doc_count": 752309
},
{
"key": 76,
"doc_count": 752405
},
{
"key": 59,
"doc_count": 752444
},
{
"key": 55,
"doc_count": 752749
},
{
"key": 68,
"doc_count": 752835
},
{
"key": 61,
"doc_count": 752863
}
]
}
}
}```

2、Filter polymerization

Filter Aggregation is conditional filtering first , In the process of polymerization

Scene ： Query the average score of students in South China University of technology （ First screen schools , And then we aggregate the scores ）

```{
"aggs" : {
"scut_math_avg" : {
"filter" : { "term": { "school": " South China University of Technology " } },
"aggs" : {
"avg_price" : { "avg" : { "field" : "math" } }
}
}
}
}```

The query result is ：

```{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"scut_math_avg": {
"doc_count": 1854993,
"avg_price": {
"value": 56.93080027795253
}
}
}
}```

3、Filters Multiple aggregation

Scene ： Check schools , Chinese 、 Mathematics 、 What is the average score of English , Multiple polymerization can be used , It may be a little slow , as follows

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"messages" : {
"filters" : {
"filters" : {
"school_1" :   { "term" : { "school" : " South China University of Technology " }},
"school_2" : { "term" : { "school" : " Sun Yat sen University " }},
"school_3" : { "match" : { "school" : " Jinan University " }}
}
},
"aggs" : {
"avg_chinese" : { "avg" : { "field" : "chinese" } },
"avg_math" : { "avg" : { "field" : "math" } }
}
}
}
}```

So the result is ：

```{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"messages": {
"buckets": {
"school_1": {
"doc_count": 1854993,
"avg_chinese": {
"value": 59.353236912484306
},
"avg_math": {
"value": 56.93080027795253
}
},
"school_2": {
"doc_count": 1855016,
"avg_chinese": {
"value": 59.349129064115886
},
"avg_math": {
"value": 56.93540918245449
}
},
"school_3": {
"doc_count": 44519876,
"avg_chinese": {
"value": 59.35397212247402
},
"avg_math": {
"value": 56.92948502372289
}
}
}
}
}
}```

4、Range Range aggregation

Scene ： Want to query the number of Chinese scores in each segment , You can query

```POST  http://localhost:9200/student_test1/_search?size=0{
"aggs" : {
"chinese_ranges" : {
"range" : {
"field" : "chinese",
"ranges" : [
{ "to" : 60 },
{ "from" : 60, "to" : 75 },
{ "from" : 75, "to" : 85 },
{ "from" : 85 }
]
}
}
}
}```

The query result is ：

```{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"chinese_ranges": {
"buckets": [
{
"key": "*-60.0",
"to": 60,
"doc_count": 25096839
},
{
"key": "60.0-75.0",
"from": 60,
"to": 75,
"doc_count": 11278543
},
{
"key": "75.0-85.0",
"from": 75,
"to": 85,
"doc_count": 7424634
},
{
"key": "85.0-*",
"from": 85,
"doc_count": 6284812
}
]
}
}
}```

The group names of the returned results are *-60.0 60.0-75.0 75.0-85.0 85.0-*
If we don't want a group name like this , You can customize the group name , for example ：

```POST  http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"chinese_ranges" : {
"range" : {
"field" : "chinese",
"keyed" : true,
"ranges" : [
{ "key" : " fail, ", "to" : 60 },
{ "key" : " pass ", "from" : 60, "to" : 75 },
{ "key" : " good ", "from" : 75, "to" : 85 },
{ "key" : " Excellent ", "from" : 85 }
]
}
}
}
}```

The query result will be ：

```{
"took": 1675,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"chinese_ranges": {
"buckets": {
" fail, ": {
"to": 60,
"doc_count": 25096839
},
" pass ": {
"from": 60,
"to": 75,
"doc_count": 11278543
},
" good ": {
"from": 75,
"to": 85,
"doc_count": 7424634
},
" Excellent ": {
"from": 85,
"doc_count": 6284812
}
}
}
}
}```

There are all kinds of other 、 Complex aggregate queries , All of them can check information online , It even supports some calculation methods of recommendation system , For example, the concept of matrix and so on .

You can also refer to  https://blog.csdn.net/alex_xfboy/article/details/8610