The Basic script to find the duplicate count is as below, but we will not get the complete information of the documents as the bucket size is limited to 10 by default.
GET /index/type/_search
{
"size":0,
"aggs" : {
"db" : {
"terms" : {
"field" : "source-dbtype"
},
"aggs" : {
"count" : {
"terms" : {
"field" : "column_name","min_doc_count": 2
}
}
}
}
}
}
To get the complete details we need to use cardinality aggregation as shown below.
Cardinality Aggregation
A
single-value
metrics aggregation that calculates an approximate count of distinct values. Values can be extracted either from specific fields in the document or generated by a script.
GET index/type/_search
{
"size": 0,
"aggs": {
"maximum_match_counts": {
"cardinality": {
"field": "column_name",
"precision_threshold": 100
}
}
}
}
get value of maximum_match_counts aggregations
Now you can get all duplicate userids
GET index/type/_search
{
"size": 0,
"aggs": {
"column_name": {
"terms": {
"field": "column_name",
"size": maximum_match_counts,
"min_doc_count": 2
}
}
}
}
This will give you the complete output of the duplicates in your index.
Hope this helps :)
Author: Adil Mohammed
No comments:
Write commentsPlease do not enter spam links