mongodb count num of distinct values per field/key mongodb count num of distinct values per field/key mongodb mongodb

mongodb count num of distinct values per field/key


MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.

There is a shell db.collection.distinct() helper as well:

> db.countries.distinct('country');[ "Spain", "England", "France", "Australia" ]> db.countries.distinct('country').length4

As noted in the MongoDB documentation:

Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.


Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.

db.articles.aggregate([    {        $match: {            keywords: { $not: {$size: 0} }        }    },    { $unwind: "$keywords" },    {        $group: {            _id: {$toLower: '$keywords'},            count: { $sum: 1 }        }    },    {        $match: {            count: { $gte: 2 }        }    },    { $sort : { count : -1} },    { $limit : 100 }]);

that give result such as

{ "_id" : "inflammation", "count" : 765 }{ "_id" : "obesity", "count" : 641 }{ "_id" : "epidemiology", "count" : 617 }{ "_id" : "cancer", "count" : 604 }{ "_id" : "breast cancer", "count" : 596 }{ "_id" : "apoptosis", "count" : 570 }{ "_id" : "children", "count" : 487 }{ "_id" : "depression", "count" : 474 }{ "_id" : "hiv", "count" : 468 }{ "_id" : "prognosis", "count" : 428 }


With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.

For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:

db.users.aggregate([    { "$group": {        "_id": { "$toLower": "$role" },        "count": { "$sum": 1 }    } },    { "$group": {        "_id": null,        "counts": {            "$push": { "k": "$_id", "v": "$count" }        }    } },    { "$replaceRoot": {        "newRoot": { "$arrayToObject": "$counts" }    } }    ])

Example Output

{    "user" : 67,    "superuser" : 5,    "admin" : 4,    "moderator" : 12}