Find duplicate records in MongoDB Find duplicate records in MongoDB database database

Find duplicate records in MongoDB


Use aggregation on name and get name with count > 1:

db.collection.aggregate([    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },     {"$project": {"name" : "$_id", "_id" : 0} }]);

To sort the results by most to least duplicates:

db.collection.aggregate([    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },     {"$sort": {"count" : -1} },    {"$project": {"name" : "$_id", "_id" : 0} }     ]);

To use with another column name than "name", change "$name" to "$column_name"


You can find the list of duplicate names using the following aggregate pipeline:

  • Group all the records having similar name.
  • Match those groups having records greater than 1.
  • Then group again to project all the duplicate names as an array.

The Code:

db.collection.aggregate([{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},{$match:{"count":{$gt:1}}},{$project:{"name":1,"_id":0}},{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},{$project:{"_id":0,"duplicateNames":1}}])

o/p:

{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }


The answer anhic gave can be very inefficient if you have a large database and the attribute name is present only in some of the documents.

To improve efficiency you can add a $match to the aggregation.

db.collection.aggregate(    {"$match": {"name" :{ "$ne" : null } } },     {"$group" : {"_id": "$name", "count": { "$sum": 1 } } },    {"$match": {"count" : {"$gt": 1} } },     {"$project": {"name" : "$_id", "_id" : 0} })