Find duplicate records in MongoDB
Use aggregation on name
and get name
with count > 1
:
db.collection.aggregate([ {"$group" : { "_id": "$name", "count": { "$sum": 1 } } }, {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, {"$project": {"name" : "$_id", "_id" : 0} }]);
To sort the results by most to least duplicates:
db.collection.aggregate([ {"$group" : { "_id": "$name", "count": { "$sum": 1 } } }, {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, {"$sort": {"count" : -1} }, {"$project": {"name" : "$_id", "_id" : 0} } ]);
To use with another column name than "name", change "$name" to "$column_name"
You can find the list
of duplicate
names using the following aggregate
pipeline:
Group
all the records having similarname
.Match
thosegroups
having records greater than1
.- Then
group
again toproject
all the duplicate names as anarray
.
The Code:
db.collection.aggregate([{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},{$match:{"count":{$gt:1}}},{$project:{"name":1,"_id":0}},{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},{$project:{"_id":0,"duplicateNames":1}}])
o/p:
{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }
The answer anhic gave can be very inefficient if you have a large database and the attribute name is present only in some of the documents.
To improve efficiency you can add a $match to the aggregation.
db.collection.aggregate( {"$match": {"name" :{ "$ne" : null } } }, {"$group" : {"_id": "$name", "count": { "$sum": 1 } } }, {"$match": {"count" : {"$gt": 1} } }, {"$project": {"name" : "$_id", "_id" : 0} })