Remove duplicate in MongoDB

Yes, dropDups is gone for good. But you can definitely achieve your goal with little bit effort.

You need to first find all duplicate rows and then remove all except first.

db.dups.aggregate([{$group:{_id:"$contact_id", dups:{$push:"$_id"}, count: {$sum: 1}}},{$match:{count: {$gt: 1}}}]).forEach(function(doc){  doc.dups.shift();  db.dups.remove({_id : {$in: doc.dups}});});

As you see doc.dups.shift() will remove first _id from array and then remove all documents with remaining _ids in dups array.

script above will remove all duplicate documents.

mongodb mongodb-query duplicates

this is a good pattern for mongod 3+ that also ensures that you will not run our of memory which can happen with really big collections. You can save this to a dedup.js file, customize it, and run it against your desired database with: mongo localhost:27017/YOURDB dedup.js

var duplicates = [];db.runCommand(  {aggregate: "YOURCOLLECTION",    pipeline: [      { $group: { _id: { DUPEFIELD: "$DUPEFIELD"}, dups: { "$addToSet": "$_id" }, count: { "$sum": 1 } }},      { $match: { count: { "$gt": 1 }}}    ],    allowDiskUse: true }).result.forEach(function(doc) {    doc.dups.shift();    doc.dups.forEach(function(dupId){ duplicates.push(dupId); })})printjson(duplicates); //optional print the list of duplicates to be removeddb.YOURCOLLECTION.remove({_id:{$in:duplicates}});

mongodb mongodb-query duplicates

We can also use an $out stage to remove duplicates from a collection by replacing the content of the collection with only one occurrence per duplicate.

For instance, to only keep one element per value of x:

// > db.collection.find()//     { "x" : "a", "y" : 27 }//     { "x" : "a", "y" : 4  }//     { "x" : "b", "y" : 12 }db.collection.aggregate(  { $group: { _id: "$x", onlyOne: { $first: "$$ROOT" } } },  { $replaceWith: "$onlyOne" }, // prior to 4.2: { $replaceRoot: { newRoot: "$onlyOne" } }  { $out: "collection" })// > db.collection.find()//     { "x" : "a", "y" : 27 }//     { "x" : "b", "y" : 12 }

This:

$groups documents by the field defining what a duplicate is (here x) and accumulates grouped documents by only keeping one (the $first found) and giving it the value $$ROOT, which is the document itself. At the end of this stage, we have something like:
```
{ "_id" : "a", "onlyOne" : { "x" : "a", "y" : 27 } }{ "_id" : "b", "onlyOne" : { "x" : "b", "y" : 12 } }
```
$replaceWith all existing fields in the input document with the content of the onlyOne field we've created in the $group stage, in order to find the original format back. At the end of this stage, we have something like:
```
{ "x" : "a", "y" : 27 }{ "x" : "b", "y" : 12 }
```
$replaceWith is only available starting in Mongo 4.2. With prior versions, we can use $replaceRoot instead:
```
{ $replaceRoot: { newRoot: "$onlyOne" } }
```
$out inserts the result of the aggregation pipeline in the same collection. Note that $out conveniently replaces the content of the specified collection, making this solution possible.

CodeHunter

Remove duplicate in MongoDB

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last