mongodb get distinct records mongodb get distinct records mongodb mongodb

mongodb get distinct records


There is a distinct command in mongodb, that can be used in conjunction with a query. However, I believe this just returns a distinct list of values for a specific key you name (i.e. in your case, you'd only get the id values returned) so I'm not sure this will give you exactly what you want if you need the whole documents - you may require MapReduce instead.

Documentation on distinct:http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct


You want to use aggregation. You could do that like this:

db.test.aggregate([    // each Object is an aggregation.    {        $group: {            originalId: {$first: '$_id'}, // Hold onto original ID.            _id: '$id', // Set the unique identifier            val:  {$first: '$val'},            name: {$first: '$name'},            ttm:  {$first: '$ttm'}        }    }, {        // this receives the output from the first aggregation.        // So the (originally) non-unique 'id' field is now        // present as the _id field. We want to rename it.        $project:{            _id : '$originalId', // Restore original ID.            id  : '$_id', //             val : '$val',            name: '$name',            ttm : '$ttm'        }    }])

This will be very fast... ~90ms for my test DB of 100,000 documents.

Example:

db.test.find()// { "_id" : ObjectId("55fb595b241fee91ac4cd881"), "id" : 1, "name" : "x", "ttm" : 23, "val" : 5 }// { "_id" : ObjectId("55fb596d241fee91ac4cd882"), "id" : 1, "name" : "x", "ttm" : 34, "val" : 1 }// { "_id" : ObjectId("55fb59c8241fee91ac4cd883"), "id" : 1, "name" : "x", "ttm" : 24, "val" : 2 }// { "_id" : ObjectId("55fb59d9241fee91ac4cd884"), "id" : 2, "name" : "x", "ttm" : 56, "val" : 3 }// { "_id" : ObjectId("55fb59e7241fee91ac4cd885"), "id" : 2, "name" : "x", "ttm" : 76, "val" : 3 }// { "_id" : ObjectId("55fb59f9241fee91ac4cd886"), "id" : 3, "name" : "x", "ttm" : 54, "val" : 7 }db.test.aggregate(/* from first code snippet */)// output{    "result" : [        {            "_id" : ObjectId("55fb59f9241fee91ac4cd886"),            "val" : 7,            "name" : "x",            "ttm" : 54,            "id" : 3        },        {            "_id" : ObjectId("55fb59d9241fee91ac4cd884"),            "val" : 3,            "name" : "x",            "ttm" : 56,            "id" : 2        },        {            "_id" : ObjectId("55fb595b241fee91ac4cd881"),            "val" : 5,            "name" : "x",            "ttm" : 23,            "id" : 1        }    ],    "ok" : 1}

PROS: Almost certainly the fastest method.

CONS: Involves use of the complicated Aggregation API. Also, it is tightly coupled to the original schema of the document. Though, it may be possible to generalize this.


I believe you can use aggregate like this

collection.aggregate({   $group : {        "_id" : "$id",        "docs" : {             $first : {             "name" : "$name",            "ttm" : "$ttm",            "val" : "$val",            }        }     }});