Referencing the whole document in MongoDB Aggregation Pipeline
Use the $$ROOT
variable:
References the root document, i.e. the top-level document, currently being processed in the aggregation pipeline stage.
There is currently no mechanism to access the full document in aggregation framework, if you only needed a subset of fields, you could do:
db.tweets.aggregate([ {$group: { _id: '$clusters.clusterID', members: {$addToSet : { user: "$user", text: "$text", // etc for subset // of fields you want } } } } ] )
Don't forget with a few hundred thousand tweets, aggregating the full document will run you into the 16MB limit for returned aggregation framework result document.
You can do this via MapReduce like this:
var m = function() { emit(this.clusters.clustersID, {members:[this]});}var r = function(k,v) { res = {members: [ ] }; v.forEach( function (val) { res.members = val.members.concat(res.members); } ); return res;}db.tweets.mapReduce(m, r, {out:"output"});
I think MapReduce more useful for this task.
As written in the comments by Asya Kamsky, my example is incorrect for mongodb, please use official docs for mongoDB.