Sub-query in MongoDB

No Joins in here, just us bears

So MongoDB "does not do joins". You might have tried something like this in the shell for example:

db.events.find().forEach(function(event) {    event.user = db.user.findOne({ "_id": eventUser });    printjson(event)})

But this does not do what you seem to think it does. It actually does exactly what it looks like and, runs a query on the "user" collection for every item that is returned from the "events" collection, both "to and from" the "client" and is not run on the server.

For the same reasons your 'embedded' statement within an aggregation pipeline does not work like that. Unlike the above the "whole pipeline" logic is sent to the server before execution. So if you did something like this to 'select "UK" users:

db.events.aggregate([    { "$match": {        "user": {             "$in": db.users.distinct("_id",{ "country": "UK" })        }    }}])

Then that .distinct() query is actually evaluated on the "client" and not the server and therefore not having availability to any document values in the aggregation pipeline. So the .distinct() runs first, returns it's array as an argument and then the whole pipeline is sent to the server. That is the order of execution.

Correcting

You need at least some level of de-normalization for the sort of query you want to run to work. So you generally have two choices:

Embed your whole user object data within the event data.
At least embed "some" of the user object data within the event data. In this case "country" becasue you are going to use it.

So then if you follow the "second" case there and at least "extend" your existing data a little to include the "country" like this:

{    "_id": ObjectId("yyyyy"),     "createdAt": ISODate(),     "user": {        "_id": ObjectId("xxxxx"),        "country": "UK"    }}

Then the "aggregation" process becomes simple:

db.events.aggregate([    { "$match": {        "createdAt": { "$gte": ISODate("2013-01-01T00:00:00Z") },        "user": { "$exists": true }    }},    { "$group": {        "_id": {            "year": { "$year": "$createdAt" },            "user_id": "$user._id"            "country": "$user.country"        },        "count": { "$sum": 1 }    }},    { "$group": {        "_id": "$_id.country",        "total": { "$sum": "$count" },        "distinct": { "$sum": 1 }    }}])

We're not normal

Fixing your data to include the information it needs on a single collection where we "do not do joins" is a relatively simple process. Just really a variant on the original query sample above:

var bulk = db.events.intitializeUnorderedBulkOp(),    count = 0;db.users.find().forEach(function(user) {    // update multiple events for user    bulk.find({ "user": user._id }).update({        "$set": { "user": { "_id": user._id, "country": user.country } }    });    count++;    // Send batch every 1000     if ( count % 1000 == 0 ) {        bulk.execute();        bulk = db.events.intitializeUnorderedBulkOp();    }});// Clear any queuedif ( count % 1000 != 0 )    bulk.execute();

So that's what it's all about. Individual queries to a MongoDB server get "one collection" and "one collection only" to work with. Even the fantastic "Bulk Operations" as shown above can still only be "batched" on a single collection.

If you want to do things like "aggregate on related properties", then you "must" contain those properties in the collection you are aggregating data for. It is perfectly okay to live with having data sitting in separate collections, as for instance "users" would generally have more information attached to them than just and "_id" and a "country".

But the point here is if you need "country" for analysis of "event" data by "user", then include it in the data as well. The most efficient server join is a "pre-join", which is the theory in practice here in general.

CodeHunter

Sub-query in MongoDB

No Joins in here, just us bears

Correcting

We're not normal

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last