Denormalization of data in MongoDB Denormalization of data in MongoDB database database

Denormalization of data in MongoDB


In the SQL world you try to normalize the data

Not always, normalising to the point of death inflicts performance hits but it is true that I personally do not apply the same normalisation to MongoDB as I do SQL.

If you are aware of the normalised forms ( http://en.wikipedia.org/wiki/Database_normalization ) I like to think MongoDB as going to 1NF and then back down to denormalised again.

You don't care if the data is duplicated?

Oh yes we do. Updating is a pain if the data is duplicated wrong.

Let me give you an example: category and product would be two separate entities, there is no denying it. These two entities are normalised (the repeating data of product has been spearated from category). Another way of thinking of it is: Are all products only going to exist in one category?

So on top level entities, as you can see, the same rules relatively apply with 1NF easily being applied to MongoDB.

On the front of duplication you, of course, would not want to store each product separately within each category (I answered no to the question above) so you would naturally want to separate catgeories and products.

You would normally have a many-to-many relationship here with a middle normalised table. This is where de-normalisation can come in. You can say that a category will have a list of products that are unique to that category as such you could de-normalise the many-to-many relational table into the category row as a list (or the other way around into the product row). This will not generate duplication since that list is unique to that category (more than likely). This of course means that the category or products would house a list _ids of the related row instead of the object itself.

There are times where duplication is nessecary, mainly for optimisation or work arounds for not having JOINs; this rule also applies to SQL as well if you have ever done a big enough site.

Typical usage scenarios of duplication is aggregation fields of stats like a Facebook posts shares and comments and maybe even the 5 latest comments of that post would also be duplicated onto the post row.

So it is not a case of ignoring schema design but more of tuning it for MongoDBs characteristics. Normally if you do that you will find that you, naturally, design a good schema.

As an added reference you can refer here: http://docs.mongodb.org/manual/core/data-modeling