MongoDB beginner - to normalize or not to normalize? MongoDB beginner - to normalize or not to normalize? database database

MongoDB beginner - to normalize or not to normalize?


Try this approach:

Work out which entity (or entities) are the hero(s)

With 'hero', I mean the entity(s) that the database is centered around. Let's take your example. The hero of the real-estate example is the house*.

Work out the ownerships

Go through the other entities, such as the owner, agency, images and reviews and ask yourself whether it makes sense to place their information together with the house. Would you have a cascading delete on any of the foreign keys in your relational database? If so, then that implies ownership.

Work out whether it actually matters that data is de-normalised

You will have agency (and probably owner) details spread across multiple houses. Does that matter?

Your house collection will probably look like this:

house: {owner,agency,images[], // recommend references to GridFS herereviews[] // you probably won't get too many of these for a single house}

*Actually, it's probably the ad of the house (since houses are typically advertised on a real-estate website and that's probably what you're really interested in) so just consider that


Sarah Mei wrote an informative article about the kinds of issues that can arise with data integrity in nosql dbs. The choice between duplicate data or using id's, code based joins and the challenges with keeping data integrity. Her take is that any nosql db with code based joins will lose data integrity at some point. Imho the articles comments are as valuable as the article itself in understanding these issues and possible resolutions.

Link: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/comment-page-1/


I would just like to give a normalization refresher from the MongoDB's perspective -

What are the goals of normalization?

  • Frees the database from modification anomalies - For MongoDB, it looks like embedding data would mostly cause this. And in fact, we should try to avoid embedding data in documents in MongoDB which possibly create these anomalies. Occasionally, we might need to duplicate data in the documents for performance reasons. However that's not the default approach. The default is to avoid it.
  • Should minimize re-design when extending - MongoDB is flexible enough because it allows addition of keys without re-designing all the documents
  • Avoid bias toward any particular access pattern - this is something, we're not going to worry about when describing schema in MongoDB. And one of the ideas behind the MongoDB is to tune up your database to the applications that we're trying to write and the problem we're trying to solve.