Relational to NoSQL Database Relational to NoSQL Database mongodb mongodb

Relational to NoSQL Database


First, NoSQL is not one size fits all. In SQL, almost every 1:N and M:N relation is modeled in the same way. The NoSQL philosophy is that the way you model the data depends on the data and its use patterns.

Second, I agree with Mark Baker: Scaling is hard, and it's achieved by loosening constraints. It's not a technology matter. I love working with MongoDB, but for other reasons (no need to code ugly SQL; no need for complicated, bloated ORM; etc.)

Now let's review your options:Option 1 copies more data than needed. You will often have to denormalize some data, but never all of it. If so, it's cheaper to fetch the referenced object.

Option 2/3 they are very similar. The key here is: who's writing? You don't want a lot of clients having write-access to the same document, because that will force you to use a locking mechanism, and/or restrict yourself to modifier operations only. Therefore, option 2 is probably better than 3. However, if A attacks B, they'd also trigger a write to user B, so you have to make sure your writes are safe.

Option 4 Partial denormalization: Your user object seems to be most important, so how about this:

user {  battles : [ {"Name" : "The battle of foo", "Id" : 4354 }, ... ] ...}

This will make it easier to show e.g. a user dashboard, because you don't need to know all the details in the dashboard. Note: the data structure is then coupled to details of the presentation.

Option 5 Data on edges. Often, the relation needs to hold data as well:

user { battles : [ {"Name" : "The battle of foo", "unitsLost" : 54, "Id" : 34354 }, ... ]}

here, unitsLost is specific to the user and the battle, hence the data sits on the edge of the graph. Contrary to the battle's name, this data is not denormalized.

Option 6 Linker collections. Of course, such 'edge-data' can grow huge and might even call for a separate collection (linker collection). This fully eliminates the problem of access locks:

user {   "_id" : 3443}userBattles {  userId : 3443,  battleId : 4354,  unitsLost : 43,  itemsWon : [ <some list > ],  // much more data}

Which of these is best depends on a lot of details of your application. If users make a lot of clicks (i.e. you have a fine-grained interface), it makes sense to split up objects like in option 4 or 6. If you really need all data in one batch, partial denormalization doesn't help, so option 2 would be preferable. Keep in mind the multiple writer problem.


Option 2 is the way to go.

If you would do it in a RDB, at some point in time (when you have to start scaling horizontally), you would also need to start removing SQL joins and join data on application level.

Even 10gen recommends using "manual" reference ids: http://www.mongodb.org/display/DOCS/Database+References