MongoDB One to Many Relationship MongoDB One to Many Relationship mongodb mongodb

MongoDB One to Many Relationship


The problem is that you over normalize your data. An order is defined by a customer, who lives at a certain place at the given point in time, pays a certain price valid at the time of the order (which might heavily change over the application lifetime and which you have to document anyway and several other parameters which are all valid only in a certain point of time. So to document an order (pun intended), you need to persist all data for that certain point in time. Let me give you an example:

{ _id: "order123456789",  date: ISODate("2014-08-01T16:25:00.141Z"),  customer: ObjectId("53fb38f0040980c9960ee270"),  items:[ ObjectId("53fb3940040980c9960ee271"),          ObjectId("53fb3940040980c9960ee272"),          ObjectId("53fb3940040980c9960ee273")         ], Total:400 }

Now, as long as neither the customer nor the details of the items change, you are able to reproduce where this order was sent to, what the prices on the order were and alike. But now what happens if the customer changes it's address? Or if the price of an item changes? You would need to keep track of those changes in their respective documents. It would be much easier and sufficiently efficient to store the order like:

{  _id: "order987654321",  date: ISODate("2014-08-01T16:25:00.141Z"),  customer: {               userID: ObjectId("53fb3940040980c9960ee283"),               recipientName: "Foo Bar"               address: {                          street: "742 Evergreen Terrace",                          city: "Springfield",                          state: null                         }            },  items: [    {count:1, productId:ObjectId("53fb3940040980c9960ee300"), price: 42.00 },    {count:3, productId:ObjectId("53fb3940040980c9960ee301"), price: 0.99},    {count:5, productId:ObjectId("53fb3940040980c9960ee302"), price: 199.00}  ]}

With this data model and the usage of aggregation pipelines, you have several advantages:

  1. You don't need to independently keep track of prices and addresses or name changes or gift buys of a customer - it is already documented.
  2. Using aggregation pipelines, you can create a price trends without the need of storing pricing data independently. You simply store the current price of an item in an order document.
  3. Even complex aggregations such as price elasticity, turnover by state / city and alike can be done using pretty simple aggregations.

In general, it is safe to say that in a document oriented database, every property or field which is subject to change in the future and this change would create a different semantic meaning should be stored inside the document. Everything which is subject to change in the future but doesn't touch the semantic meaning (the users password in the example) may be linked via a GUID.


One to Many Relations

In this relationship, there is many, many entities or many entities that map to the one entity. e.g.: - a city have many persons who live in that city. Say NYC have 8 million people.

Let's assume the below data model:

   //city  {  _id: 1,  name: 'NYC',  area: 30,  people: [{      _id: 1,      name: 'name',      gender: 'gender'        .....    },    ....    8 million people data inside this array    ....  ]}

This won't work because that's going to be REALLY HUGE. Let's try to flip the head.

 //people { _id: 1, name: 'John Doe', gender: gender, city: {     _id: 1,     name: 'NYC',     area: '30'       .....   }}

Now the problem with this design is that if there are obviously multiple people living in NYC, so we've done a lot of duplication for city data.

Probably, the best way to model this data is to use true linking.

 //people { _id: 1, name: 'John Doe', gender: gender, city: 'NYC'}//city{_id: 'NYC',...}

In this case, people collection can be linked to the city collection. Knowing we don't have foreign key constraints, we've to be consistent about it. So, this is a one to many relation. It requires 2 collections. For small one to few (which is also one to many), relations like blog post to comments. Comments can be embedded inside post documents as an array.

So, if it's truly one to many, 2 collections works best with linking. But for one to few, one single collection is generally enough.