When should I create a new collections in MongoDB? When should I create a new collections in MongoDB? mongodb mongodb

When should I create a new collections in MongoDB?


The Two Most Popular Approaches to Schema Design in MongoDB

  1. Embed data into documents and store them in a single collection.
  2. Normalize data across multiple collections.

Embedding Data

There are several reasons why MongoDB doesn't support joins across collections, and I won't get into all of them here. But the main reason why we don't need joins is because we can embed relevant data into a single hierarchical JSON document. We can think of it as pre-joining the data before we store it. In the relational database world, this amounts to denormalizing our data. In MongoDB, this is about the most routine thing we can do.

Normalizing Data

Even though MongoDB doesn't support joins, we can still store related data across multiple collections and still get to it all, albeit in a round about way. This requires us to store a reference to a key from one collection inside another collection. It sounds similar to relational databases, but MongoDB doesn't enforce any of key constraints for us like most relational databases do. Enforcing key constraints is left entirely up to us. We're good enough to manage it though, right?

Accessing all related data in this way means we're required to make at least one query for every collection the data is stored across. It's up to each of us to decide if we can live with that.

When to Embed Data

  1. Embed data when that embedded data will be accessed at the same time as the rest of the document. Pre-joining data that is frequently used together reduces the amount of code we have to write to query across multiple collections. It also reduces the number of round trips to the server.
  2. Embed data when that embedded data only pertains to that single document. Like most rules, we need to give this some thought before blindly following it. If we're storing an address for a user, we don't need to create a separate collection to store addresses just because the user might have a roommate with the same address. Remember, we're not normalizing here, so duplicating data to some degree is ok.
  3. Embed data when you need "transaction-like" writes. Prior to v4.0, MongoDB did not support transactions, though it does guarantee that a single document write is atomic. It'll write the document or it won't. Writes across multiple collections could not be made atomic, and update anomalies could occur for how many ever number of scenarios we can imagine. This is no longer the case since v4.0, however it is still more typical to denormalize data to avoid the need for transactions.

When to Normalize Data

  1. Normalize data when data that applies to many documents changes frequently. So here we're talking about "one to many" relationships. If we have a large number of documents that have a city field with the value "New York" and all of a sudden the city of New York decides to change its name to "New-New York", well then we have to update a lot of documents. Got anomalies? In cases like this where we suspect other cities will follow suit and change their name, then we'd be better off creating a cities collection containing a single document for each city.
  2. Normalize data when data grows frequently. When documents grow, they have to be moved on disk. If we're embedding data that frequently grows beyond its allotted space, that document will have to be moved often. Since these documents are bigger each time they're moved, the process only grows more complex and won't get any better over time. By normalizing those embedded parts that grow frequently, we eliminate the need for the entire document to be moved.
  3. Normalize data when the document is expected to grow larger than 16MB. Documents have a 16MB limit in MongoDB. That's just the way things are. We should start breaking them up into multiple collections if we ever approach that limit.

The Most Important Consideration to Schema Design in MongoDB is...

How our applications access and use data. This requires us to think? Uhg! What data is used together? What data is used mostly as read-only? What data is written to frequently? Let your applications data access patterns drive your schema, not the other way around.


The scope you've described is definitely not too much for "one collection". In fact, being able to store everything in a single place is the whole point of a MongoDB collection.

For the most part, you don't want to be thinking about querying across combined tables as you would in SQL. Unlike in SQL, MongoDB lets you avoid thinking in terms of "JOINs"--in fact MongoDB doesn't even support them natively.

See this slideshare: http://www.slideshare.net/mongodb/migrating-from-rdbms-to-mongodb?related=1

Specifically look at slides 24 onward. Note how a MongoDB schema is meant to replace the multi-table schemas customary to SQL and RDBMS.

In MongoDB a single document holds all information regarding a record. All records are stored in a single collection.

Also see this question:MongoDB query multiple collections at once