120 mongodb collections vs single collection - which one is more efficient? 120 mongodb collections vs single collection - which one is more efficient? mongodb mongodb

120 mongodb collections vs single collection - which one is more efficient?


Single Sharded Collection

The edited version of the question makes the actual requirement clearer: you have a collection that can potentially grow very large and you want an approach to partition the data. The artificial collection limit is your own planned partitioning scheme.

In that case, I think you would be best off using a single collection and taking advantage of MongoDB's auto-sharding feature to distribute the data and workload to multiple servers as required. Multiple collections is still a valid approach, but unnecessarily complicates your application code & deployment versus leveraging core MongoDB features. Assuming you choose a good shard key, your data will be automatically balanced across your shards.

You can do not have to shard immediately; you can defer the decision until you see your workload actually requiring more write scale (but knowing the option is there when you need it). You have other options before deciding to shard as well, such as upgrading your servers (disks and memory in particular) to better support your workload. Conversely, you don't want to wait until your system is crushed by workload before sharding so you definitely need to monitor the growth. I would suggest using the free MongoDB Monitoring Service (MMS) provided by 10gen.

On another website someone suggested using many databases instead of many collections. But this means overhead and then I would have to use / manage many different connections.

Multiple databases will add significantly more administrative overhead, and would likely be overkill and possibly detrimental for your use case. Storage is allocated at the database level, so 120 databases would be consuming much more space than a single database with 120 collections.

Fixed number of collections (original answer)

If you can plan for a fixed number of collections (120 as per your original question description), I think it makes more sense to take this approach rather than using a monolithic collection.

NOTE: the design considerations below still apply, but since the question was updated to clarify that multiple collections are an attempted partitioning scheme, sharding a single collection would be a much more straightforward approach.

The motivations for using separate collections would be:

  • Your documents for a single large collection will likely have to include some indication of the collection subtype, which may need to be added to multiple indexes and could significantly increase index sizes. With separate collections the subtype is already implicit in the collection namespace.

  • Sharding is enabled at the collection level. A single large collection only gives you an "all or nothing" approach, whereas individual collections allow you to control which subset(s) of data need to be sharded and choose more appropriate shard keys.

  • You can use the compact to command to defragment individual collections. Note: compact is a blocking operation, so the normal recommendation for a HA production environment would be to deploy a replica set and use rolling maintenance (i.e. compact the secondaries first, then step down and compact the primary).

  • MongoDB 2.4 (and 2.2) currently have database-level write lock granularity. In practice this has not proven a problem for the vast majority of use cases, however multiple collections would allow you to more easily move high activity collections into separate databases if needed.

  • Further to the previous point .. if you have your data in separate collections, these will be able to take advantage of future improvements in collection-level locking (see SERVER-1240 in the MongoDB Jira issue tracker).


The main problem here is that you will gain very little performance in the current MongoDB versions if you separate out collections into the same database. To get any sort of extra performance over a single collection setup you would need to move the collections out into separate databases, then you will have operational overhead for judging what database you should query etc.

So yes, you could go for 120 collections easily however, you won't really gain anything currently due to: https://jira.mongodb.org/browse/SERVER-1240 not being implemented (anytime soon).

Housing billions of documents in a single collection isn't too bad. I presume that even if you was to house this in separate collections it probably would not be on a single server either, just like sharding a single collection, so any speed reduction due to multi server setup will also not matter in this case.

In my personal opinion, using a single collection is easier on everything.