HBase cassandra couchdb mongodb..any fundamental difference? HBase cassandra couchdb mongodb..any fundamental difference? mongodb mongodb

HBase cassandra couchdb mongodb..any fundamental difference?


Those are some long answers from @Bohzo. (but they are good links)

The truth is, they're "kind of" competing. But they definitely have different strengths and weaknesses and they definitely don't all solve the same problems.

For example Couch and Mongo both provide Map-Reduce engines as part of the main package. HBase is (basically) a layer over top of Hadoop, so you also get M-R via Hadoop. Cassandra is highly focused on being a Key-Value store and has plug-ins to "layer" Hadoop over top (so you can map-reduce).

Some of the DBs provide MVCC (Multi-version concurrency control). Mongo does not.

All of these DBs are intended to scale horizontally, but they do it in different ways. All of these DBs are also trying to provide flexibility in different ways. Flexible document sizes or REST APIs or high redundancy or ease of use, they're all making different trade-offs.

So to your question: In other words, are they all competing in the exact same market and trying to solve the exact same problems?

  1. Yes: they're all trying to solve the issue of database-scalability and performance.
  2. No: they're definitely making different sets of trade-offs.

What should you start with?

Man, that's a tough question. I work for a large company pushing tons of data and we've been through a few years. We tried Cassandra at one point a couple of years ago and it couldn't handle the load. We're using Hadoop everywhere, but it definitely has a steep learning curve and it hasn't worked out in some of our environments. More recently we've tried to do Cassandra + Hadoop, but it turned out to be a lot of configuration work.

Personally, my department is moving several things to MongoDB. Our reasons for this are honestly just simplicity.

Setting up Mongo on a linux box takes minutes and doesn't require root access or a change to the file system or anything fancy. There are no crazy config files or java recompiles required. So from that perspective, Mongo has been the easiest "gateway drug" for getting people on to KV/Document stores.


Short answer: test before you use in production.

I can offer my experience with both HBase (extensive) and MongoDB (just starting).

Even though they are not the same kind of stores, they solve the same problems:

  • scalable storage of data
  • random access to the data
  • low latency access

We were very enthusiastic about HBase at first. It is built on Hadoop (which is rock-solid), it is under Apache, it is active... what more could you want? Our experience:

  • HBase is fragile
  • administrator's nightmare (full of configuration settings where default ones are less than perfect, nontransparent configuration, changes from version to version,...)
  • loses data (unless you have set the X configuration and changed Y to... you get the point :) - we found that out when HBase crashed and we lost 2 hours (!!!) of data because WAL was not setup properly
  • lacks secondary indexes
  • lacks any way to perform a backup of database without shutting it down

All in all, HBase was a nightmare. Wouldn't recommend it to anyone except to our direct competitors. :)

MongoDB solves all these problems and many more. It is a delight to setup, it makes administrating it a simple and transparent job and the default configuration settings actually make sense. You can perform (hot) backups, you can have secondary indexes. From what I read, I wouldn't recommend MapReduce on MongoDB (JavaScript, 1 thread per node only), but you can use Hadoop for that.

And it is also VERY active when compared to HBase.

Also:http://www.google.com/trends?q=HBase%2CMongoDB

Need I say more? :)

UPDATE: many months later I must say MongoDB delivered on all accounts and more. The only real downside is that hosting companies do not offer it the way they offer MySQL. ;) It also looks like MapReduce is bound to become multi-threaded in 2.2. Still, I wouldn't use MR this way. YMMV.