When to use CouchDB vs RDBMS [closed] When to use CouchDB vs RDBMS [closed] database database

When to use CouchDB vs RDBMS [closed]


I recently attended the NoSQL conference in London and think I have a better idea now how to answer the original question. I also wrote a blog post, and there are a couple of other good ones.

Key points:

  • We have accumulated probably 30 years knowledge of adminstering relational databases, so shouldn't replace them without careful consideration; non-relational data stores are less mature than relational ones, and so are inherently more risky to adopt
  • There are different types of non-relational data store; some are key-value stores, some are document stores, some are graph databases
  • You could use a hybrid approach, e.g. a combination of RDBMS and graph data store for a social software site
  • Document data stores (e.g. CouchDB and MongoDB) are probably the closest to relational databases and provide a JSON data structure with all the fields presented hierarchically which avoids having to do table joins, and (some might argue) is an improvement on the traditional object-relational mapping that most applications currently use
  • Non-relational databases support replication (including master-master); relational databases support replication too but it may not be as comprehensive as the non-relational option
  • Very large sites such as Twitter, Digg and Facebook use Cassandra, which is built from the ground up to support clustering
  • Relational databases are probably suitable for 90% of cases

In summary, consensus seems to be "proceed with caution".


Until someone gives a more in-depth answer, here are some pros and cons for CouchDB

Pros:

  • you don't need to fit your data into one of those pesky higher-order normal forms
  • you can change the "schema" of your data at any time
  • your data will be indexed exactly for your queries, so you will get results in constant time.

Cons:

  • you need to create views for each and every query, i.e. ad-hoc like queries (such as concatenating dynamic WHERE's and SORT's in an SQL) queries are not available.
  • you will either have redundant data, or you will end up implementing join and sort logic yourself on "client-side" (e.g. sorting a many-to-many relationship on multiple fields)

Pros or Cons:

  • creating your views are not as straightforward as in SQL, it's more like solving a puzzle. Depends on your type if this is a pro or a con :)


CouchDB is one of several available 'key/value stores', others include oldies like BDB, web-oriented ones like Persevere, MongoDB and CouchDB, new super-fast like memcached (RAM-only) and Tokyo Cabinet, and huge stores like Hadoop and Google's BigTable (MongoDB also claims to be on this space).

There's certainly space for both key/value stores and relational DBs. Traditionally, most RDBs are considered a layer above key/value. For example, MySQL used to use BDB as an optional backend for tables. In short, key/values know nothing about fields and relationships, which are the foundations of SQL.

Key/value stores typically are easier to scale, which makes them an attractive choice when growing explosively, like Twitter did. Of course, that means that any relationships between the stored values have to be managed on your code, instead of just declared in SQL. CouchDB's approach is to store big 'documents' in the value part, making them (mostly) self contained, so you can get most of the needed data in a single query. Many use cases fit on this idea, others don't.

The current theme I see is that after the "Rails doesn't scale!!" scare, now many people is realizing that it's not about your web framework; but about intelligent cacheing, to avoid hitting the database, and even the webapp when possible. The rising star there is memcached.

As always, it all depends on your needs.