What are some "mental steps" a developer must take to begin moving from SQL to NO-SQL (CouchDB, FathomDB, MongoDB, etc)? What are some "mental steps" a developer must take to begin moving from SQL to NO-SQL (CouchDB, FathomDB, MongoDB, etc)? mongodb mongodb

What are some "mental steps" a developer must take to begin moving from SQL to NO-SQL (CouchDB, FathomDB, MongoDB, etc)?


Firstly, each NoSQL store is different. So it's not like choosing between Oracle or Sql Server or MySQL. The differences between them can be vast.

For example, with CouchDB you cannot execute ad-hoc queries (dynamic queries if you like). It is very good at online - offline scenarios, and is small enough to run on most devices. It has a RESTful interface, so no drivers, no ADO.NET libraries. To query it you use MapReduce (now this is very common across the NoSQL space, but not ubiquitous) to create views, and these are written in a number of languages, though most of the documentation is for Javascript. CouchDB is also designed to crash, which is to say if something goes wrong, it just restarts the process (the Erlang process, or group of linked processes that is, not the entire CouchDB instance typically).

MongoDB is designed to be highly performant, has drivers, and seems like less of a leap for a lot of people in the .NET world because of this. I believe though that in crash situations it is possible to lose data (it doesn't offer the same level of transactional guarantees around writes that CouchDB does).

Now both of these are document databases, and as such they share in common that their data is unstructured. There are no tables, no defined schema - they are schemaless. They are not like a key-value store though, as they do insist that the data you persist is intelligible to them. With CouchDB this means the use of JSON, and with MongoDB this means the use of BSON.

There are many other differences between MongoDB and CouchDB and these are considered in the NoSQL space to be very close in their design!

Other than document databases, their are network oriented solutions like Neo4J, columnar stores (column oriented rather than row oriented in how they persist data), and many others.

Something which is common across most NoSQL solutions, other than MapReduce, is that they are not relational databases, and that the majority do not make use of SQL style syntax. Typcially querying follows an imperative mode of programming rather than the declarative style of SQL.

Another typically common trait is that absolute consistency, as typically provided by relational databases, is traded for eventual models of consistency.

My advice to anyone looking to use a NoSQL solution would be to first really understand the requirements they have, understand the SLAs (what level of latency is required; how consistent must that latency remain as the solutions scales; what scale of load is anticipated; is the load consistent or will it spike; how consistent does a users view of the data need to be, should they always see their own writes when they query, should their writes be immediately visible to all other users; etc...). Understand that you can't have it all, read up on Brewers CAP theorum, which basically says you can't have absolute consistence, 100% availability, and be partition tolerant (cope when nodes can't communicate). Then look into the various NoSQL solutions and start to eliminate those which are not designed to meet your requirements, understand that the move from a relational database is not trivial and has a cost associated with it (I have found the cost of moving an organisation in that direction, in terms of meetings, discussions, etc... itself is very high, preventing focus on other areas of potential benefit). Most of the time you will not need an ORM (the R part of that equation just went missing), sometimes just binary serialisation may be ok (with something like DB4O for example, or a key-value store), things like the Newtonsoft JSON/BSON library may help out, as may automapper. I do find that working with C#3 theere is a definite cost compared to working with a dynamic language like, say Python. With C#4 this may improve a little with things like the ExpandoObject and Dynamic from the DLR.

To look at your 3 specific questions, with all it depends on the NoSQL solution you adopt, so no one answer is possible, however with that caveat, in very general terms:

  1. If persisting the object (or aggregate more likely) as a whole, your joins will typically be in code, though you can do some of this through MapReduce.

  2. Again, it depends, but with Couch you would execute a GET over HTTP against either a specific resource, or against a MapReduce view.

  3. Most likely nothing. Just keep an eye-out for the serialisation, deserialisation scenarios. The difficulty I have found comes in how you manage versions of your code. If the property is purely for pushing to an interface (GUI, web service) then it tends to be less of an issue. If the property is a form of internal state which behaviour will rely on, then this can get more tricky.

Hope it helps, good luck!


Just stop thinking about the database.

Think about modeling your domain. Build your objects to solve the problem at hand following good patterns and practices and don't worry about persistence.