NoSql vs Relational database NoSql vs Relational database database database

NoSql vs Relational database


Not all data is relational. For those situations, NoSQL can be helpful.

With that said, NoSQL stands for "Not Only SQL". It's not intended to knock SQL or supplant it.

SQL has several very big advantages:

  1. Strong mathematical basis.
  2. Declarative syntax.
  3. A well-known language in Structured Query Language (SQL).

Those haven't gone away.

It's a mistake to think about this as an either/or argument. NoSQL is an alternative that people need to consider when it fits, that's all.

Documents can be stored in non-relational databases, like CouchDB.

Maybe reading this will help.


The history seem to look like this:

  1. Google needs a storage layer for their inverted search index. They figure a traditional RDBMS is not going to cut it. So they implement a NoSQL data store, BigTable on top of their GFS file system. The major part is that thousands of cheap commodity hardware machines provides the speed and the redundancy.

  2. Everyone else realizes what Google just did.

  3. Brewers CAP theorem is proven. All RDBMS systems of use are CA systems. People begin playing with CP and AP systems as well. K/V stores are vastly simpler, so they are the primary vehicle for the research.

  4. Software-as-a-service systems in general do not provide an SQL-like store. Hence, people get more interested in the NoSQL type stores.

I think much of the take-off can be related to this history. Scaling Google took some new ideas at Google and everyone else follows suit because this is the only solution they know to the scaling problem right now. Hence, you are willing to rework everything around the distributed database idea of Google because it is the only way to scale beyond a certain size.

C - Consistency
A - Availability
P - Partition tolerance
K/V - Key/Value


NoSQL is better than RDBMS because of the following reasons/properities of NoSQL

  1. It supports semi-structured data and volatile data
  2. It does not have schema
  3. Read/Write throughput is very high
  4. Horizontal scalability can be achieved easily
  5. Will support Bigdata in volumes of Terra Bytes & Peta Bytes
  6. Provides good support for Analytic tools on top of Bigdata
  7. Can be hosted in cheaper hardware machines
  8. In-memory caching option is available to increase the performance of queries
  9. Faster development life cycles for developers

EDIT:

To answer "why RDBMS cannot scale", please take a look at RDBMS Overheads pdf written by Stavros Harizopoulos,Daniel J. Abadi,Samuel Madden and Michael Stonebraker

RDBMS's have challenges in handling huge data volumes of Terabytes & Peta bytes. Even if you have Redundant Array of Independent/Inexpensive Disks (RAID) & data shredding, it does not scale well for huge volume of data. You require very expensive hardware.

Logging: Assembling log records and tracking down all changes in database structures slows performance. Logging may not be necessary if recoverability is not a requirement or if recoverability is provided through other means (e.g., other sites on the network).

Locking: Traditional two-phase locking poses a sizeable overhead since all accesses to database structures are governed by a separate entity, the Lock Manager.

Latching: In a multi-threaded database, many data structures have to be latched before they can be accessed. Removing this feature and going to a single-threaded approach has a noticeable performance impact.

Buffer management: A main memory database system does not need to access pages through a buffer pool, eliminating a level of indirection on every record access.

This does not mean that we have to use NoSQL over SQL.

Still, RDBMS is better than NoSQL for the following reasons/properties of RDBMS

  1. Transactions with ACID properties - Atomicity, Consistency, Isolation & Durability
  2. Adherence to Strong Schema of data being written/read
  3. Real time query management ( in case of data size < 10 Tera bytes )
  4. Execution of complex queries involving join & group by clauses

We have to use RDBMS (SQL) and NoSQL (Not only SQL) depending on the business case & requirements