Recommended Setup for BigData Application Recommended Setup for BigData Application elasticsearch elasticsearch

Recommended Setup for BigData Application


First let's talk about Cassandra

This is a NoSQL database with eventual consistency which basically means for you that different nodes into a Cassandra cluster may have different 'snapshots' of data in the case that there is an inter cluster communication/availability problem. The data eventually will be consistent however.

Since you consider it as a 'frontend' database what you need to understand is how you will model your data. Cassandra can take advantage of indexes however you still need to defined upfront your access pattern.

Normally there is no relation between Cassandra and Hadoop (except that both are written in Java) however the Datastax distribution (enterprise version) has Hadoop support directly from Cassandra.

As a general workflow you will read/write most current data (let's say - last 24 hours) from your 'small' database that enough performance (Cassandra has excellent support for it) and you would move anything older than X (older than 24 hours) to a 'long term storage' such as Hadoop where you can run all sort of Map Reduce etc.

In regards to the text search it really depends what you need - Elastic Search is sort of competition to Solr and reverse. You can see yourself how they compare here http://solr-vs-elasticsearch.com/


As for your third question,

I think Cassandra is more like a database to save data.

Hadoop is responsible to provide a compution model to let you analyze your large data in Cassandra.So it is very helpful to combine Cassandra with Hadoop.

Also have other ways you can consider, such as combine with mongo and hadoop, for mongo has support mongo-connector between hadoop and it's data.

Also if you have some search requirements , you can also use solr, directly generated index from mongo.