Large Data Sets - NoSQL, NewSQL, SQL..? Brain Fried

mysql nosql hadoop cassandra hbase

You need to think carefuly about what types of queries you will need to run over these docs. Cassandra etc may well be a good fit if your queries are basic, but richer SQL-like queries are not possible. The largest Cassandra deployments are of the order of 150TB, so your data volumes should not be a problem; but Cassandra performance may be overkill and will sacrifice query richness.

If you just want text indexing, then also consider Lucene, as I think for batch indexing Lucene can now handle over 100 GB/hour, so overnight indexing of 1TB would be possible - and Lucene now claims comparable speeds for incremental indexing too...

mysql nosql hadoop cassandra hbase

Checkout RavenDB. It is a document DB supporting Map/Reduce, which is based on Lucene and therefore can also provide full-text search capabilities natively from the querying API.

Sharding and replication capabilities are built-in, and very advanced. Using Esent as storage, each node can store up to 16TB of data.

mysql nosql hadoop cassandra hbase

Database mainly depends on your use cases. I will suggest you to go with cassandra or hbase.

For real time analysis over cassandra you can use Apache spark and spark streaming all are work well.

Also try Elastic search or solar search for text searching. All this are open source and very good to try.

For real time analysis you can have look to facebook opensource Prestodb as well but i didn't found much information needed apart from presto website and most of people suggesting to go with cassandra with apache spark.

CodeHunter

Large Data Sets - NoSQL, NewSQL, SQL..? Brain Fried

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last