Hbase for real-time application Hbase for real-time application hadoop hadoop

Hbase for real-time application


HBase is chosen based on the following in general:

Volume : millions and billions is better than thousands and millions

Features : When you do not need transactions, secondary indexes and some RDBMS features

Hardware : Make sure you have sufficient hardware for region servers. It involves good amount of maintenance

More specific:

Its best suited for web applications due to its fast random read queries. But this only comes with very good row key design. This involves you planning out your end queries well in advance and design your row key. Special care needs to be take in row key desing if you also have time based data and your queries heavily depend on it. In short, you should avoid hot spotting. Some info here

Apart from this, selection by other columns values is possible using HBase filters, but very few selections and may not guarantee the web apps response times.

Also, if your data set(rows) have variable number of columns and also you do not need all columns in your queries, HBase is again the best choice

Server(Region) failover is possible in HBase - so your data would be safe.

It can be used both for batch and streaming. Ofcourse, for streaming its the best possible in Big Data stack. However this also depends on your streaming pipeline - like kafka, spark streaming or storm etc.

Since you mentioned Phoenix, I assume you might want to stick to sql view of HBase - this might give you better options. However at the core, row key design is still at the heart of HBase performance