what should be considered before choosing hbase? what should be considered before choosing hbase? hadoop hadoop

what should be considered before choosing hbase?


One of my favourite book describes..

These are points to make note before taking decision

Coming to @Whitefret's last point : There is some thing called CAP theorm based on which decision can be taken.enter image description here

  • Consistency (all nodes see the same data at the same time)

  • Availability (every request receives a response about whether it succeeded or failed)

  • Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures)

In this context Hbase supports CP

However, for switching RDBMS to HBASE you can use SQOOP.


It's a difficult question, there are many things to consider.

  1. Can you optimize your RDBMS? Adding indexes, denormalization of joins that cost too much ... There are many path to consider and I am no expert.
  2. Is your data big? This is very vague, and you have a space between RDBMS and Big Data where you can't be sure which one to use. Millions of rows can still be handled by RDBMS efficiently.
  3. Do you need relation in you data? NoSQL database don't use relation, this can be hard for people from a SQL background. There are frameworks that gives SQL to HBase, but it is a bad idea in general to have a RDBMS model when using Big Data

If you can answer those questions and you think NoSQL is the drill, ask your team how they feel about it. NoSQL database comes with problem you would never meet in the SQL world. They should build a prototype first to understand how all this works, and maybe make some training available for them.

In Summary:
- Find if you need non relational database
- Choose the right one (is Hbase really what you need?, why not consider Cassandra or MongoDB?)


HBase like all NoSQL DB come with great new features but sadly nothing is free (not even mentionning the money cost).

In HBase, you really should check if all the query that you might want to do can be fullfilled with the HBase data model. An important thing to consider is the schema design (the modelisation of the rowkey most and foremost).I advice you to read this really good paper :

http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf

I think that a really good answer to your question can be found on the HBase official site.

"HBase isn’t suitable for every problem.

First, make sure you have enough data. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.

Second, make sure you can live without all the extra features that an RDBMS provides (e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.) An application built against an RDBMS cannot be "ported" to HBase by simply changing a JDBC driver, for example. Consider moving from an RDBMS to HBase as a complete redesign as opposed to a port.

Third, make sure you have enough hardware. Even HDFS doesn’t do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.

HBase can run quite well stand-alone on a laptop - but this should be considered a development configuration only. "

https://hbase.apache.org/book.html