Hadoop, Hive, Pig, HBase, Cassandra - when to use what? [closed] Hadoop, Hive, Pig, HBase, Cassandra - when to use what? [closed] hadoop hadoop

Hadoop, Hive, Pig, HBase, Cassandra - when to use what? [closed]


Your guesses are somewhat accurate.

By Hadoop, I guess you are referring to MapReduce? Hadoop as such is an ecosystem which consists of many components (including MapReduce, HDFS, Pig and Hive).

MapReduce is good when you need to write the logic for processing data at the Map() and Reduce() method level. In my work, I find MapReduce very useful when I'm dealing with data that is unstructured & needs to be cleansed.

Hive,Pig: They are good for batch processes, running periodically (maybe in terms of hours or days)

HBase & Cassandra: Support low latency calls. So they can be used for real time applications, where response time is key. Have a look at this discussion to get a better idea about HBase vs Cassandra.