Cassandra and Pig integration - Is hadoop optional? Cassandra and Pig integration - Is hadoop optional? hadoop hadoop

Cassandra and Pig integration - Is hadoop optional?


Hadoop is only optional when you are testing things out. In order to do anything at any scale you will need hadoop as well.

Running without hadoop means you are running pig in local mode. Which basically means all the data is processed by the same pig process that you are running in. This works fine with a single node and example data.

When running with any significant amount of data or multiple machines you want to run pig in hadoop mode. By running hadoop task trackers on your cassandra nodes pig can take advantage of the benefits map reduce provides by distributing the workload and using data locality to reduce network transfer.


It's optional. Cassandra has its own implementation of pig's LoadFunc and storeFunc which allow u to query and store.

Hadoop and Cassandra are different in many ways. It's hard to say what you lose without knowing what exactly u r trying to accomplish.