Multi-node Hadoop cluster with Docker Multi-node Hadoop cluster with Docker hadoop hadoop

Multi-node Hadoop cluster with Docker


As of September 2016 there is no quick answer.

https://github.com/Lewuathe/docker-hadoop-cluster does not seem like a good start, as it should be universal for your B. option

Keep an eye on https://github.com/sequenceiq/hadoop-docker and https://github.com/kiwenlau/hadoop-cluster-docker


To address your question C., you may want to check out BlueData's software platform: http://www.bluedata.com/blog/2015/06/docker-containers-big-data-clusters

It's designed to run multi-node Hadoop clusters in a Docker-based environment and there is a free version available for download (you can also run it in an AWS EC2 instance).


This work has already been done for you, actually:

https://hub.docker.com/r/cloudera/clusterdock/

It includes a pre-packaged multi-node CDH cluster, with Cloudera Manager as an optional component for cluster management et al.