Hadoop on Kubernetes vs Standard Hadoop Hadoop on Kubernetes vs Standard Hadoop kubernetes kubernetes

Hadoop on Kubernetes vs Standard Hadoop


As people have said, "the only difference is you are in kubernetes/container". The reality is that means a couple of huge things in terms of actual operation:

  • The helm chart linked above is a toy.
    • It builds vanilla hadoop (i.e. not HDP or CDH)
    • It doesn't do HA namenodes
    • It doesn't do kerberos
  • You have to manage your own volumes
    • If you are running on a public cloud this isn't a super big deal, as you can dynamically get storage

So unless you just want a super lightweight hdfs deployment, or you are comfortable/willing to build out your own deployment of a more sophisticated k8s hadoop deployment, or you are willing to pay for a 3rd party kubernetes stack with hadoop support (e.g. robin.io), I would say that in general it is not worth running on k8s at this point.

Note that if/when the hadoop vendors make their own operator, this might change.


  • Standard hadoop is just hadoop with map-reduce , spark etc and backed by HDFS

  • Hadoop on kubernetes is just standard Hadoop as above , but running on Kubernetes

In case of Hadoop on K8S , you get all the benefits that kubernetes usually offers over traditional infrastructure.

There is a helm chart as well:

https://github.com/helm/charts/tree/master/stable/hadoop


you might want to consider looking at this set of chartsIn short, this is a collection of helm charts to spin up Hadoop services on K8s cluster.

to mention a few highlights:

  • support HA namenode
  • support of Kerberos
  • support k8s persistent vols
  • support of data volumes
  • etc

Hope this helps. Cheers