Hadoop on Kubernetes vs Standard Hadoop
As people have said, "the only difference is you are in kubernetes/container". The reality is that means a couple of huge things in terms of actual operation:
- The helm chart linked above is a toy.
- It builds vanilla hadoop (i.e. not HDP or CDH)
- It doesn't do HA namenodes
- It doesn't do kerberos
- You have to manage your own volumes
- If you are running on a public cloud this isn't a super big deal, as you can dynamically get storage
So unless you just want a super lightweight hdfs deployment, or you are comfortable/willing to build out your own deployment of a more sophisticated k8s hadoop deployment, or you are willing to pay for a 3rd party kubernetes stack with hadoop support (e.g. robin.io), I would say that in general it is not worth running on k8s at this point.
Note that if/when the hadoop vendors make their own operator, this might change.
Standard hadoop is just hadoop with map-reduce , spark etc and backed by HDFS
Hadoop on kubernetes is just standard Hadoop as above , but running on Kubernetes
In case of Hadoop on K8S , you get all the benefits that kubernetes usually offers over traditional infrastructure.
There is a helm chart as well:
you might want to consider looking at this set of chartsIn short, this is a collection of helm charts to spin up Hadoop services on K8s cluster.
to mention a few highlights:
- support HA namenode
- support of Kerberos
- support k8s persistent vols
- support of data volumes
- etc
Hope this helps. Cheers