How can I deploy HDFS (Hadoop Distributed FS) to a K8s (Kubernetes) cluster? How can I deploy HDFS (Hadoop Distributed FS) to a K8s (Kubernetes) cluster? kubernetes kubernetes

How can I deploy HDFS (Hadoop Distributed FS) to a K8s (Kubernetes) cluster?


In general, I suggest you don't use HDFS within k8s...

  1. NameNode HA would need to be containerized, and NameNode filesystem must be stateful.
  2. You need Zookeeper QJM, which competes with etcd, in a way, for leader election purposes.

HDFS was designed before k8s persistent volumes were really thought about. Hadoop Ozone project is still in development and meant to work around these limitations. It currently has k8s deployment, instructions, though

Alternatively, I suggest you look into using MinIO, or Project Rook (on CephFS), both of which offer a Hadoop-compatible file system (HCFS)


If you must use HDFS, then set it up outside k8s, then make requests to it from within the containers.

Regarding YARN, make sure to watch the Yunikorn project (YARN on k8s)