Is Namenode still necessary if I use S3 instead of HDFS? Is Namenode still necessary if I use S3 instead of HDFS? hadoop hadoop

Is Namenode still necessary if I use S3 instead of HDFS?


No, provided you have a means to deal with the fact that S3 lacks the consistency needed by the shipping work committers. Every so often, if S3's listings are inconsistent enough, your results will be invalid and you won't even notice.

Different suppliers of Spark on AWS solve this in their own way. If you are using ASF spark, there is nothing bundled which can do this.

https://www.youtube.com/watch?v=BgHrff5yAQo