Using AWS EMRFS in apache spark hosted on ec2 Using AWS EMRFS in apache spark hosted on ec2 kubernetes kubernetes

Using AWS EMRFS in apache spark hosted on ec2


No, EMRFS is for EMR only, the easy way to make S3 look like part of HDFS. For EC2 you connect to S3, but that is less easy than with EMR. S3 is not tightly coupled to EC2. Yes, parallelism is applied but not according to MR data locality, worker and data node that is.


EMR uses a closed source S3 connector with proprietary features "emrfs". You don't get to see the source, can't get support from anyone else and don't get to use it except when you run emr. For independent apps: the s3a connector is great but not a full replacement for HDFS