No space left on device exception, amazon EMR medium instances and S3
The problem means that there is no space to store the output (or temporary output) of your MapReduce job.
Some things to check are:
- Have you deleted unnecessary files from HDFS? Run
hadoop dfs -ls /
command to check the files stored on HDFS. (In case you use a Trash, make sure you empty it, too.) - Do you use compression to store the output (or temporary output) of your jobs? You can do so by setting as output format the SequenceFileOutputFormat, or by setting
setCompressMapOutput(true);
- What is the replication factor? By default it is set to 3, but if there is a space issue, you can risk to set it to 2, or 1, in order to make your program run.
It could be an issue that some of your reducers output a significantly larger amount of data than others, so check your code, too.
I've gotten out of space errors on AMI 3.2.x where I haven't on AMI 3.1.x. Switch AMIs, and see what happens.