Disk Spill during MapReduce Disk Spill during MapReduce hadoop hadoop

Disk Spill during MapReduce


Mapper's intermediate files (spilled files) are stored in the local filesystem of the worker node where the Mapper is running. Similarly the data streamed from one node to another node is stored in local filesystem of the worker node where the task is running.

This local filesystem path is specified by hadoop.tmp.dir property which by default is '/tmp'.

And after the completion or failure of the job the temporary location used on the local filesystem get's cleared automatically you don't have to perform any clean up process, it's automatically handled by the framework.