Records proactively spilled in Hadoop Pig? Records proactively spilled in Hadoop Pig? hadoop hadoop

Records proactively spilled in Hadoop Pig?


The first two shows the total records/bytes written to HDFS by your MR job.
It can happen, that during a MR job not all records fit into the memory.Spill counters indicate how many records have been written to the local disks of your datanodes to avoid running out of memory.

Pig uses two methods to control the memory usage and do a spill if necessary:

1.Spillable Memory Manager:

This is like a central place where the spillable bags are registered. In case of low memory this managergoes through the list of the registered bags and performs a GC.


2.Proactive (self) spilling:

Bags can also spill themselves if their memory limit is reached (see pig.cachedbag.memusage)


Back to the statistics you have:

  • Total bags proactively spilled: # of bags that have been spilled
  • Total records proactively spilled: # of records in those bags

It's always good to check the spill stats of your job since lot of spilling may indicate huge performance hit that need to be avoided.