Records proactively spilled in Hadoop Pig?

join hadoop apache-pig

The first two shows the total records/bytes written to HDFS by your MR job.
It can happen, that during a MR job not all records fit into the memory.Spill counters indicate how many records have been written to the local disks of your datanodes to avoid running out of memory.

Pig uses two methods to control the memory usage and do a spill if necessary:

1.Spillable Memory Manager:

This is like a central place where the spillable bags are registered. In case of low memory this managergoes through the list of the registered bags and performs a GC.

2.Proactive (self) spilling:

Bags can also spill themselves if their memory limit is reached (see pig.cachedbag.memusage)

Back to the statistics you have:

Total bags proactively spilled: # of bags that have been spilled
Total records proactively spilled: # of records in those bags

It's always good to check the spill stats of your job since lot of spilling may indicate huge performance hit that need to be avoided.

CodeHunter

Records proactively spilled in Hadoop Pig?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last