Fine tuning PIG for local execution

hadoop mapreduce apache-pig

Pig's documentation makes it clear that local operation is intended to be run single-threaded, taking different code paths for certain functions that would otherwise use distributed sort. As a result, optimizing for Pig's local mode seems like the wrong solution to the presented problem.

Have you considered running a local, "pseudo-distributed" cluster instead of investing in full cluster setup? You can follow Hadoop's instructions for pseudo-distributed operation, then point Pig at localhost. This would have the desired result, at the expense of two-step startup and teardown.

You'll want to raise the number of default mappers and reducers to consume all cores available on your machine. Fortunately, this is reasonably well-documented (admittedly, in the cluster setup documentation); simply define mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum in your local copy of $HADOOP_HOME/conf/mapred-site.xml.

CodeHunter

Fine tuning PIG for local execution

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last