How can i debug Hadoop map reduce [duplicate]
Since you are processing big data, the size of your tracing messages can be huge, so it can cause a problem. It's useful to consider alternatives to "system.out.println" style logging:
- use Counters (here is an simple example)
- write logs to HDFS using MultipleOutputs
The best thing about Counters and MultipleOutputs - you can programmably access them, in case of MultipleOutputs you can even run map/reduce task to extract some statistics from logs.
An another alternative to debugging on production environment is unit-testing, MiniMRCluster will help you to test your map-reduce jobs during unit testing.
I develop my map/reduce code in Eclipse using maven to build the runtime jar and to manage dependencies. Once I have hadoop installed and running on my machine to support HDFS, I can run and debug my code in Eclipse. That means using breakpoints and everything else in the Eclipse debug perspective.