How do I get more specific error info on killed job in Oozie How do I get more specific error info on killed job in Oozie hadoop hadoop

How do I get more specific error info on killed job in Oozie


One suggestion is to catch the exception in your main method, and export a property ('exceptionTrace' for example) with the exception serialized into its value (combined with the capture-output flag), which you can then reference using the wf:actionData('myJavaAction')['exceptionTrace'] EL function.

http://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html#a3.2.7_Java_Action


I found a way to handle errors and access the cause by using Counters. Maybe it is not what they are designed for, but it seems to be the only way out...

So I catch every Throwable in mapper and reducer like this:

} catch (Throwable t) {    Counters.Counter counter = reporter.getCounter("Exceptions", t.getClass().getSimpleName());        counter.increment(1);    counter.setDisplayName(t.getClass().getSimpleName() + "\n last failed key: " + key.toString() + "\n " + ExceptionUtils.getStackTrace(t));    reporter.incrCounter("Exceptions", "TOTAL_COUNT", 1);    reporter.progress();}

And these counters are easily accessible in the Tool via RunningJob after job is finished. "Exceptions" group contains all exceptions' counters with all needed information in displayName field.

Please comment if you see any problems in this approach or if you know the better one.