Oozie output-events Oozie output-events hadoop hadoop

Oozie output-events


There is at least one use of specifying <output-event>s in your coordinator. When re-running a coordinator for a range of dates (using oozie job -rerun command), all the corresponding paths specified as <output-event>s will be deleted.

Sometimes it is useful to remove all the outputs generated by a coordinator's instances. For example when you want to start another coordinator that has those paths as <input-event>s and you want to make sure it will process the re-run data instead of the old data.


If you are talking about Oozie, the output files are used to connecting different coordinator jobs. Consider a big DAG of coordinator jobs, some job might take other jobs' output as its input. So the datasets are the edges in the DAG.

For example, in the Oozie configuration file, if you specify Coordinator A's output is DS1, Coordinator B's output is DS2, and Coordinator C's input is DS1, and DS2, then Oozie will guarantee you that the corresponding action in Coordinator C will not be executed before DS1 and DS2 are ready.