What is significance of the Oozie MR launcher? What is significance of the Oozie MR launcher? hadoop hadoop

What is significance of the Oozie MR launcher?


I posted the same in Apache Flume forums and here is the response.

It's also to keep the Oozie server from being bogged down or becoming unstable. For example, if you have a bunch of workflows running Pig jobs, then you'd have the Oozie server running multiple copies of the Pig client (which is a relatively "heavy" program) directly. By moving all of the user code and external clients to map tasks in the launcher job, the Oozie server remains more light-weight and less prone to errors. It can also much more scalable this way because the launcher jobs distribute the the job launching/monitoring to other machines in the cluster; otherwise, with the Oozie server doing everything, we'd have to limit the number of concurrent workflows based on your Oozie server's machine specs (RAM, CPU, etc). And finally, from an architectural standpoint, the Oozie server itself is stateless; that is, everything is stored in the database and the Oozie server can be taken down at any point without losing anything. If we were to launch jobs directly from the Oozie server, then we'd now have some state (e.g. the Pig client cannot be restarted and resumed).