How to invoke an oozie workflow via shell script and block/wait till workflow completion
You can do that by using the job id then start a loop and parsing the output of oozie info. Below is the shell code for same.
Start oozie job
oozie_job_id=$(oozie job -oozie http://<oozie-server>/oozie -config job.properties -run );echo $oozie_job_id;sleep 30;
Parse job id from output. Here job_id format is "job: jobid"
job_id=$(echo $oozie_job_id | sed -n 's/job: \(.*\)/\1/p');echo $job_id;
check job status at regular interval, if its Running or not
while [ true ]do job_status=$(oozie job --oozie http://<oozie-server>/oozie -info $job_id | sed -n 's/Status\(.*\): \(.*\)/\2/p'); if [ "$job_status" != "RUNNING" ]; then echo "Job is completed with status $job_status"; break; fi #this sleep depends on you job, please change the value accordingly echo "sleeping for 5 minutes"; sleep 5mdone
This is basic way to do it, you can modify it as per you use case.
To upload workflow definition to HDFS use the following command :
hdfs dfs -copyFromLocal -f workflow.xml /user/hdfs/workflows/workflow.xml
To fire up Oozie job you need these two commands at the belowPlease Notice that to write each on a single line.
JOB_ID=$(oozie job -oozie http://<oozie-server>/oozie -config job.properties -submit)
oozie job -oozie http://<oozie-server>/oozie -start ${JOB_ID#*:} -config job.properties
You need to parse result coming from below command when the returning result = 0
otherwise it's a failure. Simply loop with sleep X amount of time after each trial.
oozie job -oozie http://<oozie-server>/oozie -info ${JOB_ID#*:}
echo $? //shows whether command executed successfully or not