Wait for kubernetes job to complete on either failure/success using command line Wait for kubernetes job to complete on either failure/success using command line kubernetes kubernetes

Wait for kubernetes job to complete on either failure/success using command line


Run the first wait condition as a subprocess and capture its PID. If the condition is met, this process will exit with an exit code of 0.

kubectl wait --for=condition=complete job/myjob &completion_pid=$!

Do the same for the failure wait condition. The trick here is to add && exit 1 so that the subprocess returns a non-zero exit code when the job fails.

kubectl wait --for=condition=failed job/myjob && exit 1 &failure_pid=$! 

Then use the Bash builtin wait -n $PID1 $PID2 to wait for one of the conditions to succeed. The command will capture the exit code of the first process to exit:

wait -n $completion_pid $failure_pid

Finally, you can check the actual exit code of wait -n to see whether the job failed or not:

exit_code=$?if (( $exit_code == 0 )); then  echo "Job completed"else  echo "Job failed with exit code ${exit_code}, exiting..."fiexit $exit_code

Complete example:

# wait for completion as background process - capture PIDkubectl wait --for=condition=complete job/myjob &completion_pid=$!# wait for failure as background process - capture PIDkubectl wait --for=condition=failed job/myjob && exit 1 &failure_pid=$! # capture exit code of the first subprocess to exitwait -n $completion_pid $failure_pid# store exit code in variableexit_code=$?if (( $exit_code == 0 )); then  echo "Job completed"else  echo "Job failed with exit code ${exit_code}, exiting..."fiexit $exit_code


You can leverage the behaviour when --timeout=0.

In this scenario, the command line returns immediately with either result code 0 or 1. Here's an example:

retval_complete=1retval_failed=1while [[ $retval_complete -ne 0 ]] && [[ $retval_failed -ne 0 ]]; do  sleep 5  output=$(kubectl wait --for=condition=failed job/job-name --timeout=0 2>&1)  retval_failed=$?  output=$(kubectl wait --for=condition=complete job/job-name --timeout=0 2>&1)  retval_complete=$?doneif [ $retval_failed -eq 0 ]; then    echo "Job failed. Please check logs."    exit 1fi

So when either condition=failed or condition=complete is true, execution will exit the while loop (retval_complete or retval_failed will be 0).

Next, you only need to check and act on the condition you want. In my case, I want to fail fast and stop execution when the job fails.


The wait -n approach does not work for me as I need it to work both on Linux and Mac.

I improved on the answer provided by Clayton a little, because his script would not work with set -e -E enabled. The following will work even in that case.

while true; do  if kubectl wait --for=condition=complete --timeout=0 job/name 2>/dev/null; then    job_result=0    break  fi  if kubectl wait --for=condition=failed --timeout=0 job/name 2>/dev/null; then    job_result=1    break  fi  sleep 3doneif [[ $job_result -eq 1 ]]; then    echo "Job failed!"    exit 1fiecho "Job succeeded"

You might want to add a timeout to avoid the infinite loop, depends on your situation.