Wait for kubernetes job to complete on either failure/success using command line
Run the first wait condition as a subprocess and capture its PID. If the condition is met, this process will exit with an exit code of 0.
kubectl wait --for=condition=complete job/myjob &completion_pid=$!
Do the same for the failure wait condition. The trick here is to add && exit 1
so that the subprocess returns a non-zero exit code when the job fails.
kubectl wait --for=condition=failed job/myjob && exit 1 &failure_pid=$!
Then use the Bash builtin wait -n $PID1 $PID2
to wait for one of the conditions to succeed. The command will capture the exit code of the first process to exit:
wait -n $completion_pid $failure_pid
Finally, you can check the actual exit code of wait -n
to see whether the job failed or not:
exit_code=$?if (( $exit_code == 0 )); then echo "Job completed"else echo "Job failed with exit code ${exit_code}, exiting..."fiexit $exit_code
Complete example:
# wait for completion as background process - capture PIDkubectl wait --for=condition=complete job/myjob &completion_pid=$!# wait for failure as background process - capture PIDkubectl wait --for=condition=failed job/myjob && exit 1 &failure_pid=$! # capture exit code of the first subprocess to exitwait -n $completion_pid $failure_pid# store exit code in variableexit_code=$?if (( $exit_code == 0 )); then echo "Job completed"else echo "Job failed with exit code ${exit_code}, exiting..."fiexit $exit_code
You can leverage the behaviour when --timeout=0
.
In this scenario, the command line returns immediately with either result code 0 or 1. Here's an example:
retval_complete=1retval_failed=1while [[ $retval_complete -ne 0 ]] && [[ $retval_failed -ne 0 ]]; do sleep 5 output=$(kubectl wait --for=condition=failed job/job-name --timeout=0 2>&1) retval_failed=$? output=$(kubectl wait --for=condition=complete job/job-name --timeout=0 2>&1) retval_complete=$?doneif [ $retval_failed -eq 0 ]; then echo "Job failed. Please check logs." exit 1fi
So when either condition=failed
or condition=complete
is true, execution will exit the while loop (retval_complete
or retval_failed
will be 0
).
Next, you only need to check and act on the condition you want. In my case, I want to fail fast and stop execution when the job fails.
The wait -n
approach does not work for me as I need it to work both on Linux and Mac.
I improved on the answer provided by Clayton a little, because his script would not work with set -e -E
enabled. The following will work even in that case.
while true; do if kubectl wait --for=condition=complete --timeout=0 job/name 2>/dev/null; then job_result=0 break fi if kubectl wait --for=condition=failed --timeout=0 job/name 2>/dev/null; then job_result=1 break fi sleep 3doneif [[ $job_result -eq 1 ]]; then echo "Job failed!" exit 1fiecho "Job succeeded"
You might want to add a timeout to avoid the infinite loop, depends on your situation.