Is there a way to monitor kube cron jobs using prometheus Is there a way to monitor kube cron jobs using prometheus kubernetes kubernetes

Is there a way to monitor kube cron jobs using prometheus


I'm using these rules with kube-state-metrics:

groups:- name: job.rules  rules:  - alert: CronJobRunning    expr: time() -kube_cronjob_next_schedule_time > 3600    for: 1h    labels:      severity: warning    annotations:      description: CronJob {{$labels.namespaces}}/{{$labels.cronjob}} is taking more than 1h to complete      summary: CronJob didn't finish after 1h  - alert: JobCompletion    expr: kube_job_spec_completions - kube_job_status_succeeded  > 0    for: 1h    labels:      severity: warning    annotations:      description: Job completion is taking more than 1h to complete        cronjob {{$labels.namespaces}}/{{$labels.job}}      summary: Job {{$labels.job}} didn't finish to complete after 1h  - alert: JobFailed    expr: kube_job_status_failed  > 0    for: 1h    labels:      severity: warning    annotations:      description: Job {{$labels.namespaces}}/{{$labels.job}} failed to complete      summary: Job failed


The tricky part here is the cronjobs themselves have no useful status, you have to match them to the jobs they create. I've written up an article on how to achieve this:

https://medium.com/@tristan_96324/prometheus-k8s-cronjob-alerts-94bee7b90511

The article goes into a bit of detail as to how things work, but the alert config is as follow:

groups:- name: kube-cron  rules:  - record: job_cronjob:kube_job_status_start_time:max    expr: |      label_replace(        label_replace(          max(            kube_job_status_start_time            * ON(exported_job) GROUP_RIGHT()            kube_job_labels{label_cronjob!=""}          ) BY (exported_job, label_cronjob)          == ON(label_cronjob) GROUP_LEFT()          max(            kube_job_status_start_time            * ON(exported_job) GROUP_RIGHT()            kube_job_labels{label_cronjob!=""}          ) BY (label_cronjob),          "job", "$1", "exported_job", "(.+)"),        "cronjob", "$1", "label_cronjob", "(.+)")  - record: job_cronjob:kube_job_status_failed:sum    expr: |  clamp_max(        job_cronjob:kube_job_status_start_time:max,      1)      * ON(job) GROUP_LEFT()      label_replace(        label_replace(          (kube_job_status_failed != 0),          "job", "$1", "exported_job", "(.+)"),        "cronjob", "$1", "label_cronjob", "(.+)")  - alert: CronJobStatusFailed    expr: |      job_cronjob:kube_job_status_failed:sum      * ON(cronjob) GROUP_RIGHT()      kube_cronjob_labels      > 0    for: 1m    annotations:      description: '{{ $labels.cronjob }} last run has failed {{$value }} times.'

The jobTemplate must include a label called cronjob that matches the name of the cronjob object.


The way to monitoring cronjobs with Prometheus is to have them push a metric indicating the last time they succeeded to the pushgateway. You can then alert on if the cronjob hasn't succeeded recently enough.