Is there a way to monitor kube cron jobs using prometheus

kubernetes prometheus

I'm using these rules with kube-state-metrics:

groups:- name: job.rules  rules:  - alert: CronJobRunning    expr: time() -kube_cronjob_next_schedule_time > 3600    for: 1h    labels:      severity: warning    annotations:      description: CronJob {{$labels.namespaces}}/{{$labels.cronjob}} is taking more than 1h to complete      summary: CronJob didn't finish after 1h  - alert: JobCompletion    expr: kube_job_spec_completions - kube_job_status_succeeded  > 0    for: 1h    labels:      severity: warning    annotations:      description: Job completion is taking more than 1h to complete        cronjob {{$labels.namespaces}}/{{$labels.job}}      summary: Job {{$labels.job}} didn't finish to complete after 1h  - alert: JobFailed    expr: kube_job_status_failed  > 0    for: 1h    labels:      severity: warning    annotations:      description: Job {{$labels.namespaces}}/{{$labels.job}} failed to complete      summary: Job failed

kubernetes prometheus

The tricky part here is the cronjobs themselves have no useful status, you have to match them to the jobs they create. I've written up an article on how to achieve this:

https://medium.com/@tristan_96324/prometheus-k8s-cronjob-alerts-94bee7b90511

The article goes into a bit of detail as to how things work, but the alert config is as follow:

groups:- name: kube-cron  rules:  - record: job_cronjob:kube_job_status_start_time:max    expr: |      label_replace(        label_replace(          max(            kube_job_status_start_time            * ON(exported_job) GROUP_RIGHT()            kube_job_labels{label_cronjob!=""}          ) BY (exported_job, label_cronjob)          == ON(label_cronjob) GROUP_LEFT()          max(            kube_job_status_start_time            * ON(exported_job) GROUP_RIGHT()            kube_job_labels{label_cronjob!=""}          ) BY (label_cronjob),          "job", "$1", "exported_job", "(.+)"),        "cronjob", "$1", "label_cronjob", "(.+)")  - record: job_cronjob:kube_job_status_failed:sum    expr: |  clamp_max(        job_cronjob:kube_job_status_start_time:max,      1)      * ON(job) GROUP_LEFT()      label_replace(        label_replace(          (kube_job_status_failed != 0),          "job", "$1", "exported_job", "(.+)"),        "cronjob", "$1", "label_cronjob", "(.+)")  - alert: CronJobStatusFailed    expr: |      job_cronjob:kube_job_status_failed:sum      * ON(cronjob) GROUP_RIGHT()      kube_cronjob_labels      > 0    for: 1m    annotations:      description: '{{ $labels.cronjob }} last run has failed {{$value }} times.'

The jobTemplate must include a label called cronjob that matches the name of the cronjob object.

kubernetes prometheus

The way to monitoring cronjobs with Prometheus is to have them push a metric indicating the last time they succeeded to the pushgateway. You can then alert on if the cronjob hasn't succeeded recently enough.

CodeHunter

Is there a way to monitor kube cron jobs using prometheus

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last