How does Prometheus know when a pod crashed? How does Prometheus know when a pod crashed? kubernetes kubernetes

How does Prometheus know when a pod crashed?


use sum(kube_pod_container_status_waiting_reason) by (reason) to get all the container waiting reasons if any


The common way for prometheus to extract metrics and health is by the use of scraping (thru an http endpoint is the most common). Since pods can have multiple containers, it is best to scrape an http endpoint of your running container.

If prometheus didnt receive a good response from this endpoint, it can determine that the container is down.

Prometheus itself does not do alerting, you normally delegate that to the alert manager.


kube-state-metrics gathers information from kube-apiserver for the state of kubernetes objects (such as pods, deployments, etc.). It is packed in prometheus-operator. To answer your question, you will not need the pod to be up to be able to scrape its status metrics, you will gather those directly from the apiserver (via scaping kube-state-metrics endpoint).

To check what pod level metrics are available to you via kube-state-metrics check: https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md

Per the answer above you can use kube_pod_container_status_waiting_reason metric or if you just want to alert on threshold regardless of the reason, you can use kube_pod_container_status_waiting