StatefulSet recreates pod, why?
You should look into two things:
Check the current state of the pod and recent events with the following command:
kubectl describe pods ${POD_NAME}
Look at the state of the containers in the pod. Are they all Running? Have there been recent restarts?Continue debugging depending on the state of the pods.
Especially take a closer look at why the Pod crashed.
More info can be found in the links I have provided.
- Debug StatefulSets.
StatefulSets provide a debug mechanism to pause all controller operations on Pods using an annotation. Setting the pod.alpha.kubernetes.io/initialized
annotation to "false" on any StatefulSet Pod will pause all operations of the StatefulSet. When paused, the StatefulSet will not perform any scaling operations. Once the debug hook is set, you can execute commands within the containers of StatefulSet pods without interference from scaling operations. You can set the annotation to "false" by executing the following:
kubectl annotate pods <pod-name> pod.alpha.kubernetes.io/initialized="false" --overwrite
When the annotation is set to "false", the StatefulSet will not respond to its Pods becoming unhealthy or unavailable. It will not create replacement Pods till the annotation is removed or set to "true" on each StatefulSet Pod.
Please let me know if that helped.
Another nifty little trick I came up with is to describe
the pod as soon as it stops logging, by using
kubectl logs -f mypod && kubectl describe pod mypod
When the pod fails and stops logging, the kubectl logs -f mypod
will terminate and then the shell will immediately execute kubectl describe pod mypod
, (hopefully) letting you catch the state of the failing pod before it is recreated.
In my case it was showing
Last State: Terminated Reason: OOMKilled Exit Code: 137
in line with what Timothy is saying
kubectl log -p postgresPod
the -p
will give you the previous logs (if it's a simple restart).
There's a whole bunch of "know the rest of your environment" that beg to be asked here. Do you know how many nodes make up your cluster (are we talking 1 or two or are we talking 10's 100's or more). Do you know if they are dedicated instances or are you on a cloud provider like AWS using spot instances.
Take a look at kubectl get nodes
it will it should give you the age of your nodes.
Do you have requests and limits set on your pod? Do a kubectl describe ${POD_NAME}
. Among the requests, limits etc you'll see which node the pod is on. Describe the node it will have CPU and memory details. Can your pod live within those. Is your app configured to live within those limits ? If you don't have limits set then your pod could easily consume so many resources that the kernel oom killer terminates your pod. If you do have pod limits, but have misconfigured your app then K8s may be killing your app because it is breaching the limits
If you have access to the node then take a look at dmesg
to see if OOM-Killer
has terminated any of your pods. If you don't have access get someone who does to take a look at the logs. When you're describing the node look for pods with 0
limits as that is unlimited and they may be misbehaving and causing your app to be killed because it was unlucky enough to request more resource from the system when there was non available due to unlimited apps.