Setting up a readiness, liveness or startup probe Setting up a readiness, liveness or startup probe kubernetes kubernetes

Setting up a readiness, liveness or startup probe


You're just not waiting long enough.

The deployment artifacts you're showing here look pretty normal. It's even totally normal for your application to fail fast if it can't reach the database, say because it hasn't started up yet. Every pod has a restart policy, though, which defaults to Always. So, when the pod fails, Kubernetes will restart it; and when it fails again, it will get restarted again; and when it keeps failing, Kubernetes will pause tens of seconds between restarts (the dreaded CrashLoopBackOff state).

Eventually if you're in this wait-and-restart loop, the database will actually come up, and then Kubernetes will restart your application pods, at which point the application will start up normally.

The only thing that I'd change here is that your readiness probes for the two pods should probe the services themselves, not some other service. You probably want the path to be something like / or /healthz or something else that is an actual HTTP request path in the service. That can return 503 Service Unavailable if it detects its dependency isn't available, or you can just crash. Just crashing is fine.

This is a totally normal setup in Kubernetes; there's no way to more directly say that pod A can't start until service B is ready. The flip side of this is that the pattern is actually pretty generic: if your application crashes and restarts whenever it can't reach its database, it doesn't matter if the database is hosted outside the cluster, or if it crashes sometime well after startup time; the same logic will try to restart your application until it works again.


Actually, think I might have sorted it out.

Part of the problem is that even though restartPolicy: Always is the default, the Pods are not aware the Django has failed so it thinks they are healthy.

My thinking was wrong in that I originally assumed I needed to refer to the DB deployment to see if it had start before starting the API deployment. Instead I needed to check if Django had failed and redeploy it if it had.

Doing the following accomplished this for me:

livenessProbe:  tcpSocket:    port: 5000  initialDelaySeconds: 2  periodSeconds: 2readinessProbe:  tcpSocket:    port: 5000  initialDelaySeconds: 2  periodSeconds: 2

I'm learning Kubernetes so please correct me if there is a better way to do this or if this is just plain wrong. I just know it accomplishes what I want.