Managing DB migrations on Kubernetes cluster Managing DB migrations on Kubernetes cluster docker docker

Managing DB migrations on Kubernetes cluster


Ideal solution would be to stop all pods, run the migration andrecreate them. But I am not sure how to achieve it properly withKubernetes.

I see from one of the comments that you use Helm, so I'd like to propose a solution leveraging Helm's hooks:

Helm provides a hook mechanism to allow chart developers to interveneat certain points in a release's life cycle. For example, you can usehooks to:

  • Load a ConfigMap or Secret during install before any other charts areloaded.

  • Execute a Job to back up a database before installing a newchart, and then execute a second job after the upgrade in order torestore data.

  • Run a Job before deleting a release to gracefully take aservice out of rotation before removing it.

https://helm.sh/docs/topics/charts_hooks/

You could package your migration as a k8s Job and leverage the pre-install or pre-upgrade hook to run the job. These hooks runs after templates are rendered, but before any new resources are created in Kubernetes. Thus, your migrations will run before your Pods are deployed.

To delete the deployments prior to running your migrations, create a second pre-install/pre-upgrade hook with a lower helm.sh/hook-weight that deletes the target deployments:

apiVersion: batch/v1kind: Jobmetadata:  name: "pre-upgrade-hook1"  annotations:    "helm.sh/hook": pre-upgrade    "helm.sh/hook-weight": "-1"    "helm.sh/hook-delete-policy": hook-succeededspec:  template:    metadata:      name: "pre-upgrade-hook1"    spec:      restartPolicy: Never      serviceAccountName: "<an SA with delete RBAC permissions>"      containers:      - name: kubectl        image: "lachlanevenson/k8s-kubectl:latest"        command: ["delete","deployment","deploy1","deploy2"]

The lower hook-weight will ensure this job runs prior to the migration job. This will ensure the following series of events:

  1. You run helm upgrade
  2. The helm hook with the lowest hook-weight runs and deletes the relevant deployments
  3. The second hook runs and runs your migrations
  4. Your Chart will install with new Deployments, Pods, etc.

Just make sure to keep all of the relevant Deployments in the same Chart.


From an automation/orchestration perspective, my sense is that problems like this are intended to be solved with Operators, using the recently released Operator Framework:

https://github.com/operator-framework

The idea is that there would be a Postgres Migrations Operator- which to my knowledge doesn't exist as yet- which would lie idle waiting for a Custom Resource Definition describing the migration to be posted to the cluster/namespace.

The Operator would wake up, understand what's involved in the intended migration, do some analysis on the cluster to construct a migration plan, and then perform the steps as you describe-

  • put the application into some kind of user-visible maintenance mode
  • take down the existing pods
  • run the migration
  • verify
  • recreate the application pods
  • test
  • take the application out of maintenance mode

That doesn't help you now, though.


Ideal solution would be to stop all pods, run the migration and recreate them. But I am not sure how to achieve it properly with Kubernetes.

This largely depends on your approach, specifically on your CI/CD tools. There are several strategies that you can apply, but, as an illustration, presuming you have Gitlab pipeline (Jenkins could do the same, terminology is different, etc) here are the steps:

  • Make following stages in gitlab-ci.yaml:
    • Build (where you create all necessary images and prepare migrations prior to anything deployed)
    • Stop all affected assets - deployment, services, statefulsets (this can be done as rather simple kubectl delete -f all_required_assets.yaml, where in single manifest you have defined all resources you want stopped completely. You can set grace period as well or force termination and you don't need to remove static assets - only related to stopping. Note that to stop pods you need to remove their top level creation resource, being it pod, deployment, replication controller or statefulset to stop them completely, not simply restart)
    • Migrate implemented either as Job or as Pod that would handle migrations with database (say, kubectl create -f migrate_job.yaml). Preferable job for error tracking after job finishes.
    • Start all assets (same manifest file with affected resource definitions as for stop stage, say, kubectl create -f all_required_assets.yaml, and all start/stop resources are handled through single file. If start order is important for any reason, then separate file is required, but with careful considerations one file should suffice for most scenarios)

This same principle can be exercised in other orchestration/deployment tools as well, and you can even make a simple script to run those kubectl commands directly in one go if previous one is successful.