etcd DB cluster on kubernetes misbehaving etcd DB cluster on kubernetes misbehaving kubernetes kubernetes

etcd DB cluster on kubernetes misbehaving


I don't think 🤔 the heartbeats are the main problem, it also seems 👀 the logs that you are seeing are Warning logs. So it's possible that some heartbeats are missed here and there but your nodes are node(s) are not crashing or mirroring.

It's likely that you changed the replica numbers and your new replicas are not joining the cluster. So, I would recommend following this guide for you to add the new members to the cluster. Basically with etcdctl something like this:

etcdctl member add node2 --peer-urls=http://node1:2380etcdctl member add node3 --peer-urls=http://node1:2380,http://node2:2380

Note that you will have to run these commands in a pod that has access to all your etcd nodes in your cluster.

You could also consider managing your etcd cluster with the etcd operator 🔧 which should be able to take care of the scaling and removal/addition of nodes.

✌️


Okay, I had two problems:

  • "failed to send out heartbeat" Warning messages.

  • "No leader election".

Next day i found out the reason of second problem, actually i had startup parameter set in the pod definition.ETCDCTL_API: 3

so when i run "etcdctl member list" with APIv3 it doesn't mention which member is selected as reader.

$ ETCDCTL_API=3 etcdctl member list        3d0bc1a46f81ecd9, started, etcd-2, http://etcd-2.etcd-headless.wallet.svc.cluster.local:2380, http://etcd-2.etcd-headless.wallet.svc.cluster.local:2379, false    b6a5d762d566708b, started, etcd-1, http://etcd-1.etcd-headless.wallet.svc.cluster.local:2380, http://etcd-1.etcd-headless.wallet.svc.cluster.local:2379, false$ ETCDCTL_API=2 etcdctl member list        3d0bc1a46f81ecd9, started, etcd-2, http://etcd-2.etcd-headless.wallet.svc.cluster.local:2380, http://etcd-2.etcd-headless.wallet.svc.cluster.local:2379, false    b6a5d762d566708b, started, etcd-1, http://etcd-1.etcd-headless.wallet.svc.cluster.local:2380, http://etcd-1.etcd-headless.wallet.svc.cluster.local:2379, true

So when i use APIv2 i can see which node is elected as leader and there were no problem with leader election. Still working on heartbeat warning but i guess i need to tune the config in order to avoied that.

NB: I have 3 nodes, stopped one for testing.