Consul not deregistering zombie services Consul not deregistering zombie services docker docker

Consul not deregistering zombie services


Using the http api for removing services is another much nicer solution. I just figured out how to manually remove services before I figured out how to use the https api.

To delete a service with the http api use the following command:curl -v -X PUT http://<consul_ip_address>:8500/v1/agent/service/deregister/<ServiceID>

Note that your is a combination of three things: the IP address of host machine the container is running on, the name of the container, and the inner port of the container (i.e. 80 for apache, 3000 for node js, 8000 for django, ect) all separated by colins :

Heres an example of what that would actually look like:curl -v -X PUT http://1.2.3.4:8500/v1/agent/service/deregister/192.168.1.1:sharp_apple:80

If you want an easy way to get the ServiceID then just curl the service that contains a zombie:curl -s http://<consul_ip_address>:8500/v1/catalog/service/<your_services_name>

Heres a real example for a service called someapp that will return all the services under it:curl -s http://1.2.3.4:8500/v1/catalog/service/someapp


Here is how you can absolutely delete all the zombie services: Go into your consul server, find the location of the json files containing the zombies and delete them.

For example I am running consul in a container:

docker run --restart=unless-stopped -d -h consul0 --name consul0 -v /mnt:/data \    -p $(hostname -i):8300:8300 \    -p $(hostname -i):8301:8301 \    -p $(hostname -i):8301:8301/udp \    -p $(hostname -i):8302:8302 \    -p $(hostname -i):8302:8302/udp \    -p $(hostname -i):8400:8400 \    -p $(hostname -i):8500:8500 \    -p $(ifconfig docker0 | awk '/\<inet\>/ { print $2}' | cut -d: -f2):53:53/udp \    progrium/consul -server -advertise $(hostname -i) -bootstrap-expect 3

Notice the flag -v /mnt:/data this is where all the data consul is storing is located. For me it was located in /mnt. Under this directory you will find several other directories.

config raft serf services tmp

Go into services and you will see the files that contain the json info of your services, find any ones that contains the info of zombies and delete them. Then restart consul. Then repeat for each server in your cluster that has zombies on it.


Don't use catalog, instead of using agent, the reason is catalog is maintained by agents, it will be resync-back by agent even if you remove it from catalog, remove zombie services shell script:

leader="$(curl http://ONE-OF-YOUR-CLUSTER:8500/v1/status/leader | sed 's/:8300//' | sed 's/"//g')"while :doserviceID="$(curl http://$leader:8500/v1/health/state/critical | ./jq '.[0].ServiceID' | sed 's/"//g')"node="$(curl http://$leader:8500/v1/health/state/critical | ./jq '.[0].Node' | sed 's/"//g')"echo "serviceID=$serviceID, node=$node"size=${#serviceID}echo "size=$size"if [ $size -ge 7 ]; thencurl --request PUT http://$node:8500/v1/agent/service/deregister/$serviceIDelsebreakfidonecurl http://$leader:8500/v1/health/state/critical

json parser jq is used for field retrieving