Consul not deregistering zombie services
Using the http api for removing services is another much nicer solution. I just figured out how to manually remove services before I figured out how to use the https api.
To delete a service with the http api use the following command:curl -v -X PUT http://<consul_ip_address>:8500/v1/agent/service/deregister/<ServiceID>
Note that your is a combination of three things: the IP address of host machine the container is running on, the name of the container, and the inner port of the container (i.e. 80 for apache, 3000 for node js, 8000 for django, ect) all separated by colins :
Heres an example of what that would actually look like:curl -v -X PUT http://1.2.3.4:8500/v1/agent/service/deregister/192.168.1.1:sharp_apple:80
If you want an easy way to get the ServiceID then just curl the service that contains a zombie:curl -s http://<consul_ip_address>:8500/v1/catalog/service/<your_services_name>
Heres a real example for a service called someapp that will return all the services under it:curl -s http://1.2.3.4:8500/v1/catalog/service/someapp
Here is how you can absolutely delete all the zombie services: Go into your consul server, find the location of the json files containing the zombies and delete them.
For example I am running consul in a container:
docker run --restart=unless-stopped -d -h consul0 --name consul0 -v /mnt:/data \ -p $(hostname -i):8300:8300 \ -p $(hostname -i):8301:8301 \ -p $(hostname -i):8301:8301/udp \ -p $(hostname -i):8302:8302 \ -p $(hostname -i):8302:8302/udp \ -p $(hostname -i):8400:8400 \ -p $(hostname -i):8500:8500 \ -p $(ifconfig docker0 | awk '/\<inet\>/ { print $2}' | cut -d: -f2):53:53/udp \ progrium/consul -server -advertise $(hostname -i) -bootstrap-expect 3
Notice the flag -v /mnt:/data
this is where all the data consul is storing is located. For me it was located in /mnt
. Under this directory you will find several other directories.
config raft serf services tmp
Go into services
and you will see the files that contain the json info of your services, find any ones that contains the info of zombies and delete them. Then restart consul. Then repeat for each server in your cluster that has zombies on it.
Don't use catalog, instead of using agent, the reason is catalog is maintained by agents, it will be resync-back by agent even if you remove it from catalog, remove zombie services shell script:
leader="$(curl http://ONE-OF-YOUR-CLUSTER:8500/v1/status/leader | sed 's/:8300//' | sed 's/"//g')"while :doserviceID="$(curl http://$leader:8500/v1/health/state/critical | ./jq '.[0].ServiceID' | sed 's/"//g')"node="$(curl http://$leader:8500/v1/health/state/critical | ./jq '.[0].Node' | sed 's/"//g')"echo "serviceID=$serviceID, node=$node"size=${#serviceID}echo "size=$size"if [ $size -ge 7 ]; thencurl --request PUT http://$node:8500/v1/agent/service/deregister/$serviceIDelsebreakfidonecurl http://$leader:8500/v1/health/state/critical
json parser jq is used for field retrieving