Bad gateway with traefik and docker swarm during service update Bad gateway with traefik and docker swarm during service update docker docker

Bad gateway with traefik and docker swarm during service update


Bad gateway means traefik is configured to forward requests, but it's not able to reach the container on the ip and port that it's configured to use. Common issues causing this are:

  • traefik and the service on different docker networks
  • service exists in multiple networks and traefik picks the wrong one
  • wrong port being used to connect to the container (use the container port and make sure it's listening on all interfaces, aka 0.0.0.0)

From the comments, this is only happening during the deploy, which means traefik is hitting containers before they are ready to receive requests, or while they are being stopped.

You can configure containers with a healthcheck and send request through swarm mode's VIP using a Dockerfile that looks like:

FROM jwilder/whoamiRUN echo $(date) >/build-date.txtHEALTHCHECK --start-period=30s --retries=1 CMD wget -O - -q http://localhost:8000

And then in the docker-compose.yml:

  labels:    - traefik.enable=true    - traefik.backend=app    - traefik.backend.loadbalancer.swarm=true    ...

And I would also configure the traefik service with the following options:

  - "--retry.attempts=2"  - "--forwardingTimeouts.dialTimeout=1s"

However, traefik will keep a connection open and the VIP will continue to send all requests to the same backend container over that same connection. What you can do instead is have traefik itself perform the healthcheck:

  labels:    - traefik.enable=true    - traefik.backend=app    - traefik.backend.healthcheck.path=/    ...

I would still leave the healthcheck on the container itself so Docker gives the container time to start before stopping the other container. And leave the retry option on the traefik service so any request to a stopping container, or just one that hasn't been detected by the healthcheck, has a chance to try try again.


Here's the resulting compose file that I used in my environment:

version: '3.5'services:  app:    image: test-whoami:1    ports:      - 6081:8000    deploy:      replicas: 2      restart_policy:        condition: on-failure      update_config:        parallelism: 1        failure_action: rollback      labels:        - traefik.enable=true        - traefik.backend=app        - traefik.backend.healthcheck.path=/        - traefik.frontend.rule=Path:/        - traefik.port=8000        - traefik.docker.network=test_web    networks:      - web  reverse-proxy:    image: traefik    command:      - "--api"      - "--retry.attempts=2"      - "--forwardingTimeouts.dialTimeout=1s"      - "--docker"      - "--docker.swarmMode"      - "--docker.domain=localhost"      - "--docker.watch"      - "--docker.exposedbydefault=false"      - "--docker.network=test_web"    deploy:      replicas: 1      restart_policy:        condition: on-failure      update_config:        parallelism: 1        failure_action: rollback      placement:        constraints:          - node.role == manager    networks:      - web    ports:      - 6080:80      - 6880:8080    volumes:      - /var/run/docker.sock:/var/run/docker.socknetworks:  web:

Dockerfile is as quoted above. Image names, ports, network names, etc were changed to avoid conflicting with other things in my environment.


As of today (jun/2021) Traefik can't drain the connections during update.

To achieve a zero-downtime rolling update you should delegate the load-balancing to docker swarm itself:

# trafik v2# docker-compose.ymlservices:  your_service:    deploy:      labels:        - traefik.docker.lbswarm=true

From the docs:

Enables Swarm's inbuilt load balancer (only relevant in Swarm Mode).

If you enable this option, Traefik will use the virtual IP provided by docker swarm instead of the containers IPs. Which means that Traefik will not perform any kind of load balancing and will delegate this task to swarm.

Further info:

https://github.com/traefik/traefik/issues/41

https://github.com/traefik/traefik/issues/1480