How to change Docker stack restarting behaviour? How to change Docker stack restarting behaviour? docker docker

How to change Docker stack restarting behaviour?


  1. Is this behaviour configurable? For instance, I don't want Docker to restart my stack under any circumstances. If it is configurable, then how?

With a version 3 stack, the restart policy moved to the deploy section:

version: '3'services:  crash:    image: busybox    command: sleep 10    deploy:      restart_policy:        condition: none        # max_attempts: 2

Documentation on this is available at: https://docs.docker.com/compose/compose-file/#restart_policy

  1. Is there any docker journal to keep any stack restarts as it's entries?

Depending on the task history limit (configurable with docker swarm update, you can view the previously run tasks for a service:

$ docker service ps restart_crashID                  NAME                  IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS30okge1sjfno        restart_crash.1       busybox:latest      bmitch-asusr556l    Shutdown            Complete 4 minutes agopapxoq1vve1a         \_ restart_crash.1   busybox:latest      bmitch-asusr556l    Shutdown            Complete 4 minutes ago1hji2oko51sk         \_ restart_crash.1   busybox:latest      bmitch-asusr556l    Shutdown            Complete 5 minutes ago

And you can inspect the state for any one task:

$ docker inspect 30okge1sjfno --format '{{json .Status}}' | jq .{  "Timestamp": "2018-11-06T19:55:02.208633174Z",  "State": "complete",  "Message": "finished",  "ContainerStatus": {    "ContainerID": "8e9310bde9acc757f94a56a32c37a08efeed8a040ce98d84c851d4eef0afc545",    "PID": 0,    "ExitCode": 0  },  "PortStatus": {}}

There's also an event history in the docker engine that you can query:

$ docker events --filter label=com.docker.swarm.service.name=restart_crash --filter event=die --since 15m --until 0s2018-11-06T14:54:09.417465313-05:00 container die f17d945b249a04e716155bcc6d7db490e58e5be00973b0470b05629ce2cca461 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=1hji2oko51skhv8fv1nw71gb8, com.docker.swarm.task.name=restart_crash.1.1hji2oko51skhv8fv1nw71gb8, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.1hji2oko51skhv8fv1nw71gb8)2018-11-06T14:54:32.391165964-05:00 container die d6f98b8aaa171ca8a2ddaf31cce7a1e6f1436ba14696ea3842177b2e5e525f13 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=papxoq1vve1adriw6e9xqdaad, com.docker.swarm.task.name=restart_crash.1.papxoq1vve1adriw6e9xqdaad, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.papxoq1vve1adriw6e9xqdaad)2018-11-06T14:55:00.126450155-05:00 container die 8e9310bde9acc757f94a56a32c37a08efeed8a040ce98d84c851d4eef0afc545 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=30okge1sjfnoicd0lo2g1y0o7, com.docker.swarm.task.name=restart_crash.1.30okge1sjfnoicd0lo2g1y0o7, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.30okge1sjfnoicd0lo2g1y0o7)

See more details on the events command at: https://docs.docker.com/engine/reference/commandline/events/

The best practice at larger scale organizations is to send the container logs to a central location (e.g. Elastic) and monitor the metrics externally (e.g. Prometheus/Grafana).


Since you haven't added any configuration snippet or runtime commands to your post, I'll have to make hypothesis on your actual question.

My assumptions :

  • you are running multiple services using docker-compose
  • these services have memory limits configured (in the docker-compose.yml file)
  • you see them restarting once they hit the configured memory limit, and you want to prevent them from restarting

I assume your docker-compose.yml looks like the following:

version: '2.1'services:   service1:     image: some/image     restart: always     mem_limit: 512m   service2:     image: another/image     restart: always     mem_limit: 512m

With this configuration, any of the service containers would be OOM-Killed by the kernel when it tries to use more than 512Mb of memory. Docker would then automatically restart a fresh container to replace the killed one.

So to answer your 1st point : yes, it is, just change "restart" to "no", or simply remove this line (since "no" is the default value for this parameter). As for your second point, simply look for service restarts in the docker daemon logs.

Yet, if what you need is to keep your service up, this is not going to help you : your service will still try to use more than its allowed memory limit, it will still get killed, ... and not be automatically restarted anymore.

It would be better to review the memory usage pattern of your services, and understand why they are attempting to use more than the configured limit. Eventually, the solution is either to configure your services to use less memory, or raise the mem_limit in your docker-compose.yml.

For example :

  • for a database service, configure the memory options to force the engine to not use more RAM than mem_limit (SGA and PGA under Oracle, various buffers and cache sizes for MySQL/MariaDB, ...)
  • for java applications, configure the Xmx to be less enough than the mem_limit (keeping in mind the needs for non-heap memory), or preferably with a recent JDK (latest 8 or 9+) go for -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap.

I hope this will help you; to be more precise I would really need more context.