Jenkins slave running in ECS cluster can not start container Jenkins slave running in ECS cluster can not start container docker docker

Jenkins slave running in ECS cluster can not start container


Your problem is a very common one when you start and stop containers very often, and the post you just mentioned is all about that! They specifically say that:

"The Amazon EC2 Container Service Plugin can launch containers on your ECS cluster that automatically register themselves as Jenkins slaves, execute the appropriate Jenkins job on the container, and then automatically remove the container/build slave afterwards"

The problem with this is that, if the stopped containers are not cleaned up, you eventually run out of memory, as you have experienced. You can check this yourself if you ssh into the instance and run the following command:

docker ps -a

If you run this command when Jenkins is getting in trouble, you should see an almost endless list of stopped containers. You can delete them all by running the following command:

docker rm -f $(docker ps -a -f status-exited)

However, doing this manually every so often is really not very convenient, so what you really want to do is to include the following script in the userData parameter of you ECS instance configuration when you launch it:

ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m >> /etc/ecs/ecs.configECS_CLUSTER=<NAME_OF_CLUSTER> >> /etc/ecs/ecs.configECS_DISABLE_IMAGE_CLEANUP=false >> /etc/ecs/ecs.configECS_IMAGE_CLEANUP_INTERVAL=10m >> /etc/ecs/ecs.configECS_IMAGE_MINIMUM_CLEANUP_AGE=30m >> /etc/ecs/ecs.config

This will instruct the ECS agent to enable a cleanup daemon that checks every 10 minutes (that is the lowest interval you can set) for images to delete, deletes containers 1 minute after the task has stopped, and deletes images which are 30 minutes old and no longer referenced by an active Task Definition. You can learn more about these variables here.

In my experience, this configuration might not be enough if you start and stop containers very fast, so you may want to attach a decent volume to your instance in order to make sure you have enough space to carry on while the daemon cleans up the stopped containers.


Thanks Jose for the answer.

But, this command worked for me in Docker 1.12.*

docker rm $(docker ps -aqf "status=exited")

flag 'q' filters the containerIds from the result and removes it.


If you upgrade to latest AWS client (or latest ECS AMIs, amzn-ami-2017.09.d-amazon-ecs-optimized or later) then you configure ECS automated cleanup of defunct images, containers and volumes in your ecs config for the EC hosts serving the cluster.

This cleans up after and node(label){} clause but not docker execution during that build.

  • node container and its volumes - cleaned
  • docker images generated by steps executed upon that node - not cleaned

ECS is blind to what happens on that node. Given that the nodes themselves should be the largest things, ECS automated clean up should reduce the need to run a separate cleaning task to a minimum.