Jenkins slave running in ECS cluster can not start container

docker jenkins amazon-ecs jenkins-slave

Your problem is a very common one when you start and stop containers very often, and the post you just mentioned is all about that! They specifically say that:

"The Amazon EC2 Container Service Plugin can launch containers on your ECS cluster that automatically register themselves as Jenkins slaves, execute the appropriate Jenkins job on the container, and then automatically remove the container/build slave afterwards"

The problem with this is that, if the stopped containers are not cleaned up, you eventually run out of memory, as you have experienced. You can check this yourself if you ssh into the instance and run the following command:

docker ps -a

If you run this command when Jenkins is getting in trouble, you should see an almost endless list of stopped containers. You can delete them all by running the following command:

docker rm -f $(docker ps -a -f status-exited)

However, doing this manually every so often is really not very convenient, so what you really want to do is to include the following script in the userData parameter of you ECS instance configuration when you launch it:

ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m >> /etc/ecs/ecs.configECS_CLUSTER=<NAME_OF_CLUSTER> >> /etc/ecs/ecs.configECS_DISABLE_IMAGE_CLEANUP=false >> /etc/ecs/ecs.configECS_IMAGE_CLEANUP_INTERVAL=10m >> /etc/ecs/ecs.configECS_IMAGE_MINIMUM_CLEANUP_AGE=30m >> /etc/ecs/ecs.config

This will instruct the ECS agent to enable a cleanup daemon that checks every 10 minutes (that is the lowest interval you can set) for images to delete, deletes containers 1 minute after the task has stopped, and deletes images which are 30 minutes old and no longer referenced by an active Task Definition. You can learn more about these variables here.

In my experience, this configuration might not be enough if you start and stop containers very fast, so you may want to attach a decent volume to your instance in order to make sure you have enough space to carry on while the daemon cleans up the stopped containers.

docker jenkins amazon-ecs jenkins-slave

Thanks Jose for the answer.

But, this command worked for me in Docker 1.12.*

docker rm $(docker ps -aqf "status=exited")

flag 'q' filters the containerIds from the result and removes it.

docker jenkins amazon-ecs jenkins-slave

If you upgrade to latest AWS client (or latest ECS AMIs, amzn-ami-2017.09.d-amazon-ecs-optimized or later) then you configure ECS automated cleanup of defunct images, containers and volumes in your ecs config for the EC hosts serving the cluster.

This cleans up after and node(label){} clause but not docker execution during that build.

node container and its volumes - cleaned
docker images generated by steps executed upon that node - not cleaned

ECS is blind to what happens on that node. Given that the nodes themselves should be the largest things, ECS automated clean up should reduce the need to run a separate cleaning task to a minimum.

CodeHunter

Jenkins slave running in ECS cluster can not start container

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last