Server becomes unresponsive periodically, OOM Killer inactive? Server becomes unresponsive periodically, OOM Killer inactive? docker docker

Server becomes unresponsive periodically, OOM Killer inactive?


Apparently the solution I provided didn't seem to help the person who asked the question, but it might help someone else who stumbleupon here. The following are the 2 things I suggested which might be causing the problem.

Suggestions 1

I am guessing you are using the offical ruby docker image and when you run the container ruby is running as PID 1 inside the container.

If ruby is running as PID 1 then OOM killer wont be able to kill it, causing all the problem you are seeing.

To solve this problem you will have to make sure a proper init process runs as PID 1.

Docker 1.25 and above has the --init option for docker run command. This option will make sure that a proper init handles the tasks of PID 1, it will also pass all SIGNALs to your ruby application.

https://docs.docker.com/engine/reference/commandline/run/

--init API 1.25+ Run an init inside the container that forwards signals and reaps processes

The following is what docker uses as the inithttps://github.com/krallin/tini

Suggestion 2

There is a known issue with Amazon Linux AMI the details can be found at the following link https://github.com/aws/amazon-ecs-agent/issues/794. As of writing I am not sure if the problem with AMI was fixed or not.

So try a different AMI as suggested in that thread say the Ubuntu AMI.


I think you are assuming that OOM will always target your Ruby application, but I don't think that is the case. You log line shows it killed you tty connection instead. I am betting it is killing other processes before your Ruby process and this is why your machine to seem un-responsive. You can read up on how OOM works and it might help here. I would look specifically at your oom_scores and see what you find there.

http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

Good Luck