Server becomes unresponsive periodically, OOM Killer inactive?
Apparently the solution I provided didn't seem to help the person who asked the question, but it might help someone else who stumbleupon here. The following are the 2 things I suggested which might be causing the problem.
Suggestions 1
I am guessing you are using the offical ruby docker image and when you run the container ruby is running as PID 1
inside the container.
If ruby is running as PID 1
then OOM killer wont be able to kill it, causing all the problem you are seeing.
To solve this problem you will have to make sure a proper init
process runs as PID 1
.
Docker 1.25 and above has the --init
option for docker run
command. This option will make sure that a proper init
handles the tasks of PID 1
, it will also pass all SIGNALs to your ruby application.
https://docs.docker.com/engine/reference/commandline/run/
--init API 1.25+ Run an init inside the container that forwards signals and reaps processes
The following is what docker uses as the init
https://github.com/krallin/tini
Suggestion 2
There is a known issue with Amazon Linux AMI the details can be found at the following link https://github.com/aws/amazon-ecs-agent/issues/794. As of writing I am not sure if the problem with AMI was fixed or not.
So try a different AMI as suggested in that thread say the Ubuntu AMI.
I think you are assuming that OOM will always target your Ruby application, but I don't think that is the case. You log line shows it killed you tty connection instead. I am betting it is killing other processes before your Ruby process and this is why your machine to seem un-responsive. You can read up on how OOM works and it might help here. I would look specifically at your oom_scores and see what you find there.
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
Good Luck