How can a process die in a way that Process.wait wouldn't notice? How can a process die in a way that Process.wait wouldn't notice? ruby ruby

How can a process die in a way that Process.wait wouldn't notice?


How could the child process have died without the parent script knowing?

My guess is that the child process turned into a zombie and missed by Process.waitall. Did you check if the child processes are zombies when it happens?

The zombie: If you have zombie processes it means those zombies have not been waited for by their parent (check the PPID with ps -l). In the end you have three choices: Fix the parent process (make it wait); kill the parent; or get over it.

Could you check your list of signals and trap it?

You can list all Signal(s) available (below is on windows):

Signal.list=> {"EXIT"=>0, "INT"=>2, "ILL"=>4, "ABRT"=>22, "FPE"=>8, "KILL"=>9, "SEGV"=>11, "TERM"=>15}

Could you try to trap it via e.g. INT (note: you can have one trap per Signal) (

Signal.trap('SEGV') { throw :sigsegv }catch :sigsegv    start_what_you_needendputs 'OMG! Got a SEGV!'

Since your question is a general one, it is hard to give you a specific answer.


Zombies are not the only possible cause for this problem -- stopped children may not be reported for a variety of reasons.

The existence of a zombie typically means that the parent has not properly waited on them. The posted code looks OK, though, so unless there's a framework bug lurking somewhere I'd want to look beyond the zombie apocalypse to explain this problem.

In contrast to zombies, which can't be fully reaped because they have no accessible parent, frozen processes have an intact parent but have stopped responding for some reason (waiting for an external process or I/O operation, memory problems, long or infinite looping, slow database operations, etc.).

On some platforms, Ruby can add a flag requesting return of stopped children that haven't been reported, using the following syntax:

waitpid(pid, Process::WUNTRACED)

AFAIK waitall doesn't have a version that accepts flags, so you'd have to aggregate this yourself, or use pid = -1 to wait for any child process (the default if you omit pid) or pid = 0 to wait for any child with the same process groupID as the calling process.

See documentation here.