How should a ZeroMQ worker safely "hang up"? How should a ZeroMQ worker safely "hang up"? python python

How should a ZeroMQ worker safely "hang up"?


You seem to think that you are trying to avoid a “simple” race condition such as in

... = zmq_recv(fd);do_something();zmq_send(fd, answer);/* Let's hope a new request does not arrive just now, please close it quickly! */zmq_close(fd);

but I think the problem is that fair queuing (round-robin) makes things even more difficult: you might already even have several queued requests on your worker. The sender will not wait for your worker to be free before sending a new request if it is its turn to receive one, so at the time you call zmq_send other requests might be waiting already.

In fact, it looks like you might have selected the wrong data direction. Instead of having a requests pool send requests to your workers (even when you would prefer not to receive new ones), you might want to have your workers fetch a new request from a requests queue, take care of it, then send the answer.

Of course, it means using XREP/XREQ, but I think it is worth it.

Edit: I wrote some code implementing the other direction to explain what I mean.


I think the problem is that your messaging architecture is wrong. Your workers should use a REQ socket to send a request for work and that way there is only ever one job queued at the worker. Then to acknowledge completion of the work, you could either use another REQ request that doubles as ack for the previous job and request for a new one, or you could have a second control socket.

Some people do this using PUB/SUB for the control so that each worker publishes acks and the master subscribes to them.

You have to remember that with ZeroMQ there are 0 message queues. None at all! Just messages buffered in either the sender or receiver depending on settings like High Water Mark, and type of socket. If you really do need message queues then you need to write a broker app to handle that, or simply switch to AMQP where all communication is through a 3rd party broker.


I've been thinking about this as well. You may want to implement a CLOSE message which notifies the customer that the worker is going away. You could then have the worker drain for a period of time before shutting down. Not ideal, of course, but might be workable.