Recovering from zmq.error.ZMQError: Address already in use Recovering from zmq.error.ZMQError: Address already in use python python

Recovering from zmq.error.ZMQError: Address already in use


Question 1:

If you do sudo netstat -ltnp, on a Linux type operating system, you will most probably see the process owning the port. Kill it with kill -9 <pid>.

Question 2:

When you exit the program, close your sockets and then call zmq_ctx_destroy(). This destroys the context. See http://zguide.zeromq.org/page:all#toc17 for more info.


At this very moment:

reboot

Next:

start using try: / except: / finally: encapsulation constructors, that will help you to grant a graceful exit from all zmq allocations, incl. all Socket-s' .close() and Context's .term() without any hanging orphan(s) an memory leak(s), even in case any panic button or unhandled exception interrupts your code-execution altogether with losing references to your still hanging, network-hardware bound, instances.


Sometimes another zeromq-using process is keeping the port in use, and netstat doesn't indicate that other process listening (so netstat -lntp won't show it), but rather shows an established connection on the port with the same host/port on both ends. After killing that other process, the port is now available for use.

Reason #1: I've had this happen because I had the zeromq listening ports set up in the range of ephemeral ports (on linux e.g. 32768-61000) that get used as the local side of outgoing connections, and my services need to connect to other services on the same box. A percentage of the time an outgoing connection gets an ephemeral port that is the same as a listening port on the box, and suddenly "address already in use". I just moved all the listening ports down out of the way of the ephemeral port range and all of the "address already in use" issues went away.

Reason #2: Speculation: When I've run into similar problems with other python network libraries, the offending process was previously launched from the listening process using subprocess or similar, and there was a problem with the socket leaking to the child process; if the parent process exited without closing the socket, the socket would be left alive and owned by the child process, and even though the child process didn't really know anything about the socket it would still be held up so other processes couldn't use it.

If that's the issue, it might be fixable by tweaking the flags of the socket before the subprocess, e.g. (unix-specific):

fd = sock.get(zmq.FD)old_flags = fcntl.fcntl(fd, fcntl.F_GETFD)fcntl.fcntl(fd, fcntl.F_SETFD, old_flags | fcntl.FD_CLOEXEC)

Or maybe there's a way to more properly close the socket in the parent process.