Psycopg2 db connection hangs on lost network connection Psycopg2 db connection hangs on lost network connection python python

Psycopg2 db connection hangs on lost network connection


After a long and brutal struggle, I think I fixed this issue by simply doing the strategy others are talking about, but using the psycopg2 connect function itself:

from psycopg2 import connectconn = connect(        database=database,        user=username,        password=password,        host=hostname,        port=port,        connect_timeout=3,        # https://www.postgresql.org/docs/9.3/libpq-connect.html        keepalives=1,        keepalives_idle=5,        keepalives_interval=2,        keepalives_count=2)

I was seeing psycopg2 hang consistently on long-running queries, but now the issue seems to be fully resolved.

Note this may be new functionality, since this question is old.


Took a look at the socket timeout and after reading this and this, these settings worked for me

s = socket.fromfd(connection.fileno(),                  socket.AF_INET, socket.SOCK_STREAM)# Enable sending of keep-alive messagess.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)# Time the connection needs to remain idle before start sending# keepalive probess.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, int(ceil(time)))# Time between individual keepalive probess.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 1)# The maximum number of keepalive probes should send before dropping# the connections.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3)


OP's and Gabriel Salla's solutions which configure KEEPALIVE are not complete. This solution only works when the connection is idle (no data sent before network became down) and the network became down.

If some data have already sent over the network that is already down but not yet detected so by the KEEPALIVE feature there will be a hang. This happens because the RTO mechanism is used instead of KEEPALIVE when some data is send.

To set timeout for an RTO you must set TCP_USER_TIMEOUT timeout (in milliseconds) for socket.

The complete solution is (both KEEPALIVE and RTO timeouts configured to 10 seconds):

s = socket.fromfd(conn.fileno(), socket.AF_INET, socket.SOCK_STREAM)s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 6)s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 2)s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 2)s.setsockopt(socket.IPPROTO_TCP, socket.TCP_USER_TIMEOUT, 10000)