Read timeout using either urllib2 or any other http library

python http sockets timeout nonblocking

It's not possible for any library to do this without using some kind of asynchronous timer through threads or otherwise. The reason is that the timeout parameter used in httplib, urllib2 and other libraries sets the timeout on the underlying socket. And what this actually does is explained in the documentation.

SO_RCVTIMEO
Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received.

The bolded part is key. A socket.timeout is only raised if not a single byte has been received for the duration of the timeout window. In other words, this is a timeout between received bytes.

A simple function using threading.Timer could be as follows.

import httplibimport socketimport threadingdef download(host, path, timeout = 10):    content = None        http = httplib.HTTPConnection(host)    http.request('GET', path)    response = http.getresponse()        timer = threading.Timer(timeout, http.sock.shutdown, [socket.SHUT_RD])    timer.start()        try:        content = response.read()    except httplib.IncompleteRead:        pass            timer.cancel() # cancel on triggered Timer is safe    http.close()        return content>>> host = 'releases.ubuntu.com'>>> content = download(host, '/15.04/ubuntu-15.04-desktop-amd64.iso', 1)>>> print content is NoneTrue>>> content = download(host, '/15.04/MD5SUMS', 1)>>> print content is NoneFalse

Other than checking for None, it's also possible to catch the httplib.IncompleteRead exception not inside the function, but outside of it. The latter case will not work though if the HTTP request doesn't have a Content-Length header.

python http sockets timeout nonblocking

I found in my tests (using the technique described here) that a timeout set in the urlopen() call also effects the read() call:

import urllib2 as uc = u.urlopen('http://localhost/', timeout=5.0)s = c.read(1<<20)Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/usr/lib/python2.7/socket.py", line 380, in read    data = self._sock.recv(left)  File "/usr/lib/python2.7/httplib.py", line 561, in read    s = self.fp.read(amt)  File "/usr/lib/python2.7/httplib.py", line 1298, in read    return s + self._file.read(amt - len(s))  File "/usr/lib/python2.7/socket.py", line 380, in read    data = self._sock.recv(left)socket.timeout: timed out

Maybe it's a feature of newer versions? I'm using Python 2.7 on a 12.04 Ubuntu straight out of the box.

python http sockets timeout nonblocking

One possible (imperfect) solution is to set the global socket timeout, explained in more detail here:

import socketimport urllib2# timeout in secondssocket.setdefaulttimeout(10)# this call to urllib2.urlopen now uses the default timeout# we have set in the socket modulereq = urllib2.Request('http://www.voidspace.org.uk')response = urllib2.urlopen(req)

However, this only works if you're willing to globally modify the timeout for all users of the socket module. I'm running the request from within a Celery task, so doing this would mess up timeouts for the Celery worker code itself.

I'd be happy to hear any other solutions...

CodeHunter

Read timeout using either urllib2 or any other http library

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last