How do I gracefully interrupt urllib2 downloads? How do I gracefully interrupt urllib2 downloads? python python

How do I gracefully interrupt urllib2 downloads?


There is no clean answer. There are several ugly ones.

Initially, I was putting rejected ideas in the question. As it has become clear that there are no right answers, I decided to post the various sub-optimal alternatives as a list answer. Some of these are inspired by comments, thank you.

Library Support

An ideal solution would be if OpenerDirector offered a cancel operator.

It does not. Library writers take note: if you provide long slow operations, you need to provide a way to cancel them if people are to use them in real-world applications.

Reduce timeout

As a general solution for others, this may work. With a smaller timeout, it would be more responsive to the changes in circumstances. However, it will also cause downloads to fail if they weren't completely finished in the timeout time, so this is a trade-off. In my situation, it is untenable.

Read the download in chunks.

Again, as a general solution, this may work. If the download consists of very large files, you can read them in small chunks, and abort after a chunk is read.

Unfortunately, if (as in my case) the delay is in receiving the first byte, rather than the size of the file, this will not help.

Kill the entire thread.

While there are some aggressive techniques to kill threads, depending on the operating system, they are not recommended. In particular, they can cause deadlocks to occur. See Eli Bendersky's two articles (via @JBernardo).

Just be unresponsive

If the abort operation has been triggered by the user, it may be simplest to just be unresponsive, and not act on the request until the open operation has completed.

Whether this unresponsiveness is acceptable to your users (hint: no!), is up to your project.

It also continues to place a demand on the server, even if the result is known to be unneeded.

Let it peter out in another thread.

If you create a separate thread to run the operation, and then communicate with that thread in an interruptable manner, you could discard the blocked thread, and start working on the next operation instead. Eventually, the thread will unblock and then it can gracefully shut-down.

The thread should be a daemon, so it doesn't block the total shut-down of the application.

This will give the user responsiveness, but it means that the server that will need to continue to support it, even though the result is not needed.

Rewrite the socket methods to be polling-based.

As described in @Luke's answer, it may be possible to provide (fragile?, unportable?) extensions to the standard Python libraries.

His solution changes the socket operations from blocking to polling. Another might allow shutdown through the socket.shutdown() method (if that, indeed, will interrupt a blocked socket - not tested.)

A solution based on Twisted may be cleaner. See below.

Replace the sockets with asynchronous, non-thread-based libraries.

The Twisted framework provides a replacement set of libraries for network operations that are event-driven. I understand this means that all of the different communications can be handled by a single-thread with no blocking.

Sabotage

It may be possible to navigate the OpenerDirector, to find the baselevel socket that is blocking, and sabotage it directly (Will socket.shutdown() be sufficient?) to make it return.

Yuck.

Put it in a separate (killable) process

The thread that reads the socket can be moved into a separate process, and interprocess communication can be used to transmit the result. This IPC can be aborted early by the client, and then the whole process can be killed.

Ask the Web Server to cancel

If you have control over the web-server being read, it could be sent a separate message asking it to close the socket. That should cause the blocked client to react.


I don't see any built-in mechanism to accomplish this. I would just move the OpenerDirector out to its own thread process so it would be safe to kill it.

Note: there is no way to 'kill' a thread in python (thanks JBernardo). It may, however, be possible to generate an exception in the thread, but it's likely this won't work if the thread is blocking on a socket.


Here's a start for another approach. It works by extending part of the httplib stack to include a non-blocking check for the server response. You would have to make a few changes to implement this within your thread. Also note that it uses some undocumented bits of urllib2 and httplib, so the final solution for you will probably depend on the version of Python you are using (I have 2.7.3). Poke around in your urllib2.py and httplib.py files; they're quite readable.

import urllib2, httplib, select, timeclass Response(httplib.HTTPResponse):    def _read_status(self):        ## Do non-blocking checks for server response until something arrives.        while True:            sel = select.select([self.fp.fileno()], [], [], 0)            if len(sel[0]) > 0:                break            ## <--- Right here, check to see whether thread has requested to stop            ##      Also check to see whether timeout has elapsed            time.sleep(0.1)        return httplib.HTTPResponse._read_status(self)class Connection(httplib.HTTPConnection):    response_class = Responseclass Handler(urllib2.HTTPHandler):    def http_open(self, req):        return self.do_open(Connection, req)h = Handler()o = urllib2.build_opener(h)f = o.open(url)print f.read()

Also note that there are many places in the stack that could potentially block; this example only covers one of them--the server has received the request but takes a long time to respond.