How to get a faster speed when using multi-threading in python How to get a faster speed when using multi-threading in python multithreading multithreading

How to get a faster speed when using multi-threading in python


The biggest thing you are doing wrong, that is hurting your throughput the most, is the way you are calling thread.start() and thread.join():

for i in range(0, 10):   thread = threading.Thread(target = current_post.post)   thread.start()   thread.join()

Each time through the loop, you create a thread, start it, and then wait for it to finish Before moving on to the next thread. You aren't doing anything concurrently at all!

What you should probably be doing instead is:

threads = []# start all of the threadsfor i in range(0, 10):   thread = threading.Thread(target = current_post.post)   thread.start()   threads.append(thread)# now wait for them all to finishfor thread in threads:   thread.join()


In many cases, python's threading doesn't improve execution speed very well... sometimes, it makes it worse. For more information, see David Beazley's PyCon2010 presentation on the Global Interpreter Lock / Pycon2010 GIL slides. This presentation is very informative, I highly recommend it to anyone considering threading...

Even though David Beazley's talk explains that network traffic improves the scheduling of Python threading module, you should use the multiprocessing module. I included this as an option in your code (see bottom of my answer).

Running this on one of my older machines (Python 2.6.6):

current_post.mode == "Process"  (multiprocessing)  --> 0.2609 secondscurrent_post.mode == "Multiple" (threading)        --> 0.3947 secondscurrent_post.mode == "Simple"   (serial execution) --> 1.650 seconds

I agree with TokenMacGuy's comment and the numbers above include moving the .join() to a different loop. As you can see, python's multiprocessing is significantly faster than threading.


from multiprocessing import Processimport threadingimport timeimport urllibimport urllib2class Post:    def __init__(self, website, data, mode):        self.website = website        self.data = data        #mode is either:        #   "Simple"      (Simple POST)        #   "Multiple"    (Multi-thread POST)        #   "Process"     (Multiprocessing)        self.mode = mode        self.run_job()    def post(self):        #post data        req = urllib2.Request(self.website)        open_url = urllib2.urlopen(req, self.data)        if self.mode == "Multiple":            time.sleep(0.001)        #read HTMLData        HTMLData = open_url.read()        #print "OK"    def run_job(self):        """This was refactored from the OP's code"""        origin_time = time.time()        if(self.mode == "Multiple"):            #multithreading POST            threads = list()            for i in range(0, 10):               thread = threading.Thread(target = self.post)               thread.start()               threads.append(thread)            for thread in threads:               thread.join()            #calculate the time interval            time_interval = time.time() - origin_time            print "mode - {0}: {1}".format(method, time_interval)        if(self.mode == "Process"):            #multiprocessing POST            processes = list()            for i in range(0, 10):               process = Process(target=self.post)               process.start()               processes.append(process)            for process in processes:               process.join()            #calculate the time interval            time_interval = time.time() - origin_time            print "mode - {0}: {1}".format(method, time_interval)        if(self.mode == "Simple"):            #simple POST            for i in range(0, 10):                self.post()            #calculate the time interval            time_interval = time.time() - origin_time            print "mode - {0}: {1}".format(method, time_interval)        return time_intervalif __name__ == "__main__":    for method in ["Process", "Multiple", "Simple"]:        Post("http://forum.xda-developers.com/login.php",             "vb_login_username=test&vb_login_password&securitytoken=guest&do=login",            method            )


Keep in mind that the only case where multi-threading can "increase speed" in Python is when you have operations like this one that are heavily I/O bound. Otherwise multi-threading does not increase "speed" since it can not run on more than one CPU (no, not even if you have multiple cores, python doesn't work that way). You should use multi-threading when you want two things to be done at the same time, not when you want two things to be parallel (i.e. two processes running separately).

Now, what you're actually doing will not actually increase the speed of any single DNS lookup, but it will allow for multiple requests to be shot off while waiting for the results of some others, but you should be careful of how many you do or you will just make the response times even worse than they already are.

Also please stop using urllib2, and use Requests: http://docs.python-requests.org