How to get a faster speed when using multi-threading in python

python multithreading post tcp

The biggest thing you are doing wrong, that is hurting your throughput the most, is the way you are calling thread.start() and thread.join():

for i in range(0, 10):   thread = threading.Thread(target = current_post.post)   thread.start()   thread.join()

Each time through the loop, you create a thread, start it, and then wait for it to finish Before moving on to the next thread. You aren't doing anything concurrently at all!

What you should probably be doing instead is:

threads = []# start all of the threadsfor i in range(0, 10):   thread = threading.Thread(target = current_post.post)   thread.start()   threads.append(thread)# now wait for them all to finishfor thread in threads:   thread.join()

python multithreading post tcp

In many cases, python's threading doesn't improve execution speed very well... sometimes, it makes it worse. For more information, see David Beazley's PyCon2010 presentation on the Global Interpreter Lock / Pycon2010 GIL slides. This presentation is very informative, I highly recommend it to anyone considering threading...

Even though David Beazley's talk explains that network traffic improves the scheduling of Python threading module, you should use the multiprocessing module. I included this as an option in your code (see bottom of my answer).

Running this on one of my older machines (Python 2.6.6):

current_post.mode == "Process"  (multiprocessing)  --> 0.2609 secondscurrent_post.mode == "Multiple" (threading)        --> 0.3947 secondscurrent_post.mode == "Simple"   (serial execution) --> 1.650 seconds

I agree with TokenMacGuy's comment and the numbers above include moving the .join() to a different loop. As you can see, python's multiprocessing is significantly faster than threading.

from multiprocessing import Processimport threadingimport timeimport urllibimport urllib2class Post:    def __init__(self, website, data, mode):        self.website = website        self.data = data        #mode is either:        #   "Simple"      (Simple POST)        #   "Multiple"    (Multi-thread POST)        #   "Process"     (Multiprocessing)        self.mode = mode        self.run_job()    def post(self):        #post data        req = urllib2.Request(self.website)        open_url = urllib2.urlopen(req, self.data)        if self.mode == "Multiple":            time.sleep(0.001)        #read HTMLData        HTMLData = open_url.read()        #print "OK"    def run_job(self):        """This was refactored from the OP's code"""        origin_time = time.time()        if(self.mode == "Multiple"):            #multithreading POST            threads = list()            for i in range(0, 10):               thread = threading.Thread(target = self.post)               thread.start()               threads.append(thread)            for thread in threads:               thread.join()            #calculate the time interval            time_interval = time.time() - origin_time            print "mode - {0}: {1}".format(method, time_interval)        if(self.mode == "Process"):            #multiprocessing POST            processes = list()            for i in range(0, 10):               process = Process(target=self.post)               process.start()               processes.append(process)            for process in processes:               process.join()            #calculate the time interval            time_interval = time.time() - origin_time            print "mode - {0}: {1}".format(method, time_interval)        if(self.mode == "Simple"):            #simple POST            for i in range(0, 10):                self.post()            #calculate the time interval            time_interval = time.time() - origin_time            print "mode - {0}: {1}".format(method, time_interval)        return time_intervalif __name__ == "__main__":    for method in ["Process", "Multiple", "Simple"]:        Post("http://forum.xda-developers.com/login.php",             "vb_login_username=test&vb_login_password&securitytoken=guest&do=login",            method            )

python multithreading post tcp

Keep in mind that the only case where multi-threading can "increase speed" in Python is when you have operations like this one that are heavily I/O bound. Otherwise multi-threading does not increase "speed" since it can not run on more than one CPU (no, not even if you have multiple cores, python doesn't work that way). You should use multi-threading when you want two things to be done at the same time, not when you want two things to be parallel (i.e. two processes running separately).

Now, what you're actually doing will not actually increase the speed of any single DNS lookup, but it will allow for multiple requests to be shot off while waiting for the results of some others, but you should be careful of how many you do or you will just make the response times even worse than they already are.

Also please stop using urllib2, and use Requests: http://docs.python-requests.org

CodeHunter

How to get a faster speed when using multi-threading in python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last