Why is a threaded version of this particular Perl script 200 times slower than its non-threaded counterpart? Why is a threaded version of this particular Perl script 200 times slower than its non-threaded counterpart? multithreading multithreading

Why is a threaded version of this particular Perl script 200 times slower than its non-threaded counterpart?


Jay P. is right:

~$ strace -c ./threads.pl% time     seconds  usecs/call     calls    errors syscall------ ----------- ----------- --------- --------- ---------------- 99.80    0.116007       10546        11           futex  0.20    0.000229           6        36           mmap2  0.00    0.000000           0        31           read  0.00    0.000000           0        49        13 open  0.00    0.000000           0        36           close

Compare that with:

~$ strace -c ./no-threads.pl% time     seconds  usecs/call     calls    errors syscall------ ----------- ----------- --------- --------- ---------------- 90.62    0.000261         261         1           execve  9.38    0.000027           0       167           write  0.00    0.000000           0        12           read  0.00    0.000000           0        38        13 open  0.00    0.000000           0        25           close


I'm a Python guy, not Perl, so I only have a vague idea of what the code is doing. However, always be careful when you see Queues. Python has a thread-safe Queue, and it looks like Perl does too. They're fantastic in that they take care of thread-safety for you, but they typically involve lots of expensive locking and unlocking of the queue, which is probably where all your time is going.


How many processors do you have? In general, any calculation intensive task will be slower when # of threads > # of processors. This is because it is expensive to switch between threads ("context switch"). Context switches involve stopping 1 thread, saving its context, then putting in another thread's context into the processor so it can run. And all for what? So thread A can calculate if 12321 is divisible by 7 instead of thread B?

If you have 2 procs, I would bet that a version with 2 threads might be the fastest, 4 procs -> use 4 threads, etc.