Quicksort sorts larger numbers faster?

python algorithm performance sorting quicksort

I think this has to do with the choice of a pivot. Depending on how your partition step works, if you have a lot of duplicate values, your algorithm can degenerate to quadratic behavior when confronted with many duplicates. For example, suppose that you're trying to quicksort this stream:

 [0 0 0 0 0 0 0 0 0 0 0 0 0]

If you aren't careful with how you do the partitioning step, this can degenerate quickly. For example, suppose you pick your pivot as the first 0, leaving you with the array

 [0 0 0 0 0 0 0 0 0 0 0 0]

to partition. Your algorithm might say that the smaller values are the array

 [0 0 0 0 0 0 0 0 0 0 0 0]

And the larger values are the array

[]

This is the case that causes quicksort to degenerate to O(n²), since each recursive call is only shrinking the size of the input by one (namely, by pulling off the pivot element).

I noticed that in your code, your partitioning step does indeed do this:

for items in array:    if items <= pivotVal:        smaller.append(items)    else:        greater.append(items)

Given a stream that's a whole bunch of copies of the same element, this will put all of them into one array to recursively sort.

Of course, this seems like a ridiculous case - how is this at all connected to reducing the number of values in the array? - but it actually does come up when you're sorting lots of elements that aren't distinct. In particular, after a few passes of the partitioning, you're likely to group together all equal elements, which will bring you into this case.

For a discussion of how to prevent this from happening, there's a really great talk by Bob Sedgewick and Jon Bentley about how to modify the partition step to work quickly when in the presence of duplicate elements. It's connected to Dijkstra's Dutch national flag problem, and their solutions are really clever.

One option that works is to partition the input into three groups - less, equal, and greater. Once you've broken the input up this way, you only need to sort the less and greater groups; the equal groups are already sorted. The above link to the talk shows how to do this more or less in-place, but since you're already using an out-of-place quicksort the fix should be easy. Here's my attempt at it:

for items in array:    if items < pivotVal:        smaller.append(items)    elif items == pivotVal:        equal.append(items)    else:        greater.append(items)

I've never written a line of Python in my life, BTW, so this may be totally illegal syntax. But I hope the idea is clear! :-)

python algorithm performance sorting quicksort

Things we know:

Time complexity for quick sort of unordered array is O(n*logn).
If the array is already sorted, it degrades to O(n^2).
First two statements are not discrete, i.e. the closer an array is to being sorted, the closer is time complexity of quick sort to O(n^2), and reversely as we shuffle it the complexity approaches O(n*logn)

Now, let's look at your experiment:

In all three cases you used the same number of elements. So, our n which you named x is always 100000.
In your first experiment, you used numbers between 0 and 100000, so ideally with a perfect random number generator you'd get mostly different numbers in a relatively unordered list, thus fitting the O(n*logn) complexity case.
In your third experiment, you used numbers between 0 an 10 in a 100000 elements large list. It means that there were quite many duplicates in your list, making it a lot closer to a sorted list than in the first experiment. So, in that case time complexity was much closer to O(n^2).

And with the same large enough n you can say that n*logn > n^2, which you actually confirmed by your experiment.

python algorithm performance sorting quicksort

The quicksort algorithm has a known weakness--it is slower when the data is mostly sorted. When you have 100000 between 0 and 10 they will be closer to being 'mostly sorted' than 100000 numbers in the range of 0 to 100000.

CodeHunter

Quicksort sorts larger numbers faster?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last