Why can’t you use Hash Tables/Dictionaries in Counting Sort algorithm? Why can’t you use Hash Tables/Dictionaries in Counting Sort algorithm? python-3.x python-3.x

Why can’t you use Hash Tables/Dictionaries in Counting Sort algorithm?


You certainly can do something like this. The question is whether it’s worthwhile to do so.

Counting sort has a runtime of O(n + U), where n is the number of elements in the array and U is the maximum value. Notice that as U gets larger and larger the runtime of this algorithm starts to degrade noticeably. For example, if U > n and I add one more digit to U (for example, changing it from 1,000,000 to 10,000,000), the runtime can increase by a factor of ten. This means that counting sort starts to become impractical as U gets bigger and bigger, and so you typically run counting sort when U is fairly small. If you’re going to run counting sort with a small value of U, then using a hash table isn’t necessarily worth the overhead. Hashing items costs more CPU cycles than just doing standard array lookups, and for small arrays the potential savings in memory might not be worth the extra time. And if you’re using a very large value of U, you’re better off switching to radix sort, which essentially is lots of smaller passes of counting sort with a very small value of U.

The other issue is that the reassembly step of counting sort has amazing locality of reference - you simply scan over the counts array and the input array in parallel filling in values. If you use a hash table, you lose some of that locality because th elements in the hash table aren’t necessarily stored consecutively.

But these are more implementation arguments than anything else. Fundamentally, counting sort is less about “use an array” and more about “build a frequency histogram.” It just happens to be the case that a regular old array is usually preferable to a hash table when building that histogram.