Why is it impossible to find a specified value in a sorted array faster than O(log n)?

arrays algorithm time-complexity big-o

Let's imagine that you have an array of n items. If you perform a lookup in this array, then that lookup can return one of n + 1 values: either "the item isn't present," or "the item is present at index i" for any of the n indices.

Now, suppose that the only way that your algorithm is allowed to work with the array is by asking questions of the form "is the item greater than or equal to the item in index i?" for some choice of i, and let's imagine that you ask some question of this form k total times. There are then 2^k possible ways that the comparisons could come back. To see why, there are two options for the how the first comparison can go (either "yes" or "no"). There are two options for how the second comparison can go (either "yes" or "no"), and two options for the third comparison. Multiplying all those 2s together gives 2^k.

We now have two constraints:

Any correct algorithm must be able to return one of n + 1 different options.
With k comparisons, there are 2^k possible ways those comparisons can work out.

This means that we have to have n + 1 ≤ 2^k, since otherwise there aren't enough possible outcomes from the search algorithm to be able to cover all n + 1 possible outcomes. Taking the log base two of both sides gives lg (n + 1) ≤ k, so the number of comparisons made must be Ω(log n).

Stated differently - if your algorithm makes too few queries, there aren't enough possible ways for those comparisons to go to ensure that every possible option can be produced.

Of course, if you aren't in the comparison model, you can outperform searches in an array using hash tables. Some hash tables give expected O(1) lookups, while others (say, cuckoo hashing) give worst-case O(1) lookups.

Moving outside the comparison model, there are algorithms that, subject to certain assumptions, have expected runtimes lower than O(log n). For example, interpolation search can find items in sorted arrays in expected time O(log log n), provided that the data are sampled from a uniform distribution. It works by making numeric estimates of where in the sequence to choose the next item to probe, and works well in practice.

On the more theoretical side of things, fusion trees can perform searches in time O(log n / log w), where w is the machine word size, provided that the values are integers that fit into a single machine word. This can be improved down to the surprising runtime of O(sqrt(log n / log log n)). It's known that if the n values each fit into a single machine word, then the predecessor lower bound says you can't do better than the (very unusual runtime of) O(min{log w / log log w, sqrt(log n / log log n)}), where w is the machine word size. These algorithms outperform the Ω(log n) lower bound by making multiple comparisons in parallel using creative operations on individual machine words.

arrays algorithm time-complexity big-o

To start with be careful about using the word "faster" when talking about Big-O complexity as it's done in the question title. Big-O says nothing about how fast an algorithm is. Big-O only tells how execution time changes when some variable N changes. Example:

O(1) algorithm   : Doubling N will not change execution timeO(N) algorithm   : Doubling N will double execution time (1 + 1 = 2) O(N^2) algorithm : Doubling N will quadruple execution time (2 * 2 = 4)

Also notice that for some N values a O(N^2) algorithm may be faster than an O(N) algorithm. Big-O doesn't tell anything about that. All we can say is that if we keep increasing N then sooner or later the O(N) algorithm will be faster than the O(N^2) algorithm. But Big-O doesn't tell what that value of N is. It can be for N=1, N=10, N=100, ... So be careful about "translating" Big-O complexity into "fast".

Big-O is calculated as the number of times you need to perform some basic operation (an O(1) operation) as function of the variable N. Further, Big-O (normally) looks at worst case scenarios.

For an ordinary array the only basic operation we can perform is to look-up the value at some index i in the range 1..N

In a sorted array the returned value can be used to tell us three things (see later paragraph for exceptions):

Is the value less than the value we are searching for
Is the value greater than the value we are searching for
Is the value equal to the value we are searching for

Now remember that Big-O is about worst case scenarious. So number 3 won't happen unless we are looking in a range with only one array element. So forget about number 3 for now.

Because the array is sorted the relevant answers can be translated into

The search value is in range 1 .. i
The search value is in range i+1 .. N

Now the question is: How to select the best value of i for the first look up?

Since Big-O is a worst case calculation, we shall always use the largest of the two ranges for the next step. To make the largest range as small as possible, we need to make the two ranges the same size. To do that we need i to be equal N/2. This is simply the best we can do with the knowledge we have from the look-up.

By doing that we have

The search value is in range 1 .. N/2
The search value is in range N/2+1 .. N

So in the next step we need to look in a range with N/2 elements.

Now apply the same again (i.e. i = N/2/2) for the next search to get down to N/4 element. Do it again to get to N/8 elements and so...

Repeat that until there is 1 element in the range - then we have the solution (number 3 above).

So each look-up reduce the range into half of it's original size. And k look-up will reduce the range by 2^k. Our target is to get a range size of 1 so we need to solve:

N / 2^k = 1 <=> N = 2^K <=> K = log₂(N)

So from the assumption that all we can know from a look-up is whether our search value is to the left or right of the look-up position and from the fact that Big-O is worst case calculated, we can see that the search in a sorted array can't be any better than log(N) complexity.

The above covers the general array but notice that for some arrays the assumption may not hold. Sometimes an algorithm may extract more information from the look-up'ed value at index i. For instance if we know something about the distribution of values in the array and our search value is far from the look-up value, an algorithm may benefit from doing something else in the next look-up than doing the next look-up at N/2 and thereby be more efficient.

arrays algorithm time-complexity big-o

If you do not know preliminary info about keys distribution, really, your search is O(log(n)), because of each comparison you extract 1 bit if information, and reduce search area twice.But, for practical cases, you can search in sorted array much fastest. Fro instance, see the Interpolation search, it is just O(log(log(n))).

CodeHunter

Why is it impossible to find a specified value in a sorted array faster than O(log n)?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last