Given an array, can I find in O(n) the longest range, whose endpoints are the greatest values in the range? Given an array, can I find in O(n) the longest range, whose endpoints are the greatest values in the range? arrays arrays

Given an array, can I find in O(n) the longest range, whose endpoints are the greatest values in the range?


I believe that the following algorithm runs in O(n) time and produces an optimal solution. It's a bit tricky, so I'll split the logic into two cases, one in which we consider the case where there are no duplicate values in the array, and one in which we considered a case where duplicates are allowed.

The key data structure we'll be using is the Cartesian tree, which is a max-heap built out of a range of data with the property that an inorder walk of the nodes of the tree produce the values of the sequence in the order in which they appear. The critical detail is that it's possible to build a Cartesian tree for a sequence of elements in O(n) time.

As an example, given the sequence 4 1 0 3 2, we would get this Cartesian tree:

        4         \          3         / \        1   2         \          0

Notice that the inorder walk does indeed give back 4 1 0 3 2.

I now claim that the endpoints of the range we're looking for must be a parent/child pair in the Cartesian tree (that is, either the first node is a parent of the second node or vice-versa). Note that we are not looking for a node and any ancestor, but specifically a parent/child pair. To prove this, I prove the following two claims:

Claim 1: Any parent/child pair in the Cartesian tree defines a range of values in the original sequence that does not have any intermediary values greater than the endpoints.

Claim 2: Any range of values in the sequence that does not have any intermediary values greater than its endpoints must have those endpoints as a parent/child pair in the Cartesian tree.

Taken together, these collectively prove that the range we're looking for must be a parent/child pair in the Cartesian tree. This then gives us an easy linear-time algorithm for solving the problem when all the values are distinct. First, in O(n) time, build up a Cartesian tree for the range. Next, recursively explore the tree, and for each parent/child pair, find the distance between those pairs, taking the largest range we find. This second step also runs in O(n), since a Cartesian tree is a binary tree and so the number of edges is O(n).

Proof of claim one: A parent/child pair must look like either

 u  \   v  / \ A   B

or

    u   /  v / \A   B

Recall that a Cartesian tree stores its elements in a way that an inorder traversal of the elements of the tree yield the elements of the array used to produce the tree in the order in which they appear in the array. This means that in case (1), we're looking at a range of elements starting with u, containing all of A, and concluding with b. Similarly, in case (2), the range starts with v, then contains all of B, then terminates with u. We prove the claim that u and v have no values in-between them that are larger than either u or v by contradiction. Suppose this were the case and that the value w is bigger than either u or v. By the definition of a Cartesian tree, if w is in-between u and v in the original sequence, then in case (1) it must be in subtree A and in case (2) it must be in subtree B. But a Cartesian tree is a max-heap, and so in case (1) both u and v are bigger than everything in A, and in case (2) both u and v are bigger than everything in B, a contradiction. Thus the range of values between any parent/child pair must be no larger than either the parent or the child.

Proof of Claim 2: By contrapositive; we show that if there is a subsequence of the original array beginning with u and ending with v that contains an element larger w than either u or v, then u and v cannot be a parent/child or child/parent pair in the Cartesian tree. The proof is virtually identical to above - if u and v were a parent/child pair, then in case (1) above w would have to be in A and in case (2) w would have to be in B, in both cases violating the fact that a Cartesian tree is a max-heap.

The above proofs show us that if all the values are distinct, we can solve this problem in linear time by just building a Cartesian tree and doing a simple tree walk. But what happens if the elements are allowed to be duplicates, as in your original problem statement? In that case, we can use a modified Cartesian tree structure. We'll allow nodes to be combined into a "metanode" of multiple different Cartesian tree values that share the same value. For example, the sequence 2 7 1 7 8 0 3 8 2 would look like this:

                  [8 ------- 8]                  / \         \                 /   3         2                /   /               /   0              /          [7 -- 7]            / \           2   1

Here, the root is a metanode consisting of two nodes with value 8, and the first 8 metanode consists of two 7 metanodes.

For notational purposes, let the "leftest" node of a metanode be the node furthest to the left in the metanode, and let the "rightest" node of a metanode be the node furthest to the right in a metanode.

In this modified Cartesian tree, we can define an inorder walk of the nodes as one that visits all of the nodes in a metanode in left-to-right order, doing an inorder walk of each of the nodes it contains. We then enforce that the modified Cartesian tree a strict max-heap (every node must be strictly greater than any of its children) and that an inorder walk of the tree yields the values in the same order in which they appear in the original sequence.

Notice that this generalized construction contains our original solution as a special case - if all the values are distinct, nothing is different in this tree structure. I'll also state without proof that it's possible to build up one of these structures in O(n) by a straightforward modification of the standard Cartesian tree algorithm.

In this modified tree structure, a range of values with no intervening values at least as large as the endpoints corresponds to either:

  1. A parent and the rightest node of its left child.
  2. A parent and the leftest node of its right child.
  3. Two adjacent nodes in the same metanode.

The proof of the first two claims is pretty much a straightforward modification of the proof for the above case. The difference is that instead of the contradiction proof saying that it would violate the max-heap property, instead you would claim that it violates the strong max-heap property. You would also say that any equal values in the middle of the range would have to manifest themselves as nodes that would prevent the range endpoints from being leftest or rightest nodes in a metanode. The proof of the third claim is (roughly) that any nodes in-between the two nodes must be strictly smaller than the endpoints, and if there were an equal value in the middle then the two nodes wouldn't be adjacent metanodes.

Given this observation, we can solve the original problem in O(n) time as follows. First, build up a generalized Cartesian tree from the input range. Next, walk the tree and look at all of the indicated pairs of elements that might be the range, picking the largest one that you find. Since for each node only a constant number of other nodes need to be checked (its parent, left and right siblings, and children give at most five nodes to check), this can be done in O(n) time for a net runtime of O(n).

Whew! That was an awesome problem. I don't know if this is an ideal solution, but it at least proves that it's possible to do this in O(n) time. If someone comes up with a better answer, I'd love to see it.


A simple stack based solution. Iterate over the array left to right, with the stack holding elements (technically, indexes, but use values for comparisons) that are either:

  1. The largest from the left (i.e. no larger or equal element between the start of the array and the element)
  2. The largest since the previous element on the stack.

When processing the next element x, pop elements smaller than x as long as they are from the category 2. above, then push x on the stack. Obviously you need to keep the current max to be able to discern between category 2 and 1 in constant time.

The processing above is O(n) - each element can be pushed at most once and popped at most once. Having the final stack, just check neighboring pairs (on the stack) - one of the pairs is the solution. This too is O(n).

Here's a picture with what should stay on the stack (the fat rectangles) after the whole scan of the array:

Stack-bars

(There's a small mistake in the above picture: fourth bar from the left should either be thick or drawn shorter than the first one, sorry)

Why this works: the final stack contains all elements of the input array that are not between any two larger elements (I've skipped the case of an element between two equal elements) . The solution is obviously a neighboring pair of such elements.

Here's an implementation in Python:

from collections import namedtupleE = namedtuple('E', 'i x')def maxrange(iterable):    stack = [E(0, None)]*2 # push sentinel values    maxsofar = None    top = lambda: stack[-1] # peek at the top element on the stack    for i, x in enumerate(iterable):        while top().x < x and top().x < maxsofar:            stack.pop()        stack.append(E(i, x)) # push        maxsofar = max(maxsofar, x)    return max(b.i-a.i for a,b in zip(stack, stack[1:]))

Example:

>>> maxrange([2,1,3])2


Rafals solution is good, but we can do without the stack and so save some memory.Here is a short and efficient implementation in O(n) time:

def highDist(seq):    res, ltr, rtl = 0, 0, 0    for i in range(len(seq)):        if seq[i] >= seq[ltr]:            res = max(res, i-ltr)            ltr = i        if seq[-i-1] >= seq[-rtl-1]:            res = max(res, i-rtl)            rtl = i    return res

Run on the example input:

>>> print highDist([0, 10, 8, 9, 6, 7, 4, 10, 0])6>>> print highDist([0, 10, 8, 9, 6, 7, 4, 9, 0])4>>> print highDist([0, 10, 8, 9, 6, 7, 4, 7, 0])2>>> print highDist([])0

The trick is, that if we have two points a and b s.t. everything between them is smaller than them, then the maximum distance we are searching for, is either |b-a| or entirely outside the range. Hence if we partition the entire sequence that way, one of them is the range we are searching for.

We can make the partition easily be creating an upsequence from each end.