Is there a way to measure how sorted a list is? Is there a way to measure how sorted a list is? arrays arrays

Is there a way to measure how sorted a list is?


You can simply count the number of inversions in the list.

Inversion

An inversion in a sequence of elements of type T is a pair of sequence elements that appear out of order according to some ordering < on the set of T's.

From Wikipedia:

Formally, let A(1), A(2), ..., A(n) be a sequence of n numbers.
If i < j and A(i) > A(j), then the pair (i,j) is called an inversion of A.

The inversion number of a sequence is one common measure of its sortedness.
Formally, the inversion number is defined to be the number of inversions, that is,

definition

To make these definitions clearer, consider the example sequence 9, 5, 7, 6. This sequence has the inversions (0,1), (0,2), (0,3), (2,3) and the inversion number 4.

If you want a value between 0 and 1, you can divide the inversion number by N choose 2.

To actually create an algorithm to compute this score for how sorted a list is, you have two approaches:

Approach 1 (Deterministic)

Modify your favorite sorting algorithm to keep track of how many inversions it is correcting as it runs. Though this is nontrivial and has varying implementations depending on the sorting algorithm you choose, you will end up with an algorithm that is no more expensive (in terms of complexity) than the sorting algorithm you started with.

If you take this route, be aware that it's not as simple as counting "swaps." Mergesort, for example, is worst case O(N log N), yet if it is run on a list sorted in descending order, it will correct all N choose 2 inversions. That's O(N^2) inversions corrected in O(N log N) operations. So some operations must inevitably be correcting more than one inversion at a time. You have to be careful with your implementation. Note: you can do this with O(N log N) complexity, it's just tricky.

Related: calculating the number of “inversions” in a permutation

Approach 2 (Stochastic)

  • Randomly sample pairs (i,j), where i != j
  • For each pair, determine whether list[min(i,j)] < list[max(i,j)] (0 or 1)
  • Compute the average of these comparisons and then normalize by N choose 2

I would personally go with the stochastic approach unless you have a requirement of exactness - if only because it's so easy to implement.


If what you really want is a value (z') between -1 (sorted descending) to 1 (sorted ascending), you can simply map the value above (z), which is between 0 (sorted ascending) and 1 (sorted descending), to this range using this formula:

z' = -2 * z + 1


The traditional measure of how sorted a list (or other sequential structure) is, is the number of inversions.

The number of inversions is the number of pairs (a,b) st index of a < b AND b << a. For these purposes << represents whatever ordering relation you choose for your particular sort.

A fully sorted list has no inversions, and a completely reversed list has the maximum number of inversions.


You can use actual correlation.

Suppose that to each item in the sorted list, you assign an integer rank starting from zero. Note that a graph of the elements position index versus rank will look like dots in a straight line (correlation of 1.0 between the position and rank).

You can compute a correlation on this data. For a reverse sort you will get -1 and so on.