Skip to content

Measures of presortedness

Morwenn edited this page Jan 28, 2016 · 29 revisions

Also known as measures of disorder, the measures of presortedness are algorithms used to tell how a collection is already sorted, or how much disorder there is in a collection. Some adaptative sorting algorithms are known to take advantage of the order already present in a collection, and happen to be "optimal" with regard to some measures of presortedness.

In cpp-sort, measures of presortedness are implemented as instances of some specific function objects; they take a collection or a pair of iterators and return how much disorder there is in the collection according to the measure. Just like sorters, measures of presortedness can handle custom comparison and projection functions, and with the same degree of freedom when it comes to how they can be called:

using namespace cppsort;
auto a = probe::inv(collection);
auto b = probe::rem(vec.begin(), vec.end());
auto c = probe::ham(li, std::greater<>{});
auto d = probe::runs(integers, std::negate<>{});

Note however that these can be expensive algorithms. Using them before a sorting algorithm has no interest at all; these are meant to be profiling tools: when sorting is a critical part of your application, you can use these measures on typical data and check whether it is mostly sorted according to one measure or another, then you may be able to find a sorting algorithm known to be optimal with regard to this specific measure.

Every measure of presortedness lives in the subnamespace cppsort::probe. Even though all of them are available in their own header, it is possible to include all of them at once with the following include:

#include <cpp-sort/probes.h>

Measures of presortedness are pretty formalized, so the names of the functions are short and correspond to the ones used in the litterature. Moreover, as per this litterature, we will use the symbols X to represent the analyzed iterable, and n to represent the size of the iterable.

Dis

#include <cpp-sort/probes/dis.h>

Computes the maximum distance determined by an inversion. Its worst case returns n - 1 when X is sorted in reverse order.

Complexity Memory Iterators
1 Forward

Warning: this algorithm might be noticeably slower when the passed iterable is not random-access.

Exc

#include <cpp-sort/probes/exc.h>

Computes the minimum number of exchanges required to sort X, which corresponds to n minus the number of cycles in the collection. A cycle corresponds to a number of elements in a collection that need to be rotated to be in their sorted position; for example, let {2, 4, 0, 6, 3, 1, 5} be a collection, the cycles are {0, 2} and {1, 3, 4, 5, 6} so Exc(X) = n - 2 = 5. There is always at least one cycle in any collection, so Exc has a worst case of n - 1 when every element in X is one element away from its sorted position.

Complexity Memory Iterators
n log n n Forward

Warning: this algorithm might be noticeably slower when the passed iterable is not random-access.

Ham

#include <cpp-sort/probes/ham.h>

Computes the number of elements in X that are not in their sorted position, which corresponds to the Hamming distance between X and its sorted permutation. Its worst case returns n when every element in X is one element away from its sorted position.

Complexity Memory Iterators
n log n n Forward

Inv

#include <cpp-sort/probes/inv.h>

Computes the number of inversions in X, where an inversion corresponds to a pair (a, b) of elements not in order. For example, the collection {2, 1, 3, 0} has 4 inversions: (2, 1), (2, 0), (1, 0) and (3, 0). Its worst case returns n * (n - 1) / 2 when X is sorted in reverse order.

Complexity Memory Iterators
1 Forward

Max

#include <cpp-sort/probes/max.h>

Computes the maximum distance an element in X must travel to find its sorted position. Its worst case returns n - 1 when X is sorted in reverse order.

Complexity Memory Iterators
n log n n Forward

Warning: this algorithm might be noticeably slower when the passed iterable is not random-access.

Osc

#include <cpp-sort/probes/osc.h>

Computes the Oscillation measure described by Levcopoulos and Petersson in Adaptative Heapsort. Its worst case returns (n * (n - 2) - 1) / 2 when the values in X are strongly oscillating.

Complexity Memory Iterators
1 Forward

Rem

#include <cpp-sort/probes/rem.h>

Computes the minimum number of elements that must be removed from X to obtain a sorted subsequence, which corresponds to n minus the size of the longest ascending run in X. Its worst case returns n - 1 when X is sorted in reverse order.

Complexity Memory Iterators
n 1 Forward

Warning: this algorithm might be noticeably slower when the passed iterable is not random-access.

Runs

#include <cpp-sort/probes/runs.h>

Computes the number of ascending runs in X minus one. Its worst case returns n - 1 when X is sorted in reverse order.

Complexity Memory Iterators
n 1 Forward