Skip to content

1.14.0 — Tantanmen

Compare
Choose a tag to compare
@Morwenn Morwenn released this 18 Dec 19:29
29b593a

DOI

I am writing those lines shortly after coming back from the local ramen restaurant, where I had the pleasure to enjoy edamame and tantanmen for the first time in a while. After eight months and and two patch releases, the long-awaited cpp-sort 1.14.0 is finally out. One of the reasons already mentioned in previous release notes is the work on version 2.0.0, which is far from being ready, but already had a positive effect on the overall quality of the library.

Worry not, for this long time spent in the making was not in vain: this version brings more new features to the table than usual with a clear focus on orthogonality, allowing to make the most of existing features. Another highlight of this release is a focus on documentation: I tried to make it more amenable with new power words, a lots of small changes and also a new quickstart page.

New features

New adapters: split_adapter and drop_merge_adapter (#148)

split_adapter: the library has shipped split_sorter since version 1.4.0, a sorter base don an algorithm that works by isolating an approximation of a longest non-decreasing subsequence in a O(n) pass in the left part of the collection to sort, and the out-of-place elements in the right part of the collection. The right part is then sorted with pdq_sort, and both parts are merged together to yield a sorted collection.

split_adapter is basically the same, except that it accepts any bidirectional sorter to replace pdq_sort in the algorithm. As such it acts as a pre-sorting algorithm that can be plugged on top of any compatible sorter to make it Rem-adaptive.

drop_merge_adapter: this is the adapter version of drop_merge_sorter, allowing to pass in any sorter instead of relying on pdq_sort. Much like SplitSort, drop-merge sort works by isolating out-of-place elements, sorting them with another sorting algorithm, and merging them back together. Unlike SplitSort it places the out-of-place elements in a contiguous memory buffer, allowing to use any sorter to sort them.

While philosophically close from each other, the two algorithms use different heuristics to isolate out-of-place elements, leading to a different behaviour at runtime: for the same fallback algorithm, drop_merge_adapter is faster than split_sorter when there are few out-of-place elements in the collection to sort, but becomes slower when the number of out-of-place elements grows.

Graph showing the speed difference between heap_sort raw, then adapted with split_adapter and drop_merge_adapter, when the number of inversions in the vector<int> to sort increases

Graph showing the speed difference between quick_sort raw, then adapted with split_adapter and drop_merge_adapter, when the number of inversions in the list<int> to sort increases

As can be seen in the benchmarks above, split_adapter and drop_merge adapter offer different advantages: split_adapter takes a bit less advantage of existing presortedness than drop_merge_adapter, but its cost it very small even when there are lots of inversions. This can be attributed to drop_merge_adapter allocating memory for the out-of-place elements while split_adapter does everything in-place.

When sorting a std::list however, drop_merge_adapter becomes much more interesting because the allocated memory allows to perform the adapted sort into a contiguous memory buffer, which tends to be faster.

New utilities: sorted_indices and apply_permutation (#200)

utility::sorted_indices is a function object that takes a sorter and returns a new function object. This new function object accepts a collection and returns an std::vector containing indices of elements of the passed collection. These indices are ordered in such a way that the index at position N in the returned vector corresponds to the index where to find in the original collection the element that would be at position N in the sorted collection. It is quite similar to numpy.argsort.

std::vector<int> vec = { 6, 4, 2, 1, 8, 7, 0, 9, 5, 3 };
auto get_sorted_indices_for = cppsort::utility::sorted_indices<cppsort::smooth_sorter>{};
auto indices = get_sorted_indices_for(vec);
// Displays 6 3 2 9 1 8 0 5 4 7
for (auto idx: indices) {
    std::cout << *it << ' ';
}

utility::apply_permutation accepts a collection, and permutes it according to a collection of indices in linear time. It can be used together with sorted_indices to bring a collection in sorted order in two steps. It can even be used to apply the same permutation to several collections, allowing to deal with structure-of-array patterns where the information is split in several collections.

// Names & ages of persons, where the identity of the person is the
// same index in both collections
std::vector<std::string> names = { /* ... */ };
std::vector<int> ages = { /* ... */ };

auto get_sorted_indices_for = cppsort::utility::sorted_indices<cppsort::poplar_sorter>{};
auto indices = get_sorted_indices_for(names); // Get sorted indices to sort by name
// Bring persons in sorted order
cppsort::utility::apply_permutation(names, auto(indices));
cppsort::utility::apply_permutation(ages, indices);

Note the use of C++23 auto() copy above: apply_permutation mutates both of the collections it accepts, so the indices have to be copied in order to be reused.

New utilities: sorted_iterators and indirect (#42, #200)

utility::sorted_iterators is a function object that takes a sorter and returns a new function object. This new function object accepts a collection and returns an std::vector containing iterators to the passed collection in a sorted order. It can be thought of as a kind of sorted view of the passed collection - as long as said collection does not change. It can be useful when the order of the original collection must be preserved, but operations have to be performed on the sorted collection.

std::list<int> li = { 6, 4, 2, 1, 8, 7, 0, 9, 5, 3 };
auto get_sorted_iterators_for = cppsort::utility::sorted_iterators<cppsort::heap_sorter>{};
const auto iterators = get_sorted_iterators_for(li);
// Displays 0 1 2 3 4 5 6 7 8 9
for (auto it: iterators) {
    std::cout << *it << ' ';
}

The new projection indirect dereferences its parameter and returns the result. It can be used together with standard range algorithms to perform operations on the vector of iterators returned by sorted_iterators.

std::forward_list<int> fli = /* ... */;
auto get_sorted_iterators_for = cppsort::utility::sorted_iterators<cppsort::poplar_sorter>{};
const auto iterators = get_sorted_iterators_for(fli);
auto equal_bounds = std::ranges::equal_range(iterators, 42, {}, cppsort::utility::indirect{});

New sorter: d_ary_heap_sorter

d_ary_heap_sorter implements a heapsort variant based on a d-ary heap, a heap where each node has D children - instead of 2 for a binary heap. This number of children nodes can be passed as an integer template parameter to the sorter and its associated sort function. Storing more nodes makes the algorithm a bit more complicated but improves the locality of reference, which can affect execution speed.

std::vector<int> vec = { /* ... */ };
cppsort::d_ary_heap_sort<4>(vec); // d=4

The complexity is the same as that of the library's heap_sorter. It is however worth nothing that unlike its binary counterpart, d_ary_heap_sorter does not implement a bottom-up node searching strategy.

Benchmark showing the difference in cycles per elements when sorting a one-million element vector with either heap_sort or several flavours of d_ary_heap_sort with different values for D

The benchmark above shows how the value picked for D might influence the speed of the algorithm. It also shows that d_ary_heap_sort is currently much slower than heap_sort, though the former is not really optimized and has room for improvements.

Bug fixes

  • Fixed slab_sort when sorting collections where several elements compare equivalent to the median element (#211).
  • Fixed a potential use-after-move bug in quick_merge_sort.

Improvements

Algorithmic & speed improvements:

  • slab_sort now falls back to using insertion sort on small enough collections in its melsort pass. This change leads to speed improvements when sorting shuffled data that does not exhibit obvious patterns.
    Benchmark showing the speed of slab_sort for different data patterns of one million double elements in cpp-sort 1.13.2 vs 1.14.0
  • low_moves_sorter<n>: removed O(n) comparisons in the generic case.
    Benchmark showing the difference in speed of low_moves_sorter when sorting 1 to 30 shuffled elements in cpp-sort 1.13.2 vs 1.14.0

Other improvements:

  • Add more audit assertions around some internal merge operations. Audits are expensive assertions which can be enabled by defining the preprocessor macro CPPSORT_ENABLE_AUDITS.
  • When CPPSORT_ENABLE_AUDITS is defined, the library's internal assumptions (e.g. __builtin_assume) are now turned into assertions, providing additional checks to diagnose errors.
  • schwarz_adapter<schwarz_adapter<Sorter>> is now functionally equivalent to schwarz_adapter<Sorter>, save for the parameters accepted by the constructor. It makes it actually work for more than just a few accidentally valid cases.
  • operator| for projections was changed as follows: when one of the operators is utility::identity or std::identity, it returns the other projection directly.

Tooling

Documentation:

  • Added a quickstart page to the documentation (#203).
  • Added a note to the library nomenclature page, making it explicit that words that appear in italics all over the documentation refer to words or power described in that page.
  • Added new power words to the library nomenclature:
    • Unified sorting interface
    • Equivalent elements
  • Added notes about utility::size, utility::iter_move and utility::iter_swap being removed in cpp-sort 2.0.0.
  • Removed a note about probe::max being noticeably slower on bidirectional and forward iterators. This note was there because of calls to std::distance that could potentially be linear, potentially changing the documented complexity of the algorithm. I reviewed the algorithm and realized that those calls were always performed on random-access iterators and thus in O(1) time, making the note unneeded.
  • Regenerated the Small array sorters section of the benchmarks page, which did not reflect recent changes and additions to small array sorters.

Benchmarks:

  • Improvements to the small-array benchmark:
    • The labels can now contain spaces.
    • Compute and display cycles per element instead of total number of cycles.
    • Display the median of the results for a given collection size instead of the average.
    • Let matplotlib choose the best position for the legend to appear.
    • Generate results for merge_exchange_network_sorter.
    • Don't generate benchmarks for collections of size 0.
  • Small improvements to the inversions benchmark:
    • The labels can now contain spaces.
    • The legend preserves the order of the sorters used in the C++ file.
    • Move average computation to the python file.

Miscellaneous:

  • Improved forward compatibility of conanfile.py with Conan v2. The library now requires a more recent Conan version to be packaged.
  • Bump the Catch2 version downloaded by the test suite to v3.2.1.

Known bugs

I didn't manage to fix every bug I could find since the previous release, so you might want to check the list of known bugs.