Skip to content

1.4.0 — Tartan

Compare
Choose a tag to compare
@Morwenn Morwenn released this 09 Apr 10:32
e71ed9d

It was Tartan Day a few days ago, the perfect occasion to wear your kilt or anything tartan. Combined with a surprise carbonade flamande with a friend over the weekend, I had all the energy required to finally fix a remaining bug and push a new cpp-sort release.

This release brings a single new feature, split_sorter that we will describe later, and a bunch of bug fixes, tiny library improvements, and tooling improvements. The development was somewhat painful because benchmarking split_sort lead me to an existing bug in drop_merge_sort, so I analyzed it, fixed it, wrote a more general test and found bugs in other sorting algorithms, then in measures of presortedness, and even in tooling while I was doing further testing... The rabbit hole went ever deeper, but it eventually came to an end and I could finally produce that single-feature release.

New features

A split_sorter has been added to the library, implementing the "in-place" variant of the SplitSort algorithm described in Splitsort—an adaptive sorting algorithm by Levcopoulos and Petersson. split_sort is a comparison-based unstable sorting algorithm that might require O(n) extra memory to speed up an internal merge step, but works fine with O(1) extra memory nonetheless.

The algorithm is close from drop-merge sort: it computes an approximation of a longest non-decreasing subsequence, excludes the elements that are not part of it, sorts them (currently with pdqsort), and merges the two sequences. The graph below shows how it performs compared to drop-merge sort when sorting a collection with an increasing number of elements not in their place:

While it shares many similarities with drop-merge sort, they still have a few differences:

  • SplitSort only works with random-access iterators.
  • While it uses O(n) extra memory to merge some elements, it can run perfectly fine with O(1) extra memory. Drop-merge sort on the other hand won't work with O(1) extra memory.
  • The results above show that drop-merge sort is better when few elements aren't in place, but SplitSort has a lower overhead on random data while still performing better than most general-purpose sorting algorithms when the data is already somewhat sorted.

split_sort offers a tool similar to drop_merge_sort, but with different tradeoffs.

Bug fixes

  • The measures of presortedness exc, ham and max now correctly handle duplicate values (#143).
  • Fixed issues due to self-move in the following algorithms (#141): block_sort and drop_merge_sort.
  • Fixed sorting of non-SSO-optimized std::string with the following algorithms (#142): block_sort, drop_merge_sort and grail_sort.

Improvements

  • Nested hybrid_adapter now automatically unwraps (thanks to @notfoundry for this!). To understand why this feature is interesting, consider the following sorter:

    using sorter = cppsort::hybrid_adapter<
        cppsort::hybrid_adapter<
            forward_sorter,
            random_access_sorter
        >,
        bidirectional_sorter
    >;

    When called on forward or bidirectional iterators, sorter will use the appropriate sorter. However, without the unwrapping, sorter can never call random_access_sorter because the outer sorter considers the iterator category of the inner sorter, which in this case is std::forward_iterator_tag. With the new unwrapping ability, the sorter above should be equivalent to the following one:

    using sorter = cppsort::hybrid_adapter<
        forward_sorter,
        random_access_sorter
        bidirectional_sorter
    >;

    Therefore it will call random_access_sorter as expected when passed random-access iterators.

  • Avoid Clang -Wc++1z-extensions warnings when using [[fallthrough]] while not in C++17.

  • Small optimizations in block_sort, drop_merge_sort, grail_sort, schwartz_adapter and tim_sort.

  • schwartz_adapter now simply calls the adapted sorter when passed identity as a projection.

Tooling

Some efforts were made to improve the build times, notably because of the increasingly high build times of the continuous integration:

  • Use mtime_cache on the CI server to finally have proper incremental builds.
  • When building on Linux with Clang, the testsuite now uses LLD for linking.
  • Only download relevant packages needed for each build in Travis.

Other tooling changes:

  • The testsuite is now tested on Windows with a 64-bit MinGW-w64 on the CI server.
  • The testsuite doesn't use cotire anymore since it didn't seem to make a difference in build times.
  • We now define proper flags for warnings depending on the compiler when building the testsuite.
  • More tests to cover more potential issues.
  • Fix several issues with the CI reporting failing tests as passing (most notably with ubsan and Valgrind).
  • Ship a suppression file to avoid Valgrind false positives with OSX on the CI server.
  • Add a small tool - test_failing_sorter.cpp - which I often use and tweak to investigate failing sorters.

Known bugs

I didn't manage to fix every bug I could find since the previous release, so you might want to check the list of known bugs. I still don't have a great version hygiene for those, but I promise that it will improve with the time.