Skip to content

Writing a sorter

Morwenn edited this page Nov 21, 2015 · 34 revisions

Someday, you might want to use a specific sorting algorithm that does not exist in cpp-sort (I am pretty sure there are plenty of those) but still benefit from the tools avaiable in this library. In order to do so, you would have to wrap this algorithm into a sorter. In this tutorial, we will see the quirks involved in writing a good sorter to wrap a sorting algorithm. Writing a basic sorter is easy, but getting everything right to the details might be a bit tricky from time to time. This guide should help you to make the good choices (well, at least I hope so).

This tutorial is mostly a collection of good practice, tips, and design considerations. For a full step-by-step tutorial to write a sorter, you can read Writing a bubble_sorter.

Concepts

Without talking about C++ concepts, cpp-sort sorters imply the knowledge of a range of concepts, each of which implies design decisions and things to take into account. Basically, these concepts can be converted into a series of questions to answer before designing a sorter:

  • Is it a fixed-size sorter? Is it meant to sort a collection of arbitrary size or a fixed number of values?
  • Which category of iterators does it work with? Can it be used to sort an std::forward_list?
  • Does it implement a comparison sort? Does it handle arbitrary comparison functions?
  • If it is a comparison sorter, does it handle projections too?
  • Is it a type-specific sorter? If so, which types does it work with?
  • Is it stable?

This tutorial describes what should be taken into account when writing a sorter, depending on the answers given to the previous questions. Some things are obvious, some are not, some may be a bit too clever in the end. If something ever seems too clever, not obvious enough or outright broken - or simply if you think that it could use a bit more detail -, don't hesitate to submit an issue.

The most basic sorter

Let's put fixed-size sorters aside for a while since they are a really special breed of sorters and concentrate on sorters meant to sort collections of any size. Let's assume that we already have the following selection_sort algorithm, which implements a selection sort in the most straightforward way, taking a pair of iterators like many standard library algorithms, and sorting the corresponding range in-place:

template<typename ForwardIterator>
auto selection_sort(ForwardIterator first, ForwardIterator last)
    -> void
{
    for (auto it = first ; it != last ; ++it)
    {
        auto selection = std::min_element(it, last);
        std::iter_swap(selection, it); 
    }
}

In order to use it with cpp-sort, we will need to wrap it into a sorter object. Doing so is actually trivial, here is the sorter:

struct selection_sorter_impl
{
    template<typename ForwardIterator>
    auto operator()(ForwardIterator first, ForwardIterator last) const
        -> void
    {
        selection_sort(first, last);
    }
};

struct selection_sorter:
    cppsort::sorter_facade<selection_sorter_impl>
{};

We wrote what we call a sorter implementation and wrapped it into sorter_facade, a class template designed to provide additional features to a given sorter implementation. We only had to write an operator() that takes a pair of iterators and forwards it to selection_sort, yet the resulting selection_sorter also has an operator() overload which can be passed a full collection instead of a pair of iterators, and it is additionally convertible to function pointers of type void(*)(Iterator, Iterator) and void(*)(Iterable&). This is what sorter_facade gives you for even the most basic sorter, without any effort.

Now, let's define a set of rules to apply when writing sorters. These rules don't have to be enforced, but enforcing them will ensure that a sorter will work smoothly with every tool available from the library. In this tutorial, every section will define a small set of rules instead of defining all of them at once without introducing the relevant concepts first. Fortunately, the simpler the sorter, the simpler the rules.

Rule 1.1: for any sorter, std::is_sorted called without a comparison function on the resulting range shall return true (note that this is not exactly true: floating point numbers are an example of types that will almost always cause problems).

Rule 1.2: a sorter shall be callable with either a pair of iterators or an iterable.

Rule 1.3: a sorter implementation shall provide at least an overload of operator() that takes a pair of iterators. Overloads of operator() taking an iterable can be provided as well when they add value to the sorter (optimization, correctness, better error messages...) but should never totally replace the overload taking a pair of iterators.

Rule 1.4: sorters should be immutable and the overloads of operator() should explicitly be marked as const (make sure to check twice: forgetting to const-qualify them can cause hundreds of cryptic error messages).

Category of iterators

When writing a sorter, one of the most importants things to consider is the category of iterators it is meant to work with. It directly influences the kinds of collections that the sorter will be able to sort. Sorters implement in-place sorting algorithms, therefore they can only sort forward iterators or more specific types. cpp-sort does more than documentation the sorter category a sorter is supposed to work with: it actually embeds the information into spiecialized sorter traits. If we take the selection_sorter from the previous section, we can document its properties as follows:

namespace cppsort
{
    typename<>
    struct sorter_traits<::selection_sorter>
    {
        using iterator_category = std::forward_iterator_tag;
        static constexpr bool is_stable = false;
    };
}

The standard library's iterator tags are used to document the iterator category supported by the sorter (stability is also documented but we'll come back to that later). It is a bit useful for error messages, but some other tools from the library rely of this information. For example hybrid_adapter can take several sorters with different iterator categories and generate a new sorter that will call the appropriate sorter depending on the iterator category of the passed collection:

using sorter = cppsort::hybrid_adapter<
    forward_sorter,
    bidirectional_sorter,
    random_access_sorter
>;

sorter{}(std::vector<int>{});        // calls random_access_sorter
sorter{}(std::forward_list<int>{});  // calls forward_sorter
sorter{}(std::list<int>{});          // calls bidirectional_sorter

Say you have several sorters with the same iterator category (std::forward_iterator_tag) but you know that one of them is better with forward iterators while the other one is better when it comes to random-access and bidirectional iterators. You can use rebind_iterator_category so that the second sorter is considered as a bidirectional sorter instead:

using sorter = cppsort::hybrid_adapter<
    forward_sorter1,
    cppsort::rebind_iterator_category<
        forward_sorter2,
        std::bidirectional_iterator_tag
    >
>;

The iterator category supported by a given sorter is not only there for documentation. You have tools to play with it and tools that actually need the information, so make sure to get it right when designing your sorters.

Rule 2.1: the iterator category of a sorter shall correspond to the least specific iterator category that the sorter can handle.

Comparison sorters

Most sorting algorithms are actually comparison sorts. It means that, to sort the elements of a collection, they repeatedly use a comparison functions that returns whether two elements are aleady in order. The standard library's std::sort implicitly uses an ADL-found operator< to compare two elements, but it also provides an overload which takes a user-provided comparison function to compare two elements. cpp-sort follows this design and allows its sorters to take an additional parameter for user-provided comparison functions. Let's write a sorter implementation to wrap the three-parameter overload of std::sort:

struct std_sorter_impl
{
    template<
        typename RandomAccessIterator,
        typename Compare = std::less<>
    >
    auto operator()(RandomAccessIterator first, RandomAccessIterator last
                    Compare compare={}) const
        -> void
    {
        std::sort(first, last, compare);
    }
};

struct std_sorter:
    cppsort::sorter_facade<std_sorter_impl>
{};

Compared to the previous selection_sorter_impl, the only things we add to add was a template parameter defaulted to std::less<> and a default-contructed (when not provided) parameter to operator(). As usual, sorter_facade generates a bunch of additional features: it still adds the overload of operator() taking a single iterable, but also adds an overload taking an iterable and a comparison function. Basically, it makes sure that you always ony have to provide a single overload of operator() and generates all the other ones as well as all the corresponding conversions to function pointers.

This kind of comparison sorters help to compare things that don't have an overload operator< or to compare things differently. For example, passing std::greater<> to a sorter instead of std::less<> sorts a collection in descending order:

// Sort collection in reverse order with std::sort
cppsort::sort(collection, std_sorter{}, std::greater<>{});

The rules for comparison sorters are but an extension to the rules defined for basic sorters. You will see that they are very similar.

Rule 3.1: a comparison sorter is also a sorter, which means that it shall be called without a comparison function and shall obey all the rules defined for regular sorters.

Rule 3.2: for any comparison sorter called with a specific comparison function, std::is_sorted called with the same comparison function on the resulting collection shall return true.

Rule 3.3: calling a comparison sorter with std::less<>, with std::less<T> (where T is the type of the elements of the collection to sort) or without a comparison function shall be strictly equivalent: calling std::is_sorted without a comparison function on the resulting collection shall return true.

Rule 3.4: a comparison sorter which can be called with a collection and a comparison function shall also be callable with two corresponding iterators and the same comparison function.

Rule 3.5: the comparison parameter always comes after the parameters corresponding to the collection to sort.

Handling projections

The Ranges TS introduces the notion of callable projections, borrowed from the Adobe Source Libraries. A projection is a callable function that can be passed to an algorithm so that it "views" the value to be compared differently. Algorithms generally apply the projection on-the-fly to the values when they are compared. For example, std::negate<> could be used to sort a collection of integers in descending order. Let's assume that our selection_sort algorithm from a while has been given a fourth parameter to handle projections; here is the corresponding sorter implementation:

struct selection_sorter_impl
{
    template<
        typename ForwardIterator,
        typename Compare = std::less<>,
        typename Projection = cppsort::utility::identity,
        typename = std::enable_if_t<cppsort::is_projection_iterator<
            Projection, ForwardIterator, Compare
        >>
    >
    auto operator()(ForwardIterator first, ForwardIterator last,
                    Compare compare={}, Projection projection={}) const
        -> void
    {
        selection_sort(first, last, compare, projection);
    }
};

As you can see, extending the sorter to handle projections is similar to extending it to support comparisons. Note that this sorter is both a comparison sorter and a projection sorter, but a sorter could also handle projections without handling comparisons. While default comparison function object is std::less, the equivalent default function object for projections is utility::identity (and probably std::identity in a future revision of the C++ standard): it simply returns the passed value as is.

The only subtle trick in the example above is the use of is_projection_iterator: this trait is used disambiguate comparison functions from projection functions when a sorter can be called with both at once and is not needed when the sorter is only a projection sorter. It is actually some kind of concept check and should go away when concepts make their way to the standard.

Note that most of the algorithms (actually, every projection sorter provided by the library) transform the projection parameter with utility::as_function before actually using it. This small utility allows to use pointers to member data as projections; here is an example of how it can be used:

struct wrapper { int value; }
std::vector<wrapper> vec = { {5}, {9}, {6}, {1}, {2}, {8}, {3}, {0}, {7}, {4} };
cppsort::sort(vec, selection_sorter{}, &wrapper::value);

Thanks to that small trick, the selection_sorter will sort vec, using the member data wrapper::value instead of a full wrapper instance (which cannot be compared) to perform the comparisons on. The rules for projection sorters are really close to the ones for comparison sorters and there are additiona rules when a sorter is both at once. In the following section, assume a projection-enhanced std::is_sorted, even if such a function is not standard yet.

Rule 4.1: a projection sorter is also a sorter, which means that it shall be called without a projection function and shall obey all the rules defined for regular sorters.

Rule 4.2: a sorter can be both a comparison sorter and a projection sorter at once. Such a sorter shall also obey all the rules defined for sorters, for comparison sorters and for projection sorters.

Rule 4.3: for any projection sorter called with a specific projection function, std::is_sorted called with std::less<> and the same projection function on the resulting collection shall return true. If the projection sorter is also a comparison sorter, for any such sorter called with a specific pair of comparison and projection function, std::is_sorter called with the same pair of functions on the resulting collection shall return true.

Rule 4.4: calling a projection sorter with utility::identity or without a projection function shall be strictly equivalent: calling std::is_sorted without a comparison function and without a projection function on the resulting collection shall return true. If the projection sorter is also a comparison sorter, calling such a sorter with any valid combination of std::less<> and utility::identity, and calling it without any additional function should be strictly equivalent.

Rule 4.5: a projection sorter which can be called with a collection and a projection function shall also be callable with two corresponding iterators and the same projection function. If the projection sorter is also a comparison sorter, it shall also be callable with an iterable, a comparison function and a projection function.

Rule 4.6: the projection parameter always comes after the parameters corresponding to the collection to sort. If the projection sorter is also a comparison sorter, the projection parameter always comes after the comparison parameter.

Non-comparison sorters

TODO: partial comparison sorters and remind rules

Type-specific sorters

TODO: SFINAE on supported type and hybrid_adapter

Stability

TODO: more about sorter traits

Fixed-size sorters

TODO: rules, broken design and low_projections_sorter example


Old tutorial below, will be rewritten later.

Writing a generic sorter

Let's say you have the following selection_sort algorithm and you want to write a sorter to wrap it:

template<
    typename ForwardIterator,
    typename Compare = std::less<>
>
void selection_sort(ForwardIterator first, ForwardIterator last,
                    Compare compare={})
{
    for (auto it = first ; it != last ; ++it)
    {
        auto selection = std::min_element(it, last, compare);
        std::iter_swap(selection, it); 
    }
}

Wrapping this algorithm into a selection_sorter object to benefit from some of the power of cpp-sort is actually rather trivial. Here is the corresponding sorter:

struct selection_sorter
{
    template<
        typename ForwardIterator,
        typename Compare = std::less<>
    >
    auto operator()(ForwardIterator first, ForwardIterator last,
                    Compare compare={}) const
        -> void
    {
        selection_sort(first, last, compare);
    }
};

As you can see, you simply have to provide an operator() overload that forwards its values to the sorting function. If your sorter supports custom comparison functions, make sure that it can also be called without a comparison function. The reason is that a ComparisonSorter should also be usable wherever a Sorter is usable.

However, while somehow usable, the class in the the previous example does not satisfy the Sorter concept because it cannot be passed an iterable. Instead of adding another overload by hand, we will wrap the class into sorter_facade which will provide the missing operator() overloads. It additionally provides fancy conversions to function pointers.

struct selection_sorter_impl
{
    template<
        typename ForwardIterator,
        typename Compare = std::less<>
    >
    auto operator()(ForwardIterator first, ForwardIterator last,
                    Compare compare={}) const
        -> void
    {
        selection_sort(first, last, compare);
    }
};

struct selection_sorter:
    cppsort::sorter_facade<selection_sorter_impl>
{};

Note that you can still provide operator() overloads equivalent to those provided by sorter_facade. For example, if the sorting algorithm you are wrapping allows an additional size parameter as a hint (computing the size of a ForwardIterable may be needed but expensive), you can add the following operator() overload to your sorter, and it should properly hide the corresponding overload in sorter_facade:

template<
    typename ForwardIterable,
    typename Compare = std::less<>
>
auto operator()(ForwardIterable& iterable, Compare compare={}) const
    -> void
{
    selection_sort(iterable, compare, utility::size(iterable));
}

Sorter traits

While the selection_sorter we wrote should work with most of the library, a little bit of work is still needed so that it can benefit from hybrid_adapter, which is subjectively the one of the most interesting components in this library. hybrid_adapter::operator() needs to know the iterator category of the sorters it aggregates so that it can dispatch a call to the most suitable sorter. You can provide this information by specializing sorter_traits for your sorter:

namespace cppsort
{
    template<>
    struct sorter_traits<::selection_sorter>
    {
        using iterator_category = std::forward_iterator_tag;
        static constexpr bool is_stable = false;
    };
}

Note that sorter_traits also contains information about the stability of the sorting algorithm. This information is currently unused in the library but still provided for every available sorter and sorter adapter.

Writing a type-specific sorter

While most sorting algorithms are comparison sorts designed to work with any type for which a total order exists (given the appropriate comparison function, which tends to be std::less<> by default), it is also possible to write sorters for algorithms that do not implement a comparison sort. In this chapter, we will see how to make a sorter to wrap a counting sort.

A couting sort is not a comparison sort since it does not really care that much about which element is greater than another, but relies on integer an array arithmetics to perform the sort. That said, it being a non-comparison sort means that it cannot be given an arbitrary comparison function to perform the sort. The naive implementation for such a sorter remains trivial:

struct counting_sorter_impl
{
    template<typename ForwardIterator>
    auto operator()(ForwardIterator first, ForwardIterator last) const
        -> void
    {
        counting_sort(first, last);
    }
};

struct counting_sorter:
    cppsort::sorter_facade<counting_sorter_impl>
{};

The appropriate operations will still be generated by sorter_facade and you should not experiment a problem until you try to feed a comparison function to this sorter. That being said, non-comparison sorts tend to work only with collections of some specific types since they generally use the properties of these types to sort a collection. For example, our counting_sorter only works with built-in integer types, but this information currently does not appear anywhere in the interface.

While it could be possible to trigger a hard error with a static_assert, using SFINAE to trigger a soft error instead is actually more interesting:

struct counting_sorter_impl
{
    template<typename ForwardIterator>
    auto operator()(ForwardIterator first, ForwardIterator last) const
        -> std::enable_if_t<
            std::is_integral<
                typename std::iterator_traits<ForwardIterator>::value_type
            >::value
        >
    {
        counting_sort(first, last);
    }
};

struct counting_sorter:
    cppsort::sorter_facade<counting_sorter_impl>
{};

Using such a soft error mechanism instead of a hard one will make it possible to use your sorter together with hybrid_adapter (provided you specialized sorter_traits accordingly) so that you can make it fallback to a generic sorting algorithm when a collection not handled by your sorter is given to the resulting sorter:

// This sorter will use couting_sorter if a collection
// of integers is given to it, and will fallback to
// inplace_merge_sorter otherwise
using generic_sorter = cppsort::hybrid_adapter<
    counting_sorter,
    cppsort::inplace_merge_sorter
>;

Rules and guarantees

Enforcing everything from the library side is rather difficult and cpp-sort remains rather permissive. That said, if you don't want to be surprised when implementing your sorters, you should at least follow a small set of rules...

  • For any sorter called without a comparison function, std::is_sorted called without a comparison function on the resulting collection shall return true (note that this is not exactly true: floating point numbers are an example of types that will almost always cause problems).
  • For any sorter called with a specific comparison function, std::is_sorted called with the same comparison function on the resulting collection shall return true.
  • Calling a sorter with std::less<>, with std::less<T> (where T is the type of the elements of the collection to sort) or without a comparison function should be strictly equivalent: calling std::is_sorted without a comparison function on the resulting collection should return true.
  • Every sorter shall be callable without a comparison function.
  • Every sorter which can be called with a collection and a comparison function shall also be callable with two corresponding iterators and the same comparison function.

Some parts of the library may still work even if these rules are not enforced, but enforcing them should protect you against surprising behaviour.

Improving type-specific sorters

The rules above allow to safely break from the black & white world of comparison and non-comparison sorters, and to write sorters that accept only some specific comparison functions. To support that model, sorter_facade adds overloads to operator() which take an std::less<> or an std::less<T> instance to non-comparison sorters so that any non-comparison sorter can also be called with std::less<>, so you don't have to do that by yourself either.

Now, assume that our counting_sort algorithm also has an evil twin to which we will give the name of reverse_counting_sort. The rules defined in the previous section make it pretty clear that creating a reverse_counting_sorter would be a bad idea since sorters called without a comparison function are supposed to sort a collection in ascending order. Note however that, after having reverse-sorted a collection of integers, the following assertion should hold:

assert( std::is_sorted(std::begin(collection), std::end(collection), std::greater<>{}) );

It means that we can actually safely add reverse_counting_sort to counting_sorter by adding an overload that takes an instance of std::greater<> and still satisfy all the aforementioned rules. Here is the overload:

template<typename ForwardIterator>
auto operator()(ForwardIterator first, ForwardIterator last, std::greater<>) const
    -> std::enable_if_t<
        std::is_integral<
            typename std::iterator_traits<ForwardIterator>::value_type
        >::value
    >
{
    reverse_counting_sort(first, last);
}

For the sorter to be really complete, it would need to take an equivalent overload for std::greater<T>. Unfortunately, sorter_facade does not automate the generation of such overloads for you (yet?), so you are bound to write that boilerplate by yourself if you really want it. Now, the hybrid_adapter from a while ago just got a bit more powerful:

// This sorter will use couting_sorter if a random-access collection of
// integers is given to it with no comparison function, with std::less,
// or with std::greater and will fall back to inplace_merge_sorter in
// every other case
using generic_sorter = cppsort::hybrid_adapter<
    counting_sorter,
    cppsort::inplace_merge_sorter
>;