Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple sorter options #207

Open
Morwenn opened this issue Aug 19, 2022 · 0 comments
Open

Simple sorter options #207

Morwenn opened this issue Aug 19, 2022 · 0 comments
Labels
Milestone

Comments

@Morwenn
Copy link
Owner

Morwenn commented Aug 19, 2022

Lots of sorters in the library have constexpr variables corresponding to various options:

  • drop_merge_sort has double_comparison and recency.
  • Lots of sorters have a size threshold under which they use another algorithm to sort small collections.
  • spread_sort has a slew options configuring bucket size and whatnot

All those options currently exist as details and are forced at compile time, but users have no control over them. I would like to expose several of these options so that users can change them freely. A simple way would be to create foobar_sorter_options objects, simple structs that can be passed to sorters at compile time thanks to NTTP.

Here is an example of what it could look like for drop_merge_sort:

constexpr cppsort::drop_merge_sorter_options options = {
    .recency = 8,
    .double_comparison = true
};
auto sort = cppsort::drop_merge_sorter<options>{};

Alternatively:

constexpr cppsort::drop_merge_sorter_options options = ;
auto sort = cppsort::drop_merge_sorter<{
    .recency = 8,
    .double_comparison = true
}>{};

The combination of NTTP and designated initializers would allow to pass all simple options in an option structure while naming them, which is IMO a superior solution to a sea of unnamable template parameters.


List of affected sorters and options.:

  • adaptive_shivers_sort: ???
  • cartesian_tree_sort: ???
  • counting_sort: ???
  • drop_merge_sort
    • double_comparison (bool, default true): TODO
    • recency (int, default 8): TODO
  • float_spread_sort: ???
  • grail_sort: ???
  • heap_sort: ???
  • insertion_sort: ???
  • integer_spread_sort: ???
  • mel_sort: ???
  • merge_insertion_sort: ???
  • merge_sort: ???
  • pdq_sort:
    • insertion_sort_threshold (int, default 24): partitions below this size are sorted using insertion sort.
    • ninther_threshold (int, default 128): partitions above this size use Tukey's ninther to select the pivot.
    • partial_insertion_sort_limit (int, default 8): when we detect an already sorted partition, attempt an insertion sort that allows this amount of element moves before giving up.
    • block_size (int, default 64): must be multiple of 8 due to loop unrolling, and < 256 to fit in unsigned char.
    • cacheline_size (int; default 64): cacheline size, assumes power of two.
  • poplar_sort: ???
  • quick_merge_sort:
    • Size before switching to small sort (int)
  • quick_sort: ???
  • selection_sort: ???
  • ska_sort: ???
  • slab_sort: ???
  • smooth_sort: ???
  • spin_sort: ???
  • split_sort: ???
  • spread_sort: ???
  • std_sort: nothing
  • string_spread_sort: ???
  • tim_sort: ???
  • verge_sort: ???
  • wiki_sort: ???

The list in the previous section is almost empty at the time of writing these words, and yet it already highlights some issues and design decisions to take into account:

  • Several configuration constant are currently of type difference_type. This is a problem when passing options to the sorter because difference_type is only known when the iterator type is known, which happens after template instantiation. Passing options to the sorter itself means that these options should be agnostic of the iterator type. For now I think that using int for what should always be small constants should be consensual enough, especially since difference_type is assumed to be constructible from int.
  • Buffered sorters already have a template parameter, options would likely go second.
  • Some sorters have different size limits before switching to a small collection sort. Either just go with forward_small_sort_limit, bidirectional_small_sort_limit, or invent a mechanism accepting an iterator tag and with a default option (can always default to forward iterator tag I guess).
  • Spread sorters will likely have some compatible and some non-compatible options: have a shared options structure and more options structures inheriting from the main one.
  • Pass more complicated options to the sorter's constructor? Anything with mutable parameters should go down that road.
  • Sorter adapters could likely have options too when it makes sense. Options are embedded in the template parameter of the adapted sorter, so that problem is at least a solved one.
  • We can probably reuse some option objects to pass nested compile time options the following way (assuming pdq_sorter_options object is used):
    constexpr cppsort::drop_merge_sorter_options options = ;
    auto sort = cppsort::drop_merge_sorter<{
        .recency = 8,
        .double_comparison = true,
        .pdq_sort_options = {
            .insertion_sort_threshold = 24,
            .ninther_threshold = 128
        }
    }>{};
@Morwenn Morwenn added this to the 2.0.0 milestone Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant