Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The PartialSort operator can be improved #840

Open
theodorzoulias opened this issue Jul 5, 2022 · 18 comments
Open

The PartialSort operator can be improved #840

theodorzoulias opened this issue Jul 5, 2022 · 18 comments

Comments

@theodorzoulias
Copy link

theodorzoulias commented Jul 5, 2022

Hi! I ran some tests with the PartialSort operator, and it seems that it uses the comparer more frequently than necessary. For most items it should be enough to check only if the item is smaller that the smallest of the top items, but instead it performs a binary-search operation invariably for each item.

I have posted my findings on StackOverflow, in this answer. It seems that the implementation of the PartialSort could be optimized.

@viceroypenguin
Copy link
Contributor

@theodorzoulias - Nice analysis.

Two things: 1. In the future, when doing benchmarks, recommend that you use BenchmarkDotNet; 2. If you want to write a PR for this on my fork SuperLinq, I'll gladly take it in. If not, I'll rely on this analysis to update the code myself shortly.

@theodorzoulias
Copy link
Author

@viceroypenguin thanks for the quick response. I know about the BenchmarkDotNet, and I've used it, once. I realized that I don't have the patience to wait for half an hour for a simple benchmark, nor I like the idea of my PC getting tortured while I am rolling my thumbs, so I gave up with it. I am not aiming to precise measurements anyway. Getting a rough idea of how something performs is usually enough for me. Which obviously makes me not the right person for writing PRs for this high-profile project. 😃

@viceroypenguin
Copy link
Contributor

No worries! I'll write the changes myself then. Thanks again for the analysis! :)

@viceroypenguin
Copy link
Contributor

@theodorzoulias - For the record, SortedSet<T> does not accurately implement .PartialSort(). .PartialSort() will work with the following call: new[]{1, 1, 2, 2, 3, 3, 4, }.PartialSort(3) and return [1, 1, 2]. However, SortedSet<T>, working with a set (duplicate entries are ignored), will return [1, 2, 3]. I'll write some more benchmarks to figure out if I should just switch to the LINQ methods or if there is a better way.

@viceroypenguin
Copy link
Contributor

After testing with BDN, it looks like the original version in MoreLINQ, and the modified in SuperLinq (to have a stable sort) are generally the fastest versions. Please see here for the benchmark results and the code used. Once N gets up to about 1000, then the LINQ version does get faster.

@theodorzoulias
Copy link
Author

@viceroypenguin ohh, the SortedSet<T> doesn't accept duplicate values. That was dumb. I think that it can be fixed by attaching an incremented long to each T element, to make it unique. The implementation below should replicate correctly the existing behavior of the PartialSort operator.

public static IEnumerable<T> PartialSort<T>(
    this IEnumerable<T> source,
    int count,
    IComparer<T> comparer = default,
    OrderByDirection direction = OrderByDirection.Ascending)
{
    ArgumentNullException.ThrowIfNull(source);
    if (count < 1) throw new ArgumentOutOfRangeException(nameof(count));
    comparer ??= Comparer<T>.Default;
    if (direction == OrderByDirection.Descending)
    {
        var ascendingComparer = comparer;
        comparer = Comparer<T>.Create((x, y) => ascendingComparer.Compare(y, x));
    }
    SortedSet<(T Item, long Index)> bottom = new(Comparer<(T Item, long Index)>.Create((x, y) =>
    {
        int result = comparer.Compare(x.Item, y.Item);
        if (result == 0) result = Comparer<long>.Default.Compare(x.Index, y.Index);
        return result;
    }));
    long index = 0;
    foreach (var item in source)
    {
        index++;
        if (bottom.Count < count) { bottom.Add((item, index)); continue; }
        if (comparer.Compare(item, bottom.Max.Item) > 0) continue;
        bool removed = bottom.Remove(bottom.Max);
        bool added = bottom.Add((item, index));
        Debug.Assert(removed && added);
    }
    foreach (var entry in bottom) yield return entry.Item;
}

@theodorzoulias
Copy link
Author

@viceroypenguin in order to magnify the effect of doing more comparisons than necessary, you could use long string values as keys in the benchmarks. Strings don't have a trivial comparison cost like integers do. Especially if you throw a StringComparer.InvariantCultureIgnoreCase in the mix!

@viceroypenguin
Copy link
Contributor

Good call checking with a longer key comparison. Please check the gist now with updated code and results. You might specifically review the changes I made to SortedSet() method compared to yours, since your version does not handle the keySelector version of PartialSort(). It appears with a longer comparison time, your SortedSet<T> version is quicker than all of them by far. I'll finish making modifications and import to SuperLinq soon.

@viceroypenguin
Copy link
Contributor

viceroypenguin commented Jul 6, 2022

Fixed in Superlinq (viceroypenguin/SuperLinq@32f0241)

@theodorzoulias
Copy link
Author

theodorzoulias commented Jul 7, 2022

@viceroypenguin Nice! Btw I don't know if you are allowed to use .NET 6 components in the library. If you do, the PriorityQueue<TElement, TPriority> might be preferable to the SortedSet<T>. Internally it's an array-backed heap, so it's less allocatey than the binary tree-based SortedSet<T>. This difference comes into play especially when the count is large. I came up with this implementation:

public static IEnumerable<T> PartialSort<T>(
    this IEnumerable<T> source,
    int count,
    IComparer<T> comparer = default,
    OrderByDirection direction = OrderByDirection.Ascending)
{
    ArgumentNullException.ThrowIfNull(source);
    if (count < 1) throw new ArgumentOutOfRangeException(nameof(count));
    comparer ??= Comparer<T>.Default;
    if (direction == OrderByDirection.Ascending)
    {
        var originalComparer = comparer;
        comparer = Comparer<T>.Create((x, y) => originalComparer.Compare(y, x));
    }
    PriorityQueue<bool, (T Item, long Index)> top = new(Comparer<(T Item, long Index)>.Create((x, y) =>
    {
        int result = comparer.Compare(x.Item, y.Item);
        if (result == 0) result = Comparer<long>.Default.Compare(y.Index, x.Index);
        return result;
    }));
    long index = 0;
    foreach (var item in source)
    {
        if (top.Count < count)
            top.Enqueue(default, (item, index++));
        else
            top.EnqueueDequeue(default, (item, index++));
    }
    List<T> topList = new(top.Count);
    while (top.TryDequeue(out _, out var entry)) topList.Add(entry.Item);
    for (int i = topList.Count - 1; i >= 0; i--) yield return topList[i];
}

It's not perfect because the PriorityQueue<TElement, TPriority> carries a TElement along with the TPriority, that I didn't need in this implementation, so I put there a dummy bool. Also it doesn't offer a way to enumerate it in reverse order, so finally I had to extract its contents into a List<T> before yielding. But overall it should be slightly better than a SortedSet<T>.

The PriorityQueue<TElement, TPriority> allows elements with the same priority, so attaching a long is only needed if you want to make the sort stable. Otherwise you could just use a PriorityQueue<bool, T> top = new(comparer), and make it both simpler and faster.

@viceroypenguin
Copy link
Contributor

Surprisingly, I found that PriorityQueue<,> is slower than the SortedSet<> implementation at all sizes, though within the margin of error. I think having to reverse the elements is likely the cause. Regardless, I'm going to leave the current implementation as is for now. Thanks for the update! :)

@viceroypenguin
Copy link
Contributor

@theodorzoulias FYI: SuperLinq 4.0.0 has been fully released with this improvement.

@MisinformedDNA
Copy link

My first benchmark:

public class PartialSortVsNative
{
    private const int N = 1_000_000;
    private readonly List<int> data = new(N);

    public PartialSortVsNative()
    {
        var random = new Random();
        for (int i = 0; i < N; i++)
            data.Add(random.Next());
    }

    [Benchmark(Baseline = true)]
    public List<int> Native() => data.OrderBy(x => x).Take(5).ToList();

    [Benchmark]
    public List<int> MoreLinqPartialSort() => MoreLinq.MoreEnumerable.PartialSort(data, 5).ToList();

    [Benchmark]
    public List<int> SuperLinqPartialSort() => SuperLinq.SuperEnumerable.PartialSort(data, 5).ToList();
}

results in

Method Mean Error StdDev Ratio RatioSD
Native 26.52 ms 0.524 ms 1.009 ms 1.00 0.00
MoreLinqPartialSort 17.13 ms 0.341 ms 0.379 ms 0.64 0.04
SuperLinqPartialSort 19.94 ms 0.396 ms 0.472 ms 0.75 0.04

@MisinformedDNA
Copy link

With added randomness for the take count:

public class PartialSortVsNative
{
    private const int N = 1_000_000;
    private readonly List<int> data = new(N);
    private readonly int TakeCount;

    public PartialSortVsNative()
    {
        var random = new Random();
        for (int i = 0; i < N; i++)
        {
            data.Add(random.Next());
        }
        TakeCount = random.Next(N);
    }

    [Benchmark(Baseline = true)]
    public List<int> Native() => data.OrderBy(x => x).Take(TakeCount).ToList();

    [Benchmark]
    public List<int> MoreLinqPartialSort() => MoreLinq.MoreEnumerable.PartialSort(data, TakeCount).ToList();

    [Benchmark]
    public List<int> SuperLinqPartialSort() => SuperLinq.SuperEnumerable.PartialSort(data, TakeCount).ToList();
}

// * Summary *

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19042.2006/20H2/October2020Update)
AMD Ryzen 5 PRO 4650U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.400
[Host] : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
DefaultJob : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2

Method Mean Error StdDev Ratio RatioSD
Native 177.9 ms 4.91 ms 14.48 ms 1.00 0.00
MoreLinqPartialSort 839.3 ms 15.65 ms 15.37 ms 4.33 0.37
SuperLinqPartialSort 1,223.1 ms 24.08 ms 53.87 ms 6.94 0.59

@viceroypenguin
Copy link
Contributor

Couple things:

  1. Nit: The proper way to do setup for a benchmark is to use a [GlobalSetup] public void Setup() method. That said, I'm not sure if there is a practical difference between the two other than explicitness.
  2. Making the TakeCount random is not an apples-apples comparison, because BenchmarkDotNet will run the setup once for each method, and then run the method thousands of times to generate statistics. This means that TakeCount could be 5 for Native and 50_000 for SuperLinqPartialSort, and vice versa on the next benchmark.
  3. For similar reasons, it is important to use a fixed seed to Random so that the same randomly generated values are used for each version of the test. It may be that one run is closer to sorted than the next run, which would affect variances between the different tests.

Updating your code accordingly, I get the following results:

// * Summary *

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22622.601)
Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2

Job=.NET 6.0 Runtime=.NET 6.0

Method Mean Error StdDev Ratio RatioSD
Native 22.95 ms 0.377 ms 0.504 ms 1.00 0.00
MoreLinqPartialSort 16.86 ms 0.332 ms 0.310 ms 0.74 0.03
SuperLinqPartialSort 16.82 ms 0.330 ms 0.308 ms 0.74 0.02

@viceroypenguin
Copy link
Contributor

PS: You can also look at my old benchmarks here

@viceroypenguin
Copy link
Contributor

@MisinformedDNA I did some more research, and found a place where we were unnecessarily adding performance delays. Please see viceroypenguin/SuperLinq#67 for the fix, which is currently being released with SuperLinq v4.4.0. SuperLinq should now be as fast as MoreLinq in the int case, and faster in the string case.

@theodorzoulias
Copy link
Author

theodorzoulias commented Oct 21, 2022

My first benchmark:

@MisinformedDNA my expectation is that MoreLinq's List<T> top-based implementation should be quite performant when the int count is small (less than 50 or so), and it should become progressively slower when the int count increases. That's because it calls the List<T>.Insert method, which is an O(n) operation. On the contrary the SortedSet<T>.Add (in viceroypenguin's SortedSet<T> top-based implementation) is O(log n).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants