Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PriorityQueue<T> to Collections #14032

Closed
ebickle opened this issue Jan 30, 2015 · 318 comments
Closed

Add PriorityQueue<T> to Collections #14032

ebickle opened this issue Jan 30, 2015 · 318 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Milestone

Comments

@ebickle
Copy link

ebickle commented Jan 30, 2015

See LATEST Proposal in corefxlab repo.

Second Proposal options

Proposal from https://github.com/dotnet/corefx/issues/574#issuecomment-307971397

Assumptions

Elements in priority queue are unique. If they are not, we would have to introduce 'handles' of items to enable their update/remove. Or the update/remove semantics would have to apply to first/all, which is weird.

Modeled after Queue<T> (MSDN link)

API

public class PriorityQueue<TElement, TPriority>
    : IEnumerable,
    IEnumerable<(TElement element, TPriority priority)>,
    IReadOnlyCollection<(TElement element, TPriority priority)>
    // ICollection not included on purpose
{
    public PriorityQueue();
    public PriorityQueue(IComparer<TPriority> comparer);

    public IComparer<TPriority> Comparer { get; }
    public int Count { get; }
    public bool IsEmpty { get; }

    public bool Contains(TElement element);

    // Peek & Dequeue
    public (TElement element, TPriority priority) Peek(); // Throws if empty
    public (TElement element, TPriority priority) Dequeue(); // Throws if empty
    public bool TryPeek(out TElement element, out TPriority priority); // Returns false if empty
    public bool TryDequeue(out TElement element, out TPriority priority); // Returns false if empty

    // Enqueue & Update
    public void Enqueue(TElement element, TPriority priority); // Throws if it is duplicate
    public void Update(TElement element, TPriority priority); // Throws if element does not exist
    public void EnqueueOrUpdate(TElement element, TPriority priority);
    public bool TryEnqueue(TElement element, TPriority priority); // Returns false if it is duplicate (does NOT update it)
    public bool TryUpdate(TElement element, TPriority priority); // Returns false if element does not exist (does NOT add it)
    
    public void Remove(TElement element); // Throws if element does not exist
    public bool TryRemove(TElement element); // Returns false if element does not exist

    public void Clear();

    public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();

//
// Selector part
//
    public PriorityQueue(Func<TElement, TPriority> prioritySelector);
    public PriorityQueue(Func<TElement, TPriority> prioritySelector, IComparer<TPriority> comparer);

    public Func<TElement, TPriority> PrioritySelector { get; }

    public void Enqueue(TElement element);
    public void Update(TElement element);
}

Open questions:

  1. Class name PriorityQueue vs. Heap
  2. Introduce IHeap and constructor overload? (Should we wait for later?)
  3. Introduce IPriorityQueue? (Should we wait for later - IDictionary example)
  4. Use selector (of priority stored inside the value) or not (5 APIs difference)
  5. Use tuples (TElement element, TPriority priority) vs. KeyValuePair<TPriority, TElement>
    • Should Peek and Dequeue rather have out argument instead of tuple?
  6. Is Peek and Dequeue throwing useful at all?

Original Proposal

Issue https://github.com/dotnet/corefx/issues/163 requested the addition of a priority queue to the core .NET collection data structures.

This post, while a duplicate, is intended to act the formal submission to the corefx API Review Process. The issue contents are the speclet for a new System.Collections.Generic.PriorityQueue type.

I will be contributing the PR, if approved.

Rationale and Usage

The .NET Base Class Libraries (BCL) currently lacks support for ordered producer-consumer collections. A common requirement of many software applications is the ability generate a list of items over time and process them in an order different from the order they were received in.

There are three generic data structures within the System.Collections hierarchy of namespaces that supported a sorted collection of items; System.Collections.Generic.SortedList, System.Collections.Generic.SortedSet, and System.Collections.Generic.SortedDictionary.

Of these, SortedSet and SortedDictionary are not appropriate for producer-consumer patterns that generate duplicate values. The complexity of SortedList is Θ(n) worst case for both Add and Remove.

A much more memory and time efficient data structure for ordered collections with producer-consumer usage patterns is a priority queue. Other than when capacity resizing is necessary, worse case insertion (enqueue) and remove top (dequeue) performance is Θ(log n) - far better than the existing options that exist in the BCL.

Priority queues have a wide degree of applicability across different classes of applications. The Wikipedia page on Priority Queues offers a list of many different well understand use cases. While highly specialized implementations may still require custom priority queue implementations, a standard implementation would cover a broad range of usage scenarios. Priority queues are particularly useful in scheduling the output of multiple producers, which is an important pattern in highly parallelized software.

It's worth noting that both the C++ standard library and Java offer priority queue functionality as part of their basic APIs.

Proposed API

namespace System.Collections.Generic
{
    /// <summary>
    /// Represents a collection of objects that are removed in a sorted order.
    /// </summary>
    /// <typeparam name="T">Specifies the type of elements in the queue.</typeparam>    
    [DebuggerDisplay("Count = {count}")]
    [DebuggerTypeProxy(typeof(System_PriorityQueueDebugView<>))]
    public class PriorityQueue<T> : IEnumerable<T>, ICollection, IEnumerable, IReadOnlyCollection<T>
    {
        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that uses a default comparer.
        /// </summary>
        public PriorityQueue();

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that has the specified initial capacity.
        /// </summary>
        /// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
        public PriorityQueue(int capacity);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that uses a specified comparer.
        /// </summary>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
        public PriorityQueue(IComparer<T> comparer);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that contains elements copied from the specified collection and uses a default comparer.
        /// </summary>
        /// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="collection"/> is null.</exception>
        public PriorityQueue(IEnumerable<T> collection);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that contains elements copied from the specified collection and uses a specified comparer.
        /// </summary>
        /// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentNullException">
        /// <paramref name="collection"/> is null. -or-
        /// <paramref name="comparer"/> is null.
        /// </exception>
        public PriorityQueue(IEnumerable<T> collection, IComparer<T> comparer);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class that is empty,
        /// has the specified initial capacity, and uses a specified comparer.
        /// </summary>
        /// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
        public PriorityQueue(int capacity, IComparer<T> comparer);

        /// <summary>
        /// Gets the <see cref="IComparer{T}"/> for the <see cref="PriorityQueue{T}"/>. 
        /// </summary>
        /// <value>
        /// The <see cref="T:System.Collections.Generic.IComparer{T}"/> that is used when
        /// comparing elements in the <see cref="PriorityQueue{T}"/>. 
        /// </value>
        public IComparer<T> Comparer 
        { 
            get;
        }

        /// <summary>
        /// Gets the number of elements contained in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <value>The number of elements contained in the <see cref="PriorityQueue{T}"/>.</value>
        public int Count 
        { 
            get;
        }

        /// <summary>
        /// Adds an object to the into the <see cref="PriorityQueue{T}"/> by its priority.
        /// </summary>
        /// <param name="item">
        /// The object to add to the <see cref="PriorityQueue{T}"/>. 
        /// The value can be null for reference types.
        /// </param>
        public void Enqueue(T item);

        /// <summary>
        /// Removes and returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <returns>The object with the lowest priority that is removed from the <see cref="PriorityQueue{T}"/>.</returns>
        /// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
        public T Dequeue();

        /// <summary>
        /// Returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
        public T Peek();

        /// <summary>
        /// Removes all elements from the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        public void Clear();

        /// <summary>
        /// Determines whether an element is in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <param name="item">
        /// The object to add to the end of the <see cref="PriorityQueue{T}"/>. 
        /// The value can be null for reference types.
        /// </param>
        /// <returns>
        /// true if item is found in the <see cref="PriorityQueue{T}"/>;  otherwise, false.
        /// </returns>
        public bool Contains(T item);

        /// <summary>
        /// Copies the elements of the <see cref="PriorityQueue{T}"/> to an  <see cref="T:System.Array"/>, 
        /// starting at a particular <see cref="T:System.Array"/> index.
        /// </summary>
        /// <param name="array">
        /// The one-dimensional <see cref="T:System.Array">Array</see> that is the
        /// destination of the elements copied from the <see cref="PriorityQueue{T}"/>. 
        /// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
        /// </param>
        /// <param name="arrayIndex">The zero-based index in <paramref name="array"/> at which copying begins.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
        /// <exception cref="T:System.ArgumentOutOfRangeException">
        /// <paramref name="arrayIndex"/> is less than zero. -or- 
        /// <paramref name="arrayIndex"/> is equal to or greater than the length of the <paramref name="array"/>
        /// </exception>
        /// <exception cref="ArgumentException">
        /// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
        /// greater than the available space from <paramref name="index"/> to the end of the destination
        /// <paramref name="array"/>.
        /// </exception>
        public void CopyTo(T[] array, int arrayIndex);

        /// <summary>
        /// Copies the elements of the <see cref="T:System.Collections.ICollection"/> to an 
        /// <see cref="T:System.Array"/>, starting at a particular <see cref="T:System.Array"/> index.
        /// </summary>
        /// <param name="array">
        /// The one-dimensional <see cref="T:System.Array">Array</see> that is the
        /// destination of the elements copied from the <see cref="PriorityQueue{T}"/>. 
        /// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
        /// </param>
        /// <param name="index">The zero-based index in <paramref name="array"/> at which copying begins.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="index"/> is less than zero.</exception>
        /// <exception cref="ArgumentException">
        /// <paramref name="array"/> is multidimensional. -or-
        /// <paramref name="array"/> does not have zero-based indexing. -or-
        /// <paramref name="index"/> is equal to or greater than the length of the <paramref name="array"/> -or- 
        /// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
        /// greater than the available space from <paramref name="index"/> to the end of the destination
        /// <paramref name="array"/>. -or- 
        /// The type of the source <see cref="T:System.Collections.ICollection"/> cannot be cast automatically 
        /// to the type of the destination <paramref name="array"/>.
        /// </exception>
        void ICollection.CopyTo(Array array, int index);

        /// <summary>
        /// Copies the elements stored in the <see cref="PriorityQueue{T}"/> to a new array.
        /// </summary>
        /// <returns>
        /// A new array containing a snapshot of elements copied from the <see cref="PriorityQueue{T}"/>.
        /// </returns>
        public T[] ToArray();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
        /// </summary>
        /// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
        public Enumerator GetEnumerator();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
        /// </summary>
        /// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
        IEnumerator<T> IEnumerable<T>.GetEnumerator();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <returns>An <see cref="T:System.Collections.IEnumerator"/> that can be used to iterate through the collection.</returns>
        IEnumerator IEnumerable.GetEnumerator();

        /// <summary>
        /// Sets the capacity to the actual number of elements in the <see cref="PriorityQueue{T}"/>, 
        /// if that number is less than than a threshold value.
        /// </summary>
        public void TrimExcess();

        /// <summary>
        /// Gets a value that indicates whether access to the <see cref="ICollection"/> is 
        /// synchronized with the SyncRoot.
        /// </summary>
        /// <value>true if access to the <see cref="T:System.Collections.ICollection"/> is synchronized
        /// with the SyncRoot; otherwise, false. For <see cref="PriorityQueue{T}"/>, this property always
        /// returns false.</value>
        bool ICollection.IsSynchronized
        {
            get;
        }

        /// <summary>
        /// Gets an object that can be used to synchronize access to the 
        /// <see cref="T:System.Collections.ICollection"/>.
        /// </summary>
        /// <value>
        /// An object that can be used to synchronize access to the 
        /// <see cref="T:System.Collections.ICollection"/>.
        /// </value>
        object ICollection.SyncRoot
        {
            get;
        }

        public struct Enumerator : IEnumerator<T>
        {
            public T Current { get; }
            object IEnumerator.Current { get; }
            public bool MoveNext();
            public void Reset();
            public void Dispose();
        }
    }
}

Details

  • Implementation data structure will be a binary heap. Items with a greater comparison value will be returned first. (descending order)
  • Time complexities:
Operation Complexity Notes
Construct Θ(1)
Construct Using IEnumerable Θ(n)
Enqueue Θ(log n)
Dequeue Θ(log n)
Peek Θ(1)
Count Θ(1)
Clear Θ(N)
Contains Θ(N)
CopyTo Θ(N) Uses Array.Copy, actual complexity may be lower
ToArray Θ(N) Uses Array.Copy, actual complexity may be lower
GetEnumerator Θ(1)
Enumerator.MoveNext Θ(1)
  • Additional constructor overloads that take the System.Comparison delegate were intentionally omitted in favor of a simplified API surface area. Callers can use Comparer.Create to convert a function or Lambda expression to an IComparer interface if necessary. This does require the caller to incur a one-time heap allocation.
  • Although System.Collections.Generic is not yet part of corefx, I propose that this class be added to corefxlab in the meantime. It can be moved to the primary corefx repository once System.Collections.Generic are added and there is consensus that its status should be elevated from experimental to an official API.
  • An IsEmpty property was not included, since there is no additional performance penalty calling Count. The majority of the collection data structures do not include IsEmpty.
  • The IsSynchronized and SyncRoot properties of ICollection were implemented explicitly as they are effectively obsolete. This also follows the pattern used for the other System.Collection.Generic data structures.
  • Dequeue and Peek throw an InvalidOperationException when the queue is empty to match the established behavior of System.Collections.Queue.
  • IProducerConsumerCollection was not implemented as its documentation states that it is only intended for thread-safe collections.

Open Questions

  • Is avoiding an additional heap allocation during calls to GetEnumerator when using foreach a strong enough rationale for including the nested public enumerator structure?
  • Should CopyTo, ToArray, and GetEnumerator return results in prioritized (sorted) order, or the internal order used by the data structure? My assumption is that the internal order should be returned, as it doesn't incur any additional performance penalties. However, this is a potential usability issue if a developer thinks of the class as a "sorted queue" rather a priority queue.
  • Does adding a type named PriorityQueue to System.Collections.Generic cause a potentially breaking change? The namespace is heavily used, and could cause a source compatibility problem for projects that include their own priority queue type.
  • Should items be dequeued in ascending or descending order, based on the output of IComparer? (my assumption is ascending order, to match the normal sorting convention of IComparer).
  • Should the collection be 'stable'? In other words, should two items with equal IComparison results be dequeued in the exact same order they are enqueued in? (my assumption is this isn't needed)

Updates

  • Fixed complexity of 'Construct Using IEnumerable' to Θ(n). Thanks @svick.
  • Added another option question regarding whether the priority queue should be ordered in ascending or descending order compared to the IComparer.
  • Removed NotSupportedException from explicit SyncRoot property to match behavior of other System.Collection.Generic types instead of using the newer pattern.
  • Made the public GetEnumerator method return a nested Enumerator struct instead of IEnumerable, similar to the existing System.Collections.Generic types. This is an optimization to avoid a heap (GC) allocation when using a foreach loop.
  • Removed ComVisible attribute.
  • Changed complexity of Clear to Θ(n). Thanks @mbeidler.
@svick
Copy link
Contributor

svick commented Jan 30, 2015

Operation Complexity
Construct Using IEnumerable Θ(log n)

I think this should be Θ(n). You need to at least iterate the input.

@ghost
Copy link

ghost commented Jan 31, 2015

+1

@anaisbetts
Copy link

Rx has a highly production-tested priority queue class:

https://github.com/Reactive-Extensions/Rx.NET/blob/master/Rx.NET/Source/System.Reactive.Core/Reactive/Internal/PriorityQueue.cs

@justinvp
Copy link
Contributor

Should a nested public enumerator structure be used to avoid an additional heap allocation during calls to GetEnumerator and when using foreach? My assumption is no, since enumerating over a queue is an uncommon operation.

I'd lean towards using the struct enumerator to be consistent with Queue<T> which uses a struct enumerator. Also, if a struct enumerator isn't used now, we won't be able to change PriorityQueue<T> to use one in the future.

@benaadams
Copy link
Member

Also perhaps a method for batch inserts? Can always then sort and continue from previous insertion point rather than starting at the beginning if that would help?:

    public void Enqueue(List<T> items) {
         items.Sort(_comparer);
         ... insertions ...
    }

@ebickle
Copy link
Author

ebickle commented Feb 7, 2015

I've tossed a copy of the initial implementation here. Test coverage is far from complete, but if anyone is curious please take a look and let me know what you think. I've tried to follow the existing coding conventions from the System.Collections classes as much as possible.

@justinvp
Copy link
Contributor

justinvp commented Feb 7, 2015

Cool. Some initial feedback:

  • Queue<T>.Enumerator implements IEnumerator.Reset explicitly. Should PriorityQueue<T>.Enumerator do the same?
  • Queue<T>.Enumerator uses _index == -2 to indicate the enumerator has been disposed. PriorityQueue<T>.Enumerator has the same comment but has an extra _disposed field. Consider getting rid of the extra _disposed field and use _index == -2 to indicate it's been disposed to make the struct smaller and to be consistent with Queue<T>
  • I think the static _emptyArray field can be removed and usage replaced with Array.Empty<T>() instead.

@justinvp
Copy link
Contributor

justinvp commented Feb 7, 2015

Also...

  • Other collections that take a comparer (e.g. Dictionary<TKey, TValue>, HashSet<T>, SortedList<TKey, TValue>, SortedDictionary<TKey, TValue>, SortedSet<T>, etc.) allow null to be passed in for the comparer, in which case Comparer<T>.Default is used.

@justinvp
Copy link
Contributor

justinvp commented Feb 7, 2015

Also...

  • ToArray can be optimized by checking for _size == 0 before allocating the new array, in which case just return Array.Empty<T>().

@ebickle
Copy link
Author

ebickle commented Feb 7, 2015

@justinvp Great feedback, thanks!

  • I implemented Enumerator.Reset explicitly since it's core, non-deprecated functionality of an enumerator. Whether or not it's exposed seems inconsistent across the collection types, and only a few use the explicit variant.
  • Removed the _disposed field in favor of _index, thanks! Tossed that in at the last minute that night and missed the obvious. Decided to keep the ObjectDisposedException for correctness with the newer collection types, even though the old System.Collections.Generic types don't use it.
  • Array.Empty is a F# feature, so can't use it here sadly!
  • Modified the comparer parameters to accept null, good find!
  • The ToArray optimization is tricky. Technically speaking arrays are mutable in C#, even when they have a length of zero. In reality you're right, the allocation isn't needed and it can be optimized. I'm leaning towards a more cautious implementation, in case there are side-effects I'm not thinking of. Semantically the caller will still expect that allocation, and it's a minor one.

@akoeplinger
Copy link
Member

@ebickle

Array.Empty is a F# feature, so can't use it here sadly!

Not anymore: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Array.cs#L1060-L1069

@ebickle
Copy link
Author

ebickle commented Feb 27, 2015

I've migrated the code over to ebickle/corefx in the issue-574 branch.

Implemented the Array.Empty() change and plugged everything into the regular build pipeline. One slight temporary kludge I had to introduce was having the System.Collections project depend on the System.Collections nuget package, as Comparer isn't open source yet.

Will be fixed once issue dotnet/corefx#966 is complete.

One key area I'm looking for feedback is how ToArray, CopyTo, and the enumerator should be handled. Currently they're optimized for performance, which means the target array is a heap and not sorted by the comparer.

There are three options:

  1. Leave it as-is, and document that the returned array is not sorted. (it's a priority queue, not a sorted queue)
  2. Modify the methods to sort the returned items/arrays. They will no longer be O(n) operations.
  3. Remove the methods and enumerable support altogether. This is the "purist" option but removes the ability to quickly grab the remaining queued items when the priority queue is no longer needed.

Another thing I'd like feedback on is whether the queue should be stable for two items with the same priority (compare result of 0). Generally priority queues make no guarantees that two items with the same priority will be dequeued in the order they were enqueued, but I noticed that internal priority queue implementation used in System.Reactive.Core took some additional overhead to ensure this property. My preference would be to not do this, but I'm not completely sure which option is better in terms developers' expectations.

@mbeidler
Copy link

I happened upon this PR because I too was interested in adding a priority queue to .NET. Glad to see someone went to the effort of making this proposal :). After reviewing the code, I noticed the following:

  • When the IComparer ordering is not consistent with Equals, the behavior of this Contains implementation (which uses the IComparer) may be surprising to some users, as it is essentially contains an item with equal priority.
  • I didn't see code to shrink the array in Dequeue. Shrinking the heap array by half when quarter full is typical.
  • Should the Enqueue method accept null arguments?
  • I think the complexity of Clear should be Θ(n), since that is the complexity of System.Array.Clear, which it uses. https://msdn.microsoft.com/en-us/library/system.array.clear%28v=vs.110%29.aspx

@svick
Copy link
Contributor

svick commented Mar 23, 2015

I didn't see code to shrink the array in Dequeue. Shrinking the heap array by half when quarter full is typical.

Queue<T> and Stack<T> also don't shrink their arrays (based on Reference Source, they're not in CoreFX yet).

@ebickle
Copy link
Author

ebickle commented Mar 23, 2015

@mbeidler I'd considered adding some form of automatic array shrinking in dequeue, but as @svick pointed out it doesn't exist in the reference implementations of similar data structures. I'd be curious to hear from anyone on the .NET Core/BCL team whether there's any particular reason they chose that style of implementation.

Update: I checked List, Queue, Queue, and ArrayList - none of them shrink the size of the internal array on a remove/dequeue.

Enqueue should support nulls, and is documented as allowing them. Did you run into a bug? I can't remember how robust the unit tests area in the area yet.

@mbeidler
Copy link

mbeidler commented Apr 1, 2015

Interesting, I noticed in the reference source linked by @svick that Queue<T> has an unused private constant named _ShrinkThreshold. Perhaps that behavior existed in a previous version.

Concerning the use of IComparer instead of Equals in the implementation of Contains, I wrote up the following unit test, which would fail currently: https://gist.github.com/mbeidler/9e9f566ba7356302c57e

@ebickle
Copy link
Author

ebickle commented Apr 1, 2015

@mbeidler Good point. According to MSDN, IComparer / IComparable only guarantees that a value of zero has the same sort order.

However, it looks as though the same issue exists in the other collection classes. If I modify the code to operate on SortedList<Patient, Patient>, the test case still fails on ContainsKey. The implementation of SortedList<TKey, TValue>.ContainsKey calls Array.BinarySearch, which relies on IComparer to check for equality. The same holds true for SortedSet.

Arguably it's a bug in the existing collection classes as well. I'll dug through the rest of the collections classes and see if there are any other data structures that accept an IComparer but test for equality separately. You're right though, for a priority queue you would expect custom ordering behavior that's completely independent from equality.

@ebickle
Copy link
Author

ebickle commented Apr 1, 2015

Committed a fix and the test case into my fork branch. The new implementation of Contains is based directly on the behavior of List.Contains. Since List doesn't accept an IEqualityComparer, the behavior is functionally equivalent.

When I get some time later today, I'll likely submit bug reports for the other built-in collections. The probably can't be fixed due to regression behavior, but at very least the documentation needs to be updated.

@mbeidler
Copy link

mbeidler commented Apr 3, 2015

I think it makes sense for ContainsKey to use the IComparer<TKey> implementation, since that is what specifies the key. However, I think it would be more logical for ContainsValue to use Equals instead of IComparable<TValue> in its linear search; though in this case the scope is significantly reduced since a type's natural ordering is much less likely to be inconsistent with equals.

It appears that in the MSDN documentation for SortedList<TKey, TValue>, the remarks section for ContainsValue does indicate that the TValue's sort ordering is used in lieu of equality.

@ebickle
Copy link
Author

ebickle commented May 10, 2015

@terrajobst How do you feel about the API proposal so far? Do you feel this a good fit for CoreFX?

@ddevault
Copy link
Contributor

ddevault commented Jul 2, 2015

👍

@terrajobst
Copy link
Member

Thanks for filing this. I believe we've enough data to make a formal review of this proposal, hence I labelled it as 'ready for api review'

@andrewgmorris
Copy link

As Dequeue and Peek are throwing methods the caller needs to check count before each call. Would it make sense to instead (or in addition) provide TryDequeue and TryPeek following the pattern of the concurrent collections? There are issues open to add non-throwing methods to existing generic collections so adding a new collection which doesn't have these methods seems counterproductive.

@benaadams
Copy link
Member

@andrewgmorris related https://github.com/dotnet/corefx/issues/4316 "Add TryDequeue to Queue"

@weshaggard
Copy link
Member

We had a basic review of this and we agree that we want a ProrityQueue in the framework. However, we need to get someone to help drive the design and implementation for it. Whoever grabs the issue can work @terrajobst on finalizing the API.

@shmuelie
Copy link
Contributor

So what work is left on the API?

@justinvp
Copy link
Contributor

This is missing from the API proposal above: PriorityQueue<T> should implement IReadOnlyCollection<T> to match Queue<T> (Queue<T> now implements IReadOnlyCollection<T> as of .NET 4.6).

@BrannonKing
Copy link

I don't know that array-based priority queues are best. Memory allocation in .NET is really fast. We don't have the same search-small-blocks issue that the old malloc dealt with. You are welcome to use my priority queue code from here: https://github.com/BrannonKing/Kts.Astar/tree/master/Kts.AStar

@pgolebiowski
Copy link
Contributor

pgolebiowski commented Oct 21, 2020

It would be helpful if you could elaborate more, or just say what you're trying to say. I'd need to disambiguate, there would be ping-pong, and that would turn into a lengthy discussion. Alternatively, we could arrange a call.

@stephentoub
Copy link
Member

stephentoub commented Oct 21, 2020

I'm saying we want to avoid any Enqueue operation requiring an allocation, whether on the part of the caller or the part of the implementation (ammortized internal allocation is fine, e.g. to expand an array used in the implementation). I'm trying to understand how that's possible with a node-based heap (e.g. if those node objects are exposed to the caller, that prohibits pooling by the implementation due to concerns around inappropriate reuse / aliasing). I want to be able to write:

pq.Enqueue(42, 84);

and have that not allocate. How do the implementations you refer to achieve that?

or just say what you're trying to say

I thought I was.

@pgolebiowski
Copy link
Contributor

pgolebiowski commented Oct 21, 2020

we want to avoid any Enqueue operation requiring an allocation [...] I want to be able to write: pq.Enqueue(42, 84); and have that not allocate.

Where does this desire come from? It's a nice to have side effect of a solution, not a requirement that 99.9% customers need to have satisfied. I don't see why you would choose this low-impact dimension to make design choices between solutions.

We're not making design choices based on optimizations for the 0.1% of customers if these negatively impact 12% of the customers in another dimension. "caring about no allocations" + "dealing with two value types" is an edge case.

I find the dimension of supported behavior/functionality far more important, especially when designing a general-purpose versatile data structure for a wide audience and a variety of use cases.

@stephentoub
Copy link
Member

stephentoub commented Oct 21, 2020

Where does this desire come from?

From wanting the core collection types to be usable in scenarios that care about performance. You say the node-based solution would support 100% of use cases: that is not the case if every enqueue allocates, just as List<T>, Dictionary<TKey, TValue>, HashSet<T>, and so on would become unusable in many situations if they allocated on every Add.

Why do you believe it's only "0.1%" that care about the allocation overheads of these methods? From where is that data coming?

"caring about no allocations" + "dealing with two value types" is an edge case

It is not. It's also not just "two value types". As I understand it, the proposed solution would require either a) an allocation on every enqueue regardless of the Ts involved, or b) would require the element type to derive from some known base type which in turn prohibits a large number of possible uses to avoid extra allocation.

@TimLovellSmith
Copy link
Contributor

TimLovellSmith commented Oct 21, 2020

@eiriktsarpalis
So you don't forget any options, I think there is a feasible option 4 to add to options 1, 2, and 3, in your list which is a compromise:
4. an implementation that supports the 12% use case, while also nearly-optimizing for the other 88% by allowing updates to elements that are equatable, and only lazily building the lookup table required to do those updates the first time an update method is called (and updating it on subsequence updates+removals). Therefore incurring less cost for apps which don't use the functionality.

We might still decide that because there's extra performance available to the 88% or 12% from an implementation which doesn't need an updatable data structure, or is optimized for one in the first place, its better to provide options 2 and 3, than option 4. But I thought we should not forget another option exists.

[Or I suppose you could just see this as a better option 1 and update the description of 1 to say that bookkeeping is not forced, but lazy, and correct equatable behavior is only required when updates are used...]

@pgolebiowski
Copy link
Contributor

pgolebiowski commented Oct 21, 2020

@stephentoub That's exactly what I had in mind about saying simply what you want to say, thank you :)

Why do you believe it's only 0.1% that care about the allocation overheads of these methods? From where is that data coming?

From intuition, i.e. the same source based on which you believe it's more important to prioritize "no additional allocations" over "ability to conduct updates". At least for the update mechanism, we have the data that 11-12% customers do need to have this behavior supported. I don't think remotely close customers would care about "no additional allocations".

In either case, you are for some reason choosing to fixate on the memory dimension, forgetting about other dimensions, e.g. raw speed, which is another trade-off for your preferred approach. An array-based implementation providing "no additional allocations" would be slower than a node-based implementation. Again, I think it's arbitrary here to prioritize memory over speed.

Let's take a step back and focus on what the customers want. We have a design choice that may or may not make the data structure unusable for 12% of the customers. I think we would need to be very careful with providing reasons for why we would choose not to support these.

@stephentoub
Copy link
Member

stephentoub commented Oct 21, 2020

An array-based implementation providing "no additional allocations" would be slower than a node-based implementation.

Please share the two C# implementations you're using to perform that comparison and the benchmarks used to come to that conclusion. Theoretical papers are certainly valuable but they are only one small piece of the puzzle. The more important thing is when the rubber meets the road, factoring in the details of the given platform and given implementations, and you're able to validate on the specific platform with the specific implementation and typical/expected data sets / usage patterns. It could very well be that your assertion is correct. It also may not be. I'd like to see the implementations / data to understand better.

@pgolebiowski
Copy link
Contributor

Please share the two C# implementations you're using to perform that comparison and the benchmarks used to come to that conclusion

This is a valid point, the paper I quote only compares and benchmarks implementations in C++. It conducts multiple benchmarks with different data sets and usage patterns. I am quite confident this would be transferable to C#, but if you believe this is something we need to double down, I think the best course of action would be for you to ask a colleague to conduct such a study.

@eiriktsarpalis
Copy link
Member

@pgolebiowski I would be interested in better understanding the nature of your objection. The proposal advocates for two separate types, would that not cover your requirements?

@eiriktsarpalis
Copy link
Member

  1. an implementation that supports the 12% use case, while also nearly-optimizing for the other 88% by allowing updates to elements that are equatable, and only lazily building the lookup table required to do those updates the first time an update method is called (and updating it on subsequence updates+removals). Therefore incurring less cost for apps which don't use the functionality.

I would probably classify that as a performance optimization for option 1, however I do see a couple of issues with that particular approach:

  • Update now becomes O(n), which can result in unpredictable performance depending on usage patterns.
  • The lookup table is also needed for validating uniqueness. Enqueueing the same element twice before calling Update would be accepted and arguably bring the queue to an inconsistent state.

@TimLovellSmith
Copy link
Contributor

@eiriktsarpalis Its only O(n) once, and O(1) afterwards, which is O(1) amortized. And you can postpone validating uniqueness until the first update. But maybe this is overly clever. Two classes is easier to explain.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 23, 2020

I've spent the last few days prototyping two PriorityQueue implementations: a basic implementation without update support and an implementation that supports updates using element equality. I've named the former PriorityQueue and the latter, for lack of a better name, PrioritySet. My goal is to gauge API ergonomics and compare performance.

The implementations can be found in this repo. Both classes are implemented using array-based quad heaps. The updatable implementation also uses a dictionary that maps elements to internal heap indices.

Basic PriorityQueue

namespace System.Collections.Generic
{
    public class PriorityQueue<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)>
    {
        // Constructors
        public PriorityQueue();
        public PriorityQueue(int initialCapacity);
        public PriorityQueue(IComparer<TPriority>? comparer);
        public PriorityQueue(int initialCapacity, IComparer<TPriority>? comparer);
        public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values);
        public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer);

        // Properties
        public int Count { get; }
        public IComparer<TPriority> Comparer { get; }

        // O(log(n)) push operation
        public void Enqueue(TElement element, TPriority priority);
        // O(1) peek operations
        public TElement Peek();
        public bool TryPeek(out TElement element, out TPriority priority);
        // O(log(n)) pop operations
        public TElement Dequeue();
        public bool TryDequeue(out TElement element, out TPriority priority);
        // Combined push/pop, generally more efficient than sequential Enqueue();Dequeue() calls.
        public TElement EnqueueDequeue(TElement element, TPriority priority);

        public void Clear();

        public Enumerator GetEnumerator();
        public struct Enumerator : IEnumerator<(TElement Element, TPriority Priority)>, IEnumerator;
    }
}

Here's a basic sample using the type

var queue = new PriorityQueue<string, int>();

queue.Enqueue("John", 1940);
queue.Enqueue("Paul", 1942);
queue.Enqueue("George", 1943);
queue.Enqueue("Ringo", 1940);

Assert.Equal("John", queue.Dequeue());
Assert.Equal("Ringo", queue.Dequeue());
Assert.Equal("Paul", queue.Dequeue());
Assert.Equal("George", queue.Dequeue());

Updatable PriorityQueue

namespace System.Collections.Generic
{
    public class PrioritySet<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)> where TElement : notnull
    {
        // Constructors
        public PrioritySet();
        public PrioritySet(int initialCapacity);
        public PrioritySet(IComparer<TPriority> comparer);
        public PrioritySet(int initialCapacity, IComparer<TPriority>? priorityComparer, IEqualityComparer<TElement>? elementComparer);
        public PrioritySet(IEnumerable<(TElement Element, TPriority Priority)> values);
        public PrioritySet(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer, IEqualityComparer<TElement>? elementComparer);

        // Members shared with baseline PriorityQueue implementation
        public int Count { get; }
        public IComparer<TPriority> Comparer { get; }
        public void Enqueue(TElement element, TPriority priority);
        public TElement Peek();
        public bool TryPeek(out TElement element, out TPriority priority);
        public TElement Dequeue();
        public bool TryDequeue(out TElement element, out TPriority priority);
        public TElement EnqueueDequeue(TElement element, TPriority priority);

        // Update methods and friends
        public bool Contains(TElement element); // O(1)
        public bool TryRemove(TElement element); // O(log(n))
        public bool TryUpdate(TElement element, TPriority priority); // O(log(n))
        public void EnqueueOrUpdate(TElement element, TPriority priority); // O(log(n))

        public void Clear();
        public Enumerator GetEnumerator();
        public struct Enumerator : IEnumerator<(TElement Element, TPriority Priority)>, IEnumerator;
    }
}

Performance Comparison

I wrote a simple heapsort benchmark that compares the two implementations in their most basic application. I also included a sort benchmark that uses Linq for comparison:

BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04
AMD EPYC 7452, 1 CPU, 4 logical and 2 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
Method Size Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
LinqSort 30 1.439 us 0.0072 us 0.0064 us 1.00 0.00 0.0095 - - 672 B
PriorityQueue 30 1.450 us 0.0085 us 0.0079 us 1.01 0.01 - - - -
PrioritySet 30 2.778 us 0.0217 us 0.0192 us 1.93 0.02 - - - -
LinqSort 300 24.727 us 0.1032 us 0.0915 us 1.00 0.00 0.0305 - - 3912 B
PriorityQueue 300 29.510 us 0.0995 us 0.0882 us 1.19 0.01 - - - -
PrioritySet 300 47.715 us 0.4455 us 0.4168 us 1.93 0.02 - - - -
LinqSort 3000 412.015 us 1.5495 us 1.3736 us 1.00 0.00 0.4883 - - 36312 B
PriorityQueue 3000 491.722 us 4.1463 us 3.8785 us 1.19 0.01 - - - -
PrioritySet 3000 677.959 us 3.1996 us 2.4981 us 1.64 0.01 - - - -
LinqSort 30000 5,223.560 us 11.9077 us 9.9434 us 1.00 0.00 93.7500 93.7500 93.7500 360910 B
PriorityQueue 30000 5,688.625 us 53.0746 us 49.6460 us 1.09 0.01 - - - 2 B
PrioritySet 30000 8,124.306 us 39.9498 us 37.3691 us 1.55 0.01 - - - 4 B

As can be expected, the overhead of tracking element locations adds a significant performance hit, roughly 40-50% slower compared to the baseline implementation.

@pgolebiowski
Copy link
Contributor

I appreciate all the effort, I see it took a lot of time and energy.

  1. I really don't see though the reason for 2 almost identical data structures, where one is an inferior version of another.
  2. Further, even if you would want to have such 2 versions of a priority queue, I don't see how the "superior" version is better than Priority queue proposal (v2.2) from 20 days ago.

@miyu
Copy link

miyu commented Oct 25, 2020

tl;dr:

  • This proposal is exciting!! But it doesn't yet fit my high-perf use-cases.
  • >90% of my computational geometry/motion planning performance load is PQ enqueue/dequeue because that's the dominant N^m LogN in an algorithm.
  • I'm in favor of separate PQ implementations. I usually don't need priority updates, and 2x worse perf is unacceptable.
  • PrioritySet is a confusing name and isn't discoverable vs PriorityQueue
  • Storing priority twice (once in my element, once in the queue) feels expensive. Struct copies & memory usage.
  • If computing priority were expensive, I would simply store a tuple (priority: ComputePriority(element), element) into a PQ, and my priority-getter function would simply be tuple => tuple.priority.
  • Performance should be benchmarked per-operation or alternatively on real-world use-cases (e.g. optimized multi-start multi-end search on a graph)
  • Unordered enumeration behavior is unexpected. Prefer Queue-like Dequeue()-order semantics?
  • Consider supporting Clone operation & merge operation.
  • Basic operations must be 0-alloc in steady-state usage. I will pool these queues.
  • Consider supporting EnqueueMany which performs heapify to aid pooling.

I work in high-performance search (motion-planning) & computational geometry code (e.g. sweepline algorithms) that is relevant to both robotics and games, I use a lot of custom hand-rolled priority queues. The other common use-case I have is a Top-K query where updatable priority is not useful.

Some feedback regarding the two-implementations (yes vs no update support) debate.

Naming:

  • PrioritySet implies set semantics, but the queue doesn't implement ISet.
  • You're using the UpdatablePriorityQueue name which is easier to discover if I search PriorityQueue.

Performance:

  • Priority Queue performance is nearly always my performance bottleneck (>90%) in my geometry / planning algorithms
  • Consider passing a Func<TElement, TPriority> or Comparison to ctor rather than copying TPriority around (expensive!). If computing priority is expensive, I will insert (priority, element) into a PQ and pass a comparison which looks at my cached priority.
  • A significant number of my algorithms do not need PQ updates. I'd consider using a built-in PQ that does not support updates, but if something has a 2x perf cost to support a feature I don't need (updating) then it is useless to me.
  • For performance analysis / tradeoffs, it would be important to know the relative wall-time cost of enqueue/dequeue per operation @eiriktsarpalis -- "tracking is 2x slower" isn't enough to evaluate whether a PQ is useful.
  • I was glad to see your constructors perform Heapify. Consider a constructor which takes an IList, List, or Array to avoid enumeration allocations.
  • Consider exposing an EnqueueMany which performs Heapify if the PQ is initially empty, since in high-perf it is common to pool collections.
  • Consider making Clear not zero array if elements do not contain references.
  • Allocations in enqueue/dequeue are unacceptable. My algorithms are zero-alloc for performance reasons, with pooled thread-local collections.

APIs:

  • Cloning a priority queue is a trivial operation with your implementations and frequently useful.
    • Related: Should enumerating a Priority Queue have queue-like semantics? I would expect a dequeue-order collection, similar to what Queue does. I expect new List(myPriorityQueue) to not mutate the priorityQueue, but to work as I just described.
  • As mentioned above, it is preferable to take in a Func<TElement, TPriority> rather than insert with priority. If computing priority is expensive, I can simply insert (priority, element) and provide a func tuple => tuple.priority
  • Merging two priority queues is sometimes useful.
  • It is strange that Peek returns a TItem but Enumeration & Enqueue have (TItem, TPriority).

That being said, for a significant number of my algorithms, my priority queue items contain their priorities, and storing that twice (once in the PQ, once in the items) sounds inefficient. This is especially the case if I am ordering by numerous keys (similar use-case to OrderBy.ThenBy.ThenBy). This API also cleans a lot of the inconsistencies where Insert takes a priority but Peek doesn't return it.

Finally, it is worth noting that I often insert indices of an array into a Priority Queue, rather than array elements themselves. This is supported by all APIs discussed so far, though. As an example, if I am processing the beginning/ends of intervals on a x-axis line, I might have priority queue events (x, isStartElseEnd, intervalId) and order by x then by isStartElseEnd. This is often because I have other data structures that map from an index to some computed data.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 26, 2020

@pgolebiowski I took the liberty of including your proposed pairing heap implementation to the benchmarks, just so we can compare instances of all three approaches directly. Here are the results:

Method Size Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
PriorityQueue 10 774.7 ns 3.30 ns 3.08 ns 773.2 ns - - - -
PrioritySet 10 1,643.0 ns 3.89 ns 3.45 ns 1,642.8 ns - - - -
PairingHeap 10 1,660.2 ns 14.11 ns 12.51 ns 1,657.2 ns 0.0134 - - 960 B
PriorityQueue 50 6,413.0 ns 14.95 ns 13.99 ns 6,409.5 ns - - - -
PrioritySet 50 12,193.1 ns 35.41 ns 29.57 ns 12,188.3 ns - - - -
PairingHeap 50 13,955.8 ns 193.36 ns 180.87 ns 13,989.2 ns 0.0610 - - 4800 B
PriorityQueue 150 27,402.5 ns 76.52 ns 71.58 ns 27,410.2 ns - - - -
PrioritySet 150 48,485.8 ns 160.22 ns 149.87 ns 48,476.3 ns - - - -
PairingHeap 150 56,951.2 ns 190.52 ns 168.89 ns 56,953.6 ns 0.1831 - - 14400 B
PriorityQueue 500 124,933.7 ns 429.20 ns 380.48 ns 124,824.4 ns - - - -
PrioritySet 500 206,310.0 ns 433.97 ns 338.81 ns 206,319.0 ns - - - -
PairingHeap 500 229,423.9 ns 3,213.33 ns 2,848.53 ns 230,398.7 ns 0.4883 - - 48000 B
PriorityQueue 1000 284,481.8 ns 475.91 ns 445.16 ns 284,445.6 ns - - - -
PrioritySet 1000 454,989.4 ns 3,712.11 ns 3,472.31 ns 455,354.0 ns - - - -
PairingHeap 1000 459,049.3 ns 1,706.28 ns 1,424.82 ns 459,364.9 ns 0.9766 - - 96000 B
PriorityQueue 10000 3,788,802.4 ns 11,715.81 ns 10,958.98 ns 3,787,811.9 ns - - - 1 B
PrioritySet 10000 5,963,100.4 ns 26,669.04 ns 22,269.86 ns 5,950,915.5 ns - - - 2 B
PairingHeap 10000 6,789,719.0 ns 134,453.01 ns 265,397.13 ns 6,918,392.9 ns 7.8125 - - 960002 B
PriorityQueue 1000000 595,059,170.7 ns 4,001,349.38 ns 3,547,092.00 ns 595,716,610.5 ns - - - 4376 B
PrioritySet 1000000 1,592,037,780.9 ns 13,925,896.05 ns 12,344,944.12 ns 1,591,051,886.5 ns - - - 288 B
PairingHeap 1000000 1,858,670,560.7 ns 36,405,433.20 ns 59,815,170.76 ns 1,838,721,629.0 ns 1000.0000 - - 96000376 B

Key takeaways

  • The pairing heap implementation asymptotically performs much better than its array-backed counterparts. However it can be up to 2x slower for small heap sizes (< 50 elements), catches up at around 1000 elements, but is up to 2x faster for heaps of size 10^6.
  • As expected, the pairing heap produces a significant amount of heap allocations.
  • The "PrioritySet" implementation is consistently slow the slowest of all three contenders, so we might not want to pursue that approach after all.

In light of the above I still believe there are valid trade-offs between the baseline array heap and the pairing heap approach.

EDIT: updated the results after a bugfix in my benchmarks, thanks @VisualMelon

@VisualMelon
Copy link

VisualMelon commented Oct 26, 2020

@eiriktsarpalis Your benchmark for the PairingHeap is, I think, wrong: the parameters to Add are the wrong way around. When you swap them around, it's a different story: https://gist.github.com/VisualMelon/00885fe50f7ab0f4ae5cd1307312109f

(I did exactly the same thing when I first implemented it)

Note that this doesn't mean the pairing heap is faster or slower, rather it seems to depend heavily on the distribution/order of the data supplied.

@TimLovellSmith
Copy link
Contributor

@eiriktsarpalis re: the usefulness of PrioritySet...
We shouldn't expect updatable to be anything other than slower for heapsort, since it has no priority updates in the scenario. (Also for heapsort, its likely you even want to keep duplicates, a set is just not appropriate.)

The litmus test for seeing whether PrioritySet is useful should be benchmarking algorithms which use priority updates, versus a non-updating implementation of same algorithm, enqueueing the duplicate values and ignoring duplicates when dequeueing.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 26, 2020

Thanks @VisualMelon, I updated my results and comments after your suggested fix.

rather it seems to depend heavily on the distribution/order of the data supplied.

I believe it might have benefited from the fact that the enqueued priorities were monotonic.

The litmus test for seeing whether PrioritySet is useful should be benchmarking algorithms which use priority updates, versus a non-updating implementation of same algorithm, enqueueing the duplicate values and ignoring duplicates when dequeueing.

@TimLovellSmith my goal here was to measure performance for the most common PriorityQueue application: rather than measure the performance of updates, I wanted to see the impact for the case where updates are not needed at all. It might however make sense to produce a separate benchmark that compares pairing heap with "PrioritySet" updates.

@miyu thanks for your detailed feedback, it is very much appreciated!

@eiriktsarpalis
Copy link
Member

@TimLovellSmith I wrote a simple benchmark that uses updates:

Method Size Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
PrioritySet 10 1.052 us 0.0106 us 0.0099 us 1.055 us - - - -
PairingHeap 10 1.055 us 0.0042 us 0.0035 us 1.055 us 0.0057 - - 480 B
PrioritySet 50 7.394 us 0.0527 us 0.0493 us 7.380 us - - - -
PairingHeap 50 8.587 us 0.1678 us 0.1570 us 8.634 us 0.0305 - - 2400 B
PrioritySet 150 27.522 us 0.0459 us 0.0359 us 27.523 us - - - -
PairingHeap 150 32.045 us 0.1076 us 0.1007 us 32.019 us 0.0610 - - 7200 B
PrioritySet 500 109.097 us 0.6548 us 0.6125 us 109.162 us - - - -
PairingHeap 500 131.647 us 0.5401 us 0.4510 us 131.588 us 0.2441 - - 24000 B
PrioritySet 1000 238.184 us 1.0282 us 0.9618 us 238.457 us - - - -
PairingHeap 1000 293.236 us 0.9396 us 0.8789 us 293.257 us 0.4883 - - 48000 B
PrioritySet 10000 3,035.982 us 12.2952 us 10.8994 us 3,036.985 us - - - 1 B
PairingHeap 10000 3,388.685 us 16.0675 us 38.1861 us 3,374.565 us - - - 480002 B
PrioritySet 1000000 841,406.888 us 16,788.4775 us 15,703.9522 us 840,888.389 us - - - 288 B
PairingHeap 1000000 989,966.501 us 19,722.6687 us 30,705.8191 us 996,075.410 us - - - 48000448 B

@ericsampson
Copy link

On a separate note, has their been discussion/feedback on the lack of stability being an issue (or non-issue) for people's usecases?

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 27, 2020

has their been discussion/feedback on the lack of stability being an issue (or non-issue) for people's usecase

None of the implementations guarantee stability, however it should be pretty straightforward for users to obtain stability by augmenting the ordinal with a timestamp:

var pq = new PriorityQueue<string, (int priority, DateTime timestamp)>();

foreach (string element in elements)
{
    int priority = 42;
    pq.Enqueue(element, (priority, DateTime.Now));
}

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 28, 2020

To summarize some of my previous posts, I've been trying to identify what a popular .NET priority queue would look like, so I went through the following data:

  • Common priority queue usage patterns in .NET source code.
  • PriorityQueue implementations in core libraries of competing frameworks.
  • Benchmarks of various .NET priority queue prototypes.

Which yielded the following takeaways:

  • 90% of priority queue use cases do not require priority updates.
  • Supporting priority updates results in a more complicated API contract (requiring either handles or element uniqueness).
  • In my benchmarks, implementations that support priority updates are 2-3x slower compared to ones that don't.

Next Steps

Going forward, I propose we take the following actions for .NET 6:

  1. Introduce a System.Collections.Generic.PriorityQueue class that is simple, caters to the majority of our user requirements and is as efficient as possible. It'll use an array-backed quaternary heap and will not support priority updates. A prototype of the implementation can be found here. I will be creating a separate issue detailing the API proposal shortly.

  2. We recognize the need for heaps that support efficient priority updates, so will keep working towards introducing a specialized class that addresses this requirement. We are evaluating a couple of prototypes [1,2] each with their own sets of trade-offs. My recommendation would be to introduce this type at a later stage, since more work is needed to finalize the design.

At this moment, I would like to thank the contributors to this thread, in particular @pgolebiowski and @TimLovellSmith. Your feedback has played a huge role in guiding our design process. I hope to keep receiving your input as we iron out the design for the updatable priority queue.

@pgolebiowski
Copy link
Contributor

Going forward, I propose we take the following actions for .NET 6: [...]

Sounds good :)

Introduce a System.Collections.Generic.PriorityQueue class that is simple, caters to the majority of our user requirements and is as efficient as possible. It'll use an array-backed quaternary heap and will not support priority updates.

If we have the decision by the codebase owners that this direction is approved and desired, can I continue leading the API design for that bit and provide the final implementation?

At this moment, I would like to thank the contributors to this thread, in particular @pgolebiowski and @TimLovellSmith. Your feedback has played a huge role in guiding our design process. I hope to keep receiving your input as we iron out the design for the updatable priority queue.

It was quite a journey :D

@eiriktsarpalis
Copy link
Member

The API for System.Collections.Generic.PriorityQueue<TElement, TPriority> has just been approved. I've created a separate issue to continue our conversation on a potential heap implementation that supports priority updates.

I'm going to close this issue, thank you all for your contributions!

System.Collections Area Ownership automation moved this from In progress to Done Nov 18, 2020
@xied75
Copy link
Contributor

xied75 commented Nov 18, 2020

Maybe someone can do a write up on this journey! A whole 6 years for one API. :) Any chance to win a Guinness?

@dotnet dotnet locked as resolved and limited conversation to collaborators Jan 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Projects
No open projects
Development

No branches or pull requests