Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: SimplePriorityQueue overloads that expose priority when trivial, for performance and thread safety #54

Open
JohannesMP opened this issue Jul 25, 2021 · 1 comment

Comments

@JohannesMP
Copy link

JohannesMP commented Jul 25, 2021

Overview

When working with priority queues it's often useful to be able to access the priority of the head node as you dequeue it. This is especially common in many pathfinding and graph traversal algorithms.

When using GenericPriorityQueue or FastPriorityQueue you manipulate the nodes directly and so have access to their respective Priority properties.

When using SimplePriorityQueue the nodes are not publicly exposed and so it is currently necessary to first get the priority of the head element, and only then dequeue it.

A naïve user implementation might look like this:

public static class SimplePriorityQueueExtensions
{
    public static TItem Dequeue<TItem, TPriority>(
        this SimplePriorityQueue<TItem, TPriority> queue,
        out TPriority priority)
    {
        priority = queue.GetPriority(queue.First);
        return queue.Dequeue();
    }
    
    public static bool TryDequeue<TItem, TPriority>(
        this SimplePriorityQueue<TItem, TPriority> queue,
        out TItem first, 
        out TPriority priority)
    {
        // Can't use TryDequeue because by that point we've already lost the priority value stored :(
        if (!queue.TryFirst(out first))
        {
            priority = default;
            return false;
        }
        
        priority = queue.GetPriority(first);
        queue.Dequeue(); // Already have 'first' assigned, so no need to store it again.
        return true;
    }
}

This achieves the desired result using SimplePriorityQueue.GetPriority (added as a result of related issue #27), but this implementation is suboptimal (even if for a moment we ignore the fact that it is not thread safe).

The Problem

In its current implementation SimplePriorityQueue.Dequeue() retrieves an internal SimpleNode , which contains both the Data as well as the Priority properties we need, but only returns the data to the caller:

public TItem Dequeue()
{
    lock(_queue)
    {
        if(_queue.Count <= 0)
        {
            throw new InvalidOperationException("Cannot call Dequeue() on an empty queue");
        }

        SimpleNode node =_queue.Dequeue();
        RemoveFromNodeCache(node);
        return node.Data;
    }
}

So by performing Dequeue/TryDequeue we actually already have access the priority value of the head node that we want! It's just not exposed to the caller.

They instead have to make an additional GetPriority call, which in the general case means a _itemToNodesCache lookup since it is intended to work for any node, that could be avoided altogether.

The Proposal

It would be trivial to add a Dequeue and TryDequeue overload to SimplePriorityQueue that each have a out TPriority parameter, since we get that value for free in both of the existing implementations.

For example, the overload for Dequeue could be:

/// <summary>
/// Removes the head of the queue (node with minimum priority; ties are broken by order of insertion), 
/// and returns it along with its priority.
/// If queue is empty, throws an exception
/// O(log n)
/// </summary>
public TItem Dequeue(out TPriority priority)
{
    lock(_queue)
    {
        if(_queue.Count <= 0)
        {
            throw new InvalidOperationException("Cannot call Dequeue() on an empty queue");
        }

        SimpleNode node =_queue.Dequeue();
        priority = node.Priority; // <---- added
        RemoveFromNodeCache(node);
        return node.Data;
    }
}

And the overload for TryDequeue could be:

/// <summary>
/// Removes the head of the queue (node with minimum priority; ties are broken by order of insertion), 
/// and sets it to first along with its priority.
/// Useful for multi-threading, where the queue may become empty between calls to Contains() and Dequeue()
/// or between GetProprity() and TryDequeue()
/// Returns true if successful; false if queue was empty
/// O(log n)
/// </summary>
public bool TryDequeue(out TItem first, out TPriority priority)
{
    if (_queue.Count > 0)
    {
        lock (_queue)
        {
            if (_queue.Count > 0)
            {
                SimpleNode node = _queue.Dequeue();
                first = node.Data;
                priority = node.Priority; // <---- added
                RemoveFromNodeCache(node);
                return true;
            }
        }
    }
    
    first = default(TItem);
    priority = default(TPriority); // <---- added
    return false;
}

They are identical to the existing implementations, except for assigning the priority out parameter. There should be no performance overhead other than one additional assignment, which is trivial (and would still happen with (Try)GetPriority anyway).

This would even have the additional benefit of being thread-safe, unlike the naïve user implementation provided as an example at the start.

If it was deemed necessary to keep the function names unique and not overload them they could be named DequeueWithPriority and TryDequeueWithPriority respectively, or something comparable. 2nd hardest problem in computer science and all that...

Keeping things 'Simple'

The argument could be made that if you care enough about optimizing Dequeue with priority you should just use GenericPriorityQueue or FastPriorityQueue, and that the public signature of SimplePriorityQueue should be kept, well, 'simple'.

That is valid, but given how trivial this implementation would be to add it just seems like such a missed opportunity, especially for saving an additional hash for an operation that as I understand it is very common in some of the main applications of Priority Queues.

More importantly SimplePriorityQueue promises to be thread-safe, and requiring the user to make three calls means we can no longer guarantee that the operation is thread-safe - it's now up to the user to ensure that they correctly lock access to the container for the duration of their First (or TryFirst), GetPriority and Dequeue calls. Yuck.

It just seems like the equivalent of creating a SimpleDictionary implementation that promises to be "thread-safe" , but only has Contains and GetValue and does not provide a TryGetValue:

// Needs to hash key twice
// Cannot guarantee thread-safe value access
if(SimpleDictionary.Contains(key))
{
    Value value = SimpleDictionary.GetValue(key);
    // Use value here ...
}


// Needs to hash key once
// Can guarantee thread-safe value access
if(SimpleDictionary.TryGetValue(key, out Value value))
{
    // Use value here ...
}

The primary purpose of this library is performance (it's in the name after all), so I'd like to make the argument that even its 'simple' implementation would benefit from this addition, while (especially if we use unique names) remaining backwards compatible.

The proposed addition would make a common access pattern safer (one less place the user has to worry about thread safety) and faster, at no extra cost.

Relevant issues

  • Retrieve priority in Peek/Dequeue #27 : The first discussion of this issue, which resulted in the addition of SimplePriorityQueue.GetPriority. As outlined here that is sub-optimal in this specific case where we already have the node and so don't need to hash to retrieve it. It also requires the caller to manage their own thread safety which the proposed change also throws in for free.
@JohannesMP
Copy link
Author

JohannesMP commented Jul 26, 2021

I've made a pull requests that adds a variation of the proposal discussed in this issue.

I opted to just add the proposed functionality as overloads to SimplePriorityQueue.TryFirst, SimplePriorityQueue.TryRemove and SimplePriorityQueue.TryDequeue, consistent with the intent that the Try* methods are for multithreading.

In each case if the caller wants the priority of the head/head-to-be-removed/node-to-be-removed, they can now get it without an additional (Try)GetPriority call, making it easier to remain thread-safe while also avoiding an extra _itemToNodesCache lookup.

@JohannesMP JohannesMP changed the title Proposal: SimplePriorityQueue.Dequeue/TryDequeue overloads that also expose priority Proposal: SimplePriorityQueue overloads that expose priority when trivial, for performance and thread safety Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant