Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: SortedDict Range Deletion #180

Open
ThomasShaw1 opened this issue Nov 20, 2021 · 7 comments
Open

Feature request: SortedDict Range Deletion #180

ThomasShaw1 opened this issue Nov 20, 2021 · 7 comments

Comments

@ThomasShaw1
Copy link

ThomasShaw1 commented Nov 20, 2021

Range key deletion feature is currently missing for SortedDict. If I want to delete multiple keys within an range, I have to do them one by one using a for loop which is very inefficient in terms of time complexity. See my example below.

# program that removes the subdict from 2 to 4
d = SortedDict({1: 1, 2: 2, 3: 3, 4: 4, 5: 5})
lower_bound_index = d.bisect_left(2)
upper_bound_index = d.bisect_left(4)
for _ in range(lower_bound_index, upper_bound_index + 1):
   d.popitem(lower_bound_index)

This is very inefficient both in terms of usage and in terms of time complexity. Time complexity for this one is O(k log n) where k is the range size. Other languages such as C++ and Java only needs an time complexity of O(k + log n) to accomplish the job with the API mentioned below.

In C++, you have the option of calling erase(iterator first, iterator last) for their map data structure. Similarly, in Java, you can call subMap(first, last).clear() for their TreeMap data structure. Something similar should be available for SortedDict in my opinion.

@ThomasShaw1
Copy link
Author

Here's another hacky way of doing it. Not sure if it is better in terms of performance. But still, an API would be nice.

def play():
    sd = SortedDict({1: 1, 2: 2, 3: 3, 4: 4, 5: 5})
    low = sd.bisect_left(2)
    high = sd.bisect_left(4)
    for key in sd._list[low:high + 1]:
        super(SortedDict, sd).pop(key)
    del sd._list[low:high + 1]
    print(sd)

@grantjenks
Copy link
Owner

Having a kind of range_delete makes sense to me. There’s a variety of optimizations that could be done inside the data structure too.

Note that today you can use irange instead of bisect to avoid using the position index.

@grantjenks
Copy link
Owner

Here's the best I can come up with today (without internal optimizations):

$ python -m IPython
Python 3.9.1 (v3.9.1:1e5d33e9b9, Dec  7 2020, 12:10:52) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import sortedcontainers

In [2]: class SortedDict(sortedcontainers.SortedDict):
   ...:     def range_del(self, minimum=None, maximum=None, inclusive=(True, True)):
   ...:         iterator = self.irange(minimum, maximum, inclusive)
   ...:         keys = list(iterator)
   ...:         for key in keys:
   ...:             del self[key]
   ...: 

In [3]: sd = SortedDict({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})

In [4]: sd
Out[4]: SortedDict({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})

In [5]: sd.range_del('b', 'd')

In [6]: sd
Out[6]: SortedDict({'a': 1, 'e': 5})

@ThomasShaw1
Copy link
Author

This is great in terms of simplicity. But I would still urge you to take a look at the hack implementation I wrote above. The main idea is that slice deletion for SortedList is already internally optimized to O(log n), which is used for keys, so why not just use it as it is and then take care of the dictionary deletion one by one. That way, the time complexity is O(k + log n) instead of O(k log n), which is inline with Sorted Maps in Java and C++ and other languages.

@grantjenks
Copy link
Owner

The performance of the two will differ depending on the size of the dictionary and how much is deleted. But I generally agree that an optimized version would be a good addition to the API.

The best improvement to your version would be to avoid building the positional index. If you like at the implementation of SortedList.irange, you can see how to do so.

@ThomasShaw1
Copy link
Author

But how would I achieve slice deletion if I don't have the positional indices? SortedList.__delitem__ only takes index or slice. It is only through slice deletion that we achieve internal optimization.

@grantjenks grantjenks changed the title Missing Feature Range Deletion Feature request: SortedDict Range Deletion Oct 9, 2022
@MacherelR
Copy link

Hello,
I'm currently facing the exact same problem and I was wondering whether any of you (@ThomasShaw1) had found some optimized solution ?
I've tried the range_del @grantjenks wrote but it doesn't seem really optimized as it doesn't accelerate the deletion.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants