Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unpickling very slow #203

Open
ChrisJefferson opened this issue Sep 1, 2022 · 5 comments
Open

unpickling very slow #203

ChrisJefferson opened this issue Sep 1, 2022 · 5 comments

Comments

@ChrisJefferson
Copy link

ChrisJefferson commented Sep 1, 2022

Hi,

I have some code where unpickling SortedLists and SortedSets takes a significant amount of the runtime of my program (I have a lot of them!)

The problem is, I think, that as __reduce__ just says that set and key should be passed to init, then the iterable values then get fed into self._update one at a time.

Would it be possible to speed things up, as the iterable is already known to be sorted? I wasn't sure of the best way to do this, one option would be to add a constructor method issorted, but then that might get used by other people (and maybe incorrectly), and you might not want that?

@ChrisJefferson
Copy link
Author

One option (based on my limited Python experience) would be to switch to using setstate and getstate, and then set up the special methods in setstate?

@grantjenks
Copy link
Owner

The update method has a fast path for when it’s initially empty.

Please provide a simple benchmark to reproduce the issue.

@ChrisJefferson
Copy link
Author

Here is a simple benchmark (I put a class in just to check if classes aren't always slow). The results are get are:

1.4263453999956255 0.07709820000309264 0.10215669999888632

So it is 20 times slower to unpickle a SortedSet than a set, and 14 times slow to unpickle a set with a single member. However, that might be considered a reasonable overhead -- in my case it is annoying as I'm using pickle to store a very large datastructure, and my program is "unpickle data structure, do some AI with it, close program".

Code:

from sortedcontainers import SortedSet 
import pickle
import timeit

class Stupid:
    def __init__(self,l):
        self.a = list(l)

s1 = list([SortedSet(range(i, i+10)) for i in range(10000)])
s2 = list([set(range(i, i+10)) for i in range(10000)])
s3 = list([Stupid(range(i, i+10)) for i in range(10000)])

p1 = pickle.dumps(s1)
p2 = pickle.dumps(s2)
p3 = pickle.dumps(s3)

def func1():
    pickle.loads(p1)

def func2():
    pickle.loads(p2)

def func3():
    pickle.loads(p3)

time1 = timeit.Timer(func1).timeit(number=10)
time2 = timeit.Timer(func2).timeit(number=10)
time3 = timeit.Timer(func3).timeit(number=10)

print(f"{time1} {time2} {time3}")

@kPsarakis
Copy link

Hi,

Thanks a lot for this project!

I face a similar issue when loading a pickled SortedSet. I agree with Chris that the issue is that during loading, it recreates the SortedSet, which is an expensive operation.

Are you open to a PR?

@grantjenks
Copy link
Owner

I can’t promise it’ll be merged but I’m certainly happy to review a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants