Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with strides deserializing complex NumPy array #449

Open
vnmabus opened this issue Jun 28, 2023 · 4 comments
Open

Error with strides deserializing complex NumPy array #449

vnmabus opened this issue Jun 28, 2023 · 4 comments
Labels
bug extensions issues affecting numpy/pandas/etc

Comments

@vnmabus
Copy link

vnmabus commented Jun 28, 2023

I have found a case in which a deserialization causes the following error, using the current version in the repo (3.0.1.dev40+gef95ebc):

ValueError: strides is incompatible with shape of requested array and size of buffer

The following code shows a simple example in which that happens:

from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold
from sklearn.neighbors import KNeighborsClassifier

import numpy as np
import jsonpickle
import jsonpickle.ext.numpy

jsonpickle.ext.numpy.register_handlers()

classifier = KNeighborsClassifier()
grid = GridSearchCV(
    classifier,
    {"n_neighbors": [1, 3]},
    cv=RepeatedStratifiedKFold(),
)
grid.fit(np.eye(10), np.zeros(10))

json = jsonpickle.dumps(grid.cv_results_)
data = jsonpickle.loads(json)

The full traceback is

ValueError                                Traceback (most recent call last)
<ipython-input-36-e3acff02f24e> in <module>
     17 
     18 json = jsonpickle.dumps(grid.cv_results_)
---> 19 data = jsonpickle.loads(json)

.../site-packages/jsonpickle/unpickler.py in decode(string, backend, context, keys, reset, safe, classes, v1_decode, on_missing)
     86     )
     87     data = backend.decode(string)
---> 88     return context.restore(data, reset=reset, classes=classes)
     89 
     90 

.../site-packages/jsonpickle/unpickler.py in restore(self, obj, reset, classes)
    360         if classes:
    361             self.register_classes(classes)
--> 362         value = self._restore(obj)
    363         if reset:
    364             self._swap_proxies()

.../site-packages/jsonpickle/unpickler.py in _restore(self, obj)
    342         else:
    343             restore = self._restore_tags(obj)
--> 344         return restore(obj)
    345 
    346     def restore(self, obj, reset=True, classes=None):

.../site-packages/jsonpickle/unpickler.py in _restore_dict(self, obj)
    827                     str_k = k
    828                 self._namestack.append(str_k)
--> 829                 data[k] = self._restore(v)
    830                 self._namestack.pop()
    831         return data

.../site-packages/jsonpickle/unpickler.py in _restore(self, obj)
    342         else:
    343             restore = self._restore_tags(obj)
--> 344         return restore(obj)
    345 
    346     def restore(self, obj, reset=True, classes=None):

.../site-packages/jsonpickle/unpickler.py in _restore_object(self, obj)
    767             proxy = _Proxy()
    768             self._mkref(proxy)
--> 769             instance = handler(self).restore(obj)
    770             proxy.reset(instance)
    771             self._swapref(proxy, instance)

.../site-packages/jsonpickle/ext/numpy.py in restore(self, data)
    332             ), "Current implementation assumes base is C or F contiguous"
    333 
--> 334             arr = np.ndarray(
    335                 buffer=base.data,
    336                 dtype=self.restore_dtype(data).newbyteorder(data.get('byteorder', '|')),

ValueError: strides is incompatible with shape of requested array and size of buffer
@vnmabus
Copy link
Author

vnmabus commented Jun 28, 2023

When trying to debug I have found that base in the line 334 correspond to a 0-dim array whose element is a jsonpickle.unpickler._Proxy object.

@vnmabus
Copy link
Author

vnmabus commented Jun 28, 2023

It looks to me like the reference is wrong because some off-by-one error at either encoding or decoding. Can someone who knows the codebase please take a look at it?

@Theelx
Copy link
Contributor

Theelx commented Jul 5, 2023

Hey, sorry for the late response! I took off the week before July 4th to spend time with family, so I'll try to look at this issue today.

@Theelx
Copy link
Contributor

Theelx commented Jul 8, 2023

There is indeed an off-by-one error in the encoding step, where py/id is 17 instead of 16. This is weird, because if I recall correctly, there was a similar off-by-one error with numpy serialization that was fixed a year or two ago, so maybe the bug was reintroduced somehow.

@Theelx Theelx added bug extensions issues affecting numpy/pandas/etc labels Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug extensions issues affecting numpy/pandas/etc
Projects
None yet
Development

No branches or pull requests

2 participants