Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZnJSON: I wrote a similar package before I found out about jsonpickle #377

Open
PythonFZ opened this issue Jan 10, 2022 · 5 comments
Open
Labels
core enhancement not-a-bug Something that seems like a bug but is intended behavior performance

Comments

@PythonFZ
Copy link

I wanted to write something that achieves the same goals as jsonpickle and came up with the https://github.com/zincware/ZnJSON package. Then recently I discovered jsonpickle on Stackoverflow.

I decided for a slightly different implementation, overwriting json.JSONEncoder and having an API like follows:

import numpy as np
import json
import znjson

znjson.register(
    znjson.converter.NumpyConverter
)

data = json.dumps(
    obj={"data_np": np.arange(2), "data": [x for x in range(10)]},
    cls=znjson.ZnEncoder,
    indent=4
)
_ = json.loads(data, cls=znjson.ZnDecoder)

I'm now struggling if to keep ZnJSON in my project, move over to jsonpickle or add jsonpickle as a converter to ZnJSON.

Primarily, there are two features I would be interested to see how they are handled with jsonpickle, with respect to custom serializers.
With ZnJSON they look like:

import pathlib

from znjson import ConverterBase


class PathlibConverter(ConverterBase):
    instance = pathlib.Path   #  will be checked with isinstance(obj, instance), can be overwritten in __eq__
    representation = "pathlib.Path"
    order = 0  # determines what to try first

    def _encode(self, obj: pathlib.Path):
        return obj.as_posix()

    def _decode(self, value):
        return pathlib.Path(value)

I can easily add a serializer for a custom datatype and use znjson.register to add it and I can set an order parameter which e.g. can be useful for small data. A small Numpy array can be serialized into a list but a larger one is stored as binary.

@Theelx
Copy link
Contributor

Theelx commented Jan 10, 2022

I think there's a nice middle ground between switching to jsonpickle and only using jsonpickle, and adding jsonpickle as a converter to ZnJSON.

It looks like ZnJSON's handlers aren't fully done, because the code is very small and minimal (though I could just be overestimating the amount of code needed for a full handler). If you could adapt your custom handlers for numpy/pandas/pathlib to jsonpickle, that could actually be quite useful, as it looks like it'd allow jsonpickle to be far more extensible/maintainable than it currently is. I'm not entirely familiar with jsonpickle's extension code for numpy and pandas, but what it looks like to me is that jsonpickle's numpy implementation is bloated and unnecessarily complicated, whereas your numpy implementation looks simple and easy to use. I don't believe davvid and I have extensive knowledge about the current converters' logic flow (I definitely don't know it, but davvid might), so a rewrite would help make it more maintainable for upcoming releases of numpy/pandas.

Basically, would you consider adding your ZnJSON converters to jsonpickle? They'd need to be adapted to work with jsonpickle's API, but if they work and you're willing/able to contribute them, I think that would be a huge plus to everyone who uses jsonpickle.

@Theelx
Copy link
Contributor

Theelx commented Jan 10, 2022

Regarding jsonpickle's current custom serialization handlers:
Here's an example of a current numpy handler

class NumpyBaseHandler(BaseHandler):

Here's the function that users can call to enable numpy extension handlers for jsonpickle
def register_handlers():

It is a bit clunky currently, because when using the public API, people have to enable all of them or none of them. Also, the numpy extension does have a mechanism to store arrays greater than size_threshold (defaults to 16), just like yours. jsonpickle actually stores it as a compressed base64 string though, rather than binary data.

@Theelx Theelx added core not-a-bug Something that seems like a bug but is intended behavior labels Jan 10, 2022
@PythonFZ
Copy link
Author

Basically, would you consider adding your ZnJSON converters to jsonpickle? They'd need to be adapted to work with jsonpickle's API, but if they work and you're willing/able to contribute them, I think that would be a huge plus to everyone who uses jsonpickle.

I would be interested to look into it. So the idea would be to have the flexibility of the znjson.ConverterBase but inside jsonpickle?

It looks like ZnJSON's handlers aren't fully done

They do work and are fully functional. Because I'm the only user so far, I guess there might be a lot of open issues though.

@PythonFZ
Copy link
Author

I added a very simple base64 encoder (looking a bit at your code) for numpy and ran a quick benchmark of ZnJSON vs jsonpickle, which you can have a look at here: https://github.com/zincware/ZnJSON/blob/add_base64_converter/example/compare_to_jsonpickle.ipynb

Judging from your quotes

that could actually be quite useful, as it looks like it'd allow jsonpickle to be far more extensible/maintainable than it currently is [...] looks like to me is that jsonpickle's numpy implementation is bloated and unnecessarily complicated, whereas your numpy implementation looks simple and easy to use [...] so a rewrite would help make it more maintainable for upcoming releases of numpy/pandas

I was thinking, maybe jsonpickle could benefit from some structural changes. I'm aware that everything has to be fully backwards compatible. One way to solve this, could be to keep everything as is, but extend it by e.g.

jsonpickle.register(jsonpickle.converters.NumpyConverter)

which would then use the ZnJSON API and log some mild deprecation information when calling the current method, that a newer and more flexible version is available.

import jsonpickle.ext.numpy as jsonpickle_numpy
jsonpickle_numpy.register_handlers()

What is your experienced opinion on this topic? Do you think this could be helpful or do you have ideas on how to improve this?

@Theelx
Copy link
Contributor

Theelx commented Jan 11, 2022

So the idea would be to have the flexibility of the znjson.ConverterBase but inside jsonpickle?

Yup yup, that's exactly what I was thinking.

What is your experienced opinion on this topic?

I want to clarify that even though I have a basic knowledge of most of jsonpickle, the vast majority of jsonpickle's code was written by other people, I only started contributing to certain parts of jsonpickle around a year ago because it turns out that I'm good at speeding up existing code, and jsonpickle was a bottleneck in my other project. So, I wouldn't consider myself experienced with most of the code.

One way to solve this, could be to keep everything as is, but extend it by e.g.

jsonpickle.register(jsonpickle.converters.NumpyConverter)

Yeah, that would most likely work. If jsonpickle.converters is introduced though, there could be a conflict between handlers and converters, so I think it'd be best to just introduce this as a new NumpyHandler. One issue is that currently, handlers have to inherit from jsonpickle.handlers.BaseHandler, so we'd need to figure out if we can remove that requirement, or otherwise make youyr NumpyHandler inherit from BaseHandler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core enhancement not-a-bug Something that seems like a bug but is intended behavior performance
Projects
None yet
Development

No branches or pull requests

2 participants