Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[issues-912] optimize serialization size for primitive arrays - IntOnly for now #915

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jhsenjaliya
Copy link

@jhsenjaliya jhsenjaliya commented Sep 15, 2022

PR for #912 ( Currently limited to showcase the feature for Int primitive arrays only based on the guidelines )

Motivation:
Many times, primitive array is used for optimized/efficient code (size and execution time), where the size of the primitive array is not always accurately decided beforehand.
Serializing these primitive arrays can produce large serialized object with mostly blank character (NUL).
Kryo being focused in efficiency, should provide configuration to further optimize this behavior to store only the necessary values whenever there is an opportunity to do so.

pls refer to the description on #912 for more info on approach.

Thanks

@theigl
Copy link
Collaborator

theigl commented Sep 23, 2022

@jhsenjaliya: Thanks for this PR. Your approach is interesting, but the changes required to support it for all types are quite invasive. I will keep this PR open for now to see if anyone else is interested in this optimization.

@jhsenjaliya
Copy link
Author

jhsenjaliya commented Sep 26, 2022

Sure, I will let you think through this
My observation has been that optimizations will always be tricky, but the value it provides to have least possible storage size would be not only worth but adds lot of value to Kryo.
Thanks for the review !

@NathanSweet
Copy link
Member

I agree the feature can be useful. I've used skipping zeros at the beginning or end in my projects, where it makes sense.

I don't think we want a setting that changes the behavior of all Input/Output. The feature can be entirely self contained within a serializer. Where you want this, which is unlikely to be everywhere, you would use the serializer.

It could make sense for Kryo to provide such a serializer, though there are many use case specific serializers that could be provided. We don't try to provide them all, especially when the implementation is relatively trivial.

If you really wanted to do it everywhere, you could extend Input/Output, but I don't think it makes sense for Kryo to provide that as it's too application specific.

@jhsenjaliya
Copy link
Author

jhsenjaliya commented Dec 14, 2022

@NathanSweet , Thanks for providing that input.
I believe such optimizations better suit as settings/configs rather than all new serializer/deserializer all together.
Also by default this config is OFF, so there is no change to the behavior, only when user needs, it can be turned ON, when user thinks S/He wants additional improvements like this.

I also like ur idea of doing this for continuous default values ( zeros ) in the beginning instead of just end. may be there can be settings/config for all 3 cases -- optimize_continuous_zeros_in_starting_only, optimize_continuous_zeros_in_end_only and optimize_continuous_zeros ( for both )

I can imaging lot of storage savings with this. hope more people finds this useful feature/optimizations when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants