Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VersionedRecordSerializer? #884

Open
JanecekPetr opened this issue Feb 14, 2022 · 11 comments
Open

VersionedRecordSerializer? #884

JanecekPetr opened this issue Feb 14, 2022 · 11 comments

Comments

@JanecekPetr
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I'm trying to serialize a Record that evolves - new components are being added to it.

Describe the solution you'd like
Would it make sense to create a VersionedRecordSerializer akin to VersionedFieldSerializer with the same functionality, the @Since annotation?

Describe alternatives you've considered
An obvious workaround is to use a traditional class instead of a record.

@theigl
Copy link
Collaborator

theigl commented Feb 16, 2022

@JanecekPetr: The RecordSerializer was a contribution from Oracle. When I accepted it, I didn't really think about aligning it with FieldSerializer and its compatible variants, but it definitely makes sense to be able to evolve records in a similar way to classes.

I don't have to time work on this at the moment but I'd be happy to accept a PR.

@JanecekPetr
Copy link
Contributor Author

JanecekPetr commented Feb 17, 2022

Okay, I actually started looking into this.

  1. The eclipse project setup is outdated - JUnit 5 was introduced, but the setup doesn't know about it yet. I'm not yet sure I want to provide a fix for that too, I'll see.
  2. The current RecordSerializer is sloooooow as it does a lot of reflection all the time and no component caching. I started with a caching version (no ASM yet) and the speedup of writing is 20-25× (no read/roundtrip benchmarks yet, I'll do those next week):
    Benchmark                                               Mode  Cnt     Score    Error  Units
    RecordSerializerBenchmark.cachingRecordSerializer       avgt    5    85.466 ±  5.142  ns/op
    RecordSerializerBenchmark.recordSerializer              avgt    5  2035.110 ± 42.174  ns/op
    
    This is annoying as it's largely unfixable without breaking changes as there's now only new RecordSerializer() constructor and no new RecordSerializer(Class<T> type) constructor. What should I do? The new VersionRecordSerializer will for sure cache things and I'd be happy to fix the RecordSerializer, too, but how? What do you think is the best approach to the design now?
  3. There is almost no configuration of the RecordSerializer, there's only fixedFieldTypes. I haven't yet looked into what other configuration might be necessary / useful as I do not yet fully understand all of them. Would you mind taking a look and helping me with the choice? If there is more config that'd be useful, I'll add that too. That, at least, can easily be done retroactively.

I'm happy to open new tickets and continue each thread elsewhere if needed. There's apparently a lot incoming :).

(I'm also not a fan of serializing records alphabetically as it complicates the code unnecessarily, but that ship has sailed for sure. At least it's consistent with FieldSerializer in that you can reorder the components and can rename then as long as the alphabetical order doesn't change. Okay I guess.)

@JanecekPetr
Copy link
Contributor Author

This is annoying as it's largely unfixable without breaking changes as there's now only new RecordSerializer() constructor and no new RecordSerializer(Class<T> type) constructor.

Actually we could still internally just do an internal ClassValue lookup and store the cached component accessors and metadata there. Hm. Inconsistent with FieldSerializer, again, but would work without changing the API.

Either way, some design work needs to be done before I actually start writing some more complex code.

@theigl
Copy link
Collaborator

theigl commented Feb 18, 2022

@JanecekPetr: Thanks a lot for looking into this! I'll think about these issues and will get back to you asap.

@theigl
Copy link
Collaborator

theigl commented Feb 21, 2022

@JanecekPetr: Thanks again for your valuable input!

The eclipse project setup is outdated - JUnit 5 was introduced, but the setup doesn't know about it yet. I'm not yet sure I want to provide a fix for that too, I'll see.

Right, I'm using IntelliJ and haven't checked the Eclipse config. If you can, please provide a fix.

The current RecordSerializer is sloooooow as it does a lot of reflection all the time and no component caching. I started with a caching version (no ASM yet) and the speedup of writing is 20-25× (no read/roundtrip benchmarks yet, I'll do those next week)
This is annoying as it's largely unfixable without breaking changes as there's now only new RecordSerializer() constructor and no new RecordSerializer(Class type) constructor. What should I do? The new VersionRecordSerializer will for sure cache things and I'd be happy to fix the RecordSerializer, too, but how? What do you think is the best approach to the design now?

We will definitely find a solution for this. Your benchmark results make a good case, even for a breaking change. What I would suggest is to add the the new constructor and cache the results there. Then we deprecate the default constructor and cache the components on first access if they haven't been cached from the new constructor. WDYT?

There is almost no configuration of the RecordSerializer, there's only fixedFieldTypes. I haven't yet looked into what other configuration might be necessary / useful as I do not yet fully understand all of them. Would you mind taking a look and helping me with the choice? If there is more config that'd be useful, I'll add that too. That, at least, can easily be done retroactively.

I'd suggest we start with the simplest configuration possible and do not add any new options for now. As you said, this can easily be done retroactively, should anybody ask for it.

Having said that, my biggest concern with the whole idea of a VersionedRecordSerializer is that before long, people will ask for a CompatibleRecordSerializer as well. Then we have a base class RecordSerializer and two sub-classes that hook into serialization and modify what fields are read. There will be a large amount of duplication between these serializers and the existing variants of FieldSerializer. Furthermore, users will have to configure both a VersionedFieldSerializer and a VersionedRecordSerializer if they want to evolve their data structures.

Ideally, we could abstract FieldSerializer in such a way, that it can deal with both records and normal classes. The main difficulty with this approach is that FieldSerializer currently instantiates an empty class and then reads one field after another and sets the field's value. For records, it would have to read all fields into an array, and then finally invoke the canonical constructor. This solution would support all current serializer options, including all annotations for customizing serializers on a per-field basis.

So to sum up: We can move ahead with improving the record serializer by adding caching and extending it to support versioning, but the ideal solution would be to let field serializer and its variant handle records. I'm just not sure it can actually be implemented.

@JanecekPetr
Copy link
Contributor Author

I agree overall, the fact that FieldSerializer can't pick a canonical constructor if there is one is a shame that could be optionally alleviated (at least since Java 8 where there sometimes are parameter names accessible to reflection). But that's another story and another ticket.

For now the RecordSerializer exists and what you propose makes sense. I'll look into this and should have a POC this or next week.

@JanecekPetr
Copy link
Contributor Author

JanecekPetr commented Mar 3, 2022

I didn't forget about this, I have a preliminary almost-working POC with tests and some benchmarks, but it requires more work and a cleanup pass.
I haven't had very much free time lately outside work because of the current situation war in Europe as we're trying to help people fleeing Ukraine as well as the ones staying there. I'll get back to this ... I don't know. As soon as possible.

@theigl
Copy link
Collaborator

theigl commented Sep 9, 2022

@JanecekPetr: Did you make any progress on this issue?

I started preliminary work on refactoring the FieldSerializer to support records, but I don't have enough free time to finish this at the moment. Your suggested performance improvements for the existing serializer would be a great intermediate step.

@JanecekPetr
Copy link
Contributor Author

Oh my, indeed I have!
I'm currently away from the world, but I'll post the code with my progress during the weekend so we can finish it off.

@theigl
Copy link
Collaborator

theigl commented Nov 24, 2022

@JanecekPetr: Ping ;)

I'd like to do another Kryo 5.x release in the coming weeks. Any chance you can create a PR with your caching improvements?

@theigl
Copy link
Collaborator

theigl commented Dec 2, 2022

@JanecekPetr: I created a PR with caching for the RecordSerializer here: #927

Is this the same approach you took? I'm also seeing a 20x speed-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants