Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specfication of behavior with changing data types (versioning / schema evolution) #302

Open
bluenote10 opened this issue Apr 9, 2022 · 2 comments

Comments

@bluenote10
Copy link

I'm looking for a highly compact serialization format that allows for a certain backwards compatibility. rmp-serde looks like a very interesting candidate, because its non-named representation is very concise indeed. What I couldn't answer from studying the docs is the aspect of backwards compatibility, i.e., how rmp-serde handles changes to data types -- in particular when using the compact non-named serialization.

Imagine serializing a struct with fields a and b and storing it on disk. Later an optional field c (with a default) is added. Will that work, or will the data become unreadable? What about removing or renaming fields? Does the order of fields matter, i.e., will the data become unreadable when swapping the field order to b and a? Does that depend on whether a and b have the same or different types?

It would be great if the documentation could specify what kind of assumptions users can make regarding changing data types.

@kornelski
Copy link
Collaborator

It's undocumented, because it hasn't been carefully considered and tested. I don't mind committing to keeping specific data representations and compatibility.

If you'd like to rely on some things, please contribute unit tests that ensure they keep working.

AFAIK currently:

  • order does matter when using non-named serialization.
  • names of struct fields don't matter when using non-named serialization.
  • adding default fields at the end of structs should be fine.
  • removal of fields can break non-named serialization. May be fine for named.

@bluenote10
Copy link
Author

Thanks, this kind of information already helps a lot!

Background: I was basically skimming over possibilities to do highly compact (which implies schema-based / non-self-describing) serialization, combined with some way of dealing with schema evolution. After understanding serde better and playing around with various binary non-self-describing, I've come to the conclusion that serde is not quite the right tool for the job (for reference on versioning serde-rs/serde#1137). All non-self-describing serializers I tried suffer from order dependence, breaking field removals, and lack of support of certain "named only serde features" like skip_serializing_if, which kind of has to be the case since serde lacks protobuf-like field offset annotations or cereal/boost-serialization-like "class versions". There is probably no way to fix it on rmp_serde side without storage overhead, so documenting should be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants