Specfication of behavior with changing data types (versioning / schema evolution) #302

bluenote10 · 2022-04-09T09:36:26Z

I'm looking for a highly compact serialization format that allows for a certain backwards compatibility. rmp-serde looks like a very interesting candidate, because its non-named representation is very concise indeed. What I couldn't answer from studying the docs is the aspect of backwards compatibility, i.e., how rmp-serde handles changes to data types -- in particular when using the compact non-named serialization.

Imagine serializing a struct with fields a and b and storing it on disk. Later an optional field c (with a default) is added. Will that work, or will the data become unreadable? What about removing or renaming fields? Does the order of fields matter, i.e., will the data become unreadable when swapping the field order to b and a? Does that depend on whether a and b have the same or different types?

It would be great if the documentation could specify what kind of assumptions users can make regarding changing data types.

The text was updated successfully, but these errors were encountered:

kornelski · 2022-04-11T16:37:00Z

It's undocumented, because it hasn't been carefully considered and tested. I don't mind committing to keeping specific data representations and compatibility.

If you'd like to rely on some things, please contribute unit tests that ensure they keep working.

AFAIK currently:

order does matter when using non-named serialization.
names of struct fields don't matter when using non-named serialization.
adding default fields at the end of structs should be fine.
removal of fields can break non-named serialization. May be fine for named.

bluenote10 · 2022-04-15T17:21:35Z

Thanks, this kind of information already helps a lot!

Background: I was basically skimming over possibilities to do highly compact (which implies schema-based / non-self-describing) serialization, combined with some way of dealing with schema evolution. After understanding serde better and playing around with various binary non-self-describing, I've come to the conclusion that serde is not quite the right tool for the job (for reference on versioning serde-rs/serde#1137). All non-self-describing serializers I tried suffer from order dependence, breaking field removals, and lack of support of certain "named only serde features" like skip_serializing_if, which kind of has to be the case since serde lacks protobuf-like field offset annotations or cereal/boost-serialization-like "class versions". There is probably no way to fix it on rmp_serde side without storage overhead, so documenting should be sufficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specfication of behavior with changing data types (versioning / schema evolution) #302

Specfication of behavior with changing data types (versioning / schema evolution) #302

bluenote10 commented Apr 9, 2022

kornelski commented Apr 11, 2022

bluenote10 commented Apr 15, 2022

Specfication of behavior with changing data types (versioning / schema evolution) #302

Specfication of behavior with changing data types (versioning / schema evolution) #302

Comments

bluenote10 commented Apr 9, 2022

kornelski commented Apr 11, 2022

bluenote10 commented Apr 15, 2022