Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit Type Remapping & Anonymous Functions #316

Closed
wants to merge 16 commits into from

Conversation

JonasIsensee
Copy link
Collaborator

@JonasIsensee JonasIsensee commented May 16, 2021

This PR finally implements what is needed to store anonymous functions using JLD2.
Most of the julia side of things is borrowed from BSON but additional trickery
was needed to integrate all this with JLD2.

AFAICT the memory layout of functions / typenames / methods have changed from julia 1.5 to 1.6
and this PR only supports 1.6.

As a side effect of this, this PR also implements explicit type remapping to allow renaming types on load.
This can be useful when working with multiple versions of the same struct. (e.g. old one in the file)

Explicit Type Remapping

Sometimes you store data using structs that you defined yourself or are
shipped with some package and weeks later, when you want to
load the data, the structs have changed.

using JLD2
struct A
    x::Int
end

jldsave("example.jld2"; a = A(42))

This results in warnings and sometimes even errors when trying to load the
file as demonstrated here.

julia> using JLD2

julia> struct A{T}
            x::T
       end

julia> load("example.jld2")
┌ Warning: read type A is not a leaf type in workspace; reconstructing
└ @ JLD2 ~/.julia/dev/JLD2/src/data/reconstructing_datatypes.jl:273
Dict{String, Any} with 1 entry:
  "a" => var"##A#257"(42)

As of JLD2 version v0.4.5 there is a fix. The JLDFile struct contains a type_map
dictionary that allows for explicit type remapping. Now you can define a struct
that matches the old definition and load your data.

julia> struct A_old
            x::Int
        end

julia> f = jldopen("example.jld2","r")
JLDFile /home/jonas/.julia/dev/JLD2/example.jld2 (read-only)
 └─🔢 a

julia> f.type_map["Main.A"] = A_old
A_old

julia> f["a"]
A_old(42)

closes #208
closes #191
closes #175
closes #288
todo
storing typeof(anonfun) #37

@codecov
Copy link

codecov bot commented May 16, 2021

Codecov Report

Merging #316 (e3f52a7) into master (a9c62a6) will increase coverage by 0.29%.
The diff coverage is 98.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #316      +/-   ##
==========================================
+ Coverage   89.88%   90.18%   +0.29%     
==========================================
  Files          27       28       +1     
  Lines        2720     2813      +93     
==========================================
+ Hits         2445     2537      +92     
- Misses        275      276       +1     
Impacted Files Coverage Δ
src/file_header.jl 78.57% <ø> (ø)
src/data/anonymous_functions.jl 96.66% <96.66%> (ø)
src/JLD2.jl 90.85% <100.00%> (+1.14%) ⬆️
src/data/reconstructing_datatypes.jl 76.36% <100.00%> (+2.36%) ⬆️
src/data/writing_datatypes.jl 96.96% <100.00%> (+0.38%) ⬆️
src/backwards_compatibility.jl 62.50% <0.00%> (-12.50%) ⬇️
src/dataio.jl 98.44% <0.00%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9c62a6...e3f52a7. Read the comment docs.

@CarloLucibello
Copy link

what is missing from this PR? AFAIK after this JLD2 would be a better candidate than BSON for serializing Flux's models in basically any situation

@JonasIsensee
Copy link
Collaborator Author

what is missing from this PR? AFAIK after this JLD2 would be a better candidate than BSON for serializing Flux's models in basically any situation

There are two things that are missing:

  • Proper review. Currently, I appear to be the only one familiar enough with JLD2 internals and willing to implement stuff like this. Since JLD2 is used by a lot of people, I was hesitant to just merge this without outside opinions.
  • I'd really like to resolve Tuple-of-type-of-closure cannot be reloaded #37 , but this is a problem quite deeply embedded into JLD2. and not fixable without "breaking" changes.
  • If I merge this PR before fixing Tuple-of-type-of-closure cannot be reloaded #37, then I will have to implement even more legacy stuff to not break anyone's files.

The issue with #37 is this:
For every dataset, JLD2 stores essentially the

  • content
  • description of content (e.g. memory layout on disk)
  • name of datatype

This works well for data but for datatypes JLD2 is hardcoded to use the datatype signature as content.
Thus, if the signature of a stored datatype is not known in a new julia session, it is impossible to reconstruct.

The fix:

  1. Change serialization of datatypes to contain description of their (instance) layout
  2. Change deserialization to create a new datatype from description when loaded datatype is not known.

@babaq
Copy link

babaq commented Apr 17, 2022

is this branch workable for anonymous functions now? i tried current release version, it saves and loads correctly a dataset containing anonymous functions within a single Julia session, but when i restart a new Julia session and after using the same packages, it loads everything except anonymous functions.

I also tried BSON, JLD, and JLSO, BSON failed saving probably because my dataset contains namedtuples of different types. JLD could save, but failed load. The JLSO is like the JLD2, could save and load in a single session, but can not load in a new session.

@JonasIsensee
Copy link
Collaborator Author

is this branch workable for anonymous functions now? i tried current release version, it saves and loads correctly a dataset containing anonymous functions within a single Julia session, but when i restart a new Julia session and after using the same packages, it loads everything except anonymous functions.

I also tried BSON, JLD, and JLSO, BSON failed saving probably because my dataset contains namedtuples of different types. JLD could save, but failed load. The JLSO is like the JLD2, could save and load in a single session, but can not load in a new session.

Hi @babaq ,
I'm afraid it is not. I built this at some point and got it working partially. However, there have been significant changes to how this works between e.g. julia 1.6 and 1.7.
So, it is very difficult to get working reliably.
Something else you could try out, is #377.
This is a Pathfinder PR that would, in principle, allow us to write objects (anonymous function) as binary blobs using the julia Serialization stdlib.
It still needs work, but I hope that this is more doable.

@giordano giordano deleted the anonfunctions branch September 20, 2023 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants