Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variables named with modifiers (like \hat) are incorrectly saved #466

Closed
wsshin opened this issue Mar 30, 2023 · 7 comments · Fixed by #561
Closed

Variables named with modifiers (like \hat) are incorrectly saved #466

wsshin opened this issue Mar 30, 2023 · 7 comments · Fixed by #561

Comments

@wsshin
Copy link

wsshin commented Mar 30, 2023

I can save variables named with modifiers (like \hat) with no problem:

julia> â = 1;  # â is entered by a\hat[tab]

julia> jldsave("foo.jld2"; â)

However, if I try to load the saved variable, an error is generated complaining that it cannot find the variable name:

julia> load("foo.jld2", "â")  # â is entered by a\hat[tab]
Error encountered while load FileIO.File{FileIO.DataFormat{:JLD2}, String}("foo.jld2").

Fatal error:
ERROR: KeyError: key "â" not found
[...]

Strangely, if I simply load the file without specifying the variable name, the result shows that the variable is actually loaded:

julia> load("foo.jld2")
Dict{String, Any} with 1 entry:
  "â" => 1

It turns out that the loaded "â" has a different byte representation than the saved "â":

julia> codeunits("â")  # â is entered by a\hat[tab]
3-element Base.CodeUnits{UInt8, String}:
 0x61
 0xcc
 0x82

julia> codeunits("â")  # â is entered by copy-and-pasting the output of load("foo.jld2") above
2-element Base.CodeUnits{UInt8, String}:
 0xc3
 0xa2

So, somewhere during jldsave() seems to change the byte representation of "â".

Here is the version info:

julia> VERSION
v"1.9.0-rc1"

(@v1.9) pkg> st JLD2
Status `~/.julia/environments/v1.9/Project.toml`
  [033835bb] JLD2 v0.4.31
@JonasIsensee
Copy link
Collaborator

Hi @wsshin,

I'm afraid, this isn't really a problem restricted to JLD2 but one more generally with Unicode.
There appear to be two different unicode representations of the "same" visual symbol
and julia generates a different version depending on how you create it.

julia> :â
:â

julia> string(:â)
"â"

julia> string(:â) == "â"
false

julia> codeunits(string(:â))
2-element Base.CodeUnits{UInt8, String}:
 0xc3
 0xa2

julia> codeunits("â")
3-element Base.CodeUnits{UInt8, String}:
 0x61
 0xcc
 0x82

julia> Symbol(string(:â)) == :â
true

julia> string(Symbol("â")) == "â"
true

julia> Symbol("â") == :â
false

julia> string(:â) == "â"
false

@wsshin
Copy link
Author

wsshin commented Mar 31, 2023

Thanks @JonasIsensee. I reported the issue to JuliaLang/julia.

@oscardssmith
Copy link

The correct solution here is for JLD2 to apply normalization before doing the lookup.

@JonasIsensee
Copy link
Collaborator

The place to edit is here, I think:

function lookup_offset(g::Group, name::AbstractString)

The correct function to compare strings is Unicode.is_equal_normalized() and one could consider always adding a normalization step prior to saving (or loading the whole thing) with Unicode.normalize()

@ggebbie
Copy link

ggebbie commented May 24, 2023

I hit this with a variable named G\bar on Julia 1.9.0. I will rename variable to G for now. Thanks for documenting this issue.

@wsshin
Copy link
Author

wsshin commented Mar 26, 2024

@JonasIsensee, is there any reason why your #466 (comment) cannot be implemented? I am experiencing this problem a year from my initial report, and I find that it hasn't been resolved yet.

@JonasIsensee
Copy link
Collaborator

Oh, it should be relatively easy to fix.
You are welcome to submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants