Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-cli creates broken git repository #2236

Open
balat opened this issue Apr 8, 2023 · 10 comments
Open

irmin-cli creates broken git repository #2236

balat opened this issue Apr 8, 2023 · 10 comments

Comments

@balat
Copy link
Contributor

balat commented Apr 8, 2023

With 3.6.1 (or current dev version):

$ mkdir /tmp/test
$ cd /tmp/test
$ irmin set a b
$ git checkout main
error: inflate: data stream error (incorrect data check)
error: corrupt loose object '7a2bc67366d825510257eb58fbb6adfd1d1ebbc4'
fatal: loose object 7a2bc67366d825510257eb58fbb6adfd1d1ebbc4 (stored in .git/objects/7a/2bc67366d825510257eb58fbb6adfd1d1ebbc4) is corrupt

I can repoduce the behaviour with the following minimal program, as long as I link with checkseum.ocaml instead of checkseum.c:

open Lwt.Syntax

module Store = Irmin_git_unix.FS.KV(Irmin.Contents.String)

let main =
  let* repo = Store.Repo.v (Irmin_git.config ~bare:true "/tmp/irmin/bro3") in
  let* t = Store.main repo in
  Store.set_exn t ~info:(Irmin_git_unix.info ~author:"toto" "Updating foo/bar") ["foo"; "bar"] "testing 123"

let _ = Lwt_main.run main

irmin-pack requires explicitely checkseum.ocaml which make irmin-cli linked with checkseum.ocaml.
Is it safe to switch to checkseum.c for irmin-pack until checkseum.ocaml is fixed?

@dinosaure
Copy link
Member

Can you precise the version of checkseum and optint? Several versions was broken in the past due to the representation of integers.

@balat
Copy link
Contributor Author

balat commented Apr 8, 2023

$ opam show checkseum

<><> checkseum: information on all versions <><><><><><><><><><><><><><><><><><>
name                   checkseum
all-installed-versions 0.5.0 [default]
all-versions           0.0.1  0.0.2  0.0.3  0.0.9  0.1.0  0.1.1  0.1.1-1  0.2.0  0.2.1  0.3.0  0.3.1  0.3.2  0.3.3  0.3.4  0.4.0  0.5.0

<><> Version-specific details <><><><><><><><><><><><><><><><><><><><><><><><><>
version      0.5.0

$ opam show optint

<><> optint: information on all versions ><><><><><><><><><><><><><><><><><><><>
name                   optint
all-installed-versions 0.3.0 [default]
all-versions           0.0.1  0.0.2  0.0.3  0.0.4  0.0.5  0.1.0  0.2.0  0.3.0

@dinosaure
Copy link
Member

That's indeed unfortunate, I will try to reproduce next week and see what is wrong with checkseum‧ocaml

@dinosaure
Copy link
Member

See mirage/checkseum#80

@dinosaure
Copy link
Member

FIxed by ocaml/opam-repository#23657.

@metanivek
Copy link
Member

@dinosaure thanks for the quick fix!

We use adler32 for creating a checksum for one of our files in irmin-pack, which is used by Tezos. They updated their opam repository to 0.5.0 about 4 weeks ago and have had a couple of releases since then.

Thankfully, we haven't heard any issues of this checksum failing in these releases. What is your advice? Will these checksums continue to work with 0.5.1? Should Tezos release a new version with 0.5.1 when it releases?

@dinosaure
Copy link
Member

Thankfully, we haven't heard any issues of this checksum failing in these releases. What is your advice? Will these checksums continue to work with 0.5.1? Should Tezos release a new version with 0.5.1 when it releases?

As far as, by default, you use checkseum‧c, you should not get any problems (as @balat said, you must explicitely depends on checkseum‧ocaml to use the buggy implementation). If you generated an adler32 with checkseum‧ocaml (v0.5.0), the checksum is just wrong and they will not continue to work with checkseum.0.5.1. The best is to move as soon as you can to checkseum.0.5.1 and fix "by hands" wrong checksums generated by checkseum‧ocaml. On my side, I will disallow the usage of checkseum.0.5.0 on opam-repository (and no one will be able to use it after this release).

Sorry for this inconvenience and this regression.

@metanivek
Copy link
Member

metanivek commented Apr 11, 2023

Unfortunately, we do depend on checkseum.ocaml. We will need to come up with some way to "fix" this, probably by disregarding existing checksums and creating new ones. Not ideal but we'll figure something out!

Even though they are wrong in 0.5.0 will these checksums continue to work as long as the surrounding binary is not updated to 0.5.1? Is it consistently wrong for encode/decode? 😅 If so, my guess is that we will want to advise Tezos to not upgrade until we have some fix in irmin-pack that can properly handle the broken checksums waves hands somehow.

@dinosaure
Copy link
Member

Even though they are wrong in 0.5.0 will these checksums continue to work as long as the surrounding binary is not updated to 0.5.1?

So, it will work if you use only Checkseum.Adler32.* (v0.5.0). If you use another implementation of Adler32 (a C one - like checkseum.c), it will not work (even if it's the same version). So you must keep checkseum.ocaml (v0.5.0)...

Is it consistently wrong for encode/decode?

This part is done by optint/repr, so decoding/encoding does not change between v0.5.0/v0.5.1, only the expected value change.

If so, my guess is that we will want to advise Tezos to not upgrade until we have some fix in irmin-pack that can properly handle the broken checksums waves hands somehow.

It's probably a good way indeed, you should increment the version of your pack file and say that for previous version, you should not calculate/verify the checksum and move then to checkseum.0.5.1.

I don't know what Tezos uses for packages (if they use their OPAM overlay or not) and you should ask them about that due to the available: false added for checkseum.0.5.0. We can figure out about a time before to add the available: false (may be a note for other users) and let you and Tezos do the migration - I'm not sure about all that.

@metanivek
Copy link
Member

They do use their own opam repository/overlay. It appears that they update it periodically, but I will give them a heads up to not update checkseum to 0.5.1 until we are ready for it. I think it's fine (and right!) for you to change the availability flag in the official repo.

We have a new release coming up that bumps the version of our on-disk data (of the file we checksum! 🎉), so I think we can use that to handle this properly. An unfortunate situation but perhaps fortunate timing of events and discovery!

Thanks again for the quick fix and information to understand the scope of things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants