Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kellnr calculates wrong checksum [CRITICAL] #311

Closed
alexthe2 opened this issue May 2, 2024 · 15 comments
Closed

Kellnr calculates wrong checksum [CRITICAL] #311

alexthe2 opened this issue May 2, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@alexthe2
Copy link

alexthe2 commented May 2, 2024

Since last week (there was no particular event), kellnr started calculating the wrong checksums for the crates pushed to it.

This results in cargo not being able to verify checksums.

failed to verify the checksum of `v12-data v1.17.1 (registry `kellnr`)

As an example crate:

sqlite> select * from crate_index where name="v12-data" AND vers="1.17.1";
id|name|vers|deps|cksum|features|yanked|links|v|crate_fk
35|v12-data|1.17.1|...|90392cdc01079ee500391dd0fd409059158b83b170567361d2e3552b76450f6c|{}|0||1|5

The checksum here is not the checksum that would occur from running sha256sum, when we did a UPDATE crate_index ... with the result of sha256, that particular crate started correctly working (with cargo).

We're happy to help with resolving this issue asap, as it's a criticial bug for us, but would need some guidance as to where this issue could be originating from.

We're running the latest kellnr 5.2.1, with rust 1.77.2 on the buildserver (Jenkins) that pushes to kellnr, and locally 1.75+

@secana secana added the bug Something isn't working label May 2, 2024
@secana
Copy link
Contributor

secana commented May 2, 2024

Hi @alexthe2! Thanks for reporting the issue. Does this happen only with one specific crate or under specific circumstances? I'll try to reproduce the issue.

@alexthe2
Copy link
Author

alexthe2 commented May 2, 2024

It happens with two crates for us (from what I can see right now), called v12-data and v12-terra_converters, v12-terra_converters is depndent on data

@secana
Copy link
Contributor

secana commented May 3, 2024

Can you provide logs from Kellnr in the "trace" mode, when the issue occurs? I try to replicate the wrongly computed hash, but so far without any success.

@alexthe2
Copy link
Author

alexthe2 commented May 3, 2024

should I set KELLNR_LOG__LEVEL or KELLNR_LOG__LEVEL_WEB_SERVER?

@secana
Copy link
Contributor

secana commented May 3, 2024

KELLNR_LOG__LEVEL should be enough.

@alexthe2
Copy link
Author

alexthe2 commented May 3, 2024

it's not producing any logs 😢 , I verfied that the level is really set to trace (updated helm chart, and deleted pod to force restart)

@secana
Copy link
Contributor

secana commented May 9, 2024

I still try to debug the issue but have no idea why the sha256 is computed wrong. Can you try to disable the cache, so I know that it is not a caching issue?

KELLNR_REGISTRY_CACHE_SIZE=0

@asymmetry
Copy link

Hi @secana, we are experiencing the same issue. I could provide one more data point.

Compare the sha256sum of a corrupted crate in the db and the actual value:
In db:
image
Actual:
image

And I am already running with KELLNR_REGISTRY__CACHE_SIZE=0
image

I am running Kellnr 5.2.2 with the released docker image.

Thanks!

@asymmetry
Copy link

If I delete the corrupted version from the web ui as admin user, restart the docker container, and then publish the same crate again, it could fix the problem.

@alexthe2
Copy link
Author

alexthe2 commented May 10, 2024 via email

secana added a commit that referenced this issue May 11, 2024
@secana
Copy link
Contributor

secana commented May 11, 2024

Thanks for the input. So far, the issue seems to be the computation of the sha256, as the crate itself seems to be fine on disk. I released a debug version of kellnr with much more debug output for the specific issue. Would you be so kind and try to run it and provide the logs here?

Kellnr version: 5.2.3-debug-311
Helm chart version: 3.2.3-debug-311

All logs are in the level debug and prefixed with #311 to be easily identifiable. For the debug version, I added an additional crate to compute the sha256 and be able to compare it with the current implementation. Hopefully the debug output shows us the right direction to finally find and fix the issue.

@asymmetry
Copy link

Thanks @secana! I have deployed this test version with logs enabled. I tried to publish a test crate 10 times (with slight modification each time) and the logs looks good to me. My team will continue to use this version and I will show the logs here if the issue happens again.

Thanks for the help!

@secana
Copy link
Contributor

secana commented May 17, 2024

New version of Kellnr 5.3.2 is out with improved SHA256 computation. This error should be fixed with that version.

@secana secana closed this as completed May 17, 2024
@alexthe2
Copy link
Author

Also a finishing update from our side, we got the debug version running two days ago, no issues yet, we'll now switch to the new 5.3.2

@asymmetry
Copy link

asymmetry commented May 21, 2024

Hi @secana, my team have the debug version running for almost 10 days and we captured this issue again. Here is the log:
image

It seems that the error is happenned when reading the crate saved to disk back into memory for sha256 calculation, it only reads 4096 bytes.
However it seems you have already switched to use the in-memory data to calcualte checksum so I think this is not going to happen again. We will switch to the new 5.3.2.
Thanks a lot for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

3 participants