Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for repodata.json.zst #675

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

Conversation

beenje
Copy link
Contributor

@beenje beenje commented Nov 30, 2023

Adding support for repodata.json.zst (fix #573).

  • repodata.json.zst is now created as well as the .bz2 and .gz versions
  • added a test checking that the compressed files are identical to repodata.json
  • Ensure that repodata.json is updated for proxy channels when a compressed file is requested. Before that fix, if repodata.json.zst was requested and existed locally, it was never updated. Even with the support added in first commit, it would only be updated when someone was requesting the repodata.json file (which triggers the creation of all compressed files). Note that I didn't find an easy way to write a test for that, which would be nice.

rattler is very efficient to download repodata
serve_repo_data fixture copied from rattler
dummy_remote_session_object wasn't cleaning after itself
(using return instead of yield)
@beenje
Copy link
Contributor Author

beenje commented Dec 4, 2023

With the tests added in #677 I could easily add a test in this MR.

@beenje beenje marked this pull request as draft December 7, 2023 07:57
@beenje
Copy link
Contributor Author

beenje commented Dec 7, 2023

Doing some tests locally, I noticed that this change makes the download of the repodata.json from a proxy channel very slow (when the file is big).

The initial download from the remote repo is actually not the biggest issue as I thought in #660. Problem is the compression which is quite slow for big files. Compressing conda-forge/linux-64/repodata.json file to gz, then bz2 and now zst takes several seconds. The download of the file is blocked during that time and explains the time-out I saw on the client side.

Will look if the compression can be done in the background. And maybe add options to disable that compression (when using quetz as internal conda server, network between clients and server is usually fast).

@ivergara
Copy link
Collaborator

ivergara commented Dec 7, 2023

You might want to look at how to use the asynchronous capabilities of the package store. I fixed (#626) an oversight in how packages were uploaded some time ago. It was using synchronous "filesystem" calls, and it was blocking for too long for big files.

New version of mamba requests repodata.json.zst first.
The compressed files are created locally when downloading the non compressed version.
Quetz should always check if the repodata.json file needs to be re-downloaded
so that all files stay consistent.
@beenje
Copy link
Contributor Author

beenje commented Dec 10, 2023

I don't think async will help in this case. The compression is done in add_static_file (and add_temp_static_file), which aren't async. They are called in the background by update_indexes. There is no issue there.
But for proxy channel, we download the remote repodata.json, compress it and then serve it. The client has to wait during that time. Would be the same if it was async.
Compression should probably be done in the background for proxy channels as well.

In the meantime, I added a new compression section in the config to enable/disabled bz2, gz and zst compressions.
By default zst is disabled and the 2 others are enabled to keep the same behaviour as today.

Note that for the tests, I re-used what I implemented in #677. So I rebased this PR on the other one branch. Hoping #677 can be merged soon.

@codecov-commenter
Copy link

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (0b49467) 83.61% compared to head (f0d7927) 83.90%.
Report is 3 commits behind head on main.

Files Patch % Lines
quetz/tasks/mirror.py 84.21% 6 Missing ⚠️
quetz/tasks/common.py 75.00% 1 Missing ⚠️
quetz/utils.py 98.18% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #675      +/-   ##
==========================================
+ Coverage   83.61%   83.90%   +0.28%     
==========================================
  Files          79       79              
  Lines        6233     6324      +91     
==========================================
+ Hits         5212     5306      +94     
+ Misses       1021     1018       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add repodata.json.zst
3 participants