conda `fetch_latest_path` interface may still read repodata into memory #440

dholth · 2024-02-15T16:24:16Z

Checklist

I added a descriptive title
I searched open reports and couldn't find a duplicate

What happened?

This will require collaboration with the conda repository, reporting here because this project can benefit from lower in-Python memory usage while classic-solver conda will always load repodata in Python anyway.

We see that fetch_latest_path can still load repodata into memory here https://github.com/conda/conda/blob/main/conda/gateways/repodata/__init__.py#L862C1-L862C68

Could this function be patched with a "don't load" flag?

The logic to save cache state is a part of that convoluted function, so an attempt to bypass to a lower level might not correctly fetch from the local cache on a second request.

Conda Info

No response

Conda Config

No response

Conda list

No response

Additional Context

No response

The text was updated successfully, but these errors were encountered:

jaimergp · 2024-02-15T16:48:45Z

Thanks @dholth, yes, I'd love a method or a flag that can guarantee the minimal amount of IO to obtain the JSON cache path while guaranteeing is up-to-date.

I thought that fetch_latest_path() would do that, but it ends up calling fetch_latest() which does return raw_repodata so... maybe we just need a separate code path? Not sure how much this increases the maintenance burden, or whether this is a simple change.

I see that one of the reasons to load the JSON is to refresh the cache size but... if it's unchanged, I don't see how the cache info would be different. Even if we really want to do that, the size info might come from the stat payload, no?

Ideally we would have a function that minimally achieves:

If remote reports changes, stream remote to disk, update cache, return path
If no changes are reported, cache is valid, update the last modification time, return path.

dholth · 2024-02-15T17:30:14Z

It's a complicated method that is also supposed to maintain backwards compatibility. Hopefully now that we've found a reasonable lower-level API (necessary for more advanced repodata fetch) this can be simplified.

Yes the necessary cache information can be found from a stat call. I would like to move the cache metadata down a layer also; right now the old RepoInterface knows less about the cache and the new RepoInterface knows more about the cache as is necessary for jlap support.

dholth added the type::bug describes erroneous operation, use severity::* to classify the type label Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conda `fetch_latest_path` interface may still read repodata into memory #440

conda `fetch_latest_path` interface may still read repodata into memory #440

dholth commented Feb 15, 2024

jaimergp commented Feb 15, 2024 •

edited

dholth commented Feb 15, 2024 •

edited

conda fetch_latest_path interface may still read repodata into memory #440

conda fetch_latest_path interface may still read repodata into memory #440

Comments

dholth commented Feb 15, 2024

Checklist

What happened?

Conda Info

Conda Config

Conda list

Additional Context

jaimergp commented Feb 15, 2024 • edited

dholth commented Feb 15, 2024 • edited

conda `fetch_latest_path` interface may still read repodata into memory #440

conda `fetch_latest_path` interface may still read repodata into memory #440

jaimergp commented Feb 15, 2024 •

edited

dholth commented Feb 15, 2024 •

edited