New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy index #13880
base: main
Are you sure you want to change the base?
Lazy index #13880
Conversation
CodSpeed Performance ReportMerging #13880 will not alter performanceComparing Summary
|
How does this compare to conda/conda/core/subdir_data.py Line 92 in be5da40
|
@dholth, I added a bit of explanation to the PR description. Basically, it is inspired by Tracking memory usage and time with
and the new lazy approach 10s and
i.e. a reduction by 40s and 600MB, though a bunch of integration tests are currently failing. |
That's phenomenal. |
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
pre-commit.ci autofix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very promising, but IMO could go a little further in subsuming the supplement functions into Index
instance methods (not static nor class methods).
The get_reduced_index
could also use the Index
internally instead of creating an ad-hoc dict instead.
for prefix_record in PrefixData(prefix).iter_records(): | ||
if isinstance(index, Index): | ||
return | ||
if isinstance(prefix, PrefixData): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(prefix, PrefixData): | |
elif isinstance(prefix, PrefixData): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally, it doesn't matter, of course, due to the return in the first if branch. Semantically, the first "if" deals with the index argument, the second if-else with the prefix argument, so on balance, I'd prefer to keep it as is. Does that work for you?
), | ||
} | ||
subdir = PLATFORMS[(platform.system(), platform.machine())] | ||
index = get_index(channel_urls=["conda-forge"], platform=subdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this passing main as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to test the prepend
feature, analogous to the other test_get_index_xxx_platform
tests.
def _supplement_index_with_prefix( | ||
index: Index | dict[Any, Any], | ||
prefix: str | PrefixData, | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, this would be ideal to move into an instance method Index._supplement_prefix(prefix)
and deprecate the function. Same for _supplement_index_with_cache
and the other _supplement_*
functions.
Co-authored-by: Jannis Leidel <jannis@leidel.info>
Thanks for the quick review, @jezdez!
Dang, I already had the deprecations in place 😄 I removed them again because this is designed now as a drop-in replacement and it can be difficult to know what other code might rely on functions like these. For example, we have https://github.com/conda/conda-libmamba-solver/blob/1ac0aee96c4618c29ae2d2491fdc2ccd2b3305d4/conda_libmamba_solver/state.py#L186. But if you are happy to remove them, then so am I. We could even go further. Iterating an index is always expensive, particularly for large channels, so why not deprecate that? With the
|
To be clear, we should still provide stub functions that are properly deprecated per CEP-9 that call
Migrating from dict with these methods to UserDict implies the methods remain, so not sure if deprecating the iterator API makes sense
|
@@ -0,0 +1,20 @@ | |||
### Enhancements | |||
|
|||
* Add `conda.core.index.Index` as drop-in replacement of realized dictionary index. (#13880) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Add `conda.core.index.Index` as drop-in replacement of realized dictionary index. (#13880) | |
* Add `conda.core.index.Index` as a faster drop-in replacement of realized dictionary index. (#13880) |
It would make sense to mention why the replacement is better.
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
self.system_packages = { | ||
( | ||
rec := _make_virtual_package( | ||
f"__{package.name}", package.version, package.build | ||
) | ||
): rec | ||
for package in context.plugin_manager.get_virtual_packages() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be a property or a method. Doing this in initialization regardless the value of add_system
might be expensive (e.g. the __cuda
package has a non negligible overhead).
To give a bit of an update: It took me some time last week to make this work with |
Description
This was triggered by conda/conda-build#5154.
The goal is to introduce an Index API as a clean interface for the index, so as to allow well-defined interactions and avoid direct manipulation of Solver._index.
Currently, the index is handled as a simple dict, which is manipulated in various places to update with changed state.
This is particularly relevant in conda-build, where the index needs to be updated after the creation of new artifacts.
Introducing a standardized API makes this reloading more explicit.
The class
Index
also follows the spirit ofPackageRecordList
to avoid instantiatingPackageRecord
s where not necessary, yielding both memory and time savings.Additionally, this introduces memray into the testing facilities. With this PR, it becomes possible to use a
pytest.mark.memray
marker, which will make the test run in thelinux-memray
job (in addition to any other runs that may occur) under the observation of the pytest-memray plugin, which allows us to track memory usage.Currently, this only leads to a little summary blob in the test logs; a future PR may expand upon this with persistent tracking of memory requirements similar to the benchmarking we do with codspeed.
Checklist - did you ...
news
directory (using the template) for the next release's release notes?