Lazy index #13880

zklaus · 2024-05-01T11:43:27Z

Description

This was triggered by conda/conda-build#5154.
The goal is to introduce an Index API as a clean interface for the index, so as to allow well-defined interactions and avoid direct manipulation of Solver._index.
Currently, the index is handled as a simple dict, which is manipulated in various places to update with changed state.
This is particularly relevant in conda-build, where the index needs to be updated after the creation of new artifacts.
Introducing a standardized API makes this reloading more explicit.
The class Index also follows the spirit of PackageRecordList to avoid instantiating PackageRecords where not necessary, yielding both memory and time savings.

Additionally, this introduces memray into the testing facilities. With this PR, it becomes possible to use a pytest.mark.memray marker, which will make the test run in the linux-memray job (in addition to any other runs that may occur) under the observation of the pytest-memray plugin, which allows us to track memory usage.
Currently, this only leads to a little summary blob in the test logs; a future PR may expand upon this with persistent tracking of memory requirements similar to the benchmarking we do with codspeed.

Checklist - did you ...

Add a file to the news directory (using the template) for the next release's release notes?
Add / update necessary tests?
Add / update outdated documentation?

codspeed-hq · 2024-05-01T12:22:39Z

CodSpeed Performance Report

Merging #13880 will not alter performance

_{Comparing zklaus:lazy-index (a4fa702) with main (eb45954)}

Summary

✅ 21 untouched benchmarks

dholth · 2024-05-02T15:00:19Z

How does this compare to

conda/conda/core/subdir_data.py

Line 92 in be5da40

class PackageRecordList(UserList):

zklaus · 2024-05-06T13:58:58Z

@dholth, I added a bit of explanation to the PR description. Basically, it is inspired by PackageRecordList and tries to bring the lazy PackageRecord instantiation from there (and the related SubdirData and friends) to the index.

Tracking memory usage and time with pytest-memray, the current dict approach uses 50s and

Allocation results for tests/core/test_index.py::test_get_index_lazy at the high watermark

         📦 Total memory allocated: 1.8GiB
         📏 Total allocations: 104
         📊 Histogram of allocation sizes: |█▇▂▇▁|
         🥇 Biggest allocating functions:
                - raw_decode:/opt/conda/lib/python3.12/json/decoder.py:353 -> 1.1GiB
                - __hash__:/workspaces/conda-workspace/conda/conda/models/records.py:300 -> 182.5MiB
                - __set__:/workspaces/conda-workspace/conda/conda/auxlib/entity.py:441 -> 124.0MiB
                - join:/workspaces/conda-workspace/conda/conda/common/url.py:315 -> 83.0MiB
                - _pkey:/workspaces/conda-workspace/conda/conda/models/records.py:293 -> 43.1MiB

and the new lazy approach 10s and

Allocation results for tests/core/test_index.py::test_get_index_lazy at the high watermark

         📦 Total memory allocated: 1.2GiB
         📏 Total allocations: 91
         📊 Histogram of allocation sizes: |▁█▂▁ |
         🥇 Biggest allocating functions:
                - raw_decode:/opt/conda/lib/python3.12/json/decoder.py:353 -> 822.1MiB
                - decode:<frozen codecs>:322 -> 243.6MiB
                - raw_decode:/opt/conda/lib/python3.12/json/decoder.py:353 -> 146.5MiB
                - join:/workspaces/conda-workspace/conda/conda/common/url.py:315 -> 5.0MiB
                - _process_raw_repodata:/workspaces/conda-workspace/conda/conda/core/subdir_data.py:489 -> 2.0MiB

i.e. a reduction by 40s and 600MB, though a bunch of integration tests are currently failing.

dholth · 2024-05-06T18:15:34Z

That's phenomenal.

.github/workflows/tests.yml

Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>

zklaus · 2024-05-09T16:28:45Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

jezdez

This looks very promising, but IMO could go a little further in subsuming the supplement functions into Index instance methods (not static nor class methods).

The get_reduced_index could also use the Index internally instead of creating an ad-hoc dict instead.

jezdez · 2024-05-10T20:39:27Z

conda/core/index.py

-    for prefix_record in PrefixData(prefix).iter_records():
+    if isinstance(index, Index):
+        return
+    if isinstance(prefix, PrefixData):


Suggested change

if isinstance(prefix, PrefixData):

elif isinstance(prefix, PrefixData):

Functionally, it doesn't matter, of course, due to the return in the first if branch. Semantically, the first "if" deals with the index argument, the second if-else with the prefix argument, so on balance, I'd prefer to keep it as is. Does that work for you?

conda/core/index.py

tests/core/test_index.py

jezdez · 2024-05-10T20:41:47Z

tests/core/test_index.py

+        ),
+    }
+    subdir = PLATFORMS[(platform.system(), platform.machine())]
+    index = get_index(channel_urls=["conda-forge"], platform=subdir)


Why isn't this passing main as well?

This is to test the prepend feature, analogous to the other test_get_index_xxx_platform tests.

tests/core/test_index.py

conda/core/index.py

jezdez · 2024-05-10T20:59:20Z

conda/core/index.py

+def _supplement_index_with_prefix(
+    index: Index | dict[Any, Any],
+    prefix: str | PrefixData,
+) -> None:


As an example, this would be ideal to move into an instance method Index._supplement_prefix(prefix) and deprecate the function. Same for _supplement_index_with_cache and the other _supplement_* functions.

Co-authored-by: Jannis Leidel <jannis@leidel.info>

zklaus · 2024-05-13T08:35:57Z

Thanks for the quick review, @jezdez!

This looks very promising, but IMO could go a little further in subsuming the supplement functions into Index instance methods (not static nor class methods).

Dang, I already had the deprecations in place 😄

I removed them again because this is designed now as a drop-in replacement and it can be difficult to know what other code might rely on functions like these. For example, we have https://github.com/conda/conda-libmamba-solver/blob/1ac0aee96c4618c29ae2d2491fdc2ccd2b3305d4/conda_libmamba_solver/state.py#L186.

But if you are happy to remove them, then so am I.

We could even go further. Iterating an index is always expensive, particularly for large channels, so why not deprecate that? With the Index class in place we could to that by deprecating .__iter__, .keys, .values, .items.

The get_reduced_index could also use the Index internally instead of creating an ad-hoc dict instead.
Sure. Let me add that.

jezdez · 2024-05-13T09:21:09Z

Thanks for the quick review, @jezdez!

This looks very promising, but IMO could go a little further in subsuming the supplement functions into Index instance methods (not static nor class methods).

Dang, I already had the deprecations in place 😄

I removed them again because this is designed now as a drop-in replacement and it can be difficult to know what other code might rely on functions like these. For example, we have https://github.com/conda/conda-libmamba-solver/blob/1ac0aee96c4618c29ae2d2491fdc2ccd2b3305d4/conda_libmamba_solver/state.py#L186.

To be clear, we should still provide stub functions that are properly deprecated per CEP-9 that call Index behind the scenes. But for CLS specifically we maintain the API, so this is relatively painless by wrapping it into a try/except or similar techniques of inspecting conda's version and using a different import, or wrapping function. There are a few ways to do this.

But if you are happy to remove them, then so am I.

We could even go further. Iterating an index is always expensive, particularly for large channels, so why not deprecate that? With the Index class in place we could to that by deprecating .__iter__, .keys, .values, .items.

Migrating from dict with these methods to UserDict implies the methods remain, so not sure if deprecating the iterator API makes sense

The get_reduced_index could also use the Index internally instead of creating an ad-hoc dict instead.
Sure. Let me add that.

dholth · 2024-05-13T13:56:09Z

news/13880-lazy-index

@@ -0,0 +1,20 @@
+### Enhancements
+
+* Add `conda.core.index.Index` as drop-in replacement of realized dictionary index. (#13880)


Suggested change

* Add `conda.core.index.Index` as drop-in replacement of realized dictionary index. (#13880)

* Add `conda.core.index.Index` as a faster drop-in replacement of realized dictionary index. (#13880)

It would make sense to mention why the replacement is better.

zklaus · 2024-05-15T17:53:37Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

jaimergp · 2024-05-21T05:36:45Z

conda/core/index.py

+        self.system_packages = {
+            (
+                rec := _make_virtual_package(
+                    f"__{package.name}", package.version, package.build
+                )
+            ): rec
+            for package in context.plugin_manager.get_virtual_packages()
+        }


I think this could be a property or a method. Doing this in initialization regardless the value of add_system might be expensive (e.g. the __cuda package has a non negligible overhead).

zklaus · 2024-05-23T07:03:21Z

To give a bit of an update: It took me some time last week to make this work with get_reduced_index, which uncovered some bugs and rough edges. Now it works, but as you can see from the codspeed comment above this results in some unacceptable slow down.
Figuring out why that is, it ultimately comes down to the combination of the solver doing a bunch of iterating over the packages, for example Resolve.__init__ has no less than 5 full iterations over the index, and the size of the lazy index. Since for iterating the lazy Index behaves exactly as the current index with a realized dictionary, this is of course much slower with the full index compared to the reduced index.
I spent some time to try and remove the iterating from the solver, but ultimately gave up on that because it is too much, out-of-scope for this PR, and possibly to invasive a change.
Let's try to add a reduction to the Index class instead.

zklaus added 6 commits May 1, 2024 13:29

Add Index class

1fcbe3f

Remove debugging remnant

1a70467

Add better __repr__

71bfc03

Refactor __getitem__

c127a89

Add update from cache

fb3d720

Add update from track_features

ae234dd

conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label May 1, 2024

zklaus changed the title ~~Lazy index~~ [WIP] Lazy index May 1, 2024

zklaus added 6 commits May 6, 2024 12:14

Add lazy get_index test

d156f02

Fix PrefixData default bug

03a32a8

Raise PackagesNotFoundError instead of generic KeyError

aa7d030

Add __contains__ method for lazy in check

3f54a1e

Add expanded channels

e5c9e29

Add memray marker for test

8e6b474

Add memray tests to ci

8b4011f

jaimergp reviewed May 7, 2024

View reviewed changes

.github/workflows/tests.yml Outdated Show resolved Hide resolved

zklaus and others added 6 commits May 8, 2024 09:06

Update .github/workflows/tests.yml

ce283dc

Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>

Add prefix, cache, and feature handling to realized index

28267cf

Use realized index if available

1ca6933

Fix prefix lookup bug

a2e7e19

Revert to KeyError for dict compatibility

06ce30f

Allow Index and PrefixData for supplement with prefix

b1bd203

pre-commit-ci bot and others added 3 commits May 9, 2024 16:29

[pre-commit.ci] auto fixes from pre-commit.com hooks

74591f1

for more information, see https://pre-commit.ci

Adapt test for platforms

d0dd669

Fix platform detection

6a7606c

zklaus added 6 commits May 10, 2024 09:52

Fix shadowing

d9bb5b0

Fix

3ffc14a

Fix test

cf1c3f9

Add proper conda-forge test packages

f89cd54

Remove superfluous code

6f91afa

Add deprecations

0a148ca

zklaus changed the title ~~[WIP] Lazy index~~ Lazy index May 10, 2024

Add news entry

9b8bc3f

zklaus marked this pull request as ready for review May 10, 2024 13:48

zklaus requested a review from a team as a code owner May 10, 2024 13:48

jezdez requested changes May 10, 2024

View reviewed changes

zklaus and others added 3 commits May 13, 2024 08:06

Convert dict calls to literals

273cd6a

Move sample package dictionaries outside of test method

24a14a1

Apply suggestions from code review

2543931

Co-authored-by: Jannis Leidel <jannis@leidel.info>

Fix typo

33a5f1b

dholth reviewed May 13, 2024

View reviewed changes

zklaus added 4 commits May 15, 2024 14:40

Add subdirs as platform alternative

d6b2334

Base get_reduced_index on Index

4d58d26

Turn expanded_channels into list to deal with odd __hash__ in Channel

786e7cb

Add support for virtual system packages

f593245

[pre-commit.ci] auto fixes from pre-commit.com hooks

c894e1c

for more information, see https://pre-commit.ci

jaimergp reviewed May 21, 2024

View reviewed changes

zklaus added 4 commits May 23, 2024 07:37

Treat Index in supplement with system

affe5f7

Add ReducedIndex subclass

ca483f8

Fix formatting

3636615

Fix track feature handling

a4fa702

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy index #13880

Lazy index #13880

zklaus commented May 1, 2024 •

edited

codspeed-hq bot commented May 1, 2024 •

edited

dholth commented May 2, 2024

zklaus commented May 6, 2024

dholth commented May 6, 2024

zklaus commented May 9, 2024

jezdez left a comment

jezdez May 10, 2024

zklaus May 13, 2024

jezdez May 10, 2024

zklaus May 13, 2024

jezdez May 10, 2024

zklaus commented May 13, 2024

jezdez commented May 13, 2024

dholth May 13, 2024 •

edited

zklaus commented May 15, 2024

jaimergp May 21, 2024

zklaus commented May 23, 2024

	if isinstance(prefix, PrefixData):
	elif isinstance(prefix, PrefixData):

		@@ -0,0 +1,20 @@
		### Enhancements

		* Add `conda.core.index.Index` as drop-in replacement of realized dictionary index. (#13880)

	* Add `conda.core.index.Index` as drop-in replacement of realized dictionary index. (#13880)
	* Add `conda.core.index.Index` as a faster drop-in replacement of realized dictionary index. (#13880)

Lazy index #13880

Are you sure you want to change the base?

Lazy index #13880

Conversation

zklaus commented May 1, 2024 • edited

Description

Checklist - did you ...

codspeed-hq bot commented May 1, 2024 • edited

CodSpeed Performance Report

Merging #13880 will not alter performance

Summary

dholth commented May 2, 2024

zklaus commented May 6, 2024

dholth commented May 6, 2024

zklaus commented May 9, 2024

jezdez left a comment

Choose a reason for hiding this comment

jezdez May 10, 2024

Choose a reason for hiding this comment

zklaus May 13, 2024

Choose a reason for hiding this comment

jezdez May 10, 2024

Choose a reason for hiding this comment

zklaus May 13, 2024

Choose a reason for hiding this comment

jezdez May 10, 2024

Choose a reason for hiding this comment

zklaus commented May 13, 2024

jezdez commented May 13, 2024

dholth May 13, 2024 • edited

Choose a reason for hiding this comment

zklaus commented May 15, 2024

jaimergp May 21, 2024

Choose a reason for hiding this comment

zklaus commented May 23, 2024

zklaus commented May 1, 2024 •

edited

codspeed-hq bot commented May 1, 2024 •

edited

dholth May 13, 2024 •

edited