Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find_matches is underspecified - duplicate candidates #32

Open
pfmoore opened this issue Mar 12, 2020 · 4 comments
Open

find_matches is underspecified - duplicate candidates #32

pfmoore opened this issue Mar 12, 2020 · 4 comments

Comments

@pfmoore
Copy link
Collaborator

pfmoore commented Mar 12, 2020

The specification of the provider's find_matches method doesn't include any information about whether candidates need to be "unique". To give an example, consider two requirements, pip >= 19.0 and pip >= 20.0. The candidate pip-20.0-py3-none-any.whl satisfies both of these.

When a client implements find_matches on a provider, is it necessary that the same candidate is returned in both calls, or is it enough that "equivalent" candidates are returned? (To be honest, I'm not even clear what it means to be the "same" candidate here - is object identity enough?)

Reasons this matters:

  1. If methods on the candidate object are expensive to calculate (for pip, identify could involve building the project to get the project name), we want to avoid doing this multiple times if it's not needed.
  2. If different candidate objects are returned, will resolvelib (potentially) consider both of them, which would duplicate work (again, particularly expensive if we need to build projects as part of calling methods on the candidate). Or will "equivalent" candidates get merged?

I can look at the existing code to determine how things work, but this should be documented so that the implementation isn't constrained to keep internal details the same because clients rely on them.

@uranusjr
Copy link
Member

is it necessary that the same candidate is returned in both calls, or is it enough that "equivalent" candidates are returned?

The resolver does not compare candidates with each other, exactly because of the reason you raised: it does not (cannot) assume this is a sensical thing to do. So yes, it will result in duplicated work if equivalent (whatever this means) candidates are returned by find_matches().

I would incline to treat this as an optimisation problem; we be conservative right now and return some equivalent candidates if we’re not sure, and slowly figure out how to eliminate them. I also feel this would not be a very big problem in practice for pip, since PackageFinder already eliminates a lot of the duplicates. The only source of duplication would be direct URL and local source dir, either is used very much currently AFAICT since the current legacy resolver does not handle them very well.

@pradyunsg
Copy link
Contributor

And, one (nice?) thing about the separation of concerns in this API design, is that the optimization can/should happen on the Provider side, which is best positioned to correctly identify and cache "equivalent" candidates.

@pfmoore
Copy link
Collaborator Author

pfmoore commented Mar 12, 2020

Cool, I'm happy with that. But just to be clear, if I follow the logic in the code:

  1. The first requirement with a given identify() value (the reqirement's "name") has find_matches() called for it.
  2. Subsequent requirements are merged - we never even call find_matches() (maybe except if we backtrack, I never checked that code yet).

So the question of "multiple copies of the same candidate" never even crops up in the resolution code.

IMO, at some point this should be added to the docs, as a clarification. But for now I'm happy to simply have this issue as a reference.

It's easy to lose track of this when writing Requirement and Candidate objects that have the provider methods delegated to them (like the pip prototype does at the moment). I'm wondering whether it was a mistake to do that. Cue rewrite number 20 of the pip integration code 😉

@pradyunsg
Copy link
Contributor

I'm honestly a little concerned with the delegating that we're doing in our implementation, since it feels like more refactoring work later to cleanup responsibilities. But, yea, it's not a major concern but more of a back of the head thought atm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants