Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation of packages happens twice when a specific version is requested which is also a dependency of a locked/builtin packaged #103

Open
maartenbreddels opened this issue Mar 31, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@maartenbreddels
Copy link

馃悰 Bug

  1. I want to install typing-extensions==4.10.0, no problem, import micropip; await micropip.install(["typing-extensions==4.10.0"]) and I get it from pypi.
  2. I add pydantic, so I execute: import micropip; await micropip.install(["typing-extensions==4.10.0", "pydantic"]). In the dev console I first see 4.10.0 fetched from pypi, then pydantic from the cdn, and then (the bug) it fetches an older version from the CDN (https://cdn.jsdelivr.net/pyodide/v0.25.1/full/typing_extensions-4.7.1-py3-none-any.whl)

It seems the loadPackage from pyodide installs the dependencies of pydantic without realizing 4.10.0 is installed. I also see the old install overwrites the 4.10.0 one.

To Reproduce

Go to https://pyodide.org/en/stable/console.html

and execute import micropip; await micropip.install(["typing-extensions==4.10.0", "pydantic"])

Expected behavior

I should not get typing_extensions 4.7.0

Environment

  • Pyodide Version 0.25.0

A possible workaround would be to install 1 package at a time, but I don't know if that leads to different behaviour.

@maartenbreddels maartenbreddels added the bug Something isn't working label Mar 31, 2024
@ryanking13 ryanking13 transferred this issue from pyodide/pyodide Apr 2, 2024
@ryanking13
Copy link
Member

(tranferred the issue to micropip)

Thanks for opening the issue. Yes, this is a known bug, and it is a bit tricky to fix. Currently, micropip computes dependencies at runtime while installing packages to reduce the installation time. But to fix this, we should start the installation after computing all dependencies.

Probably I can think about creating some sort of staging step in the install process to double-check the dependencies before finally installing the package to the filesystem, but I guess we have to check how much this will affect runtime performance.

@maartenbreddels
Copy link
Author

Thanks, I wasn't sure it was a micropip issue, because I think loadPackage is JS, which lives in pyodide.

I was thinking about performance as well, and you probably want to download packages as soon as you can (and async/parallel). What about separating the two stages: 1) downloads/resolve, 2) install

@hoodmane
Copy link
Member

hoodmane commented Apr 2, 2024

What about separating the two stages: 1) downloads/resolve, 2) install

This does sound like the way to go. Currently if we have a diamond whether we succeed or fail depends on what order the requests resolve in:

Suppose A depends on B and C, B depends on D < 2 and C depends on D. We download A and look at its dependencies. We see B and C, pick the most recent version of each of them and start downloading them. If C finishes first, we see it depends on D and so we lock the most recent version of D, version 2. Then B finishes and we see that we have a conflict so we bail. OTOH if B finishes first, we get D version 1 and then C is okay with that.

We don't want to backtrack because micropip's performance is bad enough as it is. Maybe we could locate diamonds like this and deterministically fail if they exist, with a report like "mismatched requirements in diamond, please specify D's version explicitly".

@maartenbreddels
Copy link
Author

Is looking into uv and/or pixi something that is interesting? Resolving dependencies is hard, and since both teams use rust, we might benefit from it also on the performance side.

@hoodmane
Copy link
Member

hoodmane commented Apr 2, 2024

on the performance side.

Well the performance we're looking at is mostly waiting on requests so the performance of the actual computation shouldn't matter because it's completely dwarfed by network. Though it's possible that with pypi supporting pep 643 this won't be the case anymore, or at least it will be less dramatic.

Resolving dependencies is hard

Agreed, there's a lot to be said in favor of using existing libraries for correctness reasons.

We would really like to keep the backtracking to a minimum by default. A lot of these packages use exponential time algorithms to find the best solution or prove that none exist. By default, it would be nice to have a linear time algorithm that finds a solution or reports some aspect that confused it. People shouldn't be using micropip on every page load with a fixed package set, but they probably will unless we have a good way to guide them away from it.

And some users would benefit from an exponential time resolver that finds a solution if it exists.

@ryanking13
Copy link
Member

And some users would benefit from an exponential time resolver that finds a solution if it exists.

Yes, I am +1 for providing an option to use a backtracking resolver (not by default, but optionally). But we should find a way to integrate it with loadPackage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants