Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy 2 bringup #1997

Open
9 of 10 tasks
jakirkham opened this issue Aug 22, 2023 · 62 comments
Open
9 of 10 tasks

NumPy 2 bringup #1997

jakirkham opened this issue Aug 22, 2023 · 62 comments
Labels

Comments

@jakirkham
Copy link
Member

jakirkham commented Aug 22, 2023

NumPy is currently working on a NumPy 2.0 release, which is planned to come out later this year. Here are the current (draft) release notes. Also here's the upstream tracking issue ( numpy/numpy#24300 ), and ecosystem compatibility tracker.

Some questions worth discussing:

Todos:

cc @conda-forge/core

@h-vetinari
Copy link
Member

@ocefpaf
Copy link
Member

ocefpaf commented Aug 22, 2023

Can we merge these two issues just to make it easier to track them?

@jakirkham
Copy link
Member Author

The issue Axel raises seems like a subpoint of this issue (depending on what we decide). Namely do we want to opt-in to this newer/slimmer ABI and how does that fit into NumPy 2

@ocefpaf
Copy link
Member

ocefpaf commented Aug 22, 2023

Sure. IMO Axel issue is a subset of this one. I don't have strong opinions on which one to keep, or if you want to keep both, but I also don't want to get lost on two mega-threads 😬

@jakirkham
Copy link
Member Author

Added Axel's item to the list above

@h-vetinari
Copy link
Member

Handling the ABI is the key point here (that and current packages missing a <2). I updated the added item because the summary was not accurate.

Normally I'd say we do a dual migration (keep 1.x; add 2.0), but numpy has around 5000** dependents in conda-forge, so that would be a pretty substantial CI impact, especially if it takes a while to drop 1.x.

**

>mamba repoquery whoneeds numpy -c conda-forge -p linux-64 > tmp
># edit to remove header
>python
>>> q = open("tmp", "r").readlines()
>>> p = {x.strip().split(" ")[0] for x in q} - {""}
>>> len(p)
4898

Obviously not all of them are compiling against numpy, but still...

@jakirkham
Copy link
Member Author

I updated the added item because the summary was not accurate.

Thanks Axel! 🙏

Anyone should feel free to update the issue as needed 🙂

@jakirkham
Copy link
Member Author

Following up on our discussion earlier about improving the visibility of NPY_FEATURE_VERSION, started this NumPy PR ( numpy/numpy#24861 ) to message how the value is set

Also include a note about one approach we might take to ensure that value is embedded in built binaries. Though maybe there are better approaches for that portion

@jakirkham
Copy link
Member Author

There should now be a string baked into binaries built with NumPy to notate what kind of NumPy compatibility they have

numpy/numpy#25948

@jakirkham
Copy link
Member Author

It is worth noting that thanks to Axel and others we now have NumPy 2.0.0rc1 packages: conda-forge/numpy-feedstock#311

Also ecosystem support of NumPy 2 is being tracked in this issue Ralf opened: numpy/numpy#26191

We are now in a good spot to start testing building packages with NumPy 2

@h-vetinari
Copy link
Member

h-vetinari commented Apr 1, 2024

I discussed this with @rgommers recently and one important point that he brought up is the situation with pin_compatible, which we'll have to fix as part of any migration effort, probably with a piggyback migrator, since we'll need to rewrite the recipes.

In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export, and so feedstocks use pin_compatible under run:. However this will be doubly incorrect in the new setup - for one as our NPY_FEATURE_VERSION (which forms the lower bound) will be lower than the one at build time, and second because the upper bound should be something like <2.{{ numpy.split(".")[1] | int + 3 }} (for a project that's free of deprecation warnings; anything else might be deprecated in 2.{N + 1} and removed after two releases in 2.{N + 3}).

@h-vetinari
Copy link
Member

In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export [...]

Of course, if there's appetite for a split into libnumpy (with a run-export) and numpy (the python bits), that might be worth a thought as well. But then even moreso, we'd need a piggyback.

@rgommers
Copy link
Contributor

rgommers commented Apr 2, 2024

Of course, if there's appetite for a split into libnumpy (with a run-export) and numpy (the python bits), that might be worth a thought as well.

That doesn't sound good to me as a custom conda-forge split. If we want anything like that, let's do this properly and create a numpy-headers package that's officially supported by NumPy and that can be used by anyone (unless they need a static library or numpy.f2py) with a build-time dependency on the NumPy C API. We actually discussed this in a NumPy community meeting, and it seems feasible.

@isuruf
Copy link
Member

isuruf commented Apr 2, 2024

In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export, and so feedstocks use pin_compatible under run:

We do have a run_export on numpy.

@h-vetinari
Copy link
Member

Yeah... Clearly I shouldn't be writing these comments from a phone 😅

I misremembered that part, but in that case the task becomes easier - we just set up the right run export in numpy itself, and then remove pin_compatible in feedstocks that compile against numpy. Right?

@h-vetinari
Copy link
Member

Another question we have to answer soon: what mechanism do we want to use for setting NPY_FEATURE_VERSION... Perhaps the easiest would be an activation script in numpy, but that's a fairly big hammer, as it persists beyond build time and into all user environments.

Right now I'm thinking of setting NPY_FEATURE_VERSION in the global pinning (cleanly overrideable per feedstock where necessary), and then using that in conda-forge-ci-setup to populate the environment variable that numpy will pick up (and if necessary, in the compiler activation feedstocks, e.g. for CFLAGS).

The only issue there is that the run-export on numpy is not dynamic, in the sense that it gets fixed to the value of NPY_FEATURE_VERSION at the build time of numpy, and not the (potentially different) one in play when building something else against numpy. Unless I'm overlooking something, we'd therefore need to transform rather than remove the existing uses of pin_compatible("numpy") with something like

- {{ pin_compatible("numpy", lower_bound=NPY_FEATURE_VERSION) }}

while the upper bound (<2.{N + 3}) would be set by the run-export on numpy.

@jakirkham
Copy link
Member Author

What if we had something like...?

{% set version = "2.0.0" %}

package:
  name: numpy
  version: {{ version }}

...

build:
  ...
  run_exports:
    - {{ pin_subpackage("numpy", lower_bound=os.environ.get("NPY_FEATURE_VERSION", version)) }}

That way we can defer this environment variable setting to packages

If they don't set something, we can provide a sensible default (either version or something else we decide)

We could also consider whether conda-build could allow NPY_FEATURE_VERSION to be a pass through environment variable or if we handle that within conda-forge with some recipe changes to pass it through ourselves. This would let us set us use a global setting (as you suggest)

@rgommers
Copy link
Contributor

rgommers commented Apr 3, 2024

I don't think this type of NPY_FEATURE_VERSION setting is useful at all. NumPy guarantees to set it to a version that is not higher than the first numpy release that supported the Python minor version being built for. So all produced extension modules will work with all possible numpy versions that can actually be installed.

Hence, doing nothing should be the right default here, trying to change it from NumPy's default will likely only be the cause of extra complexity/confusion, and perhaps bugs.

@h-vetinari
Copy link
Member

That way we can defer this environment variable setting to packages

I'd be surprised if it works like that. AFAIU, that os.environ call will be resolved while building numpy.

I don't think this type of NPY_FEATURE_VERSION setting is useful at all. NumPy guarantees to set it to a version that is not higher than the first numpy release that supported the Python minor version being built for.

Leaving aside NEP29, this is a quantity we have to be able to control IMO. Otherwise our metadata for packages building against numpy is bound to be wrong, and deteriorate over time (when numpy inevitably moves the lower bound, and we don't move in sync across all our feedstocks). I don't see how we can reasonably avoid making NPY_FEATURE_VERSION explicit in conda-forge in some way.

@jakirkham
Copy link
Member Author

I very well could be wrong. It is easy to test

Really we just need more ideas to sample from. It's more important that we have a large variety before selecting one. So feel free to propose more

@rgommers
Copy link
Contributor

rgommers commented Apr 3, 2024

Otherwise our metadata for packages building against numpy is bound to be wrong, and deteriorate over time (when numpy inevitably moves the lower bound, and we don't move in sync across all our feedstocks).

It won't be wrong. The metadata that is in the upstream releases (i.e. the dependencies key in pyproject.toml files) is going to be updated by package authors, and that's the thing that should be relied on by conda-forge builds. The build-time version of numpy is now simply taken off the table completely, it no longer adds an extra constraint.

@h-vetinari
Copy link
Member

I'm not sure I follow. Say a project has numpy >=1.24,<2.3 in its pyproject.toml, is there some sort of hook that populates NPY_FEATURE_VERSION to 1.24? If so, how would that constraint arrive in the metadata of the packages we build?

Or do you mean that the default for that in numpy is so low (1.19?) that it won't ever become a factor? That seems doubtful to me.

Even aside from those questions, we still have an interest to provide a common baseline for numpy compatibility (so that most of conda-forge is still usable with the oldest supported numpy), and avoid that individual packages move on too quickly (unless they really need to), or extremely slowly (i.e. going back to 1.19 adds about 2 years on top on top of what NEP29 foresees w.r.t. being able to use a given ABI feature).

The build-time version of numpy is now simply taken off the table completely, it no longer adds an extra constraint.

In summary, this seems highly dubious to me. There's still a lower bound somewhere, either in numpy's default feature level, or in an explicit override of NPY_FEATURE_VERSION. However it comes to be, we should represent that lower bound in our package metadata exactly (or at the very least, something tighter).

@rgommers
Copy link
Contributor

rgommers commented Apr 3, 2024

Or do you mean that the default for that in numpy is so low (1.19?) that it won't ever become a factor? That seems doubtful to me.

This. And it's not doubtful, it is guaranteed to work. The whole point is to take away build-time version as a thing that dynamically overrides the declares runtime dependency range.

Even aside from those questions, we still have an interest to provide a common baseline for numpy compatibility (so that most of conda-forge is still usable with the oldest supported numpy), and avoid that individual packages move on too quickly

No, that does not make sense. If a package has numpy>=x.y in its constraints, you cannot just ignore that. The package author bumped the lower version for some reason, so if you tweak the metadata to say numpy>=x.y-N instead, you will allow a broken combination of packages.

In summary, this seems highly dubious to me. There's still a lower bound somewhere, either in numpy's default feature level, or in an explicit override of NPY_FEATURE_VERSION. However it comes to be, we should represent that lower bound in our package metadata exactly (or at the very least, something tighter).

No, and no. The lower bound is whatever dependencies=numpy... in pyproject.toml says, or it's a bug in the package (even if the package internally sets NPY_FEATURE_VERSION, which should be quite rare).

What the conda-forge tooling should do is check that the meta.yaml and pyproject.toml metadata is consistent - and I think that is a feature already present for regular Python packages. I.e., start treating numpy like any other Python package when building against numpy 2.x.

@h-vetinari
Copy link
Member

No, that does not make sense.

You chopped off the part of my quote that accounts for the scenario you describe.

The lower bound is whatever dependencies=numpy... in pyproject.toml says

I'm not saying we should disregard runtime constraints. I'm saying we also need to express constraints arising from the feature level - both of those can be attached to the same package without conflict. They stack and the intersection of both is what's actually permissible for the final artefact.

What the conda-forge tooling should do is check that the environment.yml and pyproject.toml metadata is consistent

I don't see this happening soon enough to be available for the 2.0 transition, it would need work on conda-build AFAICT.

and I think that is a feature already present for regular Python packages. I.e., start treating numpy like any other Python package when building against numpy 2.x.

I'm not sure what you mean here. Presumably by "python package" you don't mean "pure python" packages? Anything else that has a run-export (to my knowledge) uses the build-time version as a lower bound. That's precisely the issue that requires attention here, because of the very unusual situation how building against numpy 2.0 produces something compatible with >=1.x.

@rgommers
Copy link
Contributor

rgommers commented Apr 3, 2024

I'm saying we also need to express constraints arising from the feature level - both of those can be attached to the same package without conflict. They stack and the intersection of both is what's actually permissible for the final artefact.

What I am trying to explain is that that stacking is not doing anything, because

  • numpy will never set the feature version in a way that allows for this to have any effect, and
  • if a package does this internally by setting NPY_FEATURE_VERSION to something higher than what it says in its pyproject.toml, that's a bug in the package and should be fixed there by fixing its dependencies= metadata.

I'm not sure what you mean here. Presumably by "python package" you don't mean "pure python" packages?

Ah, I did mean this since I remember dependencies being flagged on PRs - but it may not be ready indeed, since it's marked as experimental:

image image

So it's still mostly manual then, depending on the feedstock maintainers to keep pyproject.toml and meta.yaml in sync?

@h-vetinari
Copy link
Member

h-vetinari commented Apr 20, 2024

Maybe I'm misunderstanding. Do we want to publish packages built using NumPy 2.0.0rc1 to a different label (say to do additional testing on them before releasing them in the wild)? Or do we just want to publish them to main?

So there's been a fair amount of relevant discussion in conda-forge/conda-forge-repodata-patches-feedstock#712 that should probably be moved here, but for completeness:

What we did for python 3.12.0rc was that we pulled python itself from a label (conda-forge/label/python_rc), but the packages built against 3.12 were published to main. Still, they couldn't be installed, because they depended on a python-build that was only available by explicitly opting in with -c conda-forge/label/python_rc. However, because of the ABI stability guarantee of CPython 3.12.0rc vs. GA, the moment we published python 3.12.0 to main, all those builds became installable (we had roughly ~1000 feedstocks done by the GA release).

Numpy gives the same guarantees about ABI stability between rc & GA, and so we could do exactly the same thing. I.e. packages built against numpy publish to main, but depend on a numpy that requires an opt-in channel. That would give us time to roll thing out in advance, but then the only way to roll that out over a period of ~weeks before GA release is if we do the 2.0 builds additively.

@seberg
Copy link

seberg commented Apr 20, 2024

packages built against numpy publish to main, but depend on a numpy that requires an opt-in channel

Except that there is no runtime dependency on a newer NumPy: That package will be installable right away since it also works with old NumPy versions.
The logic you should need is basically:

  • Use NumPy 1.22 for Python 3.8 (as newer version don't support 3.8, so they may not compile, even if the headers would probably be OK).
  • Use the newest available NumPy version (right now 2.0) to compile for all other Python versions.
  • It may be nice to honor an additional runtime/buildtime pin, but TBH, I don't think it matters: Even if a package only supports 1.x at runtime as long as compilation succeeds it'll be fine to compile it with NumPy 2. (I.e. it may be sensible to try with 2.0 and add the constraint if it fails.)

(Quite likely, I am misreading, but it looked a bit like that try wanted to build two versions, which would be fine but seems unnecessarily complicated.)

@h-vetinari
Copy link
Member

h-vetinari commented Apr 20, 2024

Except that there is no runtime dependency on a newer NumPy: That package will be installable right away since it also works with old NumPy versions.

You're right about that, the lower run-time dependence changes the calculus (unless we add another constraint on packages built during the RC phase, which I would have ended up with, see below).

Use NumPy 1.22 for Python 3.8

That's a good point actually; I think we should seriously consider dropping 3.8, rather than complicating the numpy 2.0 setup into a before/after 3.9 thing. should be OK iff we build everything else against 2.0rc1 🤷

Even if a package only supports 1.x at runtime as long as compilation succeeds it'll be fine to compile it with NumPy 2.

My assumption was we don't yet want builds against the RC to immediately be generally available, but if people are fine with that, why not?

@seberg
Copy link

seberg commented Apr 20, 2024

but if people are fine with that, why not?

I would say this is a gamble that may be fine (assuming the build succeeds). It seems pretty unlikly to go bad and if it fails would also seem to point to critical NumPy issues?

Basically, it isn't even a gamble: SciPy, matplotlib, pandas, etc. etc. already do this for wheels!

The one issue I can think of is weirder constructs that we saw in dependencies of eigenpy: eigenpy depedenecies may use NumPy through eigenpy which effectively makes them a single compilation unit from NumPy's perspective. That could be problematic unless they are transitioned all in one go (pinning each others versions). Likely, that isn't even a problem for conda, though.

@jakirkham
Copy link
Member Author

My assumption was we don't yet want builds against the RC to immediately be generally available, but if people are fine with that, why not?

Either approach seems potentially reasonable

We could start by putting a few packages built with the NumPy in a special label and test them out for a bit. Then, once we are comfortable, flip over to building and publishing them to main

Had thought maybe we want to do this with a few core ones like SciPy, Matplotlib, Pandas, some of the scikit-*s. That said, no strong feelings on this approach

@h-vetinari
Copy link
Member

I updated conda-forge/conda-forge-pinning-feedstock#5790 to build against 2.0.0rc1 directly; it looks like it still needs a smithy fix, see conda-forge/conda-smithy#1911

@h-vetinari
Copy link
Member

Migration is a-go! :)

@h-vetinari h-vetinari pinned this issue May 4, 2024
@h-vetinari
Copy link
Member

Oh, you know, just a little migration... *checks notes*... 2800 affected feedstocks 🫣😅

@h-vetinari
Copy link
Member

OK, it looks like that number is not correct - one of the packages that stood out to me was importlib_metadata, which doesn't depend on numpy at all. It's probably not a great sign that this is being picked up.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented May 4, 2024

Just so it doesn't get lost in the notifications, something seems to be off with the python 3.12 pinning. I know it is confusing to do dual migrations at the same time so I'm not sure there is an easy answer
conda-forge/conda-forge-pinning-feedstock#5790 (comment)

@h-vetinari
Copy link
Member

Yeah, thanks for the note @hmaarrfk & @xhochy. In previous numpy bumps, we always did them for the open python 3.x & pypy migrations as well, and this wasn't an issue. Here however, bumping the 3.12 migrator gets immediately picked up by a given feedstock rerendering and so runs into breakage if that package isn't yet numpy 2.0-compatible (basically all of them).

I undid the changes to the 312 migrator to unbreak this, but I'm kind of unsure how to square this circle now. I think for now the least bad option is to just accept that we don't get numpy 2.0 builds for 3.12. Alternatives might be:

  1. trying to override the whole numpy/python/python_impl zip in the numpy2 migrator to include 3.12 directly.
    • I tested this and it seems to work
    • Downside there is that this will open PRs to all python-related feedstocks, because the migrator bot sees python: as being updated.
  2. Close Python 3.12 migration (which would be waaaaay earlier than we usually do this)

With 1., the numpy2 migrator would look like

numpy:
  - 1.22  # no py38 support for numpy 2.0
  - 2.0
  - 2.0
  - 2.0
  - 2.0
python:
  - 3.8.* *_cpython
  - 3.9.* *_cpython
  - 3.10.* *_cpython
  - 3.11.* *_cpython
  - 3.12.* *_cpython
python_impl:
  - cpython
  - cpython
  - cpython
  - cpython
  - cpython

while the python312 one would stay untouched.

@h-vetinari
Copy link
Member

Next snag: bot isn't opening PRs 😑

@jakirkham
Copy link
Member Author

We are 87% of the easy through the Python 3.12 migration. Maybe we are close to closing this out? Perhaps with a little effort?

Wonder if there are other configuration options available to us if we make the NumPy 2 migration depend on the Python 3.12 migration? IOW the "no ice cream until you eat your vegetables" approach 😉

@h-vetinari
Copy link
Member

Wonder if there are other configuration options available to us if we make the NumPy 2 migration depend on the Python 3.12 migration?

Yeah, there's wait_for_migrators, but this is not exactly well tested and it would still run into the issue that we'd have to override the whole zip, which would then open PRs for all non-noarch python feedstocks.

@h-vetinari
Copy link
Member

We are 87% of the easy through the Python 3.12 migration.

The migrator status page is somewhat misleading on this - it reports the percentage of "PRs opened", not "PRs closed". So we're actually at ~67% with the 3.12 migration, with about ~900 feedstocks still to go. That's IMO quite a bit too early to close?

I guess the only nice alternative to points 1. & 2. above would be to implement something in cf-scripts like skip_unless_present:, where we could say that a feedstock should only be migrated if it depends on numpy, in order to force it to ignore the feedstocks that wrongly get picked up due to the presence of the python: key (that we need to override the zip) if we do point 1.

I guess there aren't really any other zips that are affected in this way, so it'd be a bit tailor-made for the numpy/python situation. From that POV, maybe we could also just deal with the hassle of closing the superfluous PRs this one time, because once we're on numpy 2.0 (and have dropped 3.8 around October), we can completely remove numpy from the python zip1 going forward.

Footnotes

  1. except PyPy, but that's a separate discussion for various reasons.

@h-vetinari
Copy link
Member

FWIW, I think we need to migrate 3.12 as well, as we won't simply be able to switch this one later (due to lots of missing dependencies then).

So if we don't do it as part of the overall numpy 2.0 migration, we'd need a separate migrator (now or later) that catches up 3.12 specifically. I feel that a second such migration wastes more time than it would do to deal with the overzealous PRs to python-but-not-numpy-related feedstocks (which by themselves aren't harmful even if merged).

@h-vetinari
Copy link
Member

I've paused the numpy2 migration for now until we decide how we want to handle this (as all PRs that are merged now would have to be cleaned up again later).

I've also opened a PR proposing to include python 3.12 in the current migrator.

@jakirkham
Copy link
Member Author

After some discussion, we decided to soft-close the Python 3.12 migration. Meaning that the Python 3.12 migration is still running, but we now also include Python 3.12 in the global pinning. As a result, re-rendering a feedstock could add Python 3.12 (even if its dependencies are not available on Python 3.12 yet). Meaning the feedstock would get Python 3.12 builds that would fail. These could simply be ignored for now. Based on the Python 3.12 migration status, this is only relevant for ~11% of feedstocks (admittedly that is ~500 feedstocks). The rest have either already been migrated or have a migration PR

With Python 3.12 now addressed, we restarted the NumPy 2 migration today ( conda-forge/conda-forge-pinning-feedstock#5851 ). Here is the status of NumPy 2 migration. It may take it a bit to start adding PRs to all the feedstocks that can now take a PR. So please be patient 🙂

Thanks all! 🙏

@bashtage
Copy link

statsmodels still seeling some NumPy 1.22 builds on my NumPy 2 migration. Have these not been migrated yet? Or will they never be migrated (and so need to be skipped (and if so, any pointers would be helpful)).

image

These all fail. All of the NumPy 2 targeted build pass (as do a few NumPy 1.22, although most of these should be disabled when the recipe is rerendered).

statsmodels NumPY 2 PR

@h-vetinari
Copy link
Member

h-vetinari commented May 17, 2024

These all fail.

Those are all the PyPy builds, which is not participating in the numpy 2.0 migration (IOW, they should run just as before, but have broken for unrelated reasons). I haven't opened an issue yet, but PyPy support is (unfortunately!) being wound down, so don't hesitate to skip those. I did just that in the statsmodels PR even before seeing your comment here... :)

@jakirkham
Copy link
Member Author

JFYI NumPy is planning to release 2.0.0 on June 16th

xref: numpy/numpy#24300 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

8 participants