Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy 2.0 support #11066

Open
jakirkham opened this issue Apr 25, 2024 · 14 comments
Open

NumPy 2.0 support #11066

jakirkham opened this issue Apr 25, 2024 · 14 comments
Labels
needs triage Needs a response from a contributor

Comments

@jakirkham
Copy link
Member

jakirkham commented Apr 25, 2024

Problem description

NumPy 2.0 is coming out soon ( numpy/numpy#24300 ). NumPy 2.0.0rc1 packages for conda & wheels came out 3 weeks back ( numpy/numpy#24300 (comment) )

Feature description

To prepare for NumPy 2.0, it might be worthwhile to start testing Dask against NumPy 2 in CI

Also as NumPy is tracking ecosystem support for NumPy 2.0, it would be helpful to share Dask's current support status in issue (with any plans): numpy/numpy#26191

NumPy has put out a migration guide. More details are in the release notes. As Dask doesn't have C/C++ usage of NumPy, only the Python changes would be relevant

Additional information

Maybe Distributed needs similar considerations

Also Dask is both a consumer of NumPy and a producer of its own NumPy-like Array API. So there are two things to consider

  1. Adding support for NumPy 2.0 itself with Dask
  2. Updating Dask's Array API to more closely match NumPy 2
@github-actions github-actions bot added the needs triage Needs a response from a contributor label Apr 25, 2024
@jakirkham
Copy link
Member Author

jakirkham commented Apr 25, 2024

Running the dask.array test suite locally, am seeing a few test failures

FAILED dask/array/tests/test_array_core.py::test_broadcast_arrays - assert () == []
...
FAILED dask/array/tests/test_routines.py::test_atleast_nd_no_args[atleast_3d] - assert () == []
...
FAILED dask/array/tests/test_routines.py::test_atleast_nd_two_args[shape10-shape20-atleast_3d] - AssertionError: assert <class 'tuple'> is <class 'list'>

These happen with variations of parameters for each test. So are likely not specific to those parameters

Edit: Probably due to upstream changes in PR ( numpy/numpy#25570 ). Likely need similar fixes as PR ( #10929 )

@jrbourbeau
Copy link
Member

Thanks @jakirkham. We have a nightly upstream CI job that tests against the nightly dev version of numpy (and other packages). Unfortunately it's failing right now (example job) due to an ImportError being raised during the test suite setup.

Likely need similar fixes as PR ( #10929 )

Thanks for tracking that down

@mrocklin
Copy link
Member

@quasiben maybe this is something your team can help resolve?

@phofl
Copy link
Collaborator

phofl commented Apr 30, 2024

There are a couple of things that break our upstream ci:

  • numexpr
  • numba
  • sparse

None of these work with NumPy 2 and makes the build fail at import time, uninstalling them gets at least the tests running, even though we have some failures

context #11086

@jakirkham
Copy link
Member Author

jakirkham commented Apr 30, 2024

Regarding NumPy 2 support for these packages

  • numexpr

Done in 2.10.0. Please see PR: pydata/numexpr#478

  • numba

Support to work with NumPy 2 is tentatively planned for Numba 0.60.0. Please see issue: numba/numba#9544

Semantic support for NumPy 2 is tentatively planed for Numba 0.61.0. Please see issue: numba/numba#9540

  • sparse

This depends on Numba. Looks like no major issues are expected here

@phofl
Copy link
Collaborator

phofl commented Apr 30, 2024

We'd need numexpr 2.10 on conda forge (only 2.9 is available at the moment)

@jakirkham
Copy link
Member Author

jakirkham commented Apr 30, 2024

We'd need numexpr 2.10 on conda forge (only 2.9 is available at the moment)

Please see PR: conda-forge/numexpr-feedstock#62

Looks like there are build issues that may need addressing: conda-forge/numexpr-feedstock#62 (comment)


That said, conda-forge is still working on NumPy 2 bringup ( conda-forge/conda-forge-pinning-feedstock#5790 ). So conda-forge packages built with NumPy 2 support are not yet available

Sounds like that is a requirement here as well?

@phofl
Copy link
Collaborator

phofl commented Apr 30, 2024

No we are using nightlies for NumPy and a few other packages

@jakirkham
Copy link
Member Author

jakirkham commented Apr 30, 2024

How are those nightlies installed? Pip?

If so, maybe numexpr can be installed the same way?

@phofl
Copy link
Collaborator

phofl commented May 2, 2024

I care mostly about getting the upstream build running again, we still need a volunteer to fix the tests. Feel free to push to my PR if interested or open a new one on top of it

@quasiben
Copy link
Member

quasiben commented May 2, 2024

I started poking at this and it's a little challenging to assemble an environment given many dask dependencies are still at the release candidate stage. Below is how i got things going:

mamba env create -f ./continuous_integration/environment-3.11.yaml
mamba uninstall --force numpy pandas scipy numexpr numba sparse
python -m pip install pandas==2.2.2
pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ \
    --prefer-binary --pre 'pyarrow==17.*'
conda install -c conda-forge -c conda-forge/label/numpy_rc 'numpy==2.*' --force
python -c 'import numpy;print(numpy.__version__);import pandas;print(pandas.__version__);import pyarrow;print(pyarrow.__version__)'

2.0.0rc1
2.2.2
17.0.0.dev59

I started off small with a single test dask/array/tests/test_array_core.py -- only 9 failures! These seem mostly to do with h5py . Need to look into h5py numpy 2 support...

FAILED dask/array/tests/test_array_core.py::test_asarray_h5py[True-asarray] - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
FAILED dask/array/tests/test_array_core.py::test_asarray_h5py[True-asanyarray] - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
FAILED dask/array/tests/test_array_core.py::test_asarray_h5py[False-asarray] - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
FAILED dask/array/tests/test_array_core.py::test_asarray_h5py[False-asanyarray] - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
FAILED dask/array/tests/test_array_core.py::test_h5py_newaxis - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
FAILED dask/array/tests/test_array_core.py::test_h5py_tokenize - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
FAILED dask/array/tests/test_array_core.py::test_auto_chunks_h5py - ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

@jakirkham
Copy link
Member Author

Thanks Ben! 🙏

What happens if you pip install h5py?

@quasiben
Copy link
Member

quasiben commented May 4, 2024

h5py tests pass when installed from source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Needs a response from a contributor
Projects
None yet
Development

No branches or pull requests

5 participants