Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI/BLD: fail by default if no BLAS/LAPACK, add 32-bit Python on Windows CI job #24279

Merged
merged 5 commits into from Jul 29, 2023

Conversation

rgommers
Copy link
Member

The change of default in build system behavior follows up on the discussion in gh-24200. It avoids large accidental regression in performance. If users really want to build with fallback routines, it's a matter of passing a single -Dallow-noblas to opt into that.

The 32-bit Python on Windows CI job was removed from Azure when switching to Meson. This re-introduces it, on GitHub Actions because it's much easier to debug there. This should be low-maintenance, and exercise the -Dallow-noblas flag at the same time. It addressed one of the TODO's in gh-23981.

There was one failing test on this config, not surprising because it hasn't been tested before. Those were for floating point exceptions, which already had a note about issues with MSVC + 32-bit Python. Given that the behavior is going to change when SIMD support comes back, I simply disabled the test for now.

@rgommers
Copy link
Member Author

Well that is smoking out a bunch of issues ...

I bet there's a whole bunch of projects building NumPy in CI without BLAS support, either by accident or because it doesn't matter to them. I think we need to have this default though, because it affects the pip install numpy from source and most users are going to want BLAS/LAPACK support.

@rgommers rgommers force-pushed the ci-win-py32bit branch 2 times, most recently from 91f335a to bb3d188 Compare July 28, 2023 16:10
@rgommers rgommers marked this pull request as draft July 28, 2023 16:37
@rgommers rgommers force-pushed the ci-win-py32bit branch 2 times, most recently from 4571d11 to 401dca5 Compare July 28, 2023 19:27
Disable using BLAS in the PyPy job on Azure because it was broken.
Before this PR, it uses to silently not find OpenBLAS and continue,
now we have to be explicit about it.
@@ -28,26 +28,33 @@ steps:
mkdir C:/opt/openblas/openblas_dll
mkdir C:/opt/32/lib/pkgconfig
mkdir C:/opt/64/lib/pkgconfig
# TBD: support 32 bit testing
$target=$(python -c "import tools.openblas_support as obs; plat=obs.get_plat(); ilp64=obs.get_ilp64(); target=f'openblas_{plat}.zip'; obs.download_openblas(target, plat, ilp64);print(target)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is ugly :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this stuff is awful. I'm really looking forward to the day we have OpenBLAS in a wheel and get rid of this.

@charris
Copy link
Member

charris commented Jul 29, 2023

LGTM. Just for confirmation

  • 64 bit BLAS is not required unless specified.
  • 32 bit OpenBLAS is not working?

@rgommers
Copy link
Member Author

64 bit BLAS is not required unless specified.

Indeed, that's how it has always been - you get 32-bit by default, and have to opt in to ILP64.

32 bit OpenBLAS is not working?

It's working fine and is used in a bunch of places. Only the PyPy build is changed here, it was silently failing to find OpenBLAS ILP64 before (so passing with lapack-lite), and when I fixed the detection logic it started failing with a "can't import _multiarray_umath error". So that needs investigating, but on a Windows machine - debugging that via Azure isn't really possible. Cc @mattip for visibility.

@mattip
Copy link
Member

mattip commented Jul 29, 2023

it started failing with a "can't import _multiarray_umath error". So that needs investigating

I am not sure what you meant by "it" in the context of PyPy failing. It seems all the CI is passing here?

@rgommers
Copy link
Member Author

@mattip I had to add DISABLE_BLAS and -Dallow-noblas to the PyPy job to make it pass. There never was a working CI job that linked against openblas64_, and the "fail instead of silently proceeding" change in this PR smoked that out.

@charris charris merged commit 452d3ad into numpy:main Jul 29, 2023
75 checks passed
@charris
Copy link
Member

charris commented Jul 29, 2023

Let's get this in. Thanks Ralf.

@rgommers rgommers deleted the ci-win-py32bit branch July 30, 2023 08:21
@rgommers
Copy link
Member Author

@charris I think we can backport all this to 1.26.x, but set the default for allow-noblas to true. That way it keeps the code between the two branches in sync (which is useful), and allows using the flag to test whether our wheel builds and CI jobs indeed pick up OpenBLAS. While at the same time not changes the behavior for 1.26.0.

@rgommers rgommers added the 09 - Backport-Candidate PRs tagged should be backported label Jul 30, 2023
@charris
Copy link
Member

charris commented Aug 4, 2023

This seems already done in 1.26 except main isn't building pp39.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 4, 2023
@rgommers
Copy link
Member Author

rgommers commented Aug 4, 2023

This seems already done in 1.26 except main isn't building pp39.

Except the default is still false there, which is fine for now, but as I commented here, we should decide before the final release whether that is what we'd like to do.

@mattip
Copy link
Member

mattip commented Aug 6, 2023

Except the default is still false there, which is fine for now

I think we should allow building without OpenBLAS by default on 1.26. That has been the policy up to now. The time to change it is in the move to 2.0, not in a 1.26 interim release which is mainly meant to enable building with meson for python 3.12.

@rgommers
Copy link
Member Author

rgommers commented Aug 6, 2023

@mattip if it were only about the policy and there were no side effects, it would be a simple decision indeed and I'd agree we should only consider it for 2.0. Why I am hesitant (either way) is that the BLAS/LAPACK detection mechanisms changed. E.g. someone who had a ~/site.cfg file pointing to their library of choice will now, if the default is true, silently get a slow lapack-lite build. I'd expect even Linux distro packagers may get this wrong. E.g., they have OpenBLAS or ATLAS in the right place, but no pkg-config installed when they build a numpy 1.26.0 package. Usually they just look at their build automation that tells them a new release was uploaded to PyPI, trigger a build, and call it good if it builds and the test suite passes.

So the trade-off we deal with here is:

  • allow-noblas=true: likely silent performance regressions for a lot of users
  • allow-noblas=false: likely a lot more build failures early on, and making people who don't care about performance explicitly add -C-Dallow-noblas=true to their build/install command.

Given the above, would you still prefer allow-noblas=true?

@mattip
Copy link
Member

mattip commented Aug 6, 2023

Hmm. Crazy thought: could we port just enough of the system_info discovery system to give a clear error before using lapack-lite if

  • meson does not find a blas/lapack
  • the pre-meson discovery techniques would have found a blas/lapack (i.e. setup.cfg is a valid file, or the NPY_BLAS_LIBS/NPY_LAPACK_LIBS/NPY_CBLAS_LIBS env variables are set)

@rgommers
Copy link
Member Author

rgommers commented Aug 6, 2023

My first thought was "even more work, don't really wanna". Second thought: "yeah, I think this is actually feasible and may be appreciated by our users".

There's two ways of doing that:

  1. stripping down system_info.py and running it as a separate script called from numpy/meson.build through a run_command call,
  2. reimplementing a bunch of the logic here in Meson's DSL

Neither is going to be 100% complete without a lot of effort, but it's doable to catch most of the common cases with maybe half a day of work.

The trickiest situation is also the most common one: when the user did nothing special like setting an env var or creating a site.cfg file, but some library would have been auto-detected in one of the hardcoded paths in system_info.py. That case is easier to handle with (1), so I think I'd go for that approach.

@charris
Copy link
Member

charris commented Aug 6, 2023

I'd be happy just to be able to continue using something like .numpy-site.cfg. I read the Redhat thread on the missing pkgconf files for OpenBLAS, and I think they had a point about the difficulty in handling all the different OpenBLAS options, not to mention other BLAS versions. Which is to say, automatic detection may not be sufficient. IIRC, there was also a builtin priority in the way that was done before. Having a simple way to just specify which library to use would be helpful, along with platform specific documentation.

All this makes me appreciate Pearu's work way back in the beginning.

@rgommers
Copy link
Member Author

rgommers commented Aug 6, 2023

@charris you can use a custom file to specify the location, compile flags, etc. of BLAS/LAPACK. It has changed from a numpy-specific and ad-hoc format to a pkg-config file, as documented at http://scipy.github.io/devdocs/building/blas_lapack.html#using-pkg-config-to-detect-libraries-in-a-nonstandard-location.

IIRC, there was also a builtin priority in the way that was done before.

Yep, that part is still in the works (also for SciPy). I should have that done by the end of September; recreating all of system_info.py has turned out to be a little more challenging than expected.

All this makes me appreciate Pearu's work way back in the beginning.

definitely!

@charris
Copy link
Member

charris commented Aug 6, 2023

you can use a custom file to specify the location, compile flags

I ended up downloading a tarball from the OpenBLAS nightlies and installing it in /usr. Note that meson does not recognize PKG_CONFIG_PATH, so the install locations are limited to the defaults.

We could upgrade tools/openblas_support.py to make such an install, in Fedora 38 it currently installs in /var/tmp. That module could also use more documentation.

@rgommers
Copy link
Member Author

rgommers commented Aug 7, 2023

I ended up downloading a tarball from the OpenBLAS nightlies and installing it in /usr. Note that meson does not recognize PKG_CONFIG_PATH, so the install locations are limited to the defaults.

That's not how that works. You can put a .pc file anywhere, and then the PKG_CONFIG_PATH environment variable should be set to point to the location of the .pc file. The include and library directories are given inside the .pc files, and those can be any absolute path.

We could upgrade tools/openblas_support.py to make such an install, in Fedora 38 it currently installs in /var/tmp. That module could also use more documentation.

We will get rid of that file as soon as possible, so please don't spend more than the minimum needed effort on it. The plan is to change our custom OpenBLAS build from a .tar.gz to a wheel, and at that point the install command is pip install openblas and the install locations are then inside or relative to site-packages.

rgommers added a commit to rgommers/numpy that referenced this pull request Aug 11, 2023
This xfail was added in numpygh-24279 for 32-bit Python + MSVC when switching
to Meson and having temporarily no SIMD support. That is back now, so
this test passes again.

[skip circle] [skip cirrus] [skip travis]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants