Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: numpy.any returns True given a boolean array of all False with the intel compiler #26197

Closed
AgilentGCMS opened this issue Apr 2, 2024 · 39 comments · Fixed by #26281
Closed
Assignees
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@AgilentGCMS
Copy link

Describe the issue:

np.any on a boolean array where all elements are False should return False. This has been the case for the past decade or so, across numerous numpy versions. However, I recently installed numpy 1.26.2, and that is not working as expected. For a 1D boolean array of 63 or less elements, all False, np.any returns False. However, if the length is 64 or more, np.any returns True.

Reproduce the code example:

import numpy as np
Y = np.zeros(63, bool) ; print(np.any(Y)) # prints "False", as expected
Y = np.zeros(64, bool) ; print(np.any(Y)) # prints "True"

Error message:

No response

Python and NumPy Versions:

import sys; print(sys.version)
3.11.6 (main, Dec 11 2023, 15:56:58) [GCC Intel(R) C++ gcc 11.3.1 mode]
print(np.__version__)
1.26.2

Runtime Environment:

[{'numpy_version': '1.26.2',
  'python': '3.11.6 (main, Dec 11 2023, 15:56:58) [GCC Intel(R) C++ gcc 11.3.1 '
            'mode]',
  'uname': uname_result(system='Linux', node='hercules-login-1.hpc.msstate.edu', release='5.14.0-162.12.1.el9_1.0.2.x86_64', version='#1 SMP PREEMPT_DYNAMIC Mon Jan 30 22:14:42 UTC 2023', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2',
                                'AVX512F',
                                'AVX512CD',
                                'AVX512_SKX',
                                'AVX512_CLX',
                                'AVX512_CNL',
                                'AVX512_ICL'],
                      'not_found': ['AVX512_KNL']}},
 {'filepath': '/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2023.1.0-4cujjco7etbwl34hwrtw3ree7dwhxnci/mkl/2023.1.0/lib/intel64/libmkl_rt.so.2',
  'internal_api': 'mkl',
  'num_threads': 80,
  'prefix': 'libmkl_rt',
  'threading_layer': 'intel',
  'user_api': 'blas',
  'version': '2023.1-Product'},
 {'filepath': '/apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.1.0-sb753366rvywq75zeg4ml5k5c72xgj72/compiler/2023.1.0/linux/compiler/lib/intel64_lin/libiomp5.so',
  'internal_api': 'openmp',
  'num_threads': 80,
  'prefix': 'libiomp',
  'user_api': 'openmp',
  'version': None}]
None

Context for the issue:

np.any() on an array of False elements returning False is a pretty basic functionality of numpy, and behaves as expected in multiple other versions of numpy across platforms. I rely on that to filter data regularly. So the fact that this breaks in 1.26.2 is very strange and unexpected.

@ngoldbaum
Copy link
Member

I can't reproduce this with numpy 1.26.4:

In [1]: import numpy as np

In [2]: Y = np.zeros(64, bool) ; print(np.any(Y))
False

In [3]: np.__version__
Out[3]: '1.26.4'

@mhvk
Copy link
Contributor

mhvk commented Apr 3, 2024

On a debian installation with a virtual environment in which I install 1.26.2, I also cannot reproduce this:

>>> import numpy as np
>>> np.__version__
'1.26.2'
>>> Y = np.zeros(63, bool) ; print(np.any(Y))
False
>>> Y = np.zeros(64, bool) ; print(np.any(Y))
False
>>> import sys; print(sys.version)
3.11.8 (main, Feb  7 2024, 21:52:08) [GCC 13.2.0]

Given that the break occurs at 64, one would suspect some kind of vectorization issue -- very weird!

@ngoldbaum
Copy link
Member

Could also be an issue building numpy with the intel compilers.

@seberg seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Apr 3, 2024
@seberg
Copy link
Member

seberg commented Apr 3, 2024

@AgilentGCMS any chance you can upgrade to 1.26.4, just to see if there was a recent related fix? Also, I am curious where you installed numpy from (might be important, but not sure).

You can also try defining:

export NPY_DISABLE_CPU_FEATURES="AVX512F"

or similar (add only features lower in the list you gave above, or add AVX2 if it still fails). That should disable using the loops that appear to be faulty here.

@seiko2plus does this issue ring a bell? This certainly must be SIMD dependent.

@AgilentGCMS
Copy link
Author

@seberg I installed numpy by compiling from source because I wanted to link to Intel MKL. I tried disabling different CPU features, and found out that the following

export NPY_DISABLE_CPU_FEATURES="AVX512_SKX"

solves this problem. I tried all the way up to arrays of length 2147483648 (2^32), and np.any() on an array of all False still gave False. So the next question is, why and how to solve this? For the moment I can disable AVX512_SKX instructions as above, but there has to be a more elegant solution.

@seberg
Copy link
Member

seberg commented Apr 3, 2024

Thanks for trying that, I think that is useful.

I don't know how to solve this, but if it has to do with AVX_SKX that helps. As far as I understand you also used the intel compiler?

but there has to be a more elegant solution.

Well, not without figuring out the root cause and the root cause might even be a compiler bug at this point. (No idea if that is likely, but it does seem like a bug that the test suite should normally notice.)

Also ping @rdevulap.

@AgilentGCMS
Copy link
Author

I don't know how to solve this, but if it has to do with AVX_SKX that helps. As far as I understand you also used the intel compiler?

That is correct, I used the Intel compiler.

@ngoldbaum
Copy link
Member

I installed numpy by compiling from source because I wanted to link to Intel MKL

Have you tried building numpy with gcc instead of the intel compilers? I don't think we test with the intel compilers in our CI. That would also help narrow down where the issue is.

@AgilentGCMS
Copy link
Author

@seberg I tried updating to numpy 1.26.4, however I could not perform the same test of excluding AVX512_SKX. Somehow it became part of the "baseline" optimizations,

[{'numpy_version': '1.26.4',
  'python': '3.11.6 (main, Dec 11 2023, 15:56:58) [GCC Intel(R) C++ gcc 11.3.1 '
            'mode]',
  'uname': uname_result(system='Linux', node='hercules-08-13', release='5.14.0-162.6.1.el9_1.0.1.x86_64', version='#1 SMP PREEMPT_DYNAMIC Mon Nov 28 18:44:09 UTC 2022', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE',
                                   'SSE2',
                                   'SSE3',
                                   'SSSE3',
                                   'SSE41',
                                   'POPCNT',
                                   'SSE42',
                                   'AVX',
                                   'F16C',
                                   'FMA3',
                                   'AVX2',
                                   'AVX512F',
                                   'AVX512CD',
                                   'AVX512_SKX',
                                   'AVX512_CLX',
                                   'AVX512_CNL',
                                   'AVX512_ICL'],
                      'found': [],
                      'not_found': ['AVX512_KNL']}},
 {'filepath': '/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2023.1.0-4cujjco7etbwl34hwrtw3ree7dwhxnci/mkl/2023.1.0/lib/intel64/libmkl_rt.so.2',
  'internal_api': 'mkl',
  'num_threads': 1,
  'prefix': 'libmkl_rt',
  'threading_layer': 'intel',
  'user_api': 'blas',
  'version': '2023.1-Product'},
 {'filepath': '/apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.1.0-sb753366rvywq75zeg4ml5k5c72xgj72/compiler/2023.1.0/linux/compiler/lib/intel64_lin/libiomp5.so',
  'internal_api': 'openmp',
  'num_threads': 1,
  'prefix': 'libiomp',
  'user_api': 'openmp',
  'version': None}]

and when I tried to disable it I got the error

RuntimeError: During parsing environment variable: 'NPY_DISABLE_CPU_FEATURES':
You cannot disable CPU feature 'AVX512_SKX', since it is part of the baseline optimizations:
(SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL).

On numpy 1.26.4 with intel compilers, the test with np.any also fails for arrays of length >= 64.

@AgilentGCMS
Copy link
Author

Who/what decides which features are part of baseline optimizations?

@AgilentGCMS
Copy link
Author

OK I figured out how to move the AVX512 features from baseline to found. The long and short of it is that this bug still exists in numpy 1.26.4, and the workaround of disabling AVX512_SKX still works.

@AgilentGCMS
Copy link
Author

Have you tried building numpy with gcc instead of the intel compilers? I don't think we test with the intel compilers in our CI. That would also help narrow down where the issue is.

I have not. You mean compile with gcc but still link to MKL, or not link to MKL at all?

@ngoldbaum
Copy link
Member

Both would provide useful debugging information probably.

@AgilentGCMS
Copy link
Author

AgilentGCMS commented Apr 3, 2024

I also noted that disabling AVX512_SKX actually slows down the evaluation of np.any. I ran the following script to test:

import time
Y = np.zeros(2**32, bool)
t1 = time.time()
for i in range(50):
    _ = np.any(Y)
t2 = time.time()
print('Time for 50 calls to np.any on an array length 2**32 = %.3f s'%(t2-t1))

and the times were significantly different:

Time for 50 calls to np.any on an array length 2**32 = 1.488 s # with AVX512_SKX enabled
Time for 50 calls to np.any on an array length 2**32 = 7.002 s # with AVX512_SKX disabled

So disabling gives the correct answer but is slower. I haven't checked if other operations like matmul are also affected.

@r-devulap
Copy link
Member

r-devulap commented Apr 3, 2024

I don't see this error with 1.26.2 or 1.26.4 when I run with numpy installed via with pip (built with gcc). This looks like a bug in Intel compiler. Let me try building with intel compiler.

@r-devulap r-devulap self-assigned this Apr 3, 2024
@AgilentGCMS
Copy link
Author

I don't see this error with 1.26.2 or 1.26.4 when I run with numpy installed via with pip (built with gcc). This looks like a bug in Intel compiler. Let me try building with intel compiler.

Thanks! I am not sure if this is specific to a particular version of the intel compiler. I am using a more recent version of openapi than I would like because of this issue which precludes me from compiling scipy with an older oneapi version.

@AgilentGCMS
Copy link
Author

I've been so far compiling with intel oneapi 2023.1.0. I switched to oneapi 2022.2.1, recompiled python and numpy, and I still see the same bug (and the same workaround, i.e., disabling AVX512_SKX).

I understand that switching to GCC would probably solve this problem, but unfortunately I can't afford to do that. I do a lot of linear algebra operations for my work, and using architecture-optimized linear algebra libraries (i.e., MKL) makes a huge difference in execution times.

@ngoldbaum
Copy link
Member

Can’t you compile numpy with gcc and still link against MKL?

@AgilentGCMS
Copy link
Author

Can’t you compile numpy with gcc and still link against MKL?

In principle yes. In practice, I don't know how to set compiler and linker flags when I compile numpy with gcc. As in, I know what those flags should be, I just don't know how to tell numpy that since it's not an autoconf/automake build.

@mattip
Copy link
Member

mattip commented Apr 3, 2024

Editing the issue title to reflect it uses the intel compiler.

Maybe related to #25044 to build NumPy on windows with the intel compiler.

The BLAS/LAPACK detection machinery should JustWork to detect MKL at build time. You can give meson hints if the wrong BLAS/LAPACK are chosen.

@mattip mattip changed the title BUG: numpy.any returns True given a boolean array of all False BUG: numpy.any returns True given a boolean array of all False with the intel compiler Apr 3, 2024
@r-devulap
Copy link
Member

@AgilentGCMS could you list the steps to build with Intel compiler? I am using icpx (Intel(R) oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308)) and keep seeing a missing symbol error:

ImportError: /np/build-install/usr/lib/python3/dist-packages/numpy/_core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so: undefined symbol: atan8_h

@AgilentGCMS
Copy link
Author

@AgilentGCMS could you list the steps to build with Intel compiler? I am using icpx (Intel(R) oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308)) and keep seeing a missing symbol error:

I have found that if I build with icx and icpx, the resulting numpy library has none of the architecture-optimized instruction sets, even when compiled with -xHost. So I've fallen back on compiling with icc, which is available in the intel oneapi as well. Here are the steps:

  1. Edit site.cfg to make sure the MKL include and library folders are set correctly. For me they are
[mkl]
library_dirs = /apps/spack-managed/oneapi-2022.2.1/intel-oneapi-mkl-2022.2.1-klrcilzymbsllrr6wmepfg2cfzem5ekd/mkl/2022.2.1/lib/intel64
include_dirs = /apps/spack-managed/oneapi-2022.2.1/intel-oneapi-mkl-2022.2.1-klrcilzymbsllrr6wmepfg2cfzem5ekd/mkl/2022.2.1/include
mkl_libs = mkl_rt
lapack_libs =
  1. Check numpy/distutils/intelccompiler.py and numpy/distutils/fcompiler/intel.py to make sure that icc and ifort are being used. Also check the optimizer flags. I have -fPIC -fp-model strict -O3 -diag-disable=10441 -fomit-frame-pointer -qopenmp. The diag-disable flag is to stop the compiler from reminding me to use icx instead of icc. I tried -xHost as well, but that includes AVX512_SKX in the base optimizations, which means it can no longer be disabled at runtime.
  2. Compile and install with python setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install.

@AgilentGCMS
Copy link
Author

AgilentGCMS commented Apr 4, 2024

I should also say that I've been able to reproduce this on another computer with intel oneapi 2022.1 compilers. The same workaround of disabling AVX512_SKX works.

@AgilentGCMS
Copy link
Author

AgilentGCMS commented Apr 4, 2024

The BLAS/LAPACK detection machinery should JustWork to detect MKL at build time. You can give meson hints if the wrong BLAS/LAPACK are chosen.

@mattip I tried your suggestion. I do have several pkgconfig files for the MKL libraries, so I tried a couple. Unfortunately they all fail without telling me why.

python -m pip install . -C-Dallow-noblas=false -Csetup-args=-Dblas=mkl-dynamic-lp64-gomp -Csetup-args=-Dlapack=mkl-dynamic-lp64-gomp
Processing /work2/noaa/co2/sbasu/packages/sources/numpy-1.26.4
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: -11
  ╰─> [17 lines of output]
      Collecting Cython<3.1,>=0.29.34
        Using cached Cython-3.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.2 kB)
      Collecting meson-python<0.16.0,>=0.15.0
        Using cached meson_python-0.15.0-py3-none-any.whl.metadata (4.1 kB)
      Collecting meson>=0.63.3 (from meson-python<0.16.0,>=0.15.0)
        Using cached meson-1.4.0-py3-none-any.whl.metadata (1.8 kB)
      Collecting pyproject-metadata>=0.7.1 (from meson-python<0.16.0,>=0.15.0)
        Using cached pyproject_metadata-0.7.1-py3-none-any.whl.metadata (3.0 kB)
      Collecting packaging>=19.0 (from pyproject-metadata>=0.7.1->meson-python<0.16.0,>=0.15.0)
        Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
      Using cached Cython-3.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
      Using cached meson_python-0.15.0-py3-none-any.whl (25 kB)
      Using cached meson-1.4.0-py3-none-any.whl (935 kB)
      Using cached pyproject_metadata-0.7.1-py3-none-any.whl (7.4 kB)
      Using cached packaging-24.0-py3-none-any.whl (53 kB)
      Installing collected packages: packaging, meson, Cython, pyproject-metadata, meson-python
      Successfully installed Cython-3.0.10 meson-1.4.0 meson-python-0.15.0 packaging-24.0 pyproject-metadata-0.7.1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: -11
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Segmentation fault (core dumped)

I have to say I find the new-fangled meson build process quite frustrating. At least with distutils I could look at the console output and get a clue as to what failed and why. This question on reddit sums up my feelings perfectly.

@mattip
Copy link
Member

mattip commented Apr 4, 2024

We should probably be pushing spin build more in the build documentation. You might find this spin-based workflow in a clean virtualenv more friendly:

python3 -m venv /tmp/venv3
source /tmp/venv3/bin/activate
pip install -r requirements/build_requirements.txt
spin build -- -Dallow-noblas=false -Dblas=mkl-dynamic-lp64-gomp -Dlapack=mkl-dynamic-lp64-gomp

@mattip
Copy link
Member

mattip commented Apr 4, 2024

There is this section about choosing a blas/lapack implementation, which uses spin build

@AgilentGCMS
Copy link
Author

@mattip Thanks, but that route failed too. At least now I have an error message:

../numpy/core/include/numpy/npy_common.h:391:2: error: npy_cdouble definition is not compatible with C99 complex definition !         Please contact NumPy maintainers and give detailed information about your         compiler and platform
#error npy_cdouble definition is not compatible with C99 complex definition ! \
 ^
../numpy/core/include/numpy/npy_common.h:398:2: error: npy_cfloat definition is not compatible with C99 complex definition !         Please contact NumPy maintainers and give detailed information about your         compiler and platform
#error npy_cfloat definition is not compatible with C99 complex definition ! \
 ^
../numpy/core/include/numpy/npy_common.h:405:2: error: npy_clongdouble definition is not compatible with C99 complex definition !         Please contact NumPy maintainers and give detailed information about your         compiler and platform
#error npy_clongdouble definition is not compatible with C99 complex definition ! \

@mattip
Copy link
Member

mattip commented Apr 5, 2024

I think that echoes the failures fixed in the WIP PR #25044. What version of NumPy are you building?

@AgilentGCMS
Copy link
Author

I think that echoes the failures fixed in the WIP PR #25044. What version of NumPy are you building?

1.26.4, the latest I could download from pypi. Also, I realized that the error snippet might not give the full picture, so I'm attaching the output of spin build.
build.log

@r-devulap
Copy link
Member

I was able to build it with icpx and don't see this failure. My oneAPI basekit installation does not seem to include icc though :/

@AgilentGCMS
Copy link
Author

@r-devulap It's my turn to ask you how you built with icpx/icx :-) Perhaps I can follow those same steps on my cluster.

@r-devulap
Copy link
Member

hah, it was chaotic to say the least. Let me try and reproduce the steps and document it. Might need a day or two to get to it.

@AgilentGCMS
Copy link
Author

@r-devulap Thanks! Also, more importantly, once you compiled successfully, did numpy show that AVX512_SKX was available and in use? It depends on the processor type on which you compiled and ran, so if AVX512_SKX instructions were not used, it would make sense that the test succeeded.

@r-devulap
Copy link
Member

Found a nice minimal way to build it, but with the caveat that I had to disable building SVML (looks like ipcx compiler fails to build SVML correctly, which is very ironic). Here are the steps:

  1. Used a docker with the oneAPI basekit pre-installed: intel/oneapi-basekit:devel-ubuntu22.04
  2. install and upgrade pip: apt install python3-pip and pip install --upgrade pip
  3. Set env CC and CXX to icx and icpx respectively.
  4. clone numpy main branch and build/install with: pip install . -Csetup-args=-Ddisable-svml=true

NumPy show config does show that it supports AVX512_SKX extensions:

>>> import numpy as np
>>> np.show_config()
Build Dependencies:
  blas:
    detection method: system
    found: true
    include directory: unknown
    lib directory: unknown
    name: mkl
    openblas configuration: unknown
    pc file directory: unknown
    version: 2023.0.0
  lapack:
    detection method: internal
    found: true
    include directory: unknown
    lib directory: unknown
    name: dep140496258242336
    openblas configuration: unknown
    pc file directory: unknown
    version: 2.1.0.dev0
Compilers:
  c:
    commands: icx
    linker: ld.bfd
    name: intel-llvm
    version: 2023.0.0
  c++:
    commands: icpx
    linker: ld.bfd
    name: intel-llvm
    version: 2023.0.0
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.10
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
Python Information:
  path: /usr/bin/python3
  version: '3.10'
SIMD Extensions:
  baseline:
  - SSE
  - SSE2
  - SSE3
  found:
  - SSSE3
  - SSE41
  - POPCNT
  - SSE42
  - AVX
  - F16C
  - FMA3
  - AVX2
  - AVX512F
  - AVX512CD
  - AVX512_SKX
  not found:
  - AVX512_KNL
  - AVX512_CLX
  - AVX512_CNL
  - AVX512_ICL

@r-devulap
Copy link
Member

See #26257 for tracking problems with SVML/ICPX compiler

@r-devulap
Copy link
Member

export CFLAGS=-fveclib=none fixes that problem. Will update meson to use this flag while building with intel-llvm compiler.

@AgilentGCMS
Copy link
Author

@r-devulap Sadly, that process did not work for me :-(

pip install . -Csetup-args=-Ddisable-svml=true
Processing /work2/noaa/co2/sbasu/packages/sources/numpy
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: -11
  ╰─> [17 lines of output]
      Collecting meson-python>=0.15.0
        Using cached meson_python-0.15.0-py3-none-any.whl.metadata (4.1 kB)
      Collecting Cython>=3.0.6
        Using cached Cython-3.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.2 kB)
      Collecting meson>=0.63.3 (from meson-python>=0.15.0)
        Using cached meson-1.4.0-py3-none-any.whl.metadata (1.8 kB)
      Collecting pyproject-metadata>=0.7.1 (from meson-python>=0.15.0)
        Using cached pyproject_metadata-0.7.1-py3-none-any.whl.metadata (3.0 kB)
      Collecting packaging>=19.0 (from pyproject-metadata>=0.7.1->meson-python>=0.15.0)
        Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
      Using cached meson_python-0.15.0-py3-none-any.whl (25 kB)
      Using cached Cython-3.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
      Using cached meson-1.4.0-py3-none-any.whl (935 kB)
      Using cached pyproject_metadata-0.7.1-py3-none-any.whl (7.4 kB)
      Using cached packaging-24.0-py3-none-any.whl (53 kB)
      Installing collected packages: packaging, meson, Cython, pyproject-metadata, meson-python
      Successfully installed Cython-3.0.10 meson-1.4.0 meson-python-0.15.0 packaging-24.0 pyproject-metadata-0.7.1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: -11
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Segmentation fault (core dumped)

Since this does not show exactly what went wrong, I tried spin build, which ended with the error

Checking for type "complex float" : NO

numpy/_core/meson.build:176:4: ERROR: Problem encountered: "complex.h" header does not include complex type complex float

Here's the full output of spin build

CFLAGS='-fveclib=none' spin build -- -Dallow-noblas=false -Dblas=mkl-dynamic-lp64-iomp -Dlapack=mkl-dynamic-lp64-iomp -Ddisable-svml=true
!! Could not load command `lldb` from file `spin.cmds.meson`.

$ /tmp/venv3/bin/python vendored-meson/meson/meson.py setup build --prefix=/usr -Dallow-noblas=false -Dblas=mkl-dynamic-lp64-iomp -Dlapack=mkl-dynamic-lp64-iomp -Ddisable-svml=true
The Meson build system
Version: 1.2.99
Source dir: /work2/noaa/co2/sbasu/packages/sources/numpy
Build dir: /work2/noaa/co2/sbasu/packages/sources/numpy/build
Build type: native build
Project name: NumPy
Project version: 2.1.0.dev0+git20240412.3f82757
C compiler for the host machine: icx (intel-llvm 11.3.1 "Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)")
C linker for the host machine: icx ld.bfd 2.35.2-24
C++ compiler for the host machine: icpx (intel-llvm 11.3.1 "Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)")
C++ linker for the host machine: icpx ld.bfd 2.35.2-24
Cython compiler for the host machine: cython (cython 3.0.10)
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program python3 found: YES (/tmp/venv3/bin/python)
Found pkg-config: /usr/bin/pkg-config (1.7.3)
Run-time dependency python found: YES 3.11
Has header "Python.h" with dependency python-3.11: YES
Compiler for C supports arguments -fno-strict-aliasing: NO
Test features "SSE SSE2 SSE3" : Parial support, missing(SSE SSE2 SSE3)
Test features "SSE" : Unsupported due to Implied feature "SSE2" is not supported
Test features "SSE2" : Unsupported due to Arguments "-msse, -msse2" are not supported
Test features "SSE3" : Unsupported due to Implied feature "SSE" is not supported
Test features "SSSE3" : Unsupported due to Implied feature "SSE" is not supported
Test features "SSE41" : Unsupported due to Implied feature "SSE" is not supported
Test features "POPCNT" : Unsupported due to Implied feature "SSE" is not supported
Test features "SSE42" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX" : Unsupported due to Implied feature "SSE" is not supported
Test features "F16C" : Unsupported due to Implied feature "SSE" is not supported
Test features "FMA3" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX2" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512F" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512CD" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512_KNL" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512_KNM" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512_SKX" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512_CLX" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512_CNL" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512_ICL" : Unsupported due to Implied feature "SSE" is not supported
Test features "AVX512_SPR" : disabled due to AVX512_SPR is disabled due to "intel-llvm compiler does not support it"
Configuring npy_cpu_dispatch_config.h using configuration
Message:
CPU Optimization Options
  baseline:
    Requested : min
    Enabled   :
  dispatch:
    Requested : max -xop -fma4
    Enabled   :

Library m found: YES
Run-time dependency mkl-dynamic-lp64-iomp found: YES 2023.1
Message: BLAS symbol suffix:
Program _build_utils/process_src_template.py found: YES (/tmp/venv3/bin/python /work2/noaa/co2/sbasu/packages/sources/numpy/numpy/_build_utils/process_src_template.py)
Program _build_utils/tempita.py found: YES (/tmp/venv3/bin/python /work2/noaa/co2/sbasu/packages/sources/numpy/numpy/_build_utils/tempita.py)
Configuring __config__.py using configuration
Checking for size of "short" : 2
Checking for size of "int" : 4
Checking for size of "long" : 8
Checking for size of "long long" : 8
Checking for size of "float" : 4
Checking for size of "double" : 8
Checking for size of "long double" : 16
Checking for size of "size_t" : 8
Checking for size of "size_t" : 8 (cached)
Checking for size of "wchar_t" : 4
Checking for size of "off_t" : 8
Checking for size of "Py_intptr_t" with dependency python-3.11: 8
Checking for size of "PY_LONG_LONG" with dependency python-3.11: 8
Has header "complex.h" : YES
Checking for type "complex float" : NO

numpy/_core/meson.build:176:4: ERROR: Problem encountered: "complex.h" header does not include complex type complex float

A full log can be found at /work2/noaa/co2/sbasu/packages/sources/numpy/build/meson-logs/meson-log.txt
Meson configuration failed; please try `spin build` again with the `--clean` flag.; aborting.

@AgilentGCMS
Copy link
Author

OK, I now have a version of numpy compiled with intel compilers which does not fail the np.any() test despite AVX512_SKX instructions being enabled. However, it is slower. It seems that I can either

  1. Compile with icx -fPIC -fp-model strict -xHost -O3 -fomit-frame-pointer -fiopenmp and get a version of numpy that is aware of all AVX512 instructions, and does not have the problem in the original bug report. However, performing the timing test above, it is slow and takes ~9 seconds.

  2. Compile with icc -fPIC -fp-model strict -xHost -O3 -fomit-frame-pointer -qopenmp and get a version of numpy that is also aware of AVX512 instructions, but fails the original test. It is, however, faster in the timing test, taking ~2 seconds.

@seiko2plus
Copy link
Member

seiko2plus commented Apr 15, 2024

I confirm it, it's a compiler-specific issue with the Intel Compiler Classic across all versions,
which does not occur with the Intel LLVM Compiler. The bug manifests specifically when the
scalar result of _cvtmask64_u64 is compared against the constant -1. This comparison
uniquely triggers a bug under conditions of equality (==) or inequality (!=) checks,
which are typically used in reduction operations like np.logical_or.

The underlying issue arises from the compiler's optimizer. When the last vector comparison
instruction operates on zmm, the optimizer erroneously emits a
duplicate of this instruction but on the lower half register ymm.
It then performs a bitwise XOR operation between the mask
produced by this duplicated instruction and the mask from the original comparison instruction.
This erroneous behavior leads to incorrect results as it introduces an unnecessary and
incorrect operation into the assembly code.

The bug should be triggered by the SIMD testing unit and it can reproduced by:

>>> from numpy._core import _simd; v = _simd.AVX512_SKX
>>> v.any_u8(v.setall_u8(0))
1

Proof of the bug and a suggested workaround:
https://godbolt.org/z/cTj63bjoa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants