BUG: <input>:1: RuntimeWarning: invalid value encountered in matmul #24067

hbombDTU · 2023-06-28T06:38:20Z

Describe the issue:

Getting an error with numpy version 1.25.0 when doing matrix multiplication with operator '@'.

When an array surpasses a certain size limit. Multiplying the two matrices produces a RuntimeWarning.

However, the warning is gone when downgrading numpy to 1.24.4. It also is removed when making a copy of the array and multiplying the original matrix with the copy. (X.T @ X_copy)

file1.zip
file2.zip
file3.zip

Reproduce the code example:

import numpy as np 

# Loading the first file
new = np.load('file1.zip')['file1']

# concatenating the other files 
for file in ['file2', 'file3']:
    x = np.load(f'{file }.zip')[file ]
    new = np.concatenate([new, x])

# performing matrix multiplication
new.T @ new

Error message:

<input>:1: RuntimeWarning: invalid value encountered in matmul

Runtime information:

import sys, numpy; print(numpy.version); print(sys.version)
1.25.0
3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)]

print(numpy.show_runtime())
[{'numpy_version': '1.25.0',
'python': '3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 '
'bit (AMD64)]',
'uname': uname_result(system='Windows', node='PC04218', release='10', version='10.0.22621', machine='AMD64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Haswell',
'filepath': 'C:\Users\hjk\dcalgo\dcalgo-deidid_q1to16_nurser_vwap\.venv\Lib\site-packages\numpy\.libs\libopenblas64__v0.3.23-gcc_10_3_0.dll',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'},
{'architecture': 'Haswell',
'filepath': 'C:\Users\hjk\dcalgo\dcalgo-deidid_q1to16_nurser_vwap\.venv\Lib\site-packages\scipy.libs\libopenblas_v0.3.20-571-g3dec11c6-gcc_10_3_0-c2315440d6b6cef5037bad648efc8c59.dll',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.21.dev'},
{'filepath': 'C:\Users\hjk\dcalgo\dcalgo-deidid_q1to16_nurser_vwap\.venv\Lib\site-packages\sklearn\.libs\vcomp140.dll',
'internal_api': 'openmp',
'num_threads': 24,
'prefix': 'vcomp',
'user_api': 'openmp',
'version': None}]
None

Context for the issue:

No response

charris · 2023-06-28T14:05:41Z

We see that occasionally on Windows, it doesn't seem repeatable. Are you using a 32 bit version of NumPy?

hbombDTU · 2023-06-29T05:41:13Z

Hi Charris,
I believe we are using 64 bit version of NumPy.

maciejskorski · 2023-06-29T06:35:20Z

I can't reproduce either, under Colab is fine. See / try on this notebook.

StefRe · 2023-06-29T07:28:41Z

@maciejskorski the Colab notebook gives the same warning <ipython-input-4-fa27106da462>:10: RuntimeWarning: invalid value encountered in matmul new.T @ new when run with numpy version 1.25.0

maciejskorski · 2023-06-29T09:07:20Z

@StefRe thanks for testing the breaking version.
I also tested that the matrix content is irrelevant, the error is trigerred by this:

M = np.random.random((30000, 270)) # on Colab warnings under numpy==1.25 
M.T @ M

So we have our minimal-not-working-example! (MNWE) 👍

seberg · 2023-06-29T09:17:08Z

Thanks, this must come somewhere from the OpenBLAS internals. Probably while deciding things like how to chunk up the job (in theory it could be uninitialized values).

The annoying thing, it would help to recompile openblas with debugging info (and maybe not super high optimization). That is unfortunately only the first step, the second one is to go write a small C program and compile with trapping math (or enable it), or run NumPy but then you have to enable the trapping math after importing NumPy (which is possible, by hacking it in).

Is anyone interested in diving into that? I can share some hack for the second part.

(I have done this type of thing once, but without recompiling openblas so it was a bit hard to really figure out where the issue happens exactly.)

seberg · 2023-06-29T09:18:45Z

Actually, maybe enabling trapping math (while possible) isn't that helpful... We know approximately where this happens, so stepping though until a NaN pops up somewhere may well be good enough.

maciejskorski · 2023-06-29T09:38:03Z

Why this has not been detected by mat-mul tests? Don't they check for warnings 🤔
The matrix size that triggers problems depends probably on the architecture, memory and so on? Playing with my notebook, I can see that (29000,270) works fine but (30000,270) triggers errors. @seberg this is what you referred to as "chunking up the job"?

seberg · 2023-06-29T09:47:15Z

We do see it randomly, but only "reliably" (still randomly) on the win32 CI (if this is the same thing). So CI might well hit it, but maybe only on your machine or so.
It is not completely impossible that this comes from uninintialized values even, or the compiler doing unsafe optimizations...

Or, our test suite just doesn't run the magic size that triggers it realiably here... In either case, we need to find out where it happens, ideally probably in a debugger where you can step through the OpenBLAS code to figure it out precisely. Once that is known, the Openblas fixup is probably easy.

chunking up the job

The BLAS implementation always has a lot of calculations to figure out how to distribute the kernel to its worker threads (and maybe just finding the size of scratch-space needed). The most probably issue is that one of those calculations causes a NaN to be created where it just doesn't matter. For example, by taking the root of a negative number or (or by uising uninitialized memory that happens to be an infinite, or...

EDIT: That said, it could of course also be the kernel itself that uses uninitialized values due to vectorization.

maciejskorski · 2023-06-29T09:58:11Z

We do see it randomly, but only "reliably" (still randomly) on the win32 CI (if this is the same thing). So CI might well hit it, but maybe only on your machine or so. It is not completely impossible that this comes from uninintialized values even, or the compiler doing unsafe optimizations...

The example I shared is reproducible under Colab's container so this happens on Linux too.

Or, our test suite just doesn't run the magic size that triggers it realiably here... In either case, we need to find out where it happens, ideally probably in a debugger where you can step through the OpenBLAS code to figure it out precisely. Once that is known, the Openblas fixup is probably easy.

chunking up the job

The BLAS implementation always has a lot of calculations to figure out how to distribute the kernel to its worker threads (and maybe just finding the size of scratch-space needed). The most probably issue is that one of those calculations causes a NaN to be created where it just doesn't matter. For example, by taking the root of a negative number or (or by uising uninitialized memory that happens to be an infinite, or...

In this specific case, mat-mul, it is likely chunking the matrix to make it into L2 cache or so?

Would it make sense to test the problematic mat-mul under the "native" OpenBLAS following their docs?

EDIT: That said, it could of course also be the kernel itself that uses uninitialized values due to vectorization.

mattip · 2023-06-29T09:59:27Z

Why this has not been detected by mat-mul tests? Don't they check for warnings

The easy answer is "it hits a code path we do not hit in tests". The harder question is "what codepath is triggered". I would imagine it is something like #16744, which turned out to be caused by registers not being properly restored. Maybe there is some race condition when chunking the large array.

mattip · 2023-06-29T10:04:42Z

Can you try limiting OpenBLAS to one thread with threadpoolctl

from threadpoolctl import threadpool_limits
import numpy as np
with threadpool_limits(limits=1, user_api='blas'):
    M = np.random.random((30000, 270))
    M.T @ M

mattip · 2023-06-29T10:17:11Z

... answering my own question:

this reproduces on Ubuntu22.04 on an AMD64 machine with NumPy 1.25.0
it reproduces even when limiting OpenBLAS to one thread

mattip · 2023-06-29T10:17:52Z

>>> np.show_runtime()
[{'numpy_version': '1.25.0',
  'python': '3.11.4 (main, Jun  7 2023, 12:45:48) [GCC 11.3.0]',
  'uname': uname_result(system='Linux', node='matti-ryzen', release='5.15.0-75-generic', version='#82-Ubuntu SMP Tue Jun 6 23:10:23 UTC 2023', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Zen',
  'filepath': '/tmp/venv3/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-7a851222.3.23.so',
  'internal_api': 'openblas',
  'num_threads': 24,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23'}]
>>>

maciejskorski · 2023-06-29T10:18:29Z

Can you try limiting OpenBLAS to one thread with threadpoolctl

Not helping :-(

seberg · 2023-06-29T10:19:16Z

Is it specific to a kernel? I once had a similar things on the Mac actually, and tried to narrow things down, but never got further than "its in the syrk routines where some sizes are estimated". It would be nice if its there, because the alternative seems even harder to figure out.

maciejskorski · 2023-06-29T10:19:46Z

Here is the Colab runtime spec:

[{'numpy_version': '1.25.0',
  'python': '3.10.12 (main, Jun  7 2023, 12:45:35) [GCC 9.4.0]',
  'uname': uname_result(system='Linux', node='ea606d5c044b', release='5.15.107+', version='#1 SMP Sat Apr 29 09:15:28 UTC 2023', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-7a851222.3.23.so',
  'internal_api': 'openblas',
  'num_threads': 2,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23'}]

seberg · 2023-06-29T10:20:48Z

Haswell and Zen... seems unlikely that it is the core type/kernel then (although might use the same one here I guess).

mattip · 2023-06-29T10:22:18Z

Strangely, the result does not have a Nan nor an inf:

>>> X = M.T @ M
<stdin>:1: RuntimeWarning: invalid value encountered in matmul
>>> np.any(np.isnan(X))
False
>>> np.any(np.isinf(X))
False

maciejskorski · 2023-06-29T10:22:26Z

Haswell and Zen... seems unlikely that it is the core type/kernel then (although might use the same one here I guess).

I can only make educated guesses, but to me the experiment "matrix size matters for the warning" plus your former remarks on chunking the job suggests it has something to do with doing mat-mul in batches with cache?

maciejskorski · 2023-06-29T10:23:13Z

Strangely, the result does not have a Nan nor an inf:

>>> X = M.T @ M
<stdin>:1: RuntimeWarning: invalid value encountered in matmul
>>> np.any(np.isnan(X))
False
>>> np.any(np.isinf(X))
False

It does not. The same question asked on SO, people blamed the OP for feeding ML models with NaNs 😄

hbombDTU · 2023-06-29T10:41:33Z

The warning has been reproduced with my colleagues also. So it is not a kernel issue and we have also checked for NaNs and infs. Would there be further investigations into this? The calculations seems to work fine, however, but what would be potential problems with this warning?

seberg · 2023-06-29T10:41:59Z

This this helps (warning, doesn't work in ipython for me, becaus ipython itself causes floating point exceptions -- although, if running in a debug you can probably continue on those.):

import ctypes
import numpy as np
dll = ctypes.CDLL(np.core._multiarray_umath.__file__)
dll.feclearexcept(61)
dll.feenableexcept(61)

The above 61 is the hardcoded mask value that I have on my linux laptop. If you do that and then run the code we may be able to narrow down where its raised first... After that maybe can go deeper.

(Yes, this is a hack, and will only work on linux.) EDIT2: of course this is only useful if you run inside a debugger, such as with gdb --args python and then r.

mattip · 2023-06-29T10:47:06Z

...
>>> dll.feenableexcept(61)
0
>>> M = np.random.random((30000, 270))
>>> M.T @ M

Thread 1 "python" received signal SIGFPE, Arithmetic exception.
0x00007ffff55417e5 in cblas_dsyrk () from /usr/local/lib/libopenblas.so.0
(gdb)

mattip · 2023-06-29T10:51:34Z

(gdb) bt
#0  0x00007ffff55417e5 in cblas_dsyrk () from /usr/local/lib/libopenblas.so.0
#1  0x00007ffff73a3153 in DOUBLE_matmul_matrixmatrix (ip1=0x7fffb6b3b010, is1_m=8, is1_n=2160, ip2=0x7fffb6b3b010, is2_n=2160, 
    is2_p=8, op=0x7fffb6aac010, os_m=2160, os_p=8, m=270, n=30000, p=270) at ../numpy/core/src/umath/matmul.c.src:174
#2  0x00007ffff73a671e in DOUBLE_matmul (args=0x7ffff7bbb980, dimensions=0x7fffffffbc88, steps=0xf510b8, __NPY_UNUSED_TAGGEDfunc=0x0)
    at ../numpy/core/src/umath/matmul.c.src:478
...

@martin-frbg any thoughts? Quick recap: this happens when calling syrk with a large input matrix. It did not happen on OpenBLAS 0.3.20, and happens in 0.3.23.

martin-frbg · 2023-06-29T13:06:10Z

Few recent changes, mostly related to where the switch from single to multithreading occurs. One such change made chunking-related variables available to "dynamic arch" builds that used to use one set for all cpus previously, maybe this is where an uninitialized variable could come into play. I'll try to reproduce and bisect later today

maciejskorski · 2023-06-29T13:52:53Z

To test another OpenBLAS version one has to compile from sources, is that right?
Is this issue something that can be reduced to few lines in native OpenBLAS and delegated to their dev team?

mattip · 2023-06-29T14:27:20Z

delegated to their dev team?

Martin is the OpenBLAS dev team :)

maciejskorski · 2023-06-29T15:53:55Z

delegated to their dev team?

Martin is the OpenBLAS dev team :)

faux pas. I am very sorry :-) I hope we will keep this discussion as much verbose and educative as possible, internals of computations sound fascinating.

martin-frbg · 2023-06-29T16:10:11Z

Silly bug introduced by me three months ago when I reworked the multithreading thresholds for SYMM and SYRK - the calculated workload size could trivially overflow the puny ìnt I had provided for it. (Which also explains why the result of the call was still correct, unless one trapped the exception) Trivial fix in OpenMathLib/OpenBLAS#4116

martin-frbg · 2023-06-29T16:14:48Z

So not much to learn from it except don't trust my judgement :/

charris · 2023-06-29T16:31:28Z

Darn, I was hoping for a fix of the long time occasional warnings on Windows :) But now I wonder if it is related to Windows having 32 bit default integers?

charris · 2023-06-29T16:38:13Z

Just for reference, the Windows problem manifested after the @ operator was directly implemented instead of using einsum.

maciejskorski · 2023-06-29T17:58:54Z

Just for reference, the Windows problem manifested after the @ operator was directly implemented instead of using einsum.

Here we were able to locate the problem because of reproducing and narrowing the scope in Colab, that others could access independently and comment. How do we debug and reproduce on Windows?

charris · 2023-06-29T18:34:23Z

@maciejskorski The problem is that the warning is only occasional, but always in the same test. There is another occasional warning involving complex numbers. Both of those only occur on Windows and don't seem to affect the correctness of the results, they just warn of invalid values.

maciejskorski · 2023-06-29T18:42:05Z

@maciejskorski The problem is that the warning is only occasional, but always in the same test. There is another occasional warning involving complex numbers. Both of those only occur on Windows and don't seem to affect the correctness of the results, they just warn of invalid values.

Is this a numpy test on CI/CD? Occasionally, because the data is randomized or because the algorithm is not fully deterministic (for various technical reasons, like parallelization)? Could we track and eventually (once spotted) share the data triggering the error?

martin-frbg · 2023-06-29T18:44:02Z

Is it reasonable to assume the other warning is related to something in OpenBLAS as well ?

charris · 2023-06-29T18:49:31Z

@martin-frbg I don't know. I've been shrugging them off as probably a platform/compiler problem. Going forward, I am going to start tracking them in an issue. They only occur a couple of times a month, and seem less frequent than they used to be, but that is very subjective.

mattip · 2023-06-30T08:58:19Z

Is the OpenBLAS develop branch stable enough to update openblas-libs to fix this issue?

martin-frbg · 2023-06-30T09:47:27Z

I think so, I'm currently trying to coordinate with xianyi after he made one of his ninja commits but I expect any immediate action will be limited to risc-v

hbombDTU · 2023-07-11T07:35:51Z

Could I get the current status for this bug?

mattip · 2023-07-11T09:50:03Z

I tried building the OpenBLAS version in MacPython/openblas-libs#101 but the macos arm64 build failed to recognize the new SVE intrinsics.

mattip · 2023-07-17T13:15:28Z

#24199 updates OpenBLAS to a version with a fix for this.

hbombDTU added the 00 - Bug label Jun 28, 2023

This was referenced Jul 16, 2023

update to work around numpy issue 24067 MacPython/openblas-libs#101

Merged

BLD: update OpenBLAS to an intermediate commit #24199

Merged

seberg closed this as completed in #24199 Jul 17, 2023

charris mentioned this issue Jul 24, 2023

BLD: update OpenBLAS to an intermeidate commit #24243

Merged

OleBialas mentioned this issue Aug 4, 2023

[JOSS] tests fail locally powerfulbean/mTRFpy#11

Closed

BUG: <input>:1: RuntimeWarning: invalid value encountered in matmul #24067

BUG: <input>:1: RuntimeWarning: invalid value encountered in matmul #24067

Comments

hbombDTU commented Jun 28, 2023

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

charris commented Jun 28, 2023

hbombDTU commented Jun 29, 2023

maciejskorski commented Jun 29, 2023

StefRe commented Jun 29, 2023

maciejskorski commented Jun 29, 2023 • edited

seberg commented Jun 29, 2023

seberg commented Jun 29, 2023

maciejskorski commented Jun 29, 2023 • edited

seberg commented Jun 29, 2023 • edited

maciejskorski commented Jun 29, 2023

mattip commented Jun 29, 2023

mattip commented Jun 29, 2023

mattip commented Jun 29, 2023

mattip commented Jun 29, 2023

maciejskorski commented Jun 29, 2023

seberg commented Jun 29, 2023

maciejskorski commented Jun 29, 2023

seberg commented Jun 29, 2023

mattip commented Jun 29, 2023

maciejskorski commented Jun 29, 2023

maciejskorski commented Jun 29, 2023 • edited

hbombDTU commented Jun 29, 2023

seberg commented Jun 29, 2023 • edited

mattip commented Jun 29, 2023

mattip commented Jun 29, 2023

martin-frbg commented Jun 29, 2023

maciejskorski commented Jun 29, 2023

mattip commented Jun 29, 2023

maciejskorski commented Jun 29, 2023

martin-frbg commented Jun 29, 2023

martin-frbg commented Jun 29, 2023

charris commented Jun 29, 2023

charris commented Jun 29, 2023

maciejskorski commented Jun 29, 2023 • edited

charris commented Jun 29, 2023

maciejskorski commented Jun 29, 2023 • edited

martin-frbg commented Jun 29, 2023

charris commented Jun 29, 2023

mattip commented Jun 30, 2023

martin-frbg commented Jun 30, 2023

hbombDTU commented Jul 11, 2023

mattip commented Jul 11, 2023

mattip commented Jul 17, 2023

maciejskorski commented Jun 29, 2023 •

edited

maciejskorski commented Jun 29, 2023 •

edited

seberg commented Jun 29, 2023 •

edited

maciejskorski commented Jun 29, 2023 •

edited

seberg commented Jun 29, 2023 •

edited

maciejskorski commented Jun 29, 2023 •

edited

maciejskorski commented Jun 29, 2023 •

edited