New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: <input>:1: RuntimeWarning: invalid value encountered in matmul #24067
Comments
We see that occasionally on Windows, it doesn't seem repeatable. Are you using a 32 bit version of NumPy? |
Hi Charris, |
I can't reproduce either, under Colab is fine. See / try on this notebook. |
@maciejskorski the Colab notebook gives the same warning |
@StefRe thanks for testing the breaking version. M = np.random.random((30000, 270)) # on Colab warnings under numpy==1.25
M.T @ M So we have our minimal-not-working-example! (MNWE) 👍 |
Thanks, this must come somewhere from the OpenBLAS internals. Probably while deciding things like how to chunk up the job (in theory it could be uninitialized values). The annoying thing, it would help to recompile openblas with debugging info (and maybe not super high optimization). That is unfortunately only the first step, the second one is to go write a small C program and compile with trapping math (or enable it), or run NumPy but then you have to enable the trapping math after importing NumPy (which is possible, by hacking it in). Is anyone interested in diving into that? I can share some hack for the second part. (I have done this type of thing once, but without recompiling openblas so it was a bit hard to really figure out where the issue happens exactly.) |
Actually, maybe enabling trapping math (while possible) isn't that helpful... We know approximately where this happens, so stepping though until a |
Why this has not been detected by mat-mul tests? Don't they check for warnings 🤔 |
We do see it randomly, but only "reliably" (still randomly) on the win32 CI (if this is the same thing). So CI might well hit it, but maybe only on your machine or so. Or, our test suite just doesn't run the magic size that triggers it realiably here... In either case, we need to find out where it happens, ideally probably in a debugger where you can step through the OpenBLAS code to figure it out precisely. Once that is known, the Openblas fixup is probably easy.
The BLAS implementation always has a lot of calculations to figure out how to distribute the kernel to its worker threads (and maybe just finding the size of scratch-space needed). The most probably issue is that one of those calculations causes a NaN to be created where it just doesn't matter. For example, by taking the root of a negative number or (or by uising uninitialized memory that happens to be an infinite, or... EDIT: That said, it could of course also be the kernel itself that uses uninitialized values due to vectorization. |
The example I shared is reproducible under Colab's container so this happens on Linux too.
In this specific case, mat-mul, it is likely chunking the matrix to make it into L2 cache or so? Would it make sense to test the problematic mat-mul under the "native" OpenBLAS following their docs?
|
The easy answer is "it hits a code path we do not hit in tests". The harder question is "what codepath is triggered". I would imagine it is something like #16744, which turned out to be caused by registers not being properly restored. Maybe there is some race condition when chunking the large array. |
Can you try limiting OpenBLAS to one thread with
|
... answering my own question:
|
|
|
Is it specific to a kernel? I once had a similar things on the Mac actually, and tried to narrow things down, but never got further than "its in the |
Here is the Colab runtime spec: [{'numpy_version': '1.25.0',
'python': '3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0]',
'uname': uname_result(system='Linux', node='ea606d5c044b', release='5.15.107+', version='#1 SMP Sat Apr 29 09:15:28 UTC 2023', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Haswell',
'filepath': '/usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-7a851222.3.23.so',
'internal_api': 'openblas',
'num_threads': 2,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'}] |
Haswell and Zen... seems unlikely that it is the core type/kernel then (although might use the same one here I guess). |
Strangely, the result does not have a
|
I can only make educated guesses, but to me the experiment "matrix size matters for the warning" plus your former remarks on chunking the job suggests it has something to do with doing mat-mul in batches with cache? |
It does not. The same question asked on SO, people blamed the OP for feeding ML models with NaNs 😄 |
The warning has been reproduced with my colleagues also. So it is not a kernel issue and we have also checked for NaNs and infs. Would there be further investigations into this? The calculations seems to work fine, however, but what would be potential problems with this warning? |
This this helps (warning, doesn't work in ipython for me, becaus ipython itself causes floating point exceptions -- although, if running in a debug you can probably continue on those.):
The above 61 is the hardcoded mask value that I have on my linux laptop. If you do that and then run the code we may be able to narrow down where its raised first... After that maybe can go deeper. (Yes, this is a hack, and will only work on linux.) EDIT2: of course this is only useful if you run inside a debugger, such as with |
|
@martin-frbg any thoughts? Quick recap: this happens when calling |
Few recent changes, mostly related to where the switch from single to multithreading occurs. One such change made chunking-related variables available to "dynamic arch" builds that used to use one set for all cpus previously, maybe this is where an uninitialized variable could come into play. I'll try to reproduce and bisect later today |
To test another OpenBLAS version one has to compile from sources, is that right? |
Martin is the OpenBLAS dev team :) |
faux pas. I am very sorry :-) I hope we will keep this discussion as much verbose and educative as possible, internals of computations sound fascinating. |
Silly bug introduced by me three months ago when I reworked the multithreading thresholds for SYMM and SYRK - the calculated workload size could trivially overflow the puny ìnt I had provided for it. (Which also explains why the result of the call was still correct, unless one trapped the exception) Trivial fix in OpenMathLib/OpenBLAS#4116 |
So not much to learn from it except don't trust my judgement :/ |
Darn, I was hoping for a fix of the long time occasional warnings on Windows :) But now I wonder if it is related to Windows having 32 bit default integers? |
Just for reference, the Windows problem manifested after the |
Here we were able to locate the problem because of reproducing and narrowing the scope in Colab, that others could access independently and comment. How do we debug and reproduce on Windows? |
@maciejskorski The problem is that the warning is only occasional, but always in the same test. There is another occasional warning involving complex numbers. Both of those only occur on Windows and don't seem to affect the correctness of the results, they just warn of invalid values. |
Is this a numpy test on CI/CD? Occasionally, because the data is randomized or because the algorithm is not fully deterministic (for various technical reasons, like parallelization)? Could we track and eventually (once spotted) share the data triggering the error? |
Is it reasonable to assume the other warning is related to something in OpenBLAS as well ? |
@martin-frbg I don't know. I've been shrugging them off as probably a platform/compiler problem. Going forward, I am going to start tracking them in an issue. They only occur a couple of times a month, and seem less frequent than they used to be, but that is very subjective. |
Is the OpenBLAS develop branch stable enough to update openblas-libs to fix this issue? |
I think so, I'm currently trying to coordinate with xianyi after he made one of his ninja commits but I expect any immediate action will be limited to risc-v |
Could I get the current status for this bug? |
I tried building the OpenBLAS version in MacPython/openblas-libs#101 but the macos arm64 build failed to recognize the new SVE intrinsics. |
#24199 updates OpenBLAS to a version with a fix for this. |
Describe the issue:
Getting an error with numpy version 1.25.0 when doing matrix multiplication with operator '@'.
When an array surpasses a certain size limit. Multiplying the two matrices produces a RuntimeWarning.
However, the warning is gone when downgrading numpy to 1.24.4. It also is removed when making a copy of the array and multiplying the original matrix with the copy. (X.T @ X_copy)
file1.zip
file2.zip
file3.zip
Reproduce the code example:
Error message:
Runtime information:
import sys, numpy; print(numpy.version); print(sys.version)
1.25.0
3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)]
print(numpy.show_runtime())
[{'numpy_version': '1.25.0',
'python': '3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 '
'bit (AMD64)]',
'uname': uname_result(system='Windows', node='PC04218', release='10', version='10.0.22621', machine='AMD64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Haswell',
'filepath': 'C:\Users\hjk\dcalgo\dcalgo-deidid_q1to16_nurser_vwap\.venv\Lib\site-packages\numpy\.libs\libopenblas64__v0.3.23-gcc_10_3_0.dll',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'},
{'architecture': 'Haswell',
'filepath': 'C:\Users\hjk\dcalgo\dcalgo-deidid_q1to16_nurser_vwap\.venv\Lib\site-packages\scipy.libs\libopenblas_v0.3.20-571-g3dec11c6-gcc_10_3_0-c2315440d6b6cef5037bad648efc8c59.dll',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.21.dev'},
{'filepath': 'C:\Users\hjk\dcalgo\dcalgo-deidid_q1to16_nurser_vwap\.venv\Lib\site-packages\sklearn\.libs\vcomp140.dll',
'internal_api': 'openmp',
'num_threads': 24,
'prefix': 'vcomp',
'user_api': 'openmp',
'version': None}]
None
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: