BUG: Regression in DGESDD / DLASCL on macOS Vortex M1 #3654

larsoner · 2022-06-16T18:56:59Z

On 8c20ca3 using OpenBLAS to build and compile SciPy I get problems when I try to take use DGESDD (or DGESVD) to take the SVD of a matrix of ones of shape (41, 50):

OpenBLAS larsoner$ make clean; git reset --hard; git clean -xdf; MACOSX_DEPLOYMENT_TARGET=11.0 make && make install
scipy larsoner$ git clean -xdf && ln -s ../site.cfg && python setup.py develop && python -c "import numpy as np, scipy.linalg; scipy.linalg.svd(np.ones((41, 50)))"
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/larsoner/python/scipy/scipy/linalg/_decomp_svd.py", line 131, in svd
    raise LinAlgError("SVD did not converge")
numpy.linalg.LinAlgError: SVD did not converge

On the previous commit 8e4c209, things seem to be okay (SVD decomp matches that on x86_64 Linux). Same thing with being on latest develop / 7060ca5 and doing git revert 8c20ca3 to revert just one of the two commits (8c20ca3) from #3399 things are okay.

@martin-frbg looks like #3399 this was your PR, can you replicate?

cross-ref numpy/numpy#21756

The text was updated successfully, but these errors were encountered:

martin-frbg · 2022-06-16T20:28:58Z

Sadly I cannot test on macOS/M1 with the gccfarm machine unavailable. I see no obvious error in the parameters, and all usual BLAS and LAPACK tests pass on the Linux/M1 machine there. (Getting numpy installed there may be a challenge).
Right now my assumption is either a miscompilation by xcode, or some undetected fault in the ThunderX2 kernels.
The "invalid parameter 4" from DLASCL - i.e. something creating NaNs - reminds me of #2709 where the system fmod turned out to be the culprit (though on Windows)

martin-frbg · 2022-06-16T20:52:17Z

Adapted Fortran test code from #348 also runs fine on Linux/M1

larsoner · 2022-06-16T20:53:52Z

@martin-frbg could you paste the file/commands here so a Fortran newbie (me) could compile + run a test locally?

larsoner · 2022-06-16T20:54:19Z

(Also, this might only be a bug with the 32-bit interface, if that matters -- TBD)

martin-frbg · 2022-06-16T21:51:28Z

gesdd.f.txt
gfortran -ffree-form gesdd.f libopenblas.a -lpthread -o gesdd
./gesdd
(my tests were done with the 32bit interface, though if you happened to mix a 64bit-integer numpy with a 32bit-integer OpenBLAS or vice versa that would go a long way towards explaining illegal parameter value errors from LAPACK calls)

larsoner · 2022-06-16T22:16:43Z

@martin-frbg indeed I have no problem running that code snippet on develop. I'm not sure why git revert 8c20ca345aad43c2f74a72b356afdbc1ec368e31 would fix things for NumPy. Feel free to close this if this Fortran code emitting no messages for you or me convinces you that the problem has to be at the NumPy end!

martin-frbg · 2022-06-16T22:53:43Z

Oh I'm easily upset, especially by someone with an actual math degree. Just in this case there is not much to go on and no way to reproduce this right now. (Guess I could try running the Fortran testcase on the neoverse in Travis CI though)

larsoner · 2022-06-17T12:40:23Z

Oh I'm easily upset, especially by someone with an actual math degree.

That was a long time ago, I doubt it's helping here :)

One other interesting point is that there is no problem on M1 when using TARGET=ARMV8. Is it possible that this is a more appropriate target on Apple M1 silicon than VORTEX? At least according to this, VORTEX would not seem to be the correct target:

https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_processors

This idea was brought up by @hoechenberger in numpy/numpy#21756 (comment)

martin-frbg · 2022-06-17T17:05:32Z

VORTEX is the name we picked for the Apple-developed arm64 cpus, but there are no actual M1-specific BLAS kernels in OpenBLAS so far anyway, that TARGET primarily adds the compiler flags for using the 8.3 instruction set.

ARMV8 on the other hand is the most basic arm64 target,its assembly kernels do not require more than the 8.0 instruction set (or what e.g. a Cortex A57 from ten years ago has to offer). And before #3399, the kernel selection for VORTEX was an exact
copy of this. My PR basically replaced a few of its BLAS kernels with the slightly more advanced ones that were contributed for Cavium ThunderX2 (which is still only the 8.1 instruction set). And as it was only a first stab at speeding up M1, it did not even replace something heavyweight like GEMM, only operations like ASUM, DOT, NRM2 or SWAP. (Given the experience from #2709, it might be worthwile to roll back the NRM2KERNEL changes now though...)

brada4 · 2022-06-19T05:03:22Z

@larsoner it would help if you troubleshoot these ways:

change kernel/arm64/KERNEL.VORTEX to include KERNEL.ARMV8 instead, i.e what you said is working but compiled with more functional ISA-s enabled - this checks for miscompilation if any.
If you see picture here at netlib there are other call chains than direct to DLASCL , so it would be worth attaching debugger breakpoint to XERBLA function and seeing which call chain is anomalous generating NaN-s. That would reduce search space below.
copy KERNEL.NEOVERSEN1 to KERNEL.VORTEX and bisect , i.e. drill down commenting half of D????? function overrides until it starts working

Not having hardware or even QEMU equivalent here - all hope on you ;-)

martin-frbg · 2022-06-19T07:00:33Z

@brada4 doing git revert 8c20ca3 was exactly that. And no need to copy entire KERNEL fliles, just adding

SNRM2KERNEL = ../arm/nrm2.c
DNRM2KERNEL = ../arm/nrm2.c

after the include line should be sufficient for a test. (Unfortunately the gccfarm M1 is still unavailable, and even if it was up I am unsure if I could install numpy there as it had limited disk space)

larsoner · 2022-06-19T10:46:02Z

there are other call chains than direct to DLASCL , so it would be worth attaching debugger breakpoint to XERBLA function and seeing which call chain is anomalous generating NaN-s. That would reduce search space below.

If you can point me to a tutorial or step-by-step for how to do this I'm happy to. But I'll start with trying to follow @martin-frbg 's suggestion since I think it covers what you suggest as well ...

change kernel/arm64/KERNEL.VORTEX to include KERNEL.ARMV8 instead...

No need to copy entire KERNEL fliles, just adding...

Okay so digging into kernels/arm64/VORTEX with what is in develop, and just to be complete -- using the following:

FAILS:

include $(KERNELDIR)/KERNEL.NEOVERSEN1

SUCCEEDS (n.b. the same as git revert 8c20ca3):

include $(KERNELDIR)/KERNEL.ARMV8

SUCCEEDS:

include $(KERNELDIR)/KERNEL.NEOVERSEN1
SNRM2KERNEL = ../arm/nrm2.c
DNRM2KERNEL = ../arm/nrm2.c

FAILS (makes sense because this is a double computation)

include $(KERNELDIR)/KERNEL.NEOVERSEN1
SNRM2KERNEL = ../arm/nrm2.c

SUCCEEDS:

include $(KERNELDIR)/KERNEL.NEOVERSEN1
DNRM2KERNEL = ../arm/nrm2.c

So it seems to be the dznrm2_thunderx2t99.c (included by KERNEL.NEOVERSEN1) that is problematic!

I'm happy to debug further if someone can give me some instructions -- I'm barely a C programmer nowadays so I'm hopeless in assembly :(

If you see picture here at netlib there are other call chains than direct to DLASCL

So another way to go here is to make it clear what code path NumPy is using that hits the bug. But before that, @martin-frbg I took another look at the pure-fortran example and noticed what I think is an error -- the second call was to SGESDD rather than DGESDD. When I change it to use DGESDD, I can replicate the error:

$ gfortran -ffree-form gesdd.f /opt/OpenBLAS/lib/libopenblas.a -lpthread -o gesdd
$ ./gesdd 
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_OVERFLOW_FLAG IEEE_UNDERFLOW_FLAG
STOP Error while computing the SVD!

@martin-frbg maybe you can replicate locally now, too? Here is the modified file (but the diff is just the one line changed):

gesdd.f.txt

martin-frbg · 2022-06-19T11:37:07Z

Oh sh.... sorry for that. Reproducible now on M1/Linux. Slightly improved solution is to set the DNRM2 (and SNRM2) kernel to "nrm2.S" (as used by most arm64 targets - and corresponding znrm2.S for CNRM2/ZNRM2 ). ThunderX2 kernel must be taking a shortcut or missing an initilalization somewhere that does not show up in the regular BLAS/LAPACK tests. Maybe even something introduced by its most recent fix to handle Inf in input.

larsoner · 2022-06-19T11:48:50Z

So @martin-frbg for now do you suggest that we put in this little workaround? It looks like you're already working on something in #3657 so happy to let you move forward however you think is best...

martin-frbg · 2022-06-19T12:09:28Z

Right now #3657 is simply to test the various DNRM2 kernels on NeoverseN1 (I do not have access to any ThunderX2 server hardware but from old issue #1854 the exact cpu probably does not matter as long as it is V8.1 or higher).

martin-frbg · 2022-06-19T12:12:09Z

Using the nrm2.S kernels will definitely fix the bug and should be more performant than the plain-C "generic" kernels. (Though a fixed thunderx2t99-style kernel should give best performance - but I am still horribly out of my depth with ARM assembly)

larsoner · 2022-06-19T12:25:33Z

but I am still horribly out of my depth with ARM assembly)

Okay so maybe until someone can fix the thunderx2t99 code to work on M1 / VORTEX we should just switch. +1 for putting this change into #3657, especially if your test case in that PR does turn things red on CIs so we can see it switch to green!

larsoner · 2022-06-19T14:57:37Z

I commented this over in #3657 (comment) , but just to have all the testing in one place, this KERNEL.VORTEX also appears to be okay, as suggested by @martin-frbg :

include $(KERNELDIR)/KERNEL.NEOVERSEN1
DNRM2KERNEL = ../arm64/nrm2.S
SNRM2KERNEL = ../arm64/nrm2.S

This was referenced Jun 16, 2022

MAINT: Cleaner f2py scipy/scipy#16421

Closed

BUG: linalg.svd broken on M1 / arm64 numpy/numpy#21756

Closed

MAINT: Use 64-bit on arm64 conda-forge/openblas-feedstock#139

Closed

martin-frbg mentioned this issue Jun 19, 2022

[WIP] Add test for DNRM2 bug on Neoverse #3657

Closed

larsoner mentioned this issue Jun 29, 2022

BUG: Use ARMV8 rather than VORTEX conda-forge/openblas-feedstock#140

Closed

5 tasks

martin-frbg mentioned this issue Jul 2, 2022

workaround ThunderX2 DNRM2 fault with ssq=inf,scale=0 #3675

Merged

martin-frbg closed this as completed in #3675 Jul 3, 2022

This was referenced Jul 6, 2022

BUG: utest dnrm2_tiny failed #3679

Closed

BUG: Non-normalized eigenvectors in linalg.eigh for complex hermitian matrix on osx-arm64 numpy/numpy#21950

Closed

larsoner mentioned this issue Jul 27, 2022

Timeline for 0.3.21? #3701

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Regression in DGESDD / DLASCL on macOS Vortex M1 #3654

BUG: Regression in DGESDD / DLASCL on macOS Vortex M1 #3654

larsoner commented Jun 16, 2022

martin-frbg commented Jun 16, 2022

martin-frbg commented Jun 16, 2022

larsoner commented Jun 16, 2022

larsoner commented Jun 16, 2022

martin-frbg commented Jun 16, 2022

larsoner commented Jun 16, 2022

martin-frbg commented Jun 16, 2022

larsoner commented Jun 17, 2022

martin-frbg commented Jun 17, 2022

brada4 commented Jun 19, 2022

martin-frbg commented Jun 19, 2022

larsoner commented Jun 19, 2022

martin-frbg commented Jun 19, 2022

larsoner commented Jun 19, 2022

martin-frbg commented Jun 19, 2022

martin-frbg commented Jun 19, 2022

larsoner commented Jun 19, 2022

larsoner commented Jun 19, 2022

BUG: Regression in DGESDD / DLASCL on macOS Vortex M1 #3654

BUG: Regression in DGESDD / DLASCL on macOS Vortex M1 #3654

Comments

larsoner commented Jun 16, 2022

martin-frbg commented Jun 16, 2022

martin-frbg commented Jun 16, 2022

larsoner commented Jun 16, 2022

larsoner commented Jun 16, 2022

martin-frbg commented Jun 16, 2022

larsoner commented Jun 16, 2022

martin-frbg commented Jun 16, 2022

larsoner commented Jun 17, 2022

martin-frbg commented Jun 17, 2022

brada4 commented Jun 19, 2022

martin-frbg commented Jun 19, 2022

larsoner commented Jun 19, 2022

martin-frbg commented Jun 19, 2022

larsoner commented Jun 19, 2022

martin-frbg commented Jun 19, 2022

martin-frbg commented Jun 19, 2022

larsoner commented Jun 19, 2022

larsoner commented Jun 19, 2022