Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use generic C for (D/Z)NRM2 on Windows x86_64 #2882

Merged
merged 2 commits into from Oct 11, 2020
Merged

Use generic C for (D/Z)NRM2 on Windows x86_64 #2882

merged 2 commits into from Oct 11, 2020

Conversation

martin-frbg
Copy link
Collaborator

to work around recent fpu exception bug in Win10 since build 19041

tentative fix for #2709

@mattip
Copy link
Contributor

mattip commented Oct 7, 2020

Is there a reason to prefer the *.S version over the *.c version on non-windows? I am a bit worried that separating the code paths by OS might cause confusion in the future if changes are made and they get out of sync.

@martin-frbg
Copy link
Collaborator Author

Performance ? This is nothing but a workaround for the convenience of users of a particularly buggy patchlevel of Windows, I do not see why it should be forced on other uses cases that do not need it. There is no reason to expect the plain (and simple) "generic" C version to suddenly outperform the .S across several generations of hardware and compilers, and I lack the time to benchmark it now. If anything, a "modern" replacement would probably need to be coded with SSE or AVX (the present nrm2_sse.S does not handle double precision) but I am not an assembler wizard who could whip this up overnight.
Having the conditional code in the KERNEL file (i.e. basically at Makefile level) should make it easier to keep track of
than having it somehow buried in the .S file itself

@mattip
Copy link
Contributor

mattip commented Oct 7, 2020

Thanks for the explanation. I agree that this whole issue has been quite a time sink, and appreciate your work in trying to fix what should be fixed elsewhere.

@mattip
Copy link
Contributor

mattip commented Oct 12, 2020

I am still getting failures with this and NumPy. I guess we still haven't worked out what is causing these:

 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DGEBAL parameter number  3 had an illegal value
 ** On entry to DGEHRD parameter number  2 had an illegal value
 ** On entry to DORGHR DORGQR parameter number  2 had an illegal value
 ** On entry to DHSEQR parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value

@martin-frbg
Copy link
Collaborator Author

My assumption was that numpy would be unlikely to use dsum/zsum as these are BLAS extensions

@mattip
Copy link
Contributor

mattip commented Oct 12, 2020

Not sure I follow. In KERNEL.generic DSUM, ZSUM are already the c-variants. Is there some way I can test the hypothesis that these kernels are involved?

@martin-frbg
Copy link
Collaborator Author

KERNEL.generic will only be used when building with TARGET=GENERIC, or when using MSVC. For the mingw builds, KERNEL (plus the respective KERNEL.target) are relevant.

@martin-frbg
Copy link
Collaborator Author

#2887 adds these to kernel/x86_64/KERNEL (the initial defaults to override are given in kernel/Makefile.L1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants