Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many many errors when porting code to pyodide #2727

Open
gabrielfougeron opened this issue Jun 16, 2022 · 17 comments
Open

Many many errors when porting code to pyodide #2727

gabrielfougeron opened this issue Jun 16, 2022 · 17 comments
Labels
bug Something isn't working Fatal Error scipy

Comments

@gabrielfougeron
Copy link
Contributor

馃悰 Bug

Thanks to the amazing help of @ryanking13 and @hoodmane , I was able to start using module sparseqr in a small project of mineusing pyodide.

Of course, I never expected that the transition to pyodide would be easy, but upon running my own code in pyodide, I was surpised to see many different errors pop up.
I had never seen most of those errors before, and they are not very reproducible (most likely because the execution of the core relies on a random number generator).

Here is a list of the main error that my code throws, none of which I had ever gotten using conda's python:

- A weird Lapack/Blas error : "failed in converting 1st argument `x' of _fblas.daxpy to C/Fortran array"
- "RuntimeError: memory access out of bounds" thrown in different places including scipy
- "SystemError: Type does not define the tp_name field."
- "RuntimeError: null function or function signature mismatch"

I must say those errors are so foreign to me I don't really know where to start. Moreover, they are not very reproducible (even considering the RNG), as the very same call might or might not throw an error.

Of course I'm not asking the Pyodide team to debug my code, but merely to give me a few pointers/ideas where to start.

To Reproduce

Go to https://gabrielfougeron.github.io//choreo/ and click "Choreo Execute!". Several print statements should appear in the console.

Expected behavior

No errors.

Environment

I built pyodide myself from pull request #2685

  • Build environment : WSL2
@gabrielfougeron gabrielfougeron added the bug Something isn't working label Jun 16, 2022
@hoodmane
Copy link
Member

hoodmane commented Jun 16, 2022

You seem to be hitting bugs in our scipy port. The current status of the scipy test suite is that... part of it passes. Last I checked the status of the test suite is as follows:

module failed passed %
cluster 0 134 100%
constants 0 10 100%
fft 0 4961 100%
fftpack 0 553 100%
integrate 7 157 96%
interpolate 2 605 99.6%
io 20 301 93.8%
linalg fatal error 1% in
misc 0 8 100%
ndimage 5 3563 99.9%
odr fatal error 11% in
optimize 26 1659 98.5%
signal fatal error 97% in
sparse fatal error 1% in
spatial fatal error 2% in
special 6 1880 99.7
stats fatal error 86% in

The good news is that most errors are fatal, usually it doesn't work but give the wrong answer. The bad news is there are still a lot of fatal errors. Usually these null function or function signature mismatch errors are fairly simple to fix, but it is time consuming.

@hoodmane
Copy link
Member

The root cause of the problem is that we don't have a working fortran compiler. See: scipy/scipy#15290

@gabrielfougeron
Copy link
Contributor Author

Thanks very much for this quick and detailled answer !
Not much I can do in the short term I guess ...
Should I close the issue ?

@hoodmane
Copy link
Member

No, leave it open, it isn't fixed.

@hoodmane
Copy link
Member

See #2728 for an example of the kind of thing that can cause these failures...

@hoodmane
Copy link
Member

hoodmane commented Jun 16, 2022

linalg/_isolve/tests/test_gcrotmk.py::TestGCROTMK::test_cornercase seems to cause memory corruption which may have something to do with the weirder problems you see.

@hoodmane
Copy link
Member

Sometimes I see:

Traceback (most recent call last):
  File "/lib/python3.10/site-packages/_pyodide/_base.py", line 431, in eval_code
    .run(globals, locals)
  File "/lib/python3.10/site-packages/_pyodide/_base.py", line 300, in run
    coroutine = eval(self.code, globals, locals)
  File "<exec>", line 217, in <module>
  File "<exec>", line 49, in main
  File "/lib/python3.10/site-packages/choreo/Choreo_funs.py", line 549, in Make2DChoreoSymManyLoops
    the_lcm = m.lcm(*nbpl)
NameError: name 'm' is not defined

is this a logic error in your code?

@gabrielfougeron
Copy link
Contributor Author

The name m not being defined is very weird.

The source code for this file is available at https://github.com/gabrielfougeron/Choreographies2/blob/main/choreo/Choreo_funs.py

At line 13, I have import math as m, so that the name m should be in scope.

I have no idea why an error is thrown. (It doesn't do this using python 3.10 in conda). More surprisingly, I've not yet seen this error in pyodide ...

@hoodmane
Copy link
Member

It must be due to early memory corruption destroying the Python vm's data structures.

@gabrielfougeron
Copy link
Contributor Author

linalg/_isolve/tests/test_gcrotmk.py::TestGCROTMK::test_cornercase seems to cause memory corruption which may have something to do with the weirder problems you see.

Yes, this sounds very plausible

@hoodmane
Copy link
Member

hoodmane commented Jun 16, 2022

I think the main problem is that gcrotmk corrupts memory. The following usually reproduces the problem:

await pyodide.loadPackage(["scipy"])
pyodide.runPython(`
import numpy as np
from scipy.sparse import eye
from scipy.sparse.linalg._isolve import gcrotmk

n = 100
for i in range(20):
    A = 2*eye(n)
    b = np.random.rand(n)
    gcrotmk(A, b, tol=0, maxiter=10)
`);

The gcrotmk method is defined in Python so we could binary search it to locate which blas/lapack call is breaking.

@gabrielfougeron
Copy link
Contributor Author

I think you hit the nail right on the head.

Indeed if you change the option Solver => Optimizer => Krylov method from LGMRES to CGS, the error disappears.

LGMRES calls gcrotmk under the hood which then leads to the memory corruption.

@hoodmane
Copy link
Member

Well I guess that gives a good short term fix. I will look into fixing gcrotmk.

@hoodmane
Copy link
Member

I suspect the problem is in nrm2 which we have patched as follows:

@@ -379,10 +393,15 @@ end function <prefix2c>dotc
 
 function <prefix3>nrm2(n,x,offx,incx) result(n2)
 
-  <ftypereal3> <prefix3>nrm2, n2
+  <ftypereal3> n2
+  double precision <prefix3>nrm2
 
-  callstatement (*f2py_func)(&<prefix3>nrm2, &n,x+offx,&incx)
-  callprotoargument <ctypereal3>*,F_INT*,<ctype3>*,F_INT*
+  callstatement <prefix3>nrm2_return_value=(*f2py_func)( &n,x+offx,&incx)
+  callprotoargument int*,<ctype3>*,int*
+
+  fortranname F_FUNC(<prefix3>nrm2,<S,SC>NRM2)
+  ! This following line is to avoid Fortran wrappers - fix for CLAPACK
+  intent(c) <prefix3>nrm2
 
   <ftype3> dimension(*),intent(in) :: x
 
@@ -399,10 +418,14 @@ end function <prefix3>nrm2
 
 function <prefix4>nrm2(n,x,offx,incx) result(n2)
 
-  <ftypereal4> <prefix4>nrm2, n2
+  callstatement <prefix4>nrm2_return_value=(*f2py_func)(&n,x+offx,&incx)
+  callprotoargument int*,<ctype4>*,int*
 
-  callstatement (*f2py_func)(&<prefix4>nrm2, &n,x+offx,&incx)
-  callprotoargument <ctypereal4>*,F_INT*,<ctype4>*,F_INT*
+  double precision <prefix4>nrm2
+  <ftypereal4> n2
+  fortranname F_FUNC(<prefix4>nrm2,<S,D>NRM2)
+  ! This following line is to avoid Fortran wrappers - fix for CLAPACK
+  intent(c) <prefix4>nrm2

@rth
Copy link
Member

rth commented Jun 20, 2022

Thanks to Hood for fixing this issue in #2728 So it looks like this fix should improve scipy.sparse support significantly #2728 (comment)

@lesteve
Copy link
Contributor

lesteve commented Apr 17, 2023

FYI it seems the gcrotmk issue has been fixed by switching to OpenBLAS #3331, at least the snippet from
#2727 (comment) runs fine in the latest console and I can still reproduce the issue with the 0.23.1 console (latest Pyodide release at the time of writing)

@gabrielfougeron
Copy link
Contributor Author

Awesome, thanks @lesteve for the heads-up. I'll probably wait for 0.24 to try it out though. I'll keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Fatal Error scipy
Projects
None yet
Development

No branches or pull requests

4 participants