Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cypari2 threadsafe #107

Open
roed314 opened this issue Jun 4, 2021 · 23 comments · May be fixed by #116
Open

Make cypari2 threadsafe #107

roed314 opened this issue Jun 4, 2021 · 23 comments · May be fixed by #116

Comments

@roed314
Copy link

roed314 commented Jun 4, 2021

As discussed here, the update from Sage 9.2 to 9.3 caused new segmentation faults when running with threads, which we believe is related to the cypari upgrade. Here's an illustration of the problem just with cypari.

Python 3.9.5 (default, May  4 2021, 03:33:11) 
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cypari2
>>> pari = cypari2.Pari()
>>> pari.issquarefree(15)
1
>>> from concurrent.futures import ThreadPoolExecutor
>>> with ThreadPoolExecutor() as e:
...     j = e.submit(pari.issquarefree, 15)
... 
>>> j.result()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "cypari2/auto_instance.pxi", line 14650, in cypari2.pari_instance.Pari_auto.issquarefree
  File "cypari2/gen.pyx", line 4813, in cypari2.gen.objtogen
  File "cypari2/convert.pyx", line 561, in cypari2.convert.PyObject_AsGEN
cysignals.signals.SignalError: Segmentation fault

Traceback with gdb

0x00007ffff708788e in new_chunk (x=<optimized out>) at ../src/kernel/none/level1.h:125
125	../src/kernel/none/level1.h: Aucun fichier ou dossier de ce type.
(gdb) tb
Temporary breakpoint 1 at 0x7ffff708788e: file ../src/kernel/none/level1.h, line 125.
(gdb) bt
#0  0x00007ffff708788e in new_chunk (x=<optimized out>) at ../src/kernel/none/level1.h:125
#1  cgeti (x=<optimized out>) at ../src/kernel/none/level1.h:198
#2  __pyx_f_7cypari2_7convert_PyLong_AS_GEN (__pyx_v_x=__pyx_v_x@entry=0x7ffff7300af0)
    at cypari2/convert.c:3721
#3  0x00007ffff7088858 in __pyx_f_7cypari2_7convert_PyObject_AsGEN (__pyx_v_x=<optimized out>)
    at cypari2/convert.c:4151
#4  0x00007ffff5ff7fc6 in __pyx_f_7cypari2_3gen_objtogen (__pyx_v_s=0x7ffff7300af0, 
    __pyx_skip_dispatch=<optimized out>) at cypari2/gen.c:186911
#5  0x00007ffff6cd1b09 in __pyx_pf_7cypari2_13pari_instance_9Pari_auto_876issquarefree (
    __pyx_v_x=0x7ffff7300af0, __pyx_v_self=<optimized out>) at cypari2/pari_instance.c:100713
#6  0x00007ffff6d2c859 in __Pyx_CyFunction_CallMethod (kw=0x0, arg=0x7ffff71c8f40, 
    self=0x7ffff5e2b540, func=0x7ffff5cc7a00) at cypari2/pari_instance.c:250940
#7  __Pyx_CyFunction_CallAsMethod (kw=0x0, args=<optimized out>, func=0x7ffff5cc7a00)
    at cypari2/pari_instance.c:54373
#8  __Pyx_CyFunction_CallAsMethod (func=0x7ffff5cc7a00, args=<optimized out>, kw=0x0)
    at cypari2/pari_instance.c:54357
@videlec
Copy link
Collaborator

videlec commented Jun 9, 2021

I doubt https://trac.sagemath.org/ticket/31029 to be the source of problems (this was a minor upgrade). Did you test it with other cypari2 versions ?

@videlec
Copy link
Collaborator

videlec commented Jun 9, 2021

Also, what is the paralellization enabled into your PARI/GP (the option --mt of the Configure script) ?

@roed314
Copy link
Author

roed314 commented Jun 9, 2021

I ran cypari using sage -python, and the contents of $SAGE_ROOT/local/lib/pari/pari.cfg include MT_LIBS='-lpthread' in the failing case, and MT_LIBS='' in the working case. So that sounds plausible as an explanation. I don't know where this is getting set though.

As a test to see if it's #31029, I'll build 9.3.beta5 and 9.3.beta5+#31029 tonight and see if that makes a difference. If you can let me know how to change the MT option to pari from the sage build system I'd be happy to try that as well.

Thanks!

@edgarcosta
Copy link
Member

@roed314, you can change it here: https://github.com/sagemath/sage/blob/develop/build/pkgs/pari/spkg-install.in#L130
This change happened on sage 9.3 beta8

@roed314
Copy link
Author

roed314 commented Jun 10, 2021

I just checked and you were right. The problem is due to #30537 rather than #31029: there is no error with 9.3.beta7, but adding #30537 yields the problem described.

What resolution would you suggest? Changing the parallelization option for pari in Sage back to MT_LIBS='' is possible of course. I don't know if there's a way to change cypari to allow the use of lpthread.

@videlec
Copy link
Collaborator

videlec commented Jun 10, 2021

I believe that some of the PARI functions interacting with the stack have to be used with care in a multithreaded PARI. I will ask Bill Allombert.

@videlec
Copy link
Collaborator

videlec commented Jun 10, 2021

Could you copy/paste the gdb traceback (ie launch sage with sage -gdb and after the crash run bt)? On my archlinux version I got

0x00007fffec00413d in ?? ()
   from /usr/lib/python3.9/site-packages/sage/libs/pari/convert_gmp.cpython-39-x86_64-linux-gnu.so
(gdb) bt
#0  0x00007fffec00413d in ?? ()
   from /usr/lib/python3.9/site-packages/sage/libs/pari/convert_gmp.cpython-39-x86_64-linux-gnu.so
#1  0x00007fffee94afd8 in ?? ()
   from /usr/lib/python3.9/site-packages/sage/rings/integer.cpython-39-x86_64-linux-gnu.so
#2  0x00007fffee94b2ce in ?? ()
   from /usr/lib/python3.9/site-packages/sage/rings/integer.cpython-39-x86_64-linux-gnu.so
#3  0x00007fffec1c0877 in ?? ()
   from /usr/lib/python3.9/site-packages/cypari2/gen.cpython-39-x86_64-linux-gnu.so
#4  0x00007fffec1e0c35 in ?? ()
   from /usr/lib/python3.9/site-packages/cypari2/gen.cpython-39-x86_64-linux-gnu.so
#5  0x00007fffec5f5d89 in ?? ()
   from /usr/lib/python3.9/site-packages/cypari2/pari_instance.cpython-39-x86_64-linux-gnu.so
#6  0x00007fffefd9fafb in ?? ()
   from /usr/lib/python3.9/site-packages/zmq/backend/cython/error.cpython-39-x86_64-linux-gnu.so

which is weird since that involves sage integers...

@videlec
Copy link
Collaborator

videlec commented Jun 10, 2021

Ha in sage 15 is an integer... should be 15r, traceback is different

#0  0x00007fffec04baea in ?? () from /usr/lib/python3.9/site-packages/cypari2/convert.cpython-39-x86_64-linux-gnu.so
#1  0x00007fffec04c1e0 in ?? () from /usr/lib/python3.9/site-packages/cypari2/convert.cpython-39-x86_64-linux-gnu.so
#2  0x00007fffec1e0dd1 in ?? () from /usr/lib/python3.9/site-packages/cypari2/gen.cpython-39-x86_64-linux-gnu.so
#3  0x00007fffec5f5d89 in ?? () from /usr/lib/python3.9/site-packages/cypari2/pari_instance.cpython-39-x86_64-linux-gnu.so
#4  0x00007fffefd9fafb in ?? () from /usr/lib/python3.9/site-packages/zmq/backend/cython/error.cpython-39-x86_64-linux-gnu.so

@videlec
Copy link
Collaborator

videlec commented Jun 10, 2021

I am unsure about the mechanism underlying ThreadPoolExecutor. On the PARI/GP side special care has to be taken with pari when using threads

  • pari_thread_alloc/pari_thread_free must be used to pass data to pthread_create
  • pari_thread_start/pari_thread_close must be called in each children handling computations

@roed314
Copy link
Author

roed314 commented Jun 10, 2021

Neither @edgarcosta or I has been able to get gdb working correctly, but it sounds like you don't need it anymore?

I think Pari's requirements for using threads is going to make higher level threading interfaces (like Python's ThreadPoolExecutor) very difficult to use. Now that we've identified the issue, I think that from the LMFDB's perspective we're going to just turn off threading in flask, which solves the problem for us.

From Sage's perspective, I'm going to ask on #30537 about why they made that change to MT_LIBS.

I'm not sure what you want to do with cypari. Let us know if there's anything else we can do to help.

@edgarcosta
Copy link
Member

@videlec here is my attempt to do what you asked us: https://gist.github.com/edgarcosta/55870bd5df1ac2f326931a2068a9c775

but I don't seem to manage to catch the traceback this way.

@videlec
Copy link
Collaborator

videlec commented Jun 11, 2021

Neither @edgarcosta or I has been able to get gdb working correctly, but it sounds like you don't need it anymore?

I think I reproduced enough. Though what Edgar sent is very different from mine...

I think Pari's requirements for using threads is going to make higher level threading interfaces (like Python's ThreadPoolExecutor) very difficult to use. Now that we've identified the issue, I think that from the LMFDB's perspective we're going to just turn off threading in flask, which solves the problem for us.

From Sage's perspective, I'm going to ask on #30537 about why they made that change to MT_LIBS.

Disabling --mt=pthreads is only a partial solution. Multithreading ought to be supported. The only way I see for now would be to implement hooks (at C level) for ThreadPoolExecutor. I will find some time to ask the question on the Python developer list. I will report it here.

I'm not sure what you want to do with cypari. Let us know if there's anything else we can do to help.

Thanks for the report! I am glad that you found a way around.

@roed314
Copy link
Author

roed314 commented Jun 11, 2021

Flask looks to use the threading Python package, which is also used by concurrent.futures. These libraries seem to have hooks where you could insert the required pari function calls.

@roed314
Copy link
Author

roed314 commented Dec 17, 2021

We're running into this again. We managed to disable threading most of the time, but threading is required to use the flask debugger, so we're not currently able to use the debugger in the LMFDB.

If anyone's interested in tackling this, we'd be grateful!

@JohnCremona
Copy link
Member

I would like to add to the requests by @roed314 and @edgarcosta . With Sage 9.5 due to be released soon it would be unfortunate for this problem to still be there. Thank you, cypari developers!

@AndrewVSutherland
Copy link
Sponsor

Just to echo @JohnCremona, we would be extremely grateful to anyone who can help with this, it is now impacting LMFDB development in a pretty serious way.

@videlec
Copy link
Collaborator

videlec commented Jan 30, 2022

@JohnCremona @AndrewVSutherland cypari2 very welcome pull requests.

@roed314
Copy link
Author

roed314 commented Jan 31, 2022

We totally understand that you (and other people with more cypari knowledge) may not have time to work on this soon. We've been looking into what might be needed, but are still a bit uncertain about how to proceed. Based on Appendix B of the pari manual, it looks like we need to add Python-accessible mechanisms to cypari that can call the appropriate thread-functions in C (pari_thread_start, pari_thread_close, pari_thread_alloc and pari_thread_free). We'd then need to figure out the appropriate places in the CPython thread classes to call these, and possibly add some convenience classes to cypari for using threads.

Neither @edgarcosta or I have worked much with the PARI stack, and we would appreciate any tips you can give us about where to begin. If the best option is to email the pari developer list and ask there, we'd be happy to do that.

@videlec
Copy link
Collaborator

videlec commented May 19, 2022

Usage examples of multithread in PARI/GP are available in examples/thread.c and examples/openmp.c.

As Bill mentioned to me, one has to be careful : in a multithreaded compiled PARI, some functions are likely to spawn thread. When using ThreadPoolExecutor we typically want to deactivate multithreading inside PARI.

@videlec
Copy link
Collaborator

videlec commented May 19, 2022

Thanks to Bill, I have a working prototype. I hope to push a branch soon.

@videlec videlec linked a pull request May 19, 2022 that will close this issue
@videlec
Copy link
Collaborator

videlec commented May 19, 2022

@JohnCremona @AndrewVSutherland @roed314 @edgarcosta Could you test the branch at #116 to see if you can break it ?

@roed314
Copy link
Author

roed314 commented May 19, 2022

Absolutely! I'll take a look.

@Honzaik
Copy link

Honzaik commented Sep 12, 2022

In my work I have switched from threading (https://docs.python.org/3/library/threading.html) to multiprocessing (https://docs.python.org/3/library/multiprocessing.html) and no PARI segfaults.
Of course this is not the same thing but it could help someone that comes across this due to having segfaults while parallelising in Sage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants