socket failures that take hours to heal #33

stopatz · 2022-04-02T07:39:27Z

I use a Wolfram session to compute the integrand in the Vegas algorithm in Python.

I use MPI to call a session in each core on a high-performance cluster.

Before I start a session, I want to kill any floating Mathematica processes, so I use the kernelcontroller as follows:

controller = kernelcontroller.WolframKernelController(kernel='path', kernel_loglevel=1)

controller._kernel_stop()

Now, if I wait 10 minutes after this clean-up, my actual code

with WolframLanguageSession('path') as session:...

works fine most of the time.

But at seemingly random times, I get socket failures when I run the two-step process (cleanup, then run session), with multiple instances of the following error message:

Socket exception: Failed to read any message from socket tcp://127.0.0.1:39237 after 20.0 seconds and 199 retries.
Failed to start.
Traceback (most recent call last):
File "/home/sjsuh/anaconda3/lib/python3.9/site-packages/wolframclient/evaluation/kernel/kernelcontroller.py", line 435, in _kernel_start
response = self.kernel_socket_in.recv_abortable(
File "/home/sjsuh/anaconda3/lib/python3.9/site-packages/wolframclient/evaluation/kernel/zmqsocket.py", line 53, in recv_abortable
raise SocketOperationTimeout(
wolframclient.evaluation.kernel.zmqsocket.SocketOperationTimeout: Failed to read any message from socket tcp://127.0.0.1:39237 after 20.0 seconds and 199 retries.

Now, to be able to run my code again, I find that I have to wait around 3 hours and run my routine. Otherwise, this socket failure persists.

So my questions are i) is there a better way to kill stray processes than what I have used, ii) why am I getting the socket failures, and is there a way to heal the socket failures faster?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

socket failures that take hours to heal #33

socket failures that take hours to heal #33

stopatz commented Apr 2, 2022 •

edited

socket failures that take hours to heal #33

socket failures that take hours to heal #33

Comments

stopatz commented Apr 2, 2022 • edited

stopatz commented Apr 2, 2022 •

edited