Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel/Interrupt Kernel does not terminate stuck subprocesses in the notebook #3400

Closed
pfmoore opened this issue Jun 4, 2013 · 54 comments · Fixed by #12137
Closed

Kernel/Interrupt Kernel does not terminate stuck subprocesses in the notebook #3400

pfmoore opened this issue Jun 4, 2013 · 54 comments · Fixed by #12137

Comments

@pfmoore
Copy link
Contributor

pfmoore commented Jun 4, 2013

When a subprocess is run from the notebook, if it gets stuck the kernel will get locked waiting for it. Selecting Kernel/Interrupt from the menu does not terminate the subprocess, but rather leaves the kernel in an unstable, "partially locked" state, where other cells do not execute. The only resolution is to restart the kernel.

This occurred for me on Windows - I do not know if it also happens on Unix.

To demonstrate, start a notebook and enter !python in a cell. The process will lock as it is waiting for interactive input. As there is no way to provide that input, the kernel must be restarted to continue.

@minrk
Copy link
Member

minrk commented Jun 4, 2013

duplicate of #514

@minrk minrk closed this as completed Jun 4, 2013
@pfmoore
Copy link
Contributor Author

pfmoore commented Jun 4, 2013

Thanks, I hadn't spotted the duplicate. Having said that, t#514 is discussing a much more complex scenario, involving actually interacting with subprocesses (and it seems to be Unix based, as it's about pty-style interaction). For my requirements, a simple means of killing a rogue subprocess would do. Consider something as simple as !sleep 50000, where just being able to kill the sleep is all you want. (Maybe Ctrl-C works for this on Unix, but it doesn't on Windows).

@minrk minrk reopened this Jun 13, 2013
@minrk
Copy link
Member

minrk commented Jun 13, 2013

Sorry, I see what you mean now. Reopening as a separate issue - interrupt not interrupting subprocesses on Windows.

@arijun
Copy link

arijun commented Sep 17, 2014

I'm not sure this is limited to subprocesses. Try executing input() or raw_input() and then clicking the interrupt button--the kernel hangs and has to be restarted.

@minrk
Copy link
Member

minrk commented Sep 17, 2014

@arijun on What OS? interrupting input and raw_input raise KeyboardInterrupt here (OS X).

@arijun
Copy link

arijun commented Sep 18, 2014

Sorry, windows. That's why I thought it was likely the same issue @pfmoore had, since that also happened on windows.

@minrk
Copy link
Member

minrk commented Sep 18, 2014

Ah, crap. I know what that bug is. I think it's a libzmq (or pyzmq) bug that prevents it from handling interrupts properly while polling on zmq sockets. It's nothing in IPython. sigh

@wmayner
Copy link

wmayner commented Mar 31, 2016

I think I just got bitten by this and I'll need to restart the kernel, meaning I've just lost a lot of data…

I was using pdb to debug a function. I re-ran the cell without first quitting pdb, and now I can't interrupt anything.

Here's a minimal example that reproduces this:

def test():
    import pdb; pdb.set_trace()  # XXX BREAKPOINT
    return 0

test()

Run this cell twice in a row.

@lancekrogers
Copy link

This same issue happens for me in Unix as well word for word.

"When a subprocess is run from the notebook, if it gets stuck the kernel will get locked waiting for it. Selecting Kernel/Interrupt from the menu does not terminate the subprocess, but rather leaves the kernel in an unstable, "partially locked" state, where other cells do not execute. The only resolution is to restart the kernel."

@nealmcb
Copy link

nealmcb commented May 8, 2017

Thanks for the nice example of a pdb hang, wmayner. But sInce pdb doesn't run in a subprocess, I opened a separate issue for pdb: #10516

@JulesGM
Copy link

JulesGM commented Mar 14, 2018

Printing too much data, let's say accidentally printing a gigantic numpy array, can make the kernel completely unresponsive and impossible to to terminate

@rajulah
Copy link

rajulah commented May 29, 2018

Has a solution been found for this issue yet? i just ran a machine learning model that took 14hr to complete and now my kernel is stuck and doesnt execute cells. if i restart, i have to run the model again for 14hrs. So is there any solution?

@JulesGM
Copy link

JulesGM commented May 29, 2018

haven't tried it, but this seems like it could help: http://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/limit_output/readme.html

@takluyver
Copy link
Member

If a specific subprocess has got stuck, you can probably find it in the task manager and forcibly kill it that way. Hopefully that lets the kernel continue.

@JulesGM
Copy link

JulesGM commented May 29, 2018

no, the issue is that the kernel spams the webserver to death or something. killing the webserver kills the kernel afaik

@patricktokeeffe
Copy link
Contributor

I'm dealing with a stuck notebook too: interrupt, restart, reconnect - none of them do anything. The [*] indicators remain next to cells as if they are queued to run but no cells get executed.

The behavior began after running a cell containing:

filedir = "20161214_rooftest"

!ls -RC $filedir

Which is strange because I have analogous cells elsewhere that run successfully. I'm not sure how/if ls could get stuck but otherwise my situation seems to match this issue.

@ashishanand7
Copy link

Is there any solution to this . Kernal cannot be interrupted .
For me it's happening with GridSearchCV in sklearn .

@ahmedrao
Copy link

There was a process named conda.exe in Task manager. I killed that process and I was successfully able to interrupt the kernel

@IMBurbank
Copy link

Interrupt is still broken. I have to restart and reload my imports every time.

@metya
Copy link

metya commented Nov 13, 2018

same problem in jupyter lab on python 3.7 kernel

@CathyQian
Copy link

same problem in Jupyter Notebook and I can't find the process named conda.exe in Task manager. Any updates on the solution yet?

@esha-sg
Copy link

esha-sg commented Jan 9, 2019

Not a solution
Sometimes trying to reconnect to the kernel helps in this case

@ambareeshsrja16
Copy link

Observing the same, in Windows 10

@itamarst
Copy link
Contributor

For the process issue specifically, on Windows specifically, here's a theory (still untested):

  1. Process is run via IPython.utils._process_win32.system, which calls _system_body, which calls p.wait() on the subprocess.Popen object.
  2. Windows subprocess.Popen.wait() has a known issue where it is not interruptible: https://bugs.python.org/issue28168

If that's the cause, switching to busy looping every 100ms or so would probably make it interruptible, or if not then taking the approach in the patch.

@nealmcb
Copy link

nealmcb commented Feb 27, 2020

Thank you @Carreau!

@ChrisPalmerNZ
Copy link

Thanks @Carreau! When will this find its way into a general release, and does it mean that we will then be able to use the Interrupt Kernel button sucessfully?

@Carreau
Copy link
Member

Carreau commented Feb 27, 2020

I'll likely do a 7.13 tomorrow. It might fix the interrupt button.

@Arpit-Gole
Copy link

Arpit-Gole commented Apr 26, 2020

Hey @Carreau
I am facing this issue when I am trying to interrupt an ongoing cell execution, interrupt goes on forever and at last I have to restart.

So in order to demonstrate, as @wmayner suggested a way to replicate the issue. I have attached a few screenshots for the same.
pyt1

Jupyter versions in my machine.
pyt2

@itamarst
Copy link
Contributor

@Arpit-Gole pdb is its own specific issue; I'm hoping to get that fixed soon too: #10516

@Arpit-Gole
Copy link

@itamarst I am training a model as follows :

forest_clf = RandomForestClassifier() cross_val_score(forest_clf, X_train, y_train, cv=3, scoring='accuracy', verbose=10, n_jobs=-1)

Now I know it is bound to take time-based on my dataset. But say for whatever reason I choose to stop the processing in half-way by pressing Kernel>Interrupt Kernel.
Ideally, it should interrupt but it takes forever to stop.
Now I don't want to restart because all my progress will be gone.

Please Help!

@Carreau
Copy link
Member

Carreau commented Apr 26, 2020

If what you are trying to interrupt is implemented in C then there is nothing to do. It's up to the library you use to handle sigint.

@jvschoen
Copy link

jvschoen commented Aug 31, 2020

I run into this sometimes too... Here is a reproduceable example from jupyer lab:

LOAD DATA

import requests
import pandas as pd

url='https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv'
r = requests.get(url, allow_redirects=True)
        with open('data/nyc_taxi.csv', 'wb') as f:
            f.write(r.content)
df_taxi = (
        pd.read_csv('data/nyc_taxi.csv')
        .assign(timestamp=lambda x: pd.to_datetime(x.timestamp))
)

df_train = df_taxi.iloc[:5000]
temp_train = df_train.set_index('timestamp')

Run Grid Search: THIS CANNOT BE INTERRUPTED

import itertools
#set parameter range
p = range(0,3)
q = range(1,3)
d = range(1,2)
s = [24,48]

# list of all parameter combos
pdq = list(itertools.product(p, d, q))
seasonal_pdq = list(itertools.product(p, d, q, s))
# SARIMA model pipeline
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(temp_train[:240],
                                            order=param,
                                            seasonal_order=param_seasonal)
            
            results = mod.fit(max_iter = 50, method = 'powell')
            
            print('SARIMA{},{} - AIC:{}'.format(param, param_seasonal, results.aic))
        except as e:
            print(e)
            continue

Is there any advice?

@dawnset
Copy link

dawnset commented Oct 13, 2020

run into this problem three times this afternoon, reminds me of the good old days when i was still using urllib.
thought its on urllib, cause there is no response to my request.
I was working but coding, I have to find a solution but a answer. So I store every variable to local file.
really don't want to see that happen again and again.

@Crispy13
Copy link

I am facing the same issue when using tensorflow and gpu for training deep learning model.

@matija2209
Copy link

Run into this with time.sleep and requests

@mbrad092
Copy link

mbrad092 commented Dec 5, 2020

Also having this issue with time.sleep requests on Windows, but runs fine on Mac OS X

@not-Ian
Copy link

not-Ian commented Dec 22, 2020

Having this issue with ThreadPoolExecutor... Something like this:

numberOfImageGatherers = 2

with concurrent.futures.ThreadPoolExecutor(max_workers=numberOfImageGatherers + 1) as executor:
        futures = []

        for imageGatherer in range(numberOfImageGatherers):
            imageDataGatherer = ImageDataGatherer(batch_size)
            futures.append(executor.submit(imageDataGatherer.gatherImageData, pipeline))

        modelTrainingConsumer = ModelTrainingConsumer(vae, plot_losses)    

        futures.append(executor.submit(modelTrainingConsumer.trainModel, pipeline))

        concurrent.futures.wait(futures)

Only way to interrupt is to restart kernel... very frustrating

@TV4Fun
Copy link

TV4Fun commented Feb 2, 2021

This is still happening. I would suggest re-opening this issue. Seeing it in a NumPy-heavy tight neural network training loop on Windows 10.

jupyter core     : 4.7.1
jupyter-notebook : 6.2.0
qtconsole        : 4.7.7
ipython          : 7.20.0
ipykernel        : 5.3.4
jupyter client   : 6.1.7
jupyter lab      : 2.2.6
nbconvert        : 6.0.7
ipywidgets       : 7.6.3
nbformat         : 5.1.2
traitlets        : 5.0.5

Is there anything I need to upgrade?

@Esesna
Copy link

Esesna commented Jun 6, 2021

Возникает такая же проблема, и самое интересное в случайных участках кода при использование B0 remote api для coppelia sim. При это если я использую Publisher и Subscriber

@hamitaksln
Copy link

I found this solution to stop cell when I work with time sleep or requests:
keyboard interrupt

for i in range(20):
    try:
        print(i)
        time.sleep(1)
    except KeyboardInterrupt:
        print("Stopping...")
        break

@thistlillo
Copy link

Are there any news on this issue? I am working remotely in a Kubernetes pod and, when this occurs, I am completely locked out of the machine. I cannot even open a shell.

@MrMino
Copy link
Member

MrMino commented Mar 21, 2022

@thistlillo your issue is most likely unrelated to this one. In any case, we would need more information than what you've provided, like: version of IPython and Python you're using, whether you are using Jupyter Notebook or Jupyter Lab, versions of those, etc. And, most importantly, a minimalized set of steps required to reproduce the issue.

If you can provide this info, please create a new issue and include it.

@UmarZein
Copy link

UmarZein commented Jun 7, 2023

I ran into this issue today on my linux laptop. Although I have not found a way to interrupt the kernel, I was able to save my variables by injecting a pickle.dumps using pyrasite. Thanks to https://stackoverflow.com/a/59124617/18638036

@Shubhang9
Copy link

Shubhang9 commented Jul 6, 2023

Hey guys! I also had the same issue, but it was a silly mistake on my end.

Avoid using try without adding specific exceptions in your except block you are expecting your code to throw. A generic try looks like this -

try:
    #code here
except:
    #exception handling

It seems like this code would also catches your keyboard Interrupts/ kernel interrupts which registers as KeyboardInterrupt exception (only if execution is inside this try block) .

I thought it might help some unfortunate souls if I share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.