variable scope problems in function definition #687

seegy · 2019-06-13T12:00:11Z

Hi there,

I'm using enterprise gateway on our kubernetes cluster (following this instructions).

Using a python notebook (in my case the kernel is elyra/kernel-spark-py:dev) I found the problem, that I cannot use variables inside a function definition, which are defined outside.

Example:

a = 123

def fun():
    print(a)
    
fun()

ends in:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py in <module>
     4     print(a)
     5
----> 6 fun()

/usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py in fun()
     2
     3 def fun():
----> 4     print(a)
     5
     6 fun()

NameError: name 'a' is not defined

Same with import...

Any ideas?

The text was updated successfully, but these errors were encountered:

kevin-bates · 2019-06-13T15:55:39Z

hmm - I think this might be related to running as a Spark application somehow since I see the very same thing running in a YARN cluster:

Also, I'm not able to reproduce this using the regular (non-spark) python kernel in k8s...

kevin-bates · 2019-06-13T16:23:06Z

ok - this is an EG issue and was introduced by this commit: 2cdac42

@jcrist - we're going to need your help on this one. It seems the use of namespace is producing side affects on the embedded kernel running in Spark. If I revert the Dask changes, I see the expected results. Note that there was another side affect of the Dask changes that I addressed in PR #540 - so whatever happens for this issue we'll need to ensure those behaviors don't reappear.

Your help is greatly appreciated!

jcrist · 2019-06-13T16:53:11Z

Hmmm, that's unfortunate. This looks to be due to ipython/ipython#62, and doesn't have a good workaround.

The namespace thing was to avoid polluting the started kernel namespace with all the stuff enterprise gateway uses to setup a kernel. I recommend the following:

Remove the addition of global in Fix spark initialization in python launcher #540
Use the following function (note this must be a function) to start a kernel in the __main__ block:

def start_ipython(locals, ns, **kwargs):
    from IPython import embed_kernel

    for k in list(locals):
        if not k.startswith('__'):
            del locals[k]

    locals.update(ns)

    embed_kernel(local_ns=locals, **kwargs)


if __name__ == '__main__':
    # Existing code...

    # Replace the `embed_kernel` call with this:
    start_ipython(locals(), namespace, connection_file=connection_file, ip=ip)

I've tested this works locally, and should fix things for you. I don't have time to test this myself on enterprise-gateway, but am happy to respond to the issue here.

kevin-bates · 2019-06-13T17:41:04Z

Thank you for your quick response - it's greatly appreciated!

I have applied the changes above to the version you created (prior to #540). This does appear to address the scenario listed in this issue. However, any attempts to reference sc, etc., result in a recursive call that blows the stack. Here are the last few frames produced when running sc.appName...

/hadoop/yarn/local/usercache/gateway/appcache/application_1559854930815_0039/container_e05_1559854930815_0039_01_000001/launch_ipykernel.py in __getattr__(self, name)
    115             self._init_thread.join(timeout=None)
    116             # now return attribute/function reference from actual Spark object
--> 117             return getattr(globals()[self._spark_session_variable], name)
    118 
    119 

/hadoop/yarn/local/usercache/gateway/appcache/application_1559854930815_0039/container_e05_1559854930815_0039_01_000001/launch_ipykernel.py in __getattr__(self, name)
    113         else:
    114             # wait on thread to initialize the Spark session variables in global variable scope
--> 115             self._init_thread.join(timeout=None)
    116             # now return attribute/function reference from actual Spark object
    117             return getattr(globals()[self._spark_session_variable], name)

/opt/conda/lib/python2.7/threading.pyc in join(self, timeout)
    928         if not self.__started.is_set():
    929             raise RuntimeError("cannot join thread before it is started")
--> 930         if self is current_thread():
    931             raise RuntimeError("cannot join current thread")
    932 

/opt/conda/lib/python2.7/threading.pyc in currentThread()
   1149     """
   1150     try:
-> 1151         return _active[_get_ident()]
   1152     except KeyError:
   1153         ##print "current_thread(): no current thread for", _get_ident()

RuntimeError: maximum recursion depth exceeded

jcrist · 2019-06-13T18:15:55Z

Ahh, hmmm, this has to do with this bit of code, which is accessing raw globals (generally an antipattern):

https://github.com/jupyter/enterprise_gateway/blob/a1b0f679e51d08f4238b4ba9b9efd1afabb27973/etc/kernel-launchers/python/scripts/launch_ipykernel.py#L124

Is there a noticeable lag if you initialize the spark context variables before starting the kernel? Since the spark application is already running, I wouldn't expect much of an issue? Otherwise you'll need to propagate the locals dict down through all the spark lazy loading machinery so that the proper namespace is being updated.

kevin-bates · 2019-06-13T18:20:20Z

I see. So you're asking if we not create a thread, but do the init synchronously? Yeah, the spark context creation takes on the order of 5-10 seconds and we need that time to be in the background so users can attempt to do non-spark things first (and not have to wait the additional time for sc creation).

I'm barely following what's going on but will try the locals dict propagation.

EDIT: @jcrist Actually, I have no idea what to do here. 🙁

The python kernel launcher was not initializing the namespace used in the embed_kernel properly. This change reverts an earlier change to use the spark context variables from the global scope, instead moving them to the local namespace. Fixes jupyter-server#687

kevin-bates · 2019-06-13T22:36:48Z

@jcrist - ok, I think I've battled through things and ferreted out what your comments were saying relative to the locals() dict. Your help on this has been invaluable!

Could you please review the changes and let us know if we can make this better?

The python kernel launcher was not initializing the namespace used in the embed_kernel properly. This change reverts an earlier change to use the spark context variables from the global scope, instead moving them to the local namespace. Fixes jupyter-server#687 Closes jupyter-server#688

seegy · 2019-06-14T08:53:56Z

Wow. I'm impressed by your engagement!

Let me know, if I can help you somehow

kevin-bates · 2019-06-14T15:59:25Z

@seegy - if you'd like to take an updated image for a spin, I've pushed an image to elyra/kernel-spark-py:scope. Since this image is built on elyra/kernel-py, it too has been pushed with a scope tag.

To confirm you're running with the correct image, ensure that its /usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py file contains the updated code.

I'd like to wait for @jcrist's opinion before merging.

seegy · 2019-06-15T22:01:15Z

hey @kevin-bates ,

unfortunately, I'm on vacation the upcoming week. So, I will have no access to our k8s environment...
But I will let you know when I could find the time to test the update.

kevin-bates · 2019-06-16T00:54:08Z

@seegy , no worries. Enjoy your vacation! I moved this to WIP since we're still working things out.

The python kernel launcher was not initializing the namespace used in the embed_kernel properly. This change reverts an earlier change to use the spark context variables from the global scope, instead moving them to the local namespace. Fixes jupyter-server#687 Closes jupyter-server#688

* Fix scoping issues relative to python kernels on Spark The python kernel launcher was not initializing the namespace used in the embed_kernel properly. This change reverts an earlier change to use the spark context variables from the global scope, instead moving them to the local namespace. Fixes #687 Closes #688

kevin-bates · 2019-06-19T18:32:00Z

@seegy and @suryag10, Since this issue is now closed and the fix has been merged, I have updated the docker default images elyra/kernel-py:dev and elyra/kernel-spark-py:dev and removed those tagged with scope.

Thank you for reporting this issue and your timely testing - its been appreciated!

seegy · 2019-06-24T07:47:10Z

👌
thanks a lot!

esevan · 2019-06-25T05:30:20Z

@kevin-bates Thanks. Saved my day 👍

kevin-bates added python kernel (iPython) spark labels Jun 13, 2019

kevin-bates mentioned this issue Jun 13, 2019

Fix scoping issues relative to python kernels on Spark #688

Merged

kevin-bates mentioned this issue Jun 18, 2019

Error on running spark python notebook #691

Closed

kevin-bates closed this as completed in #688 Jun 19, 2019

kevin-bates mentioned this issue Apr 16, 2022

Support startup scripts and ipykernel subclasses #1076

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

variable scope problems in function definition #687

variable scope problems in function definition #687

seegy commented Jun 13, 2019

kevin-bates commented Jun 13, 2019

kevin-bates commented Jun 13, 2019

jcrist commented Jun 13, 2019 •

edited

kevin-bates commented Jun 13, 2019

jcrist commented Jun 13, 2019

kevin-bates commented Jun 13, 2019 •

edited

kevin-bates commented Jun 13, 2019

seegy commented Jun 14, 2019

kevin-bates commented Jun 14, 2019

seegy commented Jun 15, 2019

kevin-bates commented Jun 16, 2019

kevin-bates commented Jun 19, 2019

seegy commented Jun 24, 2019

esevan commented Jun 25, 2019

variable scope problems in function definition #687

variable scope problems in function definition #687

Comments

seegy commented Jun 13, 2019

kevin-bates commented Jun 13, 2019

kevin-bates commented Jun 13, 2019

jcrist commented Jun 13, 2019 • edited

kevin-bates commented Jun 13, 2019

jcrist commented Jun 13, 2019

kevin-bates commented Jun 13, 2019 • edited

kevin-bates commented Jun 13, 2019

seegy commented Jun 14, 2019

kevin-bates commented Jun 14, 2019

seegy commented Jun 15, 2019

kevin-bates commented Jun 16, 2019

kevin-bates commented Jun 19, 2019

seegy commented Jun 24, 2019

esevan commented Jun 25, 2019

jcrist commented Jun 13, 2019 •

edited

kevin-bates commented Jun 13, 2019 •

edited