Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProfilesSpawner stops Jupyterhub from recognizing running worker #41

Open
Hoeze opened this issue Nov 5, 2020 · 13 comments
Open

ProfilesSpawner stops Jupyterhub from recognizing running worker #41

Hoeze opened this issue Nov 5, 2020 · 13 comments
Labels

Comments

@Hoeze
Copy link

Hoeze commented Nov 5, 2020

Hi, apologies for the double post, I moved this issue from jupyterhub/batchspawner#194.

We currently cannot spawn any workers with ProfilesSpawner enabled.
The worker starts normally but the jupyterhub directly kills it.

Logs of the worker:

+ batchspawner-singleuser jupyterhub-singleuser --ip=0.0.0.0 --NotebookApp.default_url=/lab
[I 2020-11-05 13:22:43.763 SingleUserNotebookApp manager:81] [nb_conda_kernels] enabled, 18 kernels found
[I 2020-11-05 13:22:44.808 SingleUserNotebookApp extension:162] JupyterLab extension loaded from /opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterlab
[I 2020-11-05 13:22:44.808 SingleUserNotebookApp extension:163] JupyterLab application directory is /opt/modules/i12g/anaconda/envs/jupyterhub/share/jupyter/lab
[I 2020-11-05 13:22:44.988 SingleUserNotebookApp __init__:34] [Jupytext Server Extension] Deriving a JupytextContentsManager from LargeFileManager
[I 2020-11-05 13:22:44.989 SingleUserNotebookApp singleuser:561] Starting jupyterhub-singleuser server version 1.1.0
[I 2020-11-05 13:22:44.996 SingleUserNotebookApp notebookapp:2209] Serving notebooks from local directory: /data/nasif12/home_if12/hoelzlwi
[I 2020-11-05 13:22:44.996 SingleUserNotebookApp notebookapp:2209] Jupyter Notebook 6.1.4 is running at:
[I 2020-11-05 13:22:44.996 SingleUserNotebookApp notebookapp:2209] http://[...]:50758/
[I 2020-11-05 13:22:44.996 SingleUserNotebookApp notebookapp:2210] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2020-11-05 13:22:45.010 SingleUserNotebookApp singleuser:542] Updating Hub with activity every 300 seconds
slurmstepd: error: *** JOB 377371 ON [...] CANCELLED AT 2020-11-05T13:23:39 ***

Logs of jupyterhub:


[I 2020-11-05 13:27:39.649 JupyterHub log:181] 302 POST /jupyter/hub/spawn/<user> -> /jupyter/hub/spawn-pending/<user> (<user>@192.168.16.11) 1013.71ms
[I 2020-11-05 13:27:39.761 JupyterHub pages:398] <user> is pending spawn
[I 2020-11-05 13:27:39.771 JupyterHub log:181] 200 GET /jupyter/hub/spawn-pending/<user> (<user>@192.168.16.11) 29.93ms
[I 2020-11-05 13:27:41.587 JupyterHub log:181] 200 POST /jupyter/hub/api/batchspawner (<user>@192.168.16.13) 24.47ms
[I 2020-11-05 13:27:43.792 JupyterHub log:181] 200 GET /jupyter/hub/api (@192.168.16.13) 2.84ms
[I 2020-11-05 13:27:43.843 JupyterHub log:181] 200 POST /jupyter/hub/api/users/<user>/activity (<user>@192.168.16.13) 36.08ms
[W 2020-11-05 13:27:48.647 JupyterHub base:995] User <user> is slow to start (timeout=10)
[W 2020-11-05 13:28:38.764 JupyterHub user:684] <user>'s server failed to start in 60 seconds, giving up
[I 2020-11-05 13:28:39.153 JupyterHub batchspawner:408] Stopping server job 377372
[I 2020-11-05 13:28:39.155 JupyterHub batchspawner:293] Cancelling job 377372: sudo -E -u <user> scancel 377372
[W 2020-11-05 13:28:51.948 JupyterHub batchspawner:419] Notebook server job 377372 at node03:0 possibly failed to terminate
[E 2020-11-05 13:28:52.010 JupyterHub gen:624] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterhub/handlers/base.py:884> exception=TimeoutError('Timeout')> after timeout
    Traceback (most recent call last):
      File "/opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/tornado/gen.py", line 618, in error_callback
        future.result()
      File "/opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 891, in finish_user_spawn
        await spawn_future
      File "/opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterhub/user.py", line 708, in spawn
        raise e
      File "/opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterhub/user.py", line 607, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
    tornado.util.TimeoutError: Timeout
    
[I 2020-11-05 13:28:52.019 JupyterHub log:181] 200 GET /jupyter/hub/api/users/<user>/server/progress (<user>@192.168.16.11) 71227.51ms

image

I am using python 3.7, jupyterhub 1.2, batchspawner 1.0.1 and the current git version of wrapspawner.
When directly applying batchspawner, everything is working fine.

My configuration to reproduce:

c.JupyterHub.allow_named_servers = True
c.JupyterHub.named_server_limit_per_user = 5

c.PAMAuthenticator.open_sessions = False

from jupyterhub.auth import PAMAuthenticator
import pamela
from tornado import gen

class KerberosPAMAuthenticator(PAMAuthenticator):
    @gen.coroutine
    def authenticate(self, handler, data):
        """Authenticate with PAM, and return the username if login is successful.
        Return None otherwise.
        Establish credentials when authenticating instead of reinitializing them
        so that a Kerberos cred cache has the proper UID in it.
        """
        username = data['username']
        try:
            pamela.authenticate(username, data['password'], service=self.service, resetcred=pamela.PAM_ESTABLISH_CRED)
        except pamela.PAMError as e:
            if handler is not None:
                self.log.warning("PAM Authentication failed (%s@%s): %s", username, handler.request.remote_ip, e)
            else:
                self.log.warning("PAM Authentication failed: %s", e)
        else:
            return username

c.JupyterHub.authenticator_class = KerberosPAMAuthenticator


c.JupyterHub.bind_url = 'http://:8686/jupyter/'
c.JupyterHub.default_url = 'home'
c.JupyterHub.hub_connect_ip = 'node01'
c.JupyterHub.hub_ip = '0.0.0.0'
c.JupyterHub.hub_port = 8687

c.Spawner.default_url = '/lab'
c.Spawner.http_timeout = 120

import batchspawner

c.JupyterHub.spawner_class = 'batchspawner.SlurmSpawner'
c.BatchSpawnerBase.req_nprocs = '2'
c.BatchSpawnerBase.req_runtime = '48:00:00'
c.BatchSpawnerBase.req_memory = '12gb'

c.SlurmSpawner.req_partition = 'slurm-jupyter'

c.SlurmSpawner.start_timeout = 240

c.SlurmSpawner.batch_script = """#!/bin/bash -x
{% if partition  %}#SBATCH --partition={{partition}}
{% endif %}{% if runtime    %}#SBATCH --time={{runtime}}
{% endif %}{% if memory     %}#SBATCH --mem={{memory}}
{% endif %}{% if gres       %}#SBATCH --gres={{gres}}
{% endif %}{% if nprocs     %}#SBATCH --cpus-per-task={{nprocs}}
{% endif %}{% if reservation%}#SBATCH --reservation={{reservation}}
{% endif %}{% if options    %}#SBATCH {{options}}{% endif %}

trap 'echo SIGTERM received' TERM
{{prologue}}
which jupyterhub-singleuser
{% if srun %}{{srun}} {% endif %}{{cmd}}
echo "jupyterhub-singleuser ended gracefully"
{{epilogue}}

"""

c.BatchSpawnerBase.req_prologue = '''


export XDG_RUNTIME_DIR=""
export SHELL=/bin/bash
export BASH=/bin/bash

# activate the correct conda environment
source /opt/modules/i12g/anaconda/envs/jupyterhub/bin/activate

env | sort
'''


c.SlurmSpawner.req_srun = ''


###
# comment in the following line to test the ProfilesSpawner
# c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'

c.ProfilesSpawner.profiles = [
  (
    'SLURM CPU node - 4 cores, 16 GB, 24 hours',
    'juphub-4cpu-16G',
    'batchspawner.SlurmSpawner',
    dict(
      req_nprocs='4',
      req_partition='slurm-jupyter',
      req_runtime='24:00:00',
      req_memory='16000'
    )
  ),
  (
    'SLURM CPU node - 8 cores, 16 GB, 24 hours',
    'juphub-8cpu-16G',
    'batchspawner.SlurmSpawner',
    dict(
      req_nprocs='8',
      req_partition='slurm-jupyter',
      req_runtime='24:00:00',
      req_memory='16000'
    )
  ),
  (
    "Test server",
    'local-test',
    'jupyterhub.spawner.LocalProcessSpawner',
    {
      'ip':'0.0.0.0'
    }
  )
]

from pprint import pprint
pprint(c.ProfilesSpawner.profiles)
@Hoeze Hoeze added the bug label Nov 5, 2020
@rcthomas
Copy link
Contributor

rcthomas commented Nov 5, 2020

@Hoeze in your logs I don't see something like

[I 2020-11-05 07:34:13.757 JupyterHub log:181] 200 POST /hub/api/batchspawner (...) 12.12ms

That's when batchspawner-singleuser calls back to the hub to report in what port its using. I see you've got the requisite import batchspawner so I'm not quite sure why it's not showing up, unless there's some sensitivity about where in the config file that import happens. FWIW it's dead last in my config.

@Hoeze
Copy link
Author

Hoeze commented Nov 5, 2020

Thanks for your answer @rcthomas. I moved the import to the end but this does not change anything.

Could this be a problem with ConfigProxy?
I noticed that the TestServer configuration does not work as well:
grafik
It just throws a lot of 404's like 17:18:16.450 - error: [ConfigProxy] 404 GET /custom/custom.css.

Logs:

[I 2020-11-05 17:18:09.181 JupyterHub log:181] 200 GET /jupyter/hub/spawn/<user> (<user>@192.168.16.11) 38.89ms
[I 2020-11-05 17:18:11.569 JupyterHub spawner:1455] Spawning jupyterhub-singleuser --ip=0.0.0.0 --port=43023 --SingleUserNotebookApp.default_url=/lab
[I 2020-11-05 17:18:12.456 JupyterHub log:181] 302 POST /jupyter/hub/spawn/<user> -> /jupyter/hub/spawn-pending/<user> (<user>@192.168.16.11) 1017.41ms
[I 2020-11-05 17:18:12.695 JupyterHub pages:398] <user> is pending spawn
[I 2020-11-05 17:18:12.702 JupyterHub log:181] 200 GET /jupyter/hub/spawn-pending/<user> (<user>@192.168.16.11) 19.61ms
[I 2020-11-05 17:18:14.230 SingleUserNotebookApp manager:81] [nb_conda_kernels] enabled, 18 kernels found
[I 2020-11-05 17:18:15.555 SingleUserNotebookApp extension:162] JupyterLab extension loaded from /opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterlab
[I 2020-11-05 17:18:15.555 SingleUserNotebookApp extension:163] JupyterLab application directory is /opt/modules/i12g/anaconda/envs/jupyterhub/share/jupyter/lab
[I 2020-11-05 17:18:15.784 SingleUserNotebookApp __init__:34] [Jupytext Server Extension] Deriving a JupytextContentsManager from LargeFileManager
[I 2020-11-05 17:18:15.786 SingleUserNotebookApp mixins:558] Starting jupyterhub-singleuser server version 1.2.1
[I 2020-11-05 17:18:15.792 SingleUserNotebookApp log:181] 302 GET /jupyter/user/<user>/ -> /jupyter/user/<user> (@192.168.16.11) 2.45ms
[I 2020-11-05 17:18:15.792 JupyterHub base:894] User <user> took 4.340 seconds to start
[I 2020-11-05 17:18:15.793 JupyterHub proxy:262] Adding user <user> to proxy /jupyter/user/<user>/ => http://<host>:43023
[I 2020-11-05 17:18:15.797 JupyterHub log:181] 200 GET /jupyter/hub/api (@192.168.16.11) 1.27ms
[I 2020-11-05 17:18:15.798 SingleUserNotebookApp notebookapp:2209] Serving notebooks from local directory: /data/nasif12/home_if12/<user>
[I 2020-11-05 17:18:15.798 SingleUserNotebookApp notebookapp:2209] Jupyter Notebook 6.1.4 is running at:
[I 2020-11-05 17:18:15.798 SingleUserNotebookApp notebookapp:2209] http://<host>:43023/
[I 2020-11-05 17:18:15.798 SingleUserNotebookApp notebookapp:2210] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2020-11-05 17:18:15.798 JupyterHub users:609] Server <user> is ready
[I 2020-11-05 17:18:15.799 JupyterHub log:181] 200 GET /jupyter/hub/api/users/<user>/server/progress (<user>@192.168.16.11) 2770.73ms
[I 2020-11-05 17:18:15.814 SingleUserNotebookApp mixins:539] Updating Hub with activity every 300 seconds
[I 2020-11-05 17:18:15.854 JupyterHub log:181] 200 POST /jupyter/hub/api/users/<user>/activity (<user>@192.168.16.11) 37.39ms
[I 2020-11-05 17:18:15.924 JupyterHub log:181] 302 GET /jupyter/hub/spawn-pending/<user> -> /jupyter/user/<user>/ (<user>@192.168.16.11) 11.64ms
[I 2020-11-05 17:18:16.033 SingleUserNotebookApp log:181] 302 GET /jupyter/user/<user>/ -> /jupyter/user/<user> (@192.168.16.11) 1.37ms
[W 2020-11-05 17:18:16.205 SingleUserNotebookApp log:181] 404 GET /jupyter/user/<user> (@192.168.16.11) 69.36ms
17:18:16.429 - error: [ConfigProxy] 404 GET /static/components/jquery-ui/themes/smoothness/jquery-ui.min.css?v=fb45616eef2c454960f91fcd2a04efeda84cfacccf0c5d741ba2793dc1dbd6d3ab01aaae6485222945774c7d7a9a2e9fb87e0d8ef1ea96893aa6906147a371bb
17:18:16.434 - error: [ConfigProxy] 404 GET /static/components/jquery-typeahead/dist/jquery.typeahead.min.css?v=5edf53bf6bb9c3b1ddafd8594825a7e2ed621f19423e569c985162742f63911c09eba2c529f8fb47aebf27fafdfe287d563347f58c1126b278189a18871b6a9a
17:18:16.435 - error: [ConfigProxy] 404 GET /static/style/style.min.css?v=56dfd556850eb17b7998c6828467598a322b41593edc758739c66cb2c3fea98f23d0dd8bf8b9b0f5d67bb976a50e4c34f789fe640cbb440fa089e1bf5ec170bd
17:18:16.450 - error: [ConfigProxy] 404 GET /custom/custom.css
17:18:16.451 - error: [ConfigProxy] 404 GET /static/components/es6-promise/promise.min.js?v=bea335d74136a63ae1b5130f5ac9a50c6256a5f435e6e09fef599491a84d834a8b0f011ca3eaaca3b4ab6a2da2d3e1191567a2f171e60da1d10e5b9d52f84184
17:18:16.452 - error: [ConfigProxy] 404 GET /static/components/react/react.production.min.js?v=9a0aaf84a316c8bedd6c2ff7d5b5e0a13f8f84ec02442346cba0b842c6c81a6bf6176e64f3675c2ebf357cb5bb048e0b527bd39377c95681d22468da3d5de735
[I 2020-11-05 17:18:16.461 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fcomponents%2Fjquery-ui%2Fthemes%2Fsmoothness%2Fjquery-ui.min.css%3Fv%3Dfb45616eef2c454960f91fcd2a04efeda84cfacccf0c5d741ba2793dc1dbd6d3ab01aaae6485222945774c7d7a9a2e9fb87e0d8ef1ea96893aa6906147a371bb (@192.168.16.11) 24.05ms
[I 2020-11-05 17:18:16.464 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fcomponents%2Fjquery-typeahead%2Fdist%2Fjquery.typeahead.min.css%3Fv%3D5edf53bf6bb9c3b1ddafd8594825a7e2ed621f19423e569c985162742f63911c09eba2c529f8fb47aebf27fafdfe287d563347f58c1126b278189a18871b6a9a (@192.168.16.11) 24.11ms
[I 2020-11-05 17:18:16.465 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fstyle%2Fstyle.min.css%3Fv%3D56dfd556850eb17b7998c6828467598a322b41593edc758739c66cb2c3fea98f23d0dd8bf8b9b0f5d67bb976a50e4c34f789fe640cbb440fa089e1bf5ec170bd (@192.168.16.11) 24.83ms
[I 2020-11-05 17:18:16.471 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fcustom%2Fcustom.css (@192.168.16.11) 4.16ms
[I 2020-11-05 17:18:16.473 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fcomponents%2Fes6-promise%2Fpromise.min.js%3Fv%3Dbea335d74136a63ae1b5130f5ac9a50c6256a5f435e6e09fef599491a84d834a8b0f011ca3eaaca3b4ab6a2da2d3e1191567a2f171e60da1d10e5b9d52f84184 (@192.168.16.11) 4.93ms
[I 2020-11-05 17:18:16.480 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fcomponents%2Freact%2Freact.production.min.js%3Fv%3D9a0aaf84a316c8bedd6c2ff7d5b5e0a13f8f84ec02442346cba0b842c6c81a6bf6176e64f3675c2ebf357cb5bb048e0b527bd39377c95681d22468da3d5de735 (@192.168.16.11) 11.05ms
17:18:16.484 - error: [ConfigProxy] 404 GET /static/components/react/react-dom.production.min.js?v=6fc58c1c4736868ff84f57bd8b85f2bdb985993a9392718f3b4af4bfa10fb4efba2b4ddd68644bd2a8daf0619a3844944c9c43f8528364a1aa6fc01ec1b8ae84
17:18:16.488 - error: [ConfigProxy] 404 GET /static/components/create-react-class/index.js?v=894ad57246e682b4cfbe7cd5e408dcd6b38d06af4de4f3425991e2676fdc2ef1732cbd19903104198878ae77de12a1996de3e7da3a467fb226bdda8f4618faec
[I 2020-11-05 17:18:16.491 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fcomponents%2Freact%2Freact-dom.production.min.js%3Fv%3D6fc58c1c4736868ff84f57bd8b85f2bdb985993a9392718f3b4af4bfa10fb4efba2b4ddd68644bd2a8daf0619a3844944c9c43f8528364a1aa6fc01ec1b8ae84 (@192.168.16.11) 3.26ms
[I 2020-11-05 17:18:16.493 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fcomponents%2Fcreate-react-class%2Findex.js%3Fv%3D894ad57246e682b4cfbe7cd5e408dcd6b38d06af4de4f3425991e2676fdc2ef1732cbd19903104198878ae77de12a1996de3e7da3a467fb226bdda8f4618faec (@192.168.16.11) 1.84ms
17:18:16.494 - error: [ConfigProxy] 404 GET /static/components/requirejs/require.js?v=d37b48bb2137faa0ab98157e240c084dd5b1b5e74911723aa1d1f04c928c2a03dedf922d049e4815f7e5a369faa2e6b6a1000aae958b7953b5cc60411154f593
[I 2020-11-05 17:18:16.498 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fcomponents%2Frequirejs%2Frequire.js%3Fv%3Dd37b48bb2137faa0ab98157e240c084dd5b1b5e74911723aa1d1f04c928c2a03dedf922d049e4815f7e5a369faa2e6b6a1000aae958b7953b5cc60411154f593 (@192.168.16.11) 1.95ms
17:18:16.609 - error: [ConfigProxy] 404 GET /static/base/images/favicon.ico?v=50afa725b5de8b00030139d09b38620224d4e7dba47c07ef0e86d4643f30c9bfe6bb7e1a4a1c561aa32834480909a4b6fe7cd1e17f7159330b6b5914bf45a880
[I 2020-11-05 17:18:16.614 JupyterHub log:181] 200 GET /jupyter/hub/error/404?url=%2Fstatic%2Fbase%2Fimages%2Ffavicon.ico%3Fv%3D50afa725b5de8b00030139d09b38620224d4e7dba47c07ef0e86d4643f30c9bfe6bb7e1a4a1c561aa32834480909a4b6fe7cd1e17f7159330b6b5914bf45a880 (@192.168.16.11) 2.14ms
[I 2020-11-05 17:18:23.338 JupyterHub proxy:320] Checking routes

@Hoeze
Copy link
Author

Hoeze commented Nov 5, 2020

@rcthomas Here is a minimal example configuration that does not work:

import batchspawner

c.JupyterHub.bind_url = 'http://:8686/jupyter/'
c.JupyterHub.default_url = 'home'
c.JupyterHub.hub_connect_ip = 'ouga01'
c.JupyterHub.hub_ip = '0.0.0.0'
c.JupyterHub.hub_port = 8687
c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'

c.ProfilesSpawner.profiles = [
  (
    "Test server",
    'local-test',
    'jupyterhub.spawner.LocalProcessSpawner',
    {
      'ip':'0.0.0.0'
    }
  )
]

This has no user authentication, etc.
I run this with jupyterhub -f config_min.py.

At least this minimal example should work out-of-the-box, right?

@rcthomas
Copy link
Contributor

rcthomas commented Nov 5, 2020

This goes away if I specify traitlets<5 in the build. Can you try that. See also: jupyterhub/jupyterhub#3170 --- maybe this is the reproducer we're looking for there.

An even more minimal reproducer is just:

c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'

c.ProfilesSpawner.profiles = [
  (
    "Test server",
    'local-test',
    'jupyterhub.spawner.LocalProcessSpawner',
    {
      'ip':'0.0.0.0'
    }
  )
]

@rcthomas
Copy link
Contributor

rcthomas commented Nov 5, 2020

Full details. I tested by building this Dockerfile. To make it fail, take out traitlets<5

FROM ubuntu:focal
LABEL maintainer="Rollin Thomas <rcthomas@lbl.gov>"
WORKDIR /srv

ENV DEBIAN_FRONTEND noninteractive
ENV LANG C.UTF-8

RUN \
    apt-get update          &&  \
    apt-get upgrade --yes   &&  \
    apt-get install --yes       \
        --no-install-recommends \
        git                     \
        npm                     \
        python3-pip             \
        python3-setuptools      \
        tzdata                  \
        vim

ENV TZ=America/Los_Angeles
RUN \
    ln -snf /usr/share/zoneinfo/$TZ /etc/localtime  &&  \
    echo $TZ > /etc/timezone

RUN \
    pip3 install            \
        --no-cache-dir      \
        jupyterhub          \
        jupyterlab          \
        batchspawner        \
        git+https://github.com/jupyterhub/wrapspawner   \
        'traitlets<5'

RUN \
    npm install -g configurable-http-proxy

# Some dummy users

RUN \
    adduser -q --gecos "" --disabled-password user1     && \
    echo user1:user1 | chpasswd

ADD jupyterhub_config.py /srv/jupyterhub_config.py

For the config it's just

c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'

c.ProfilesSpawner.profiles = [
  (
    "Test server",
    'local-test',
    'jupyterhub.spawner.LocalProcessSpawner',
    {
      'ip':'0.0.0.0'
    }
  )
]

@Hoeze
Copy link
Author

Hoeze commented Nov 5, 2020

Thank you so much @rcthomas!
Yes, installing traitlets<5 is EXACTLY the solution for both problems.

@hakasapl
Copy link

hakasapl commented Mar 7, 2021

Is there somewhere I can see the logs for wrapspawner? I'd like to track down a fix for this. Can't seem to find them on the client or the hub - if possible I'd like to use traitlets v5 as it's causing issues with other packages to downgrade to 4.

@rcthomas
Copy link
Contributor

rcthomas commented Mar 8, 2021

@hakasapl it would normally be in the regular JupyterHub log output. But if you look in wrapspawner you'll see that there are no log messages coming from it. Since traitlets<5 is working just fine for me I was going to let this slide till I absolutely had to do anything about it or it got fixed. My brilliant plan for that day was to add logging messages and trace through wrapspawner and hub code. Maybe you'll give that a shot? Does the reproducer work for you?

@hakasapl
Copy link

hakasapl commented Mar 8, 2021

Sounds good, I'll try to add some debug messages and see where I get. The main issue I'm running into is that conda doesn't let me install both traitlets<5 and python 3.9 - 3.8 works fine, so it's fine for now, but may present an issue in the near future. I believe PyPi doesn't have this issue, you can install traitlets<5 and python 3.9.

@rcthomas
Copy link
Contributor

rcthomas commented Apr 8, 2021

Finally was able to get enough time to look at this. What is happening is that something has changed in traitlets 4 to traitlets 5 where HasTraits._trait_values behaves differently. This impacts the computation of common_traits, specifically the second set here (self.child_spawner._trait_values.keys() in wrapspawner):

        common_traits = (
          set(self._trait_values.keys()) &
          set(self.child_spawner._trait_values.keys()) -
          set(self.child_config.keys())
        )

Under traitlets 4.3.3 this is a long list of things, under traitlets 5.0.5 it's much smaller (basically only stuff passed to the constructor, plus ip from the config file and maybe one or two other things).

I traced through the last few years of changes in traitlets. One of the things that I noticed is that along the way was this accepted MR which adds a less private-looking trait_values(). I note that when using this (swapping in trait_values() for _trait_values) in traitlets 5.0.5 I get everything I was getting in 4.3.3. and more, including things that I seemed to need to be adding back by hand. TBH when that worked I stopped trying to track down what happened to _trait_values and just tried it, I don't know if we should come back to that.

I've tested this on the reproducer, and I'm going to come up with a PR that at least allows for the old behavior on traitlets 4 but uses this method with traitlets 5. We could try getting that second term of the computation more right for traitlets 4. Stay tuned.

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/slurm-batch-spawner-failing-with-client-process-running-config-issue/11679/2

@Hoeze
Copy link
Author

Hoeze commented Jul 12, 2022

Hi @rcthomas, is this issue solved as of today?
Can I upgrade to the latest traitlets now?

@rcthomas
Copy link
Contributor

@Hoeze the part about about common_traits was fixed in release 1.0.1 I believe:

https://github.com/jupyterhub/wrapspawner/releases/tag/v1.0.1

So I think the traitlets-related part of this is fixed with that release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants