Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify that environment's dask-gateway matches binder's dask-gateway #144

Open
TomAugspurger opened this issue Apr 22, 2020 · 10 comments
Open

Comments

@TomAugspurger
Copy link
Member

We can get into poor situations when launching an image that's incompatible with the version of dask-gateway-server running on the binder. This happened recently with https://github.com/TomAugspurger/pangeo-dask-gateway where

  1. That was built initially with dask-gateway 0.6.1
  2. We bumped the binder to dask-gateway 0.7.1 (incompatible update)
  3. We tried to launch a binder session with the old image

Two possible solutions

  1. Clear our binder image cache (this doesn't work if the user has pinned dask-gateway).
  2. Insert a hook when spawning a user to check that if Dask Gateway is installed, that it's compatible with the dask-gateway-server running on the binder.

We should be able to hack something together that makes a KubeSpawner that just does a simple check for the dask-gateway version in the image when we start. We check that version against the minimum supported version (0.7.1 for now. Bump that when there's an incompatible upgrade to dask-gateway). If there's an issue we raise an informative error message.

@scottyhq
Copy link
Member

We can get into poor situations when launching an image that's incompatible with the version of dask-gateway-server running on the binder

Thanks for opening @TomAugspurger. Looks like in TomAugspurger/pangeo-dask-gateway#2, the situation leads to ValueError: 404: Not Found or ssl.SSLError: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error .

Insert a hook when spawning a user to check that if Dask Gateway is installed, that it's compatible with the dask-gateway-server running on the binder.

Rather than a kubespawner change, maybe this could be build into dask_gateway itself? I'm wondering if gateway.new_cluster() can do a get_vertsions(check=True) test https://distributed.dask.org/en/latest/api.html#distributed.Client.get_versions?

@TomAugspurger
Copy link
Member Author

Dask does a Client.get_version check by default now. I think the issue is a bit earlier. The versions of dask-gateway client in the jupyterhub pod and the dask-gateway-server running on the binder don't match, and so they can't even communicate. Going forward dask-gateway's protocol will be less version dependent, but for the best UX I think we'd want something here, since we're the only ones that know the versions of dask-gateway running in the binder (and we can check the dask-gateway in the user's image by importing it).

@scottyhq
Copy link
Member

Ah, I see. If you're up for adding a hook function that could certainly be useful. Another simple option might be to just add a highly visible jupyterhub page announcement (https://jupyterhub.readthedocs.io/en/stable/reference/templates.html#page-announcements). Such as Dask-gateway support requires version > 0.7.1, be sure to check your version. Implementing these announcement has come up a few times in the past for various temporary issues such as this..

@TomAugspurger
Copy link
Member Author

@consideRatio do you have a feeling where in the KubeSpawner lifecycle this would be best to insert? It has to be after the image has been downloaded. The most robust way to do this is to execute a small command in the container like dask-gateway-worker --version to get the version of dask-gateway in the image, and then we'd parse the output and compare it against some minimum required version.

@consideRatio
Copy link
Member

@TomAugspurger I'm not sure, I'm raising questions rather than answering :p

Rubber ducking a bit.

  1. I understand the goal to be informing users that aim to startup a repo with the binderhub instance that it is no longer usable with Dask.
  2. I understand this as a criteria check on the already built or freshly built image which require require a container of the image to be started.

Thoughts:

  • Since the criteria check requires a running container, it may come with some penalty of starting one, but that is already done during building and when the user launch the pod.
  • If the user has launched the pod, it could be checked with a kubectl exec equivalent with a k8s pod's lifecycleHooks - these are very messy though, and especially if you want to allow sudo and have started the container as root but then transitioned to a jovyan user etc, because whenever you do kubectl exec or start a exec type of a lifecycleHook, you will start as the user the container originally was started as.
  • It could be verifiable during build probably, and interact perhaps with repo2docker. Hmmm...

My gut feeling is that this may introduce so much complexity that it isn't worth it, but perhaps an approach that would be nice would be to extract some along with the build stage perhaps also combined with a diagnostic on already available images. Such final step of the build or standalone diagnostic run could start a container with the intention to extract som version info, and by doing that you could for example remove the cached image to trigger a rebuild or similarly. Hmmm...

@TomAugspurger
Copy link
Member Author

Thanks.

It could be verifiable during build probably, and interact perhaps with repo2docker. Hmmm...

I don't think this would be an option. Imagine: I have an environment built today that includes Dask Gateway 0.7.1. Things are working fine. Suppose that a year from now we're running dask-gateway 0.8 on the binderhub, which includes a breaking change and is incompatible with dask-gateway 0.7.1. We'd like to inform the user of that pod that dask-gateway won't work and refuse to start the pod.

Basically, I want something like

class DaskGatewayCheckKubeSpawner(KubeSpawner):
    def start(self):
        # download image...
        self.check_dask_gateway_image()
        # continue with the rest of start

Perhaps I can get away with just super().start() and then inserting my check. I'll just try that and see what breaks :) I imagine there are edge cases with environment names, custom start scripts...

@consideRatio
Copy link
Member

Suppose that a year from now we're running dask-gateway 0.8 on the binderhub, which includes a breaking change and is incompatible with dask-gateway 0.7.1. We'd like to inform the user of that pod that dask-gateway won't work and refuse to start the pod.

I figured that would be a way to extract information, and then how to act on that could be entirely left to other logic. Information of relevance to extract could for example be a mapping of docker image hash -> versions map. Then logic could be made to act based on that, directly, or in the future.

@TomAugspurger
Copy link
Member Author

TomAugspurger commented May 11, 2020 via email

@scottyhq
Copy link
Member

. We'd like to inform the user of that pod that dask-gateway won't work and refuse to start the pod.

@TomAugspurger -thanks for looking into this.

  1. If you don't start the pod, is there a nice way to create a custom BinderHub error page?:
    WARNING: this binderhub is running dask-gateway-server=$DASK_GATEWAY_SERVER_VERSION, but your environment has dask-gateway=X. Upgrade to ensure compatibility

  2. Alternatively if we do launch the pod (maybe because the non-daskgateway parts are still worth running) A closable template announcement would be great. It's unclear to me if thses are easy to implement on top of the jupyterlab landing page, so perhaps over-writing the workspace with a big markdown message could work (https://github.com/pangeo-data/pangeo-example-notebooks/blob/master/binder/jupyterlab-workspace.json)?

@TomAugspurger
Copy link
Member Author

FWIW, I looked into this a bit and didn't see an easy to to achieve it in the binder startup process.

I wonder if we're trying to solve this problem at the wrong level. I believe that Dask-Gateway now has versioned API requests, so it may have already solved the problem for us. When the client and different, but compatible versions things will be OK. When they have incompatible versions (in our case, the client is too old for the server), the server will presumably return a nice error message saying that the client is out of date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants