Bring into line with the Kubernetes operator design pattern #839

qcaas-nhs-sjt · 2024-05-07T11:47:23Z

Proposed change

At present, the kubespawner is self contained making the hub actually being performed by a single service that is extended by individual organisations to cover off a variety of different solutions. While this does make it easier in some ways, it does not make a closed and easily extendable solution that can meet the needs of organisations without a lot of fudging, and it also adds in a number of issues which could be exploited should the service ever be breached. For one the same service that the user interacts with can effectively make changes to pod definitions and therefore run whatever workloads they want. This could then also potentially give the user the ability to create other breaches which would undermine the solution.

Alternative options

We can of course continue to operate on the current solution, however I feel that this is risky as people add more and more functionality to the existing hub it will be much harder to split the solution out as the solution continues to grow.

The proposed solution gives us a microservice implementation and we can extend by either:

extending the operator
creating additional operators

It also could potentially allow organisations to build their own microservices on other code bases and languages if needed.

Who would use this feature?

Developers and System Administrators would benefit as it is much easier to develop microservices based on a template and add your own logic than it is to build custom python files in the current manner. Kubernetes developers are also usually familiar with the operator model. This also has benefits when debugging as they would be able to track the assets through the process.
Extension Builders would benefit as it would be easier to extend this solution by using additional microservices and controllers in some instances. Say for example we wanted to build a multi-cluster implementation we could add in some additional services to handle this securely.
IT Security officers would benefit as they could see appropriate isolation of services and validation can easily be implemented that can enforce that the operator will only provision images and workloads that organisations want to be provisioned, rather than anything.

Suggest a solution

I propose that we break the hub down in kubernetes into multiple services and utilise the kubernetes operator design pattern and implement a more refined security policy for these services. As we use the solution in secure data environments and trusted research environments it is essential that this is secure and there are other benefits such as scalability and extensibility that could be advantageous as a result of the new model.

To avoid this being a breaking change in existing applications, we could split the functionality of the current kubespawner into multiple modules that could be imported into the core kubespawner project. We could then have a feature flag which would then allow the hub to use the legacy framework or the new one. Those wanting to use the old framework could continue to do so, but these would remain offline until the organisation is ready to migrate to the new one. This would also allow us to provide bug fixes and so on to both pathways at the same time.

sequenceDiagram
    participant Hub
    participant API as Kubernetes Operator
    participant Operator
    
    Hub ->> Hub: User Logs into Jupyterhub and selects workspace
    Hub ->> API: Create Custom Resource
    Operator ->> API: Fetch Updated Custom Resources
    Operator ->> API: Create Pod and wait for readiness
    Operator ->> API: Update Status of Custom Resource to PodReady
    Hub ->> API: Fetch Status
    Hub ->> Hub: Redirect User Session to Pod

Custom Resource Definition

We would define a custom resource definition that would define a Jupyter Notebook Instance resource, this CRD would form the basis of everything else we do, so we should include fields for custom properties to aid people in development of their own custom extensions.

I would also suggest leveraging the Events resource within kubernetes to log the actions that the operator has taken and why it has chosen to make certain decisions. This will increase audit and make the solution more palatable for use in organisations wishing to comply with security standards and will also provide a simple framework for allowing those extending the solution to add their own events.

Python Module for interoperability models

A python module could be published and potentially auto generated from the custom resource definition, by keeping these models in a separate module and making these available, it would allow others to develop their own controllers, etc increasing extensibility and the ease under which we can code new modules.

Jupyter Notebooks Operator

The operator would read in the definition of the jupyter notebook instance and use this definition to generate the pods and/or other resources needed to support the implementation.

This module could use kopf or another module to provide the framework for the operator, this could then implement the tasks that are needed when the custom resource is created, amended or destroyed by the hub. Many of these functions will exist in the current kubespawner code and will just need to be split off into their own modules and referenced by the services as required.

I would also suggest leveraging the Events resource within kubernetes to log the actions that the operator has taken and why it has chosen to make certain decisions. This will increase audit and make the solution more palatable for use in organisations wishing to comply with security standards and will also provide a simple framework for allowing those extending the solution to add their own events.

Helm Chart Changes

In addition to the changes above, the z2jh helm charts would need to be amended in order to provide options for this under both legacy and new framework. The operator would need to be created along with supporting roles, bindings, etc.

The hub role would also be amended conditionally that when using the new framework it would only give the ability to the hub to read/write the new Jupyter Notebook definitions and all other permissions on the kubernetes API could be removed.

A new role would be created for the operator which would give it the ability to create pods, persistent volume claims, secrets, services, etc as the hub does currently

I appreciate that this all seems like a lot of work, however I do believe that it will provide a better option for not only security, but will also make the solution easier to extend, eliminate the need for customised python files from being injected into the hub service and could increase the capabilities of this as a product and other products in support of it.

welcome · 2024-05-07T11:47:26Z

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

manics · 2024-05-07T21:35:21Z

This all sounds really cool! I think the main question is how we go about developing and exploring this in a minimally disruptive way, given that it's a very fundamental change to KubeSpawner.

Do you think this can be developed in a separate repository, with a new spawner that subclasses KubeSpawner? We can make changes to KubeSpawner to support overriding existing methods, or parts of methods (either by splitting up existing methods or adding hooks)?

The Z2JH hub image only uses released KubeSpawner versions, this also applies to dev versions of Z2JH, so a new container image would need to be built anyway (extending quay.io/jupyterhub/k8s-hub:<TAG>).

Z2JH supports pass-through configuration of traitlets (this isn't yet fully documented jupyterhub/zero-to-jupyterhub-k8s#3271) which makes it easier to configure new traitlets. We'd need to do some work to allow the hub RBAC permissions to be configurable, or alternatively we could say initially the roles must be created separately, and the role name is configure in Z2JH.

I think overall this will lead to a faster development cycle since it minimises potentially protracted design conversations about changes to existing resources, leaves room for making multiple breaking changes in the new spawner whilst trying different approaches, and once it's working we can decide whether/how to merge everything back in.

manics · 2024-05-07T22:28:32Z

Also you're more than welcome to join our next Collaboration Cafe, it's open to everyone: jupyterhub/team-compass#718

qcaas-nhs-sjt · 2024-05-09T08:06:27Z

This all sounds really cool! I think the main question is how we go about developing and exploring this in a minimally disruptive way, given that it's a very fundamental change to KubeSpawner.

Do you think this can be developed in a separate repository, with a new spawner that subclasses KubeSpawner? We can make changes to KubeSpawner to support overriding existing methods, or parts of methods (either by splitting up existing methods or adding hooks)?

I had thought this might be one way forward, my concern is that then we end up with two code bases to maintain, so I had wondered if there was a way in which we could do it inside of the current codebase with minimal disruption, however I need to better familiarise myself with the existing code base to be sure. I had considered doing a PoC on our fork to see what we can do with it but if everyone else would prefer we could start up a new repository, do it there and accept the cost of maintaining two code bases.

If we don't mind this overhead, then I would recommend that we actually create a number of new repositories for this piece of work so that the various aspects of the system can be managed as separate microservices and libraries:

Shared libraries - This would include the python models generated from the CRD's, this will act as an enabler and will allow others to easily extend in future, or to build their own operators using the model.
The kubespawner
The Jupyter Notebooks Operator

While this will initially increase complexity, it will ultimately make it easier to manage as each of the products develop

The Z2JH hub image only uses released KubeSpawner versions, this also applies to dev versions of Z2JH, so a new container image would need to be built anyway (extending quay.io/jupyterhub/k8s-hub:<TAG>).

Yes you are correct, we will of course need several new container images, for the new jupyter hub and another for the new operator

Z2JH supports pass-through configuration of traitlets (this isn't yet fully documented jupyterhub/zero-to-jupyterhub-k8s#3271) which makes it easier to configure new traitlets. We'd need to do some work to allow the hub RBAC permissions to be configurable, or alternatively we could say initially the roles must be created separately, and the role name is configure in Z2JH.

I'm happy with either model, though it should be relatively minor for us to add in a section to the z2jh templates for adding additional permissions to the existing roles. I will try to have a look at this and see if I can raise an issue on this over on that repository when I've got 5 minutes.

I think overall this will lead to a faster development cycle since it minimises potentially protracted design conversations about changes to existing resources, leaves room for making multiple breaking changes in the new spawner whilst trying different approaches, and once it's working we can decide whether/how to merge everything back in.

I agree, by keeping components split off in line with microservice development principles, it should accelerate development in the long run, though it should be noted that there will be a learning curve with the new model, so it might slow things down for a short while people get familiar with it.

Also you're more than welcome to join our next Collaboration Cafe, it's open to everyone: jupyterhub/team-compass#718

Thanks, will try to attend if I can 😄

qcaas-nhs-sjt · 2024-05-24T11:51:02Z

Following the collaboration cafe earlier and the comments above I'm contemplating trying to put together a PoC of this in my free time. As this will be a rewrite of the kubespawner I'm also considering trying to make some other improvements along the way.

One such idea would be to mirror the existing V1Pod schema and classes into our own custom resource definition, this would allow the user to change literally any aspect of the pod without having to implement custom code, the vast majority of changes could potentially be done in the code. In order to do this we could build new classes that inherit from the existing classes, these classes could implement a method that applies string replacements on any pertinent fields allowing us to supply variables into these fields based upon information collected by kubespawner. An example implementation might look like

apiGroup: kubespawner.jupyterhub.org/v1
kind: JupyterNotebookInstanceTemplate
metadata:
   name: default
   namespace: jupyterhub
spec:
   pods:
      - name: "{username}"
        weight: 100
        annotations:
           hub.jupyter.org/username: "{unescaped_username}"
        labels:
           app: jupyterhub
           component: singleuser-server
           hub.jupyter.org/username: "{username}"
        spec:
          affinity:
           nodeAffinity:
             preferredDuringSchedulingIgnoredDuringExecution:
             - preference:
                 matchExpressions:
                 - key: hub.jupyter.org/node-purpose
                   operator: In
                   values:
                   - user
           containers:
           - name: notebook
             image: myrepo/myimage:1.2.3
             env:
             - name: JUPTERHUB_USERNAME
               value: "{unescaped_username}"
             - name: JUPYTERHUB_OAUTH_CALLBACK_URL
               value: "/user/{unescaped_username}/oauth_callback"
...

When the new version of kubespawner creates a resource it will then reference this template as follows:

apiGroup: kubespawner.jupyterhub.org/v1
kind: JupyterNotebookInstance
metadata:
   name: joe-2ebloggs-40some-2eorg
   namespace: jupyterhub
spec:
   template:
     name: default
     namespace: jupyterhub   
   variables:
     unescaped_username: "joe.bloggs@some.org"
     username: "joe-2ebloggs-40some-2eorg"

The operator would in turn create a pod as follows:

apiVersion: v1
kind: Pod
metadata:
   name: joe-2ebloggs-40some-2eorg
   namespace: jupyterhub
spec:   
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
            - key: hub.jupyter.org/node-purpose
              operator: In
              values:
              - user
    containers:
    - name: notebook
      image: myrepo/myimage:1.2.3
      env:
      - name: JUPTERHUB_USERNAME
        value: "joe.bloggs@some.org"
      - name: JUPYTERHUB_OAUTH_CALLBACK_URL
        value: "/user/joe.bloggs@some.org/oauth_callback"

This methodology I think would give a huge amount of flexibility to customise the service without having to enter code itself, you could still extend the services but the needs to do so would be limited to more edge-cases.

I was wondering if anyone had any thoughts on this? or ideas of anything else that they would like to see from such a rewrite?

manics · 2024-05-24T15:19:37Z

I like this idea, I've previously thought about a Helm-template-spawner and this is along the same lines. This a lot easier to understand and extend.

Are you planning to restrict the new spawner to only handling CRDs of type kubespawner.jupyterhub.org, or are you thinking of making the spawner handle any templated k8s manifest, so e.g. an admin provides a directory of templates and the spawner processes them all for each user?

As mentioned in the collab cafe don't feel obliged to subclass KubeSpawner if a fresh start is easier. If we want to make this available in Z2JH as an option we can easily install both spawners in the hub image.

qcaas-nhs-sjt added the enhancement label May 7, 2024

qcaas-nhs-sjt mentioned this issue May 7, 2024

Use a single JupyterHub instance to spawn pods in multiple clusters lsc-sde/iac-flux-jupyter#34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring into line with the Kubernetes operator design pattern #839

Bring into line with the Kubernetes operator design pattern #839

qcaas-nhs-sjt commented May 7, 2024

welcome bot commented May 7, 2024

manics commented May 7, 2024

manics commented May 7, 2024

qcaas-nhs-sjt commented May 9, 2024

qcaas-nhs-sjt commented May 24, 2024

manics commented May 24, 2024

Bring into line with the Kubernetes operator design pattern #839

Bring into line with the Kubernetes operator design pattern #839

Comments

qcaas-nhs-sjt commented May 7, 2024

Proposed change

Alternative options

Who would use this feature?

Suggest a solution

Custom Resource Definition

Python Module for interoperability models

Jupyter Notebooks Operator

Helm Chart Changes

welcome bot commented May 7, 2024

manics commented May 7, 2024

manics commented May 7, 2024

qcaas-nhs-sjt commented May 9, 2024

qcaas-nhs-sjt commented May 24, 2024

manics commented May 24, 2024