Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Health API resply when node is SchedulingDisabled #1705

Open
rgaiacs opened this issue May 24, 2023 · 1 comment
Open

Change Health API resply when node is SchedulingDisabled #1705

rgaiacs opened this issue May 24, 2023 · 1 comment

Comments

@rgaiacs
Copy link
Contributor

rgaiacs commented May 24, 2023

Consider the scenario of a small Kubernetes cluster (2 nodes). Node 1 runs binderhub API and Node 2 runs repo2docker and JupyterHub. When container image cleaning starts in Node 2, the is marked as SchedulingDisabled. Without any other node able to run repo2docker and JupyterHub, the health API should return unhealth.

cc @arnim

Steps to reproduce

$ curl https://notebooks.gesis.org/binder/health | python3 -m json.tool
{
    "ok": true,
    "checks": [
        {
            "service": "Docker registry",
            "ok": true
        },
        {
            "service": "JupyterHub API",
            "ok": true
        },
        {
            "service": "Pod quota",
            "total_pods": 32,
            "build_pods": 0,
            "user_pods": 32,
            "quota": 40,
            "ok": true,
            "_ignore_failure": true
        }
    ]
}
$ kubectl get nodes
NAME             STATUS   ROLES           AGE   VERSION
spko-css-app03   Ready    <none>          34d   v1.26.3
svko-ilcm03      Ready    control-plane   48d   v1.26.3
$ kubectl cordon spko-css-app03
node/spko-css-app03 cordoned
$ kubectl get nodes
NAME             STATUS                     ROLES           AGE   VERSION
spko-css-app03   Ready,SchedulingDisabled   <none>          34d   v1.26.3
svko-ilcm03      Ready                      control-plane   48d   v1.26.3
$ curl https://notebooks.gesis.org/binder/health | python3 -m json.tool

Observed Output

{
    "ok": true,
    "checks": [
        {
            "service": "Docker registry",
            "ok": true
        },
        {
            "service": "JupyterHub API",
            "ok": true
        },
        {
            "service": "Pod quota",
            "total_pods": 33,
            "build_pods": 0,
            "user_pods": 33,
            "quota": 40,
            "ok": true,
            "_ignore_failure": true
        }
    ]
}

Expected Output

{
    "ok": false,
    "checks": [
        {
            "service": "Docker registry",
            "ok": true
        },
        {
            "service": "JupyterHub API",
            "ok": true
        },
        {
            "service": "Pod quota",
            "total_pods": 33,
            "build_pods": 0,
            "user_pods": 33,
            "quota": 40,
            "ok": false,
            "_ignore_failure": true
        }
    ]
}
@minrk
Copy link
Member

minrk commented May 24, 2023

This might be a little tricky to implement. But I suppose the builder class cloud have a "builders available" method? The abstractions make it quite tricky, because how to check if it's true will depend on how it's deployed (i.e. in the helm config, outside the BinderHub config). You'll need to know which nodes to check for their scheduling status, if any.

@rgaiacs rgaiacs changed the title Change API when node is SchedulingDisabled Change Health API resply when node is SchedulingDisabled May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants