Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync GESIS node Network Policy with mybinder.org federation #2788

Open
rgaiacs opened this issue Oct 25, 2023 · 5 comments
Open

Sync GESIS node Network Policy with mybinder.org federation #2788

rgaiacs opened this issue Oct 25, 2023 · 5 comments

Comments

@rgaiacs
Copy link
Collaborator

rgaiacs commented Oct 25, 2023

GESIS node configuration is deployed using GitLab CI (similar to GitHub Actions). The core steps are

  1. Install dependencies using Ansible, see https://github.com/gesiscss/orc2/blob/main/.gitlab-ci.yml#L84
  2. Install Hell Chart, see https://github.com/gesiscss/orc2/blob/main/.gitlab-ci.yml#L211

GESIS node is running Kubernetes with Calico as Container Network Interface (CNI) plugin.

The Helm Chart loads

  1. https://github.com/gesiscss/orc2/blob/main/helm/copy-of-mybinder.org-deploy/secrets/config/common/common.yaml that is a copy of https://github.com/jupyterhub/mybinder.org-deploy/blob/main/secrets/config/common/common.yaml
  2. https://github.com/gesiscss/orc2/blob/main/helm/copy-of-mybinder.org-deploy/secrets/config/common/bans.yaml that is generated using https://github.com/jupyterhub/mybinder.org-deploy/blob/main/secrets/ban.py

I think that I'm missing an important step here. Any help?

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Oct 26, 2023

https://github.com/jupyterhub/mybinder.org-deploy/blob/main/mybinder/templates/netpol.yaml is been deploy on GESIS cluster as

Name:         binder-users
Namespace:    gesis
Created on:   2023-10-26 10:51:01 +0200 CEST
Labels:       app=binderhub
              app.kubernetes.io/managed-by=Helm
              chart=gesis-3.0.0
              component=user-netpol
              release=binderhub
Annotations:  meta.helm.sh/release-name: binderhub
              meta.helm.sh/release-namespace: gesis
Spec:
  PodSelector:     component in (dind,image-builder,singleuser-server),release=binderhub
  Allowing ingress traffic:
    <none> (Selected pods are isolated for ingress connectivity)
  Allowing egress traffic:
    To Port: 53/TCP
    To Port: 53/UDP
...

(part of the specs ommited) and the Docker-in-Docker pod is

Name:             binderhub-dind-brbmp
Namespace:        gesis
Priority:         0
Service Account:  default
Node:             spko-css-app03/194.95.75.12
Start Time:       Wed, 04 Oct 2023 13:44:30 +0200
Labels:           app=binder
                  component=image-builder
                  controller-revision-hash=75bb485d7f
                  heritage=Helm
                  name=binderhub-dind
                  pod-template-generation=2
                  release=binderhub
Annotations:      <none>
Status:           Running
IP:               10.244.3.163
IPs:
  IP:           10.244.3.163
Controlled By:  DaemonSet/binderhub-dind
Containers:
  dind:
    Container ID:  containerd://65f15f3de0865306f100afae7e5d4fdbb9d9c8fdfe0283825667e911129dac6b
    Image:         docker.io/library/docker:24.0.6-dind
    Image ID:      docker.io/library/docker@sha256:f28ffd78641197871fea8fd679f2bf8a1cdafa4dc3f1ce3e700ad964aac2879a
    Port:          <none>
    Host Port:     <none>
    Args:
      dockerd
      --storage-driver=overlay2
      -H unix:///var/run/dind/docker.sock
      --mtu=1000
    State:          Running
      Started:      Tue, 24 Oct 2023 09:15:19 +0200
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sun, 22 Oct 2023 02:41:27 +0200
      Finished:     Tue, 24 Oct 2023 09:15:18 +0200
    Ready:          True
    Restart Count:  3
    Limits:
      cpu:     4
      memory:  4Gi
    Requests:
      cpu:        500m
      memory:     1Gi
    Environment:  <none>
    Mounts:
      /var/lib/docker from dockerlib-dind (rw)
      /var/run/dind from run-dind (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dblrz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  dockerlib-dind:
    Type:          HostPath (bare host directory volume)
    Path:          /orc2_data/repo2docker
    HostPathType:  DirectoryOrCreate
  run-dind:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/dind
    HostPathType:  DirectoryOrCreate
  kube-api-access-dblrz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              binderhub=true
Tolerations:                 hub.jupyter.org/dedicated=user:NoSchedule
                             hub.jupyter.org_dedicated=user:NoSchedule
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:                      <none>

PodSelector looks good to me. From the Docker-in-Docker pod, I still run

wget http://139.162.202.16

successfully.

Calico/Tigera Operator is running

Name:             tigera-operator-f6bb878c4-p4ghb
Namespace:        tigera-operator
Priority:         0
Service Account:  tigera-operator
Node:             svko-css-app01/194.95.75.9
Start Time:       Wed, 25 Oct 2023 16:49:26 +0200
Labels:           k8s-app=tigera-operator
                  name=tigera-operator
                  pod-template-hash=f6bb878c4
Annotations:      <none>
Status:           Running
IP:               194.95.75.9
IPs:
  IP:           194.95.75.9
Controlled By:  ReplicaSet/tigera-operator-f6bb878c4
Containers:
  tigera-operator:
    Container ID:  containerd://f1440a31e51de0a0ad30b367318a0c972382c287652bc29eb497049b296a899b
    Image:         quay.io/tigera/operator:v1.30.7
    Image ID:      quay.io/tigera/operator@sha256:76715143082b0c45aa6fae57b8a2eac0213bef6ffb5c686e456a31b9a35069b3
    Port:          <none>
    Host Port:     <none>
    Command:
      operator
    State:          Running
      Started:      Wed, 25 Oct 2023 16:49:30 +0200
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      WATCH_NAMESPACE:
      POD_NAME:                            tigera-operator-f6bb878c4-p4ghb (v1:metadata.name)
      OPERATOR_NAME:                       tigera-operator
      TIGERA_OPERATOR_INIT_IMAGE_VERSION:  v1.30.7
    Mounts:
      /var/lib/calico from var-lib-calico (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-24jnn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:
  kube-api-access-24jnn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 :NoExecute op=Exists
                             :NoSchedule op=Exists
Events:                      <none>

Does anyone see what I am missing? Thanks!

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Oct 26, 2023

I might have discovered the missing piece:

apiVersion: v1
items:
- apiVersion: operator.tigera.io/v1
  kind: TigeraStatus
  metadata:
    creationTimestamp: "2023-10-26T08:49:01Z"
    generation: 1
    name: apiserver
    resourceVersion: "47435557"
    uid: f8a49c99-888e-4d96-ae58-a96fab6cbb94
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2023-10-26T08:49:06Z"
      message: 'Waiting for Installation to be ready: '
      observedGeneration: 1
      reason: ResourceNotReady
      status: "True"
      type: Degraded
- apiVersion: operator.tigera.io/v1
  kind: TigeraStatus
  metadata:
    creationTimestamp: "2023-10-26T08:49:01Z"
    generation: 1
    name: calico
    resourceVersion: "47435556"
    uid: 05bf3d62-33f9-43c9-897c-f65083b742e7
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2023-10-26T08:49:06Z"
      message: 'Error querying installation: Could not resolve CalicoNetwork IPPool
        and kubeadm configuration: IPPool 192.168.0.0/16 is not within the platform''s
        configured pod network CIDR(s) [10.244.0.0/16]'
      observedGeneration: 1
      reason: ResourceReadError
      status: "True"
      type: Degraded
kind: List
metadata:
  resourceVersion: ""

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Oct 27, 2023

I fixed the CalicoNetwork IPPool in GESIS node. I tested using deny all configuration and Network Policy is working. The problem now is that the hub pod can't connect with a existing single user pod and user redict fails. Can I have some help debuging the Network Policies?

The Binderhub namespace has 3 Network Policies:

NAME         POD-SELECTOR                                                   AGE
hub          app=jupyterhub,component=hub,release=binderhub                 126m
proxy        app=jupyterhub,component=proxy,release=binderhub               126m
singleuser   app=jupyterhub,component=singleuser-server,release=binderhub   126m

hub Network Policy

Name:         hub
Namespace:    gesis
Created on:   2023-10-27 12:01:12 +0200 CEST
Labels:       app=jupyterhub
              app.kubernetes.io/managed-by=Helm
              chart=jupyterhub-3.1.0
              component=hub
              heritage=Helm
              release=binderhub
Annotations:  meta.helm.sh/release-name: binderhub
              meta.helm.sh/release-namespace: gesis
Spec:
  PodSelector:     app=jupyterhub,component=hub,release=binderhub
  Allowing ingress traffic:
    To Port: http/TCP
    From:
      PodSelector: hub.jupyter.org/network-access-hub=true
  Allowing egress traffic:
    To Port: 8001/TCP
    To:
      PodSelector: app=jupyterhub,component=proxy,release=binderhub
    ----------
    To Port: 8888/TCP
    To:
      PodSelector: app=jupyterhub,component=singleuser-server,release=binderhub
    ----------
    To Port: 53/UDP
    To Port: 53/TCP
    To:
      IPBlock:
        CIDR: 169.254.169.254/32
        Except:
    To:
      NamespaceSelector: kubernetes.io/metadata.name=kube-system
    To:
      IPBlock:
        CIDR: 10.0.0.0/8
        Except:
    To:
      IPBlock:
        CIDR: 172.16.0.0/12
        Except:
    To:
      IPBlock:
        CIDR: 192.168.0.0/16
        Except:
    ----------
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 0.0.0.0/0
        Except: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254/32
    ----------
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 10.0.0.0/8
        Except:
    To:
      IPBlock:
        CIDR: 172.16.0.0/12
        Except:
    To:
      IPBlock:
        CIDR: 192.168.0.0/16
        Except:
    ----------
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 169.254.169.254/32
        Except:
  Policy Types: Ingress, Egress

proxy Network Policy

Name:         proxy
Namespace:    gesis
Created on:   2023-10-27 12:01:12 +0200 CEST
Labels:       app=jupyterhub
              app.kubernetes.io/managed-by=Helm
              chart=jupyterhub-3.1.0
              component=proxy
              heritage=Helm
              release=binderhub
Annotations:  meta.helm.sh/release-name: binderhub
              meta.helm.sh/release-namespace: gesis
Spec:
  PodSelector:     app=jupyterhub,component=proxy,release=binderhub
  Allowing ingress traffic:
    To Port: http/TCP
    To Port: https/TCP
    From: <any> (traffic not restricted by source)
    ----------
    To Port: http/TCP
    From:
      PodSelector: hub.jupyter.org/network-access-proxy-http=true
    ----------
    To Port: api/TCP
    From:
      PodSelector: hub.jupyter.org/network-access-proxy-api=true
  Allowing egress traffic:
    To Port: 8081/TCP
    To:
      PodSelector: app=jupyterhub,component=hub,release=binderhub
    ----------
    To Port: 8888/TCP
    To:
      PodSelector: app=jupyterhub,component=singleuser-server,release=binderhub
    ----------
    To Port: 53/UDP
    To Port: 53/TCP
    To:
      IPBlock:
        CIDR: 169.254.169.254/32
        Except:
    To:
      NamespaceSelector: kubernetes.io/metadata.name=kube-system
    To:
      IPBlock:
        CIDR: 10.0.0.0/8
        Except:
    To:
      IPBlock:
        CIDR: 172.16.0.0/12
        Except:
    To:
      IPBlock:
        CIDR: 192.168.0.0/16
        Except:
    ----------
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 0.0.0.0/0
        Except: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254/32
    ----------
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 10.0.0.0/8
        Except:
    To:
      IPBlock:
        CIDR: 172.16.0.0/12
        Except:
    To:
      IPBlock:
        CIDR: 192.168.0.0/16
        Except:
    ----------
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 169.254.169.254/32
        Except:
  Policy Types: Ingress, Egress

singleuser Network Policy

Name:         singleuser
Namespace:    gesis
Created on:   2023-10-27 12:01:12 +0200 CEST
Labels:       app=jupyterhub
              app.kubernetes.io/managed-by=Helm
              chart=jupyterhub-3.1.0
              component=singleuser
              heritage=Helm
              release=binderhub
Annotations:  meta.helm.sh/release-name: binderhub
              meta.helm.sh/release-namespace: gesis
Spec:
  PodSelector:     app=jupyterhub,component=singleuser-server,release=binderhub
  Allowing ingress traffic:
    To Port: notebook-port/TCP
    From:
      PodSelector: hub.jupyter.org/network-access-singleuser=true
  Allowing egress traffic:
    To Port: 8081/TCP
    To:
      PodSelector: app=jupyterhub,component=hub,release=binderhub
    ----------
    To Port: 8000/TCP
    To:
      PodSelector: app=jupyterhub,component=proxy,release=binderhub
    ----------
    To Port: 8080/TCP
    To Port: 8443/TCP
    To:
      PodSelector: app=jupyterhub,component=autohttps,release=binderhub
    ----------
    To Port: 53/UDP
    To Port: 53/TCP
    To:
      IPBlock:
        CIDR: 169.254.169.254/32
        Except:
    To:
      NamespaceSelector: kubernetes.io/metadata.name=kube-system
    To:
      IPBlock:
        CIDR: 10.0.0.0/8
        Except:
    To:
      IPBlock:
        CIDR: 172.16.0.0/12
        Except:
    To:
      IPBlock:
        CIDR: 192.168.0.0/16
        Except:
    ----------
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 0.0.0.0/0
        Except: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254/32
  Policy Types: Ingress, Egress

@manics
Copy link
Member

manics commented Oct 27, 2023

Is it possible the NetworkPolicy controller doesn't quite implement policies in the way it's meant to? In #2698 I had a lot of problems with the AWS network policy controller, so I ended up overriding the policies after a lot of trial and error. See the networkPolicy.ingress sections in
https://github.com/jupyterhub/mybinder.org-deploy/pull/2698/files#diff-a545d6fc3dead92078cac561cb659146ca961dbc81b295dbec0e2762232cb06d

One method I found useful for debugging was to create a pod.yaml for an image like netshoot, but copying the annotations and labels from one of the Jupyter pods. If you deploy this pod the annotations/labels means it should have the same Network Policy restrictions as the Jupyter pod in question, and so you can then kubectl exec .... to interactively poke around the network from the pod, and kubectl edit the pod labels/annotations and the network policies to figure out where the block is occurring.

E.g.

#   kubectl apply -f pod.yaml 
#   kubectl exec -it host-shell -- bash
---
apiVersion: v1
kind: Pod
metadata:
  name: host-shell
  labels:
    app: jupyterhub
    component: hub
    hub.jupyter.org/network-access-proxy-api: "true"
    hub.jupyter.org/network-access-proxy-http: "true"
    hub.jupyter.org/network-access-singleuser: "true"
    release: curvenote
spec:
  # Uncomment if you need to connect to a specific node
  # nodeSelector:
  #   kubernetes.io/hostname: nodename.k8s.example.org
  containers:
  - name: host-shell
    command:
    - sleep
    args:
    - 1h
    image: docker.io/nicolaka/netshoot:v0.11
    imagePullPolicy: IfNotPresent
    securityContext:
      privileged: true
  restartPolicy: Never
  tolerations:
  - effect: NoSchedule
    key: hub.jupyter.org/dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: hub.jupyter.org_dedicated
    operator: Equal
    value: user

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Oct 27, 2023

Thanks @manics. I follow your suggestion for debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants