Optimizing nodegroup instance types and resource requests across Cloud providers #142

scottyhq · 2020-04-20T17:24:16Z

As discussed #141, there are a number of things to consider with nodegroup instance types that are running various binderhub components. For starters: cost/hour, CPU, RAM, max number of pods.

I don't think there is a 1:1 mapping of instance specs across cloud providers, but it would be good to check in on these to simplify common config and resource requests.

Here is as simplified table for current AWS config (https://github.com/pangeo-data/pangeo-binder/tree/staging/k8s-aws):

nodegroup	min size	max size	desired capacity	instance type	vCPU	RAM (Gb)	max pods
core	1	1	1	t3.xlarge	4	16	58
scheduler	0	20	0	t3.medium	2	4	17
user	0	10	0	m5.2xlarge	8	32	58
worker	0	10	0	r5.2xlarge	8	64	58

@TomAugspurger @jhamman - what does this table look like for GCP?

For starters, it seems we can converge on the core node requests and limits for binderhub:

Non-terminated Pods:         (30 in total)
  Namespace                  Name                                                      CPU Requests  CPU Limits   Memory Requests  Memory Limits  AGE
  ---------                  ----                                                      ------------  ----------   ---------------  -------------  ---
  cert-manager               cert-manager-6b78b7c997-r5lpd                             10m (0%)      0 (0%)       32Mi (0%)        0 (0%)         23m
  cert-manager               cert-manager-cainjector-54c4796c5d-zrpkk                  0 (0%)        0 (0%)       0 (0%)           0 (0%)         23m
  cert-manager               cert-manager-webhook-77ccf5c8b4-n4g9c                     0 (0%)        0 (0%)       0 (0%)           0 (0%)         23m
  kube-system                aws-node-cml2p                                            10m (0%)      0 (0%)       0 (0%)           0 (0%)         24m
  kube-system                aws-node-termination-handler-bp582                        50m (2%)      100m (5%)    64Mi (0%)        128Mi (1%)     24m
  kube-system                cluster-autoscaler-78fb96cfd5-6cghx                       100m (5%)     100m (5%)    300Mi (3%)       300Mi (3%)     23m
  kube-system                coredns-86d5cbb4bd-5pzb7                                  100m (5%)     0 (0%)       70Mi (0%)        170Mi (2%)     23m
  kube-system                coredns-86d5cbb4bd-f4cb5                                  100m (5%)     0 (0%)       70Mi (0%)        170Mi (2%)     23m
  kube-system                kube-proxy-zdcpj                                          100m (5%)     0 (0%)       0 (0%)           0 (0%)         24m
  prod                       api-prod-dask-gateway-8f6b4896d-nhjwh                     0 (0%)        0 (0%)       0 (0%)           0 (0%)         23m
  prod                       binder-6f55f6c6c6-s9lwd                                   200m (10%)    0 (0%)       2Gi (26%)        0 (0%)         20m
  prod                       controller-prod-dask-gateway-64ff4648c-9tln7              0 (0%)        0 (0%)       0 (0%)           0 (0%)         20m
  prod                       hub-545cfd67c8-r5hm7                                      200m (10%)    1250m (62%)  1Gi (13%)        3Gi (39%)      20m
  prod                       prod-kube-lego-5dffd7fdc-mqjqh                            0 (0%)        0 (0%)       0 (0%)           0 (0%)         20m
  prod                       prod-nginx-ingress-controller-8d4c768fd-zqq9l             250m (12%)    500m (25%)   240Mi (3%)       240Mi (3%)     20m
  prod                       prod-nginx-ingress-default-backend-fc4bb9c47-dx5z7        0 (0%)        0 (0%)       0 (0%)           0 (0%)         20m
  prod                       proxy-745d9f9645-lclss                                    250m (12%)    500m (25%)   256Mi (3%)       512Mi (6%)     20m
  prod                       traefik-prod-dask-gateway-7b8ddc7cf-2z7cp                 0 (0%)        0 (0%)       0 (0%)           0 (0%)         20m
  prod                       user-scheduler-d8dbb46c8-mmm2z                            50m (2%)      0 (0%)       256Mi (3%)       0 (0%)         20m
  staging                    api-staging-dask-gateway-76c6bd8848-w2q9k                 0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    binder-6b757d85d8-jw2kv                                   50m (2%)      0 (0%)       100Mi (1%)       0 (0%)         2m2s
  staging                    controller-staging-dask-gateway-76b4d4dbc8-ndshx          0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    hub-6c5c76dc9f-448fs                                      50m (2%)      1250m (62%)  100Mi (1%)       1Gi (13%)      2m2s
  staging                    proxy-6fb5f57d5c-4fqzk                                    250m (12%)    500m (25%)   256Mi (3%)       512Mi (6%)     2m2s
  staging                    staging-dind-c7tvn                                        0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    staging-image-cleaner-27kjv                               0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    staging-kube-lego-5f847b9787-dfqbs                        0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    staging-nginx-ingress-default-backend-78c77d877f-gsntx    0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    traefik-staging-dask-gateway-6c7dd5645c-7fbql             0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    user-scheduler-66fc777965-mrb9r                           50m (2%)      0 (0%)       256Mi (3%)       0 (0%)         2m2s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         1820m (91%)   4200m (210%)
  memory                      5072Mi (65%)  6128Mi (78%)
  ephemeral-storage           0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing nodegroup instance types and resource requests across Cloud providers #142

Optimizing nodegroup instance types and resource requests across Cloud providers #142

scottyhq commented Apr 20, 2020

Optimizing nodegroup instance types and resource requests across Cloud providers #142

Optimizing nodegroup instance types and resource requests across Cloud providers #142

Comments

scottyhq commented Apr 20, 2020