Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing nodegroup instance types and resource requests across Cloud providers #142

Open
scottyhq opened this issue Apr 20, 2020 · 0 comments

Comments

@scottyhq
Copy link
Member

As discussed #141, there are a number of things to consider with nodegroup instance types that are running various binderhub components. For starters: cost/hour, CPU, RAM, max number of pods.

I don't think there is a 1:1 mapping of instance specs across cloud providers, but it would be good to check in on these to simplify common config and resource requests.

Here is as simplified table for current AWS config (https://github.com/pangeo-data/pangeo-binder/tree/staging/k8s-aws):

nodegroup min size max size desired capacity instance type vCPU RAM (Gb) max pods
core 1 1 1 t3.xlarge 4 16 58
scheduler 0 20 0 t3.medium 2 4 17
user 0 10 0 m5.2xlarge 8 32 58
worker 0 10 0 r5.2xlarge 8 64 58

@TomAugspurger @jhamman - what does this table look like for GCP?

For starters, it seems we can converge on the core node requests and limits for binderhub:

Non-terminated Pods:         (30 in total)
  Namespace                  Name                                                      CPU Requests  CPU Limits   Memory Requests  Memory Limits  AGE
  ---------                  ----                                                      ------------  ----------   ---------------  -------------  ---
  cert-manager               cert-manager-6b78b7c997-r5lpd                             10m (0%)      0 (0%)       32Mi (0%)        0 (0%)         23m
  cert-manager               cert-manager-cainjector-54c4796c5d-zrpkk                  0 (0%)        0 (0%)       0 (0%)           0 (0%)         23m
  cert-manager               cert-manager-webhook-77ccf5c8b4-n4g9c                     0 (0%)        0 (0%)       0 (0%)           0 (0%)         23m
  kube-system                aws-node-cml2p                                            10m (0%)      0 (0%)       0 (0%)           0 (0%)         24m
  kube-system                aws-node-termination-handler-bp582                        50m (2%)      100m (5%)    64Mi (0%)        128Mi (1%)     24m
  kube-system                cluster-autoscaler-78fb96cfd5-6cghx                       100m (5%)     100m (5%)    300Mi (3%)       300Mi (3%)     23m
  kube-system                coredns-86d5cbb4bd-5pzb7                                  100m (5%)     0 (0%)       70Mi (0%)        170Mi (2%)     23m
  kube-system                coredns-86d5cbb4bd-f4cb5                                  100m (5%)     0 (0%)       70Mi (0%)        170Mi (2%)     23m
  kube-system                kube-proxy-zdcpj                                          100m (5%)     0 (0%)       0 (0%)           0 (0%)         24m
  prod                       api-prod-dask-gateway-8f6b4896d-nhjwh                     0 (0%)        0 (0%)       0 (0%)           0 (0%)         23m
  prod                       binder-6f55f6c6c6-s9lwd                                   200m (10%)    0 (0%)       2Gi (26%)        0 (0%)         20m
  prod                       controller-prod-dask-gateway-64ff4648c-9tln7              0 (0%)        0 (0%)       0 (0%)           0 (0%)         20m
  prod                       hub-545cfd67c8-r5hm7                                      200m (10%)    1250m (62%)  1Gi (13%)        3Gi (39%)      20m
  prod                       prod-kube-lego-5dffd7fdc-mqjqh                            0 (0%)        0 (0%)       0 (0%)           0 (0%)         20m
  prod                       prod-nginx-ingress-controller-8d4c768fd-zqq9l             250m (12%)    500m (25%)   240Mi (3%)       240Mi (3%)     20m
  prod                       prod-nginx-ingress-default-backend-fc4bb9c47-dx5z7        0 (0%)        0 (0%)       0 (0%)           0 (0%)         20m
  prod                       proxy-745d9f9645-lclss                                    250m (12%)    500m (25%)   256Mi (3%)       512Mi (6%)     20m
  prod                       traefik-prod-dask-gateway-7b8ddc7cf-2z7cp                 0 (0%)        0 (0%)       0 (0%)           0 (0%)         20m
  prod                       user-scheduler-d8dbb46c8-mmm2z                            50m (2%)      0 (0%)       256Mi (3%)       0 (0%)         20m
  staging                    api-staging-dask-gateway-76c6bd8848-w2q9k                 0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    binder-6b757d85d8-jw2kv                                   50m (2%)      0 (0%)       100Mi (1%)       0 (0%)         2m2s
  staging                    controller-staging-dask-gateway-76b4d4dbc8-ndshx          0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    hub-6c5c76dc9f-448fs                                      50m (2%)      1250m (62%)  100Mi (1%)       1Gi (13%)      2m2s
  staging                    proxy-6fb5f57d5c-4fqzk                                    250m (12%)    500m (25%)   256Mi (3%)       512Mi (6%)     2m2s
  staging                    staging-dind-c7tvn                                        0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    staging-image-cleaner-27kjv                               0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    staging-kube-lego-5f847b9787-dfqbs                        0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    staging-nginx-ingress-default-backend-78c77d877f-gsntx    0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    traefik-staging-dask-gateway-6c7dd5645c-7fbql             0 (0%)        0 (0%)       0 (0%)           0 (0%)         2m2s
  staging                    user-scheduler-66fc777965-mrb9r                           50m (2%)      0 (0%)       256Mi (3%)       0 (0%)         2m2s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         1820m (91%)   4200m (210%)
  memory                      5072Mi (65%)  6128Mi (78%)
  ephemeral-storage           0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant