Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] - Add kubernetes horizontal autoscaler for conda-store workers based on queue depth #2284

Open
dcmcand opened this issue Feb 29, 2024 · 13 comments · May be fixed by #2384
Open

[ENH] - Add kubernetes horizontal autoscaler for conda-store workers based on queue depth #2284

dcmcand opened this issue Feb 29, 2024 · 13 comments · May be fixed by #2384

Comments

@dcmcand
Copy link
Contributor

dcmcand commented Feb 29, 2024

Feature description

Currently conda-store is set to allow 4 simultaneous builds at once. This is a bottleneck once multiple environments start getting built at once and presents a scaling challenge. If we set the simultaneous builds to 1 and autoscale based on queue depth then we should be able to handle scaling far more gracefully

Value and/or benefit

Having the conda-store workers autoscale based on queue depth will allow larger orgs to take advantage of Nebari without hitting scale bottlenecks.

Anything else?

https://learnk8s.io/scaling-celery-rabbitmq-kubernetes

@pt247
Copy link
Contributor

pt247 commented Mar 9, 2024

Options

We have two options to achieve this:

  1. Horizontal Pod Autoscaler
  2. KEDA (Kubernetes-based Event-driven Autoscaling)

Option#1 Horizontal Pod Autoscaler based on external metrics and a load monitor/watcher.

Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
The sequence of events:

  1. Build watcher queries the conda-store database every 5 seconds and populates the total number of queued workers.
  2. Horizontal autoscaler takes this value as an external metric to scale on:
- type: External
  external:
    metric:
      name: queue_messages_ready
      selector:
        matchLabels:
          queue: "worker_tasks"
    target:
      type: AverageValue ## This needs to change accordingly. 
      averageValue: 0
  1. Horizontal Pod Autoscaler (HPA) creates new pods according to queued workers.

Option#2 KEDA (Kubernetes-based Event-driven Autoscaling)

Ref:
https://blogs.halodoc.io/autoscaling-k8s-deployments-with-external-metrics/
https://keda.sh/docs/2.13/scalers/
https://keda.sh/docs/2.13/concepts/external-scalers/
https://keda.sh/docs/2.13/scalers/rabbitmq-queue/
https://keda.sh/docs/2.13/scalers/redis-cluster-lists/
https://keda.sh/docs/2.13/scalers/redis-lists/
https://keda.sh/docs/2.13/scalers/postgresql/

The PGSql scaler allows us to run a query on a database. Which means we can simply point it towards the existing conda-store database to get the queue depth of pending jobs.

Pros and Cons

Option#1

  • Components added
    1. Metrics Server
    2. Horizontal Pod Autoscaler
    3. A Queue like RabbitMQ
    4. Custom service to manage the Queue
  • Pros
    1. Light weight
    2. Components like RabbitMQ and Metrics Server can be re-used if needed.
  • Cons
    1. Requires managing the queue and queue-manager service.
    2. Its a bit hacky solution

Option#2

  • Components added (ref)
    1. Metric adaptor
    2. Contoller
    3. Scaler
    4. Addmission webhooks
  • Pros
    1. Single purpose but elegant solution based on HPA.
    2. No customization needed, its full featured and provides an extendable options of scheduling scalers.
    3. Same machinesum can be reused to scale other services in future.
  • Cons
    1. The only Metrics adoptor can be re-used by other services.

@pt247
Copy link
Contributor

pt247 commented Mar 9, 2024

Should this be part of conda-store?

Regardless of the option we take, this can be moved upstream to conda-store.

  • Points in favour of moving this to conda-store
    1. It solves a conda-store problem and touches only conda-store components.
    2. Moving it to conda-store can make it available for all the other conda-store
  • Points in against of moving this to conda-store
    1. Since conda-store is a core component of Nebari, we can rely on it being there. Therefore we can reuse KEDA to scale other pods as and when needed. This will become an issue in the highly unlikely event if we decide to move away from conda-store.
    2. We will need to figure out if this is in line with long term road-map of conda-store.

@pt247
Copy link
Contributor

pt247 commented Mar 9, 2024

We should agree on these before we start. Please suggest. Thanks.

@dcmcand
Copy link
Contributor Author

dcmcand commented Mar 12, 2024

@pt247 Conda store already has a queue, it is using redis and celery. I expect we can pull queue depth from that, so we shouldn't need to deploy extra infra there. The nebari-conda-store-redis-master stateful set is what you are looking for.

I am unfamiliar with KEDA, but it does look promising and has a redis scaler too. In general I prefer to use built in solutions as my default, so the horizontal autoscaler was my first thought, but if KEDA allows for better results with less complexity then I can see going with that. KEDA is a cncf project that seems to be actively maintained, so that is good.

As to whether this solution belongs in conda-store, I will simply say, it does not. Conda-store allows for horizontal scaling by having a queue with a worker pool. That is where conda-store's responsibility ends. Building specific implementation details for scaling on Nebari into conda-store would cross software boundaries and greatly increase coupling between the projects. That would be moving in the wrong direction. We want to decrease coupling between conda-store and Nebari. conda-store has a method for scaling horizontally, it is on Nebari to implement autoscaling that fits its particular environment.

@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented Mar 12, 2024

I bet conda store devs would have comments on this, and it would be implemented in Conda store. It seems like this issue should transferred to the conda store repo to improve visibility with conda store devs.

@viniciusdc
Copy link
Contributor

viniciusdc commented Mar 21, 2024

We want to decrease coupling between conda-store and Nebari. conda-store has a method for scaling horizontally, it is on Nebari to implement autoscaling that fits its particular environment.

I also agree that the conda-store already has a sound scaling system; however, we are not using this on our own deployment. Having multiple celery workers is already supported (as both Redis and Celery handle the task load balancing by themselves); we need to discuss how to handle the worker scaling on our Kubernetes infrastructure.

It's a manual process that depends on creating more workers. We need a way to automate this process. I initially suggested using the queue depth on Redis to manage this, which would trigger a CRD to change the number of replicas the worker deployment should have.

@dcmcand
Copy link
Contributor Author

dcmcand commented Mar 21, 2024

Either KEDA or the horizontal autoscaler would work here and both can be used to scale automatically using the queue depth. I think that KEDA seems a bit more elegant with its implementation so would suggest using that to start to see if it works and if for some reason it doesn't, then falling back to the horizontal autoscaler.

@pt247
Copy link
Contributor

pt247 commented Apr 5, 2024

Notes on POC

Installing KEDA:

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace dev

Scaled job spec:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scaled-conda-worker
  namespace: dev
spec:
  scaleTargetRef:
    kind:          Deployment                               # Optional. Default: Deployment
    name:          nebari-conda-store-worker  # Mandatory. Must be in the same namespace as the ScaledObject
  triggers:
  - type: postgresql
    metadata:
      query: "SELECT COUNT(*) FROM build WHERE status='BUILDING' OR status='QUEUED';"
      targetQueryValue: "0"
      activationTargetQueryValue: "1"
      host: "nebari-conda-store-postgresql"
      userName: "postgres"
      password: "{nebari-conda-store-postgresql}"
      port: "5432"
      dbName: "conda-store"
      sslmode: disable

I have also tried this:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scaled-conda-worker
  namespace: dev
spec:
  scaleTargetRef:
    kind:          Deployment                               # Optional. Default: Deployment
    name:          nebari-conda-store-worker  # Mandatory. Must be in the same namespace as the ScaledObject
  triggers:
  - type: postgresql
    metadata:
      query: "SELECT COUNT(*) FROM build WHERE status='BUILDING' OR status='QUEUED';"
      targetQueryValue: "0"
      activationTargetQueryValue: "1"
      host: "nebari-conda-store-postgresql.dev.svc.cluster.local"
      passwordFromEnv: PG_PASSWORD
      userName: "postgres"
      port: "5432"
      dbName: "conda-store"
      sslmode: disable

@pt247
Copy link
Contributor

pt247 commented Apr 5, 2024

I am getting the following error:

│ 2024-04-05T18:44:42Z    ERROR    Reconciler error    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind" │
│ : "ScaledObject", "ScaledObject": {"name":"scaled-conda-worker","namespace":"dev"}, "namespace": "dev", "name": "scaled-conda-work │
│ er", "reconcileID": "17f8e76e-7f9d-4e9e-90e4-77dde8a455d4", "error": "error establishing postgreSQL connection: failed to connect  │
│ to `host=nebari-conda-store-postgresql.dev.svc.cluster.local user=postgres database=conda-store`: server error (FATAL: password au │
│ thentication failed for user \"postgres\" (SQLSTATE 28P01))"}    

@viniciusdc
Copy link
Contributor

viniciusdc commented Apr 5, 2024

Uhm, this is strange behavior; I think something might be missing... I will try to reproduce this on my side as well.

@pt247
Copy link
Contributor

pt247 commented Apr 7, 2024

I have also tried TriggerAuthentication:

apiVersion: v1
kind: Secret
metadata:
  name: conda-pg-credentials
  namespace: dev
type: Opaque
data:
  PG_PASSWORD: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-conda-secret
  namespace: dev
spec:
  secretTargetRef:
  - parameter: password
    name: conda-pg-credentials
    key: PG_PASSWORD
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scaled-conda-worker
  namespace: dev
spec:
  scaleTargetRef:
    kind:          Deployment                 # Optional. Default: Deployment
    name:          nebari-conda-store-worker  # Mandatory. Must be in the same namespace as the ScaledObject
  triggers:
  - type: postgresql
    metadata:
      query: "SELECT COUNT(*) FROM build WHERE status='BUILDING' OR status='QUEUED';"
      targetQueryValue: "0"
      activationTargetQueryValue: "1"
      host: "nebari-conda-store-postgresql"
      userName: "postgres"
      port: "5432"
      dbName: "conda-store"
      sslmode: disable
    authenticationRef:
      name: keda-trigger-auth-conda-secret

@pt247
Copy link
Contributor

pt247 commented Apr 7, 2024

This worked:

It turns out that the secrets need to be base encoded.

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-postgres
  namespace: dev
spec:
  secretTargetRef:
  - parameter: password
    name: nebari-conda-store-postgresql
    key: postgresql-password
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scaled-conda-worker
  namespace: dev
spec:
  scaleTargetRef:
    kind: Deployment
    name: nebari-conda-store-worker
  triggers:
  - type: postgresql
    metadata:
      query: "SELECT COUNT(*) FROM build WHERE status='BUILDING' OR status='QUEUED';"
      targetQueryValue: "1"
      host: "nebari-conda-store-postgresql"
      userName: "postgres"
      port: "5432"
      dbName: "conda-store"
      sslmode: disable
    authenticationRef:
      name: trigger-auth-postgres

@pt247 pt247 linked a pull request Apr 8, 2024 that will close this issue
10 tasks
@pt247
Copy link
Contributor

pt247 commented Apr 9, 2024

Performance imporvements

We try and create 5 conda environments the fifth environment we add sciket-learn.

Current develop branch

Time: 5 minutes 11 seconds
Number of conda-store workers: 1

Default KEDA

Time: 4 minutes 29 seconds
Number of conda-store workers scaled to: 2

With min replica count set to 1 default is 0

Time: 2 minutes 35 seconds
Number of conda-store workers scaled to: 2

With min replica count set to 1 default is 0 + Pooling interval of 15 seconds (default is 30 seconds)

Time: 4 minutes 14 seconds

pollingInterval: 5 and minimum replica count: 1 we track state building as well

  minReplicaCount: 1   # Default: 0
  pollingInterval: 5   # Default:  30 seconds
  cooldownPeriod: 60  
 time taken: 3:40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In review/QA 👀
Development

Successfully merging a pull request may close this issue.

4 participants