Feature Request: tolerations and nodeAffinity #221

eljeffeg · 2024-05-02T18:22:39Z

Is your feature request related to a problem? Please describe.
I've been given a dedicated node cluster with sufficient resources for AssemblyLine, but I need to define it using the tolerations and nodeAffinity. Currently, core components allow you to define the nodeAffinity, but not tolerations. Services don't allow you to define either nodeAffinity or tolerations as it's managed by the Scaler.

Describe the solution you'd like
I'd like to be able to define the tolerations and nodeAffinity for core and services in the Helm file.

Describe alternatives you've considered
I've made the changes to allow core to use the tolerations, but it doesn't apply to services. When services spin up, they run out of resources. It seems the service change is currently beyond the helm charts.

cccs-rs · 2024-05-02T18:49:19Z

Would the scaler.linux_node_selector configuration work in this case as you would just need to specify the selector (whether it be based on fields or labels) that correspond to the node?

https://cybercentrecanada.github.io/assemblyline4_docs/odm/models/config/#scaler

eljeffeg · 2024-05-02T19:45:28Z

From what I can tell, Field and Label selectors cannot be used to directly dictate scheduling behaviors such as tolerations or node affinity.

cccs-rs · 2024-05-16T15:04:00Z

Looks like the configuration for the linux_node_selector should translate to a nodeAffinity for the service pods:
https://github.com/CybercentreCanada/assemblyline-core/blob/14bfcd276ac904882724c4eb9075fe0350cf9f3f/assemblyline_core/scaler/controllers/kubernetes_ctl.py#L140

So we would just need to add a configuration for tolerations.

Add configuration for setting tolerations on service pods

cccs-rs · 2024-05-17T12:19:02Z

You can test with this development release to make sure you can configure the system as necessary when we merge this into stable (along with the helm-chart changes).

eljeffeg · 2024-05-22T17:38:31Z

@cccs-rs I pulled the latest dev images and updated my helm charts, but wasn't able to get the nodeAffinity and tolerations on Services. They did show up on all the core services. Here are my respective value.yaml config.

# An affinity to be applied to all core (non-service) pods not provided by imported charts.
# https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#NodeAffinity
nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
      - key: dedicated
        operator: In
        values:
        - assemblyline

# The tolerations to be applied to all core (non-service) pods not provided by imported charts.
# https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#tolerations
tolerations:
  - effect: NoSchedule
    key: dedicated
    operator: Equal
    value: assemblyline
    
configuration:
  core:
    scaler:
      linux_node_selector:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: dedicated
              operator: In
              values:
              - assemblyline

cccs-rs · 2024-05-27T13:05:02Z

configuration:
  core:
    scaler:
      linux_node_selector:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: dedicated
              operator: In
              values:
              - assemblyline

This would have to be updated to (based on docs):

configuration:
  core:
    scaler:
      linux_node_selector:
        label:
          - key: dedicated
            operator: In
            values: 
              - assemblyline

eljeffeg · 2024-05-27T13:40:54Z

Didn't seem to help.

cccs-rs · 2024-05-27T14:09:30Z

Hmm... did Scaler read in the new configurations? Can confirm by checking the config.yml that's mounted

or:
python -c "from assemblyline.common import forge; print(forge.get_config().core.scaler.linux_node_selector);"

eljeffeg · 2024-05-27T14:48:02Z

assemblyline@scaler-665f754877-p2nlr:~$ python -c "from assemblyline.common import forge; print(forge.get_config().core.scaler.linux_node_selector);"
<Selector {"field": [], "label": [{"key": "dedicated", "operator": "In", "values": ["assemblyline"]}]}>

I tested by deleting the scaler pod so that it was recreated with the latest. Then I deleted a service pod and inspected the toleration values after it was recreated.

cccs-rs · 2024-05-27T15:45:34Z

To set the tolerations for services, then you'll need to configure core.scaler.service_defaults.tolerations per https://github.com/CybercentreCanada/assemblyline-base/pull/1676/files.

The configuration mentioned earlier should only set the affinity in the podSpec.

eljeffeg · 2024-05-28T12:36:33Z

Sorry, not seeing the tolerations or the nodeAffinity in the service pods.

scaler:
      linux_node_selector:
        label:
          - key: dedicated
            operator: In
            values:
              - assemblyline
      cluster_pod_list: true
      cpu_overallocation: 2
      service_defaults:
        backlog: 10
        min_instances: 0
        growth: 30
        shrink: 10
        environment:
          - name: "SERVICE_API_HOST"
            value: "http://service-server:5003"
        tolerations:
          - effect: NoSchedule
            key: dedicated
            operator: Equal
            value: assemblyline

Add configuration for setting tolerations on service pods

Apply configured tolerations to service pods

Add tolerations configuration for core components

cccs-rs · 2024-05-29T16:54:30Z

This should be featured in the 4.5.0.28 release.

eljeffeg added assess We still haven't decided if this will be worked on or not enhancement New feature or request labels May 2, 2024

cccs-rs assigned cccs-rs and cccs-douglass May 2, 2024

cccs-rs added base core labels May 2, 2024

cccs-rs added accepted This issue was accepted, we will work on this at some point In progress and removed assess We still haven't decided if this will be worked on or not labels May 15, 2024

cccs-rs unassigned cccs-douglass May 15, 2024

cccs-rs added a commit to CybercentreCanada/assemblyline-base that referenced this issue May 17, 2024

Merge pull request #1677 from CybercentreCanada/assemblyline/issues/221

c4672e8

Add configuration for setting tolerations on service pods

cccs-rs added a commit to CybercentreCanada/assemblyline-base that referenced this issue May 29, 2024

Merge pull request #1676 from CybercentreCanada/assemblyline/issues/221

dcfdd6a

Add configuration for setting tolerations on service pods

cccs-rs added a commit to CybercentreCanada/assemblyline-core that referenced this issue May 29, 2024

Merge pull request #954 from CybercentreCanada/assemblyline/issues/221

1460dd5

Apply configured tolerations to service pods

cccs-rs added a commit to CybercentreCanada/assemblyline-helm-chart that referenced this issue May 29, 2024

Merge pull request #112 from CybercentreCanada/assemblyline/issues/221

37c3669

Add tolerations configuration for core components

cccs-rs closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: tolerations and nodeAffinity #221

Feature Request: tolerations and nodeAffinity #221

eljeffeg commented May 2, 2024 •

edited

cccs-rs commented May 2, 2024

eljeffeg commented May 2, 2024 •

edited

cccs-rs commented May 16, 2024

cccs-rs commented May 17, 2024 •

edited

eljeffeg commented May 22, 2024

cccs-rs commented May 27, 2024

eljeffeg commented May 27, 2024

cccs-rs commented May 27, 2024 •

edited

eljeffeg commented May 27, 2024 •

edited

cccs-rs commented May 27, 2024

eljeffeg commented May 28, 2024 •

edited

cccs-rs commented May 29, 2024

Feature Request: tolerations and nodeAffinity #221

Feature Request: tolerations and nodeAffinity #221

Comments

eljeffeg commented May 2, 2024 • edited

cccs-rs commented May 2, 2024

eljeffeg commented May 2, 2024 • edited

cccs-rs commented May 16, 2024

cccs-rs commented May 17, 2024 • edited

eljeffeg commented May 22, 2024

cccs-rs commented May 27, 2024

eljeffeg commented May 27, 2024

cccs-rs commented May 27, 2024 • edited

eljeffeg commented May 27, 2024 • edited

cccs-rs commented May 27, 2024

eljeffeg commented May 28, 2024 • edited

cccs-rs commented May 29, 2024

eljeffeg commented May 2, 2024 •

edited

eljeffeg commented May 2, 2024 •

edited

cccs-rs commented May 17, 2024 •

edited

cccs-rs commented May 27, 2024 •

edited

eljeffeg commented May 27, 2024 •

edited

eljeffeg commented May 28, 2024 •

edited