-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: tolerations and nodeAffinity #221
Comments
Would the https://cybercentrecanada.github.io/assemblyline4_docs/odm/models/config/#scaler |
From what I can tell, Field and Label selectors cannot be used to directly dictate scheduling behaviors such as tolerations or node affinity. |
Looks like the configuration for the So we would just need to add a configuration for tolerations. |
Add configuration for setting tolerations on service pods
You can test with this development release to make sure you can configure the system as necessary when we merge this into stable (along with the helm-chart changes). |
@cccs-rs I pulled the latest dev images and updated my helm charts, but wasn't able to get the nodeAffinity and tolerations on Services. They did show up on all the core services. Here are my respective value.yaml config. # An affinity to be applied to all core (non-service) pods not provided by imported charts.
# https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#NodeAffinity
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values:
- assemblyline
# The tolerations to be applied to all core (non-service) pods not provided by imported charts.
# https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#tolerations
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: assemblyline
configuration:
core:
scaler:
linux_node_selector:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values:
- assemblyline |
This would have to be updated to (based on docs): configuration:
core:
scaler:
linux_node_selector:
label:
- key: dedicated
operator: In
values:
- assemblyline |
Hmm... did Scaler read in the new configurations? Can confirm by checking the or: |
assemblyline@scaler-665f754877-p2nlr:~$ python -c "from assemblyline.common import forge; print(forge.get_config().core.scaler.linux_node_selector);" I tested by deleting the scaler pod so that it was recreated with the latest. Then I deleted a service pod and inspected the toleration values after it was recreated. |
To set the tolerations for services, then you'll need to configure The configuration mentioned earlier should only set the affinity in the podSpec. |
Sorry, not seeing the tolerations or the nodeAffinity in the service pods. scaler:
linux_node_selector:
label:
- key: dedicated
operator: In
values:
- assemblyline
cluster_pod_list: true
cpu_overallocation: 2
service_defaults:
backlog: 10
min_instances: 0
growth: 30
shrink: 10
environment:
- name: "SERVICE_API_HOST"
value: "http://service-server:5003"
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: assemblyline |
Add configuration for setting tolerations on service pods
Apply configured tolerations to service pods
Add tolerations configuration for core components
This should be featured in the 4.5.0.28 release. |
Is your feature request related to a problem? Please describe.
I've been given a dedicated node cluster with sufficient resources for AssemblyLine, but I need to define it using the tolerations and nodeAffinity. Currently, core components allow you to define the nodeAffinity, but not tolerations. Services don't allow you to define either nodeAffinity or tolerations as it's managed by the Scaler.
Describe the solution you'd like
I'd like to be able to define the tolerations and nodeAffinity for core and services in the Helm file.
Describe alternatives you've considered
I've made the changes to allow core to use the tolerations, but it doesn't apply to services. When services spin up, they run out of resources. It seems the service change is currently beyond the helm charts.
The text was updated successfully, but these errors were encountered: