You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have issues with pods being killed and rescheduled in busier environments. Unfortunately postgres is just as likely to be killed as any other worker pods. After a discussion with @Fryguy and @jrafanie we think the design should be as follows:
Add RBAC permissions for the operator to read, list and write priorityClassNames
Add 3 items to the CRD for high, medium and low priorityClassName values
Assign class name values as follows:
If all values are specified in CR, use them
If no values are set, detect the cluster default. Set low to cluster default, medium = low + 100, high = medium + 100
Validate that values are reasonable:
High should not be more than 1,000,000,000 (use CRD JSON schema validation)
Error if high, medium & low are out of order (code validation)
Warn if low is less than cluster default? Warn if low is less than 0? (code validation)
Assign pod priorities:
High: postgres, memcached, kafka, httpd
Medium: UI & API, orchestrator, maybe operators if possible (may not work if the class names don't exist yet)
Low: all other workers
The text was updated successfully, but these errors were encountered:
High should not be more than 1,000,000,000 (use CRD JSON schema validation)
Good call. This keeps us under openshift defaults for critical values
$ oc get priorityclasses
NAME VALUE GLOBAL-DEFAULT AGE
openshift-user-critical 1000000000 false 89d
system-cluster-critical 2000000000 false 89d
system-node-critical 2000001000 false 89d
This issue has been automatically marked as stale because it has not been updated for at least 3 months.
If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.
This issue has been automatically marked as stale because it has not been updated for at least 3 months.
If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.
We have issues with pods being killed and rescheduled in busier environments. Unfortunately postgres is just as likely to be killed as any other worker pods. After a discussion with @Fryguy and @jrafanie we think the design should be as follows:
priorityClassName
sThe text was updated successfully, but these errors were encountered: