[RFE] Implement pod prioritization #977

bdunne · 2023-06-26T16:14:07Z

We have issues with pods being killed and rescheduled in busier environments. Unfortunately postgres is just as likely to be killed as any other worker pods. After a discussion with @Fryguy and @jrafanie we think the design should be as follows:

Add RBAC permissions for the operator to read, list and write priorityClassNames
Add 3 items to the CRD for high, medium and low priorityClassName values
Assign class name values as follows:
- If all values are specified in CR, use them
- If no values are set, detect the cluster default. Set low to cluster default, medium = low + 100, high = medium + 100
Validate that values are reasonable:
- High should not be more than 1,000,000,000 (use CRD JSON schema validation)
- Error if high, medium & low are out of order (code validation)
- Warn if low is less than cluster default? Warn if low is less than 0? (code validation)
Assign pod priorities:
- High: postgres, memcached, kafka, httpd
- Medium: UI & API, orchestrator, maybe operators if possible (may not work if the class names don't exist yet)
- Low: all other workers

The text was updated successfully, but these errors were encountered:

bdunne · 2023-06-26T16:14:34Z

@Fryguy @jrafanie throw 🍅 🍅

Fryguy · 2023-06-26T17:06:52Z

High should not be more than 1,000,000,000 (use CRD JSON schema validation)

Good call. This keeps us under openshift defaults for critical values

$ oc get priorityclasses
NAME                      VALUE        GLOBAL-DEFAULT   AGE
openshift-user-critical   1000000000   false            89d
system-cluster-critical   2000000000   false            89d
system-node-critical      2000001000   false            89d

miq-bot · 2023-10-02T00:00:05Z

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot · 2024-01-08T00:00:20Z

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

bdunne added the enhancement label Jun 26, 2023

bdunne self-assigned this Jun 26, 2023

Fryguy added this to the Quinteros milestone Jun 26, 2023

Fryguy added this to In progress in Roadmap Jun 26, 2023

bdunne linked a pull request Jun 26, 2023 that will close this issue

[WIP] Pod Prioritization #978

Draft

miq-bot added the stale label Oct 2, 2023

Fryguy removed this from the Quinteros milestone Mar 8, 2024

Fryguy moved this from In progress to To do in Roadmap Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] Implement pod prioritization #977

[RFE] Implement pod prioritization #977

bdunne commented Jun 26, 2023

bdunne commented Jun 26, 2023

Fryguy commented Jun 26, 2023

miq-bot commented Oct 2, 2023

miq-bot commented Jan 8, 2024

[RFE] Implement pod prioritization #977

[RFE] Implement pod prioritization #977

Comments

bdunne commented Jun 26, 2023

bdunne commented Jun 26, 2023

Fryguy commented Jun 26, 2023

miq-bot commented Oct 2, 2023

miq-bot commented Jan 8, 2024