Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update securityContext configs to support Openshift cluster. #1658

Closed
wants to merge 1 commit into from

Conversation

linhlam-kc
Copy link
Contributor

@linhlam-kc linhlam-kc commented Aug 24, 2022

What does this PR change?

This PR is to update cost-analyzer helmchart to support Openshift. These are required changes:

  • Update Security context in Values.yaml of following sub charts to allow Openshift assigns random UIDs:
  • Update cost-analyzer-deployment-template to use random UID when Kubecost is installed on Openshift
  • Add additional specific Values file for Openshift name values-openshift.yaml

Does this PR rely on any other PRs?

N/A

How does this PR impact users? (This is the kind of thing that goes in release notes!)

Links to Issues or ZD tickets this PR addresses or fixes

N/A

How was this PR tested?

Have you made an update to documentation?

  • Yes and it need to update Helm chart location once this PR is merged.

┆Issue is synchronized with this Jira Task by Unito

@mbolt35
Copy link
Contributor

mbolt35 commented Aug 24, 2022

@linhlam-kc This looks good... Since we're removing the defaults from the prometheus/values.yaml and thanos/values.yaml, can we add those defaults back in cost-analyzer/values.yaml and cost-analyzer/values-thanos.yaml? I think we want to make sure that any existing customers that run upgrades don't run into conflicts.

@linhlam-kc
Copy link
Contributor Author

linhlam-kc commented Aug 24, 2022

@linhlam-kc This looks good... Since we're removing the defaults from the prometheus/values.yaml and thanos/values.yaml, can we add those defaults back in cost-analyzer/values.yaml and cost-analyzer/values-thanos.yaml? I think we want to make sure that any existing customers that run upgrades don't run into conflicts.

@mbolt35 Thank you so much for helping to review. I have tested to add those defaults back to values.yaml and values-thanos.yaml . However, doing that prevents values-openshift.yaml overwrite securitycontext with null value. It is similar to when we have those defaults in prometheus/values.yaml and thanos/values.yaml. From my testing, the configuration for Openshift is also working for Amazon EKS. How about we add those defaults back to values.yaml and values-thanos.yaml but commend them out, so customer can also enable them easily if they want to. For example:

  alertmanager:
    enabled: false
    persistentVolume:
      enabled: true
    # securityContext:
    #   runAsUser: 1001
    #   runAsNonRoot: true
    #   runAsGroup: 1001
    #   fsGroup: 1001
  nodeExporter:
    enabled: true
  pushgateway:
    enabled: false
    persistentVolume:
      enabled: true
    # securityContext:
    #   runAsUser: 1001
    #   runAsNonRoot: true

@dwbrown2
Copy link
Contributor

@mbolt35 @linhlam-kc what are suggested next steps on this one? do we want this as part of 1.97?

serviceAccounts:
nodeExporter:
create: false
kube-state-metrics:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bolt suggested doing both on the other PR:
kubeStateMetrics:
enabled: false
kube-state-metrics:
disabled: true

@@ -0,0 +1,36 @@
kubecostProductConfigs:
clusterName: CLUSTER_NAME
projectID: CLUSTER_NAME
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove projectID, it is specific to AWS.

@mbolt35
Copy link
Contributor

mbolt35 commented Aug 31, 2022

@mbolt35 Thank you so much for helping to review. I have tested to add those defaults back to values.yaml and values-thanos.yaml . However, doing that prevents values-openshift.yaml overwrite securitycontext with null value. It is similar to when we have those defaults in prometheus/values.yaml and thanos/values.yaml. From my testing, the configuration for Openshift is also working for Amazon EKS. How about we add those defaults back to values.yaml and values-thanos.yaml but commend them out, so customer can also enable them easily if they want to. For example:

  alertmanager:
    enabled: false
    persistentVolume:
      enabled: true
    # securityContext:
    #   runAsUser: 1001
    #   runAsNonRoot: true
    #   runAsGroup: 1001
    #   fsGroup: 1001
  nodeExporter:
    enabled: true
  pushgateway:
    enabled: false
    persistentVolume:
      enabled: true
    # securityContext:
    #   runAsUser: 1001
    #   runAsNonRoot: true

@linhlam-kc We have to expect our users won't change anything, which will be a concern. I'm not sure why the overwrites aren't working. Just to be clear, I'm not worried for new installs, just upgrades which have those securityContexts already set, then the upgrade reverting them to null. We need to ensure that we don't break existing users upgrades

If I run my helm command with -f <values1.yaml> -f <values2.yaml> then each values file is applied in the order they're written in the command. ie: If values2.yaml in the above example overwrites values from values1.yaml then it will have the values2.yaml values.

So if you were to install with -f values.yaml -f values-thanos.yaml -f values-eks-cost-monitoring.yaml then it would apply the values in order (where values-eks-cost-monitoring.yaml values would be applied last, overwriting the specific defaults in values.yaml and values-thanos.yaml).

Does that make sense? Am I missing something?

@mbolt35
Copy link
Contributor

mbolt35 commented Aug 31, 2022

@mbolt35 @linhlam-kc what are suggested next steps on this one? do we want this as part of 1.97?

@dwbrown2 I'll keep an eye on this ticket - I assume this is something that we can get in the release as long as we can button up any of the remaining discrepancies. I just want to ensure that we won't break existing users attempting to helm upgrade by swapping out default values.

@linhlam-kc
Copy link
Contributor Author

@mbolt35 Sorry for the late reply here, thank you for clarifying about your concern. I agree that it could have un-expected side effect given that there is a change that customer does not allow 'null' securityContext. Regarding to the issue, it is related to this Helm client bug: helm/helm#5184

The issue is that if the default values (value file 1) are null, it could be overwritten successfully but it doesn't work if the default values (value file 1) arenot null that need to be replaced by null values. To understand the bug easier, please see the following examples below

For example:

Scenario 1:

  • The default value in values.yaml and values-thanos.yaml is null: securityContext: {}
  • The value in values-openshift.yaml is:
    securityContext:
       fsGroup: 1001
       runAsNonRoot: true
       runAsUser: 1001
  • I run the following Helm command to generate flat manifests:
helm template kubecost4 . \
--namespace kubecost4 \
--create-namespace \
-f ./values-thanos.yaml \
-f ./values-openshift.yaml
  • Result: securityContext is overwritten successfully with the values from values-openshift.yaml

Scenario 2:

  • The default value in values.yaml and values-thanos.yaml is not null:
    securityContext:
       fsGroup: 1001
       runAsNonRoot: true
       runAsUser: 1001
  • The value in values-openshift.yaml is null: securityContext: {}

  • I run the following Helm command to generate flat manifests:

helm template kubecost4 . \
--namespace kubecost4 \
--create-namespace \
-f ./values-thanos.yaml \
-f ./values-openshift.yaml
  • Result: securityContext is not overwritten with the default values from values.yaml and values-thanos.yaml

@mbolt35
Copy link
Contributor

mbolt35 commented Sep 7, 2022

@linhlam-kc Looking through this PR, and based on the helm bug description, it's still not adding up: you're changing all non-null values to null value defaults. If you're overwriting in the openshift values with a non-null security context, then why are you changing the existing defaults?

From our discussions, you're not going to be setting openshift's security context to null, in which case, we would never run into Scenario 2, so we shouldn't have to change the existing defaults.

@linhlam-kc
Copy link
Contributor Author

@linhlam-kc Looking through this PR, and based on the helm bug description, it's still not adding up: you're changing all non-null values to null value defaults. If you're overwriting in the openshift values with a non-null security context, then why are you changing the existing defaults?

From our discussions, you're not going to be setting openshift's security context to null, in which case, we would never run into Scenario 2, so we shouldn't have to change the existing defaults.

@mbolt35 My previous comment here is actually only for clarifying the Helm client bug which describe why adding non-null_securityContext back in values.yaml and values-thanos.yaml is not feasible after setting them to null_SecurityContext in Prometheus's chart values.yaml and Thanos's chart values.yaml. The real changes for OpenShift is to have all required SecurityContext to be null as in my commit, so OpenShift can assign random userid and groupid when Kubecost is installed.

@mbolt35
Copy link
Contributor

mbolt35 commented Sep 9, 2022

The real changes for OpenShift is to have all required SecurityContext to be null as in my commit

Ok, this is what I missed. Thanks for the explanation.

Given that this is a helm issue we can't avoid, I need to think about a better way to do this, or look into a different approach. It doesn't give me much hope that helm hasn't addressed this either.

@chipzoller
Copy link
Collaborator

This will be resolved shortly by introducing a platforms key with a switch for OpenShift (and others) which will result in the appropriate securityContext to be set.

@chipzoller chipzoller closed this Sep 24, 2023
@chipzoller chipzoller deleted the linh/support-openshift branch October 4, 2023 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants