Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add liveness probe to kube-mgmt container #211

Open
eshepelyuk opened this issue May 31, 2023 · 5 comments
Open

Add liveness probe to kube-mgmt container #211

eshepelyuk opened this issue May 31, 2023 · 5 comments

Comments

@eshepelyuk
Copy link
Contributor

eshepelyuk commented May 31, 2023

  1. On startup kube-mgmt should add sample policy to OPA container using OPA REST API. The policy i a marker that communication between containers is established and kube-mgmt started reconciliation.
  2. The sample policy should be implemented as Custom Health Check
  3. Then liveness probe should be added to kube-mgmt container, that will periodically check that OPA policy against OPA container. If policy is missing - most probably OPA container was restarted, so kube-mgmt pod can be killed and on the restart policy will be synchronized.
  4. Thresholds and periods should be set to values that would enforce kube-mgmt container restart as soon as possible.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes

Relates #189
Relates #206

@saranyareddy24
Copy link
Contributor

saranyareddy24 commented May 31, 2023

We faced this issue where OPA container restarts and kube-mgmt container is not aware of it, so it doesn't load the policies.

Solution that worked for us:

When policies were not properly loaded to OPA, the http post request sent on OPA pod will have the below response
Request URL: https://127.0.0.1:8443
Method: POST
Response code: 404
Response:

{
  "code": "undefined_document",
  "message": "document missing: data.system.main"
}

But when OPA policies were loaded properly the same post request will be successful with the below response
Request URL: https://127.0.0.1:8443
Method: POST
Response code: 200
Response:
{"apiVersion":"admission.k8s.io/v1beta1","kind":"AdmissionReview","response":{"allowed":true}}
Below configuration of liveness probe works fine, it keeps checking whether the response code for the HTTPS request is 200, if not it will restart the container, there by loading policies again.

livenessProbe:
    exec:
        command:
            - sh
            - -c
            - rc=`wget --server-response https://127.0.0.1:8443 --post-data {} --no-check-certificate
              2>&1 | awk '/^  HTTP/{print $2}'`;[ $rc -eq 200 ]
      failureThreshold: 1
      initialDelaySeconds: 60
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 30

Let me know if this is a good approach. If the solution is fine, I can contribute and check in this change.

@eshepelyuk
Copy link
Contributor Author

eshepelyuk commented May 31, 2023

Hello @saranyareddy24 the approach is described in head of the issue. your approach is a partial case depending on your current helm chart setup, it is not covering all possible setup options.

@saranyareddy24
Copy link
Contributor

saranyareddy24 commented Jun 14, 2023

Let me know if this is fine.

Configmap which creates start.rego

apiVersion: v1
kind: ConfigMap
metadata:
  name: policy-start
  labels:
    openpolicyagent.org/policy: rego
data:
  start.rego: |
    # If kube-mgmt is not able to access this policy it will consider
    # that OPA has restarted and it will try to reload the policies by restarting.
    package test
    description := "Policy that loads on start of OPA"

Liveness check for fetching start.rego

  livenessProbe:
    failureThreshold: 5
    httpGet:
      path: /v1/policies/default/policy-start/start.rego
      port: 8181
      scheme: HTTPS
    initialDelaySeconds: 60
    periodSeconds: 5
    successThreshold: 1
    timeoutSeconds: 10

Tested on my local, the configuration works.

@eshepelyuk
Copy link
Contributor Author

Hello @saranyareddy24

I do not understand the purpose of presented ConfigMap.
Please describe how it's gonna work.

@eshepelyuk
Copy link
Contributor Author

eshepelyuk commented Jun 14, 2023

Hello @saranyareddy24

I do not understand the purpose of presented ConfigMap.
Please describe how it's gonna work.

Hello @saranyareddy24

I've also updated issue description. Hope, the intention will be more clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants