Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplicaSet explosion caused by conflicting mutations #2963

Open
rtheis opened this issue Aug 18, 2023 · 9 comments
Open

ReplicaSet explosion caused by conflicting mutations #2963

rtheis opened this issue Aug 18, 2023 · 9 comments
Assignees
Labels
docs Pure prose triaged

Comments

@rtheis
Copy link

rtheis commented Aug 18, 2023

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

Using the OPA gatekeeper to mutate a replicaset owned by a deployment may result in significant cluster stability problems due to replicaset explosion cause by conflicting mutations. See the issue description for recreate instructions.

What did you expect to happen:

We recommend that the OPA documentation and/or code warn against mutation of replicasets owned by a deployment.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
kubernetes/kubernetes#57167
https://docs.google.com/document/d/10LFy30JTfTD3qgCsBZ2S8ZpuWao9mqT_xqkcbvPzVf4/

Environment:
IBM Cloud Kubernetes Service

  • Gatekeeper version: 3.12
  • Kubernetes version: (use kubectl version): 1.28
@rtheis rtheis added the bug Something isn't working label Aug 18, 2023
@rtheis
Copy link
Author

rtheis commented Aug 18, 2023

Recreate Scenario

Install Open Policy Agent (OPA) Gatekeeper

kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/v3.12.0/deploy/gatekeeper.yaml
kubectl rollout status deployment -n gatekeeper-system  gatekeeper-audit
kubectl rollout status deployment -n gatekeeper-system  gatekeeper-controller-manager

Create Test Deployments

for i in $(seq 10); do
    kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: restricted-$i
  name: restricted-$i
spec:
  selector:
    matchLabels:
      run: restricted-$i
  template:
    metadata:
      labels:
        run: restricted-$i
    spec:
      containers:
      - name: restricted-$i
        image: us.icr.io/armada-master/pause:3.9
        securityContext:
          privileged: false
          runAsUser: 1000
          runAsGroup: 1000
EOF
done
for i in $(seq 10); do
    kubectl rollout status deployment restricted-$i
done

Verify Test Deployments BEFORE OPA Gatekeeper Mutation

Restart Deployments

kubectl get rs --no-headers | wc -l
for i in $(seq 10); do
    kubectl rollout restart deployment restricted-$i
done
for i in $(seq 10); do
    kubectl rollout status deployment restricted-$i
done
kubectl get rs --no-headers | wc -l

Scale Deployments

kubectl get rs --no-headers | wc -l
for i in $(seq 10); do
    kubectl scale deployment restricted-$i --replicas=2
done
for i in $(seq 10); do
    kubectl rollout status deployment restricted-$i
done
kubectl get rs --no-headers | wc -l

Create OPA Gatekeeper Mutator

kubectl apply -f - <<EOF
apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
  name: mutator
spec:
  applyTo:
  - groups:
    - apps
    kinds:
    - ReplicaSet
    versions:
    - v1
  - groups:
    - apps
    kinds:
    - Deployment
    versions:
    - v1
  location: spec.template.spec.containers[name:*].securityContext.allowPrivilegeEscalation
  match:
    kinds:
    - apiGroups:
      - apps
      kinds:
      - ReplicaSet
    - apiGroups:
      - apps
      kinds:
      - Deployment
    scope: Namespaced
  parameters:
    assign:
      value: false
EOF

Verify Test Deployments AFTER OPA Gatekeeper Mutation

Restart a Deployment - No Problems

kubectl get rs --no-headers | wc -l
kubectl rollout restart deployment restricted-1
kubectl rollout status deployment restricted-1
sleep 10
kubectl get rs --no-headers | wc -l

Delete ReplicaSet for a Deployment - Some Problems

kubectl get rs --no-headers | wc -l
kubectl delete replicaset -l run=restricted-2
sleep 10
kubectl get rs --no-headers | wc -l

Delete Pods for a Deployment - No Problems

kubectl get rs --no-headers | wc -l
kubectl delete pod -l run=restricted-3
sleep 10
kubectl get rs --no-headers | wc -l

Scale a Deployment - Big Problems

# Get some popcorn, find a comfortable chair, and watch the fireworks.
kubectl get rs --no-headers | wc -l
kubectl scale deployment restricted-10 --replicas=3
kubectl get rs --no-headers | wc -l
for i in $(seq 20); do
    kubectl get rs --no-headers | wc -l
    sleep 6
done

Fix Test Deployments

kubectl get rs --no-headers | wc -l
for i in $(seq 10); do
    kubectl rollout restart deployment restricted-$i
done
for i in $(seq 10); do
    kubectl rollout status deployment restricted-$i
done
kubectl get rs --no-headers | wc -l
for i in $(seq 20); do
    kubectl get rs --no-headers | wc -l
    sleep 6
done

Restart Cluster

kubectl delete -f - <<EOF
apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
  name: mutator
EOF
for i in $(seq 10); do
    kubectl delete deployment restricted-$i
done

@jiahuif
Copy link

jiahuif commented Aug 24, 2023

I can think of a workaround: Do not match both RS and deployment. Check RS.meta.ownerReference to be null for unmanaged RS and skip other RS.

@ritazh
Copy link
Member

ritazh commented Aug 24, 2023

Thanks for raising this @rtheis! Is there a reason you cannot match and apply to Pod and change the location?

e.g.:

apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
  name: mutator
spec:
  applyTo:
  - groups: [""]
    kinds: ["Pod"]
    versions: ["v1"]
  match:
    scope: Namespaced
    kinds:
    - apiGroups: ["*"]
      kinds: ["Pod"]
  location: "spec.containers[name:*].securityContext.allowPrivilegeEscalation"
...

Another example similar to this:
https://open-policy-agent.github.io/gatekeeper/website/docs/mutation#adding-dnspolicy-and-dnsconfig-to-a-pod

@rtheis
Copy link
Author

rtheis commented Aug 25, 2023

@jiahuif @ritazh thank you. We certainly can and did work with folks to modify the match to ignore relicasets owned by a deployment.

@ctml91
Copy link

ctml91 commented Aug 29, 2023

You could use expansion templates if you want to target both Deployments and Pods (and other templates implementing pod template). Not required, but it's just an option in case you weren't aware of the feature.

https://open-policy-agent.github.io/gatekeeper/website/docs/expansion

@stale
Copy link

stale bot commented Oct 28, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 28, 2023
@rtheis
Copy link
Author

rtheis commented Oct 29, 2023

Ping

@stale stale bot removed the stale label Oct 29, 2023
Copy link

stale bot commented Dec 29, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 29, 2023
@rtheis
Copy link
Author

rtheis commented Dec 29, 2023

Ping

@stale stale bot removed the stale label Dec 29, 2023
@ritazh ritazh added docs Pure prose triaged and removed bug Something isn't working labels Dec 29, 2023
@ritazh ritazh self-assigned this Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Pure prose triaged
Projects
None yet
Development

No branches or pull requests

4 participants