-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A lot of reconciling events happens on a short time without any clear reason #944
Comments
Hey @dorshay6, can you share the manifest of your The reconcile loop triggers when there's an event (create, update, delete) on the Did RabbitMQ pods restart as a consequence of those reconcile events? |
From looking at the reconcile logs here it looks like the manifest was an adaptation of the production-ready example: apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: production-ready
spec:
replicas: 4
image: rabbitmq:3.8.21-management
service:
type: LoadBalancer
annotations:
"cloud.google.com/load-balancer-type": Internal
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 2
memory: 4Gi
rabbitmq:
additionalConfig: |
cluster_partition_handling = pause_minority
vm_memory_high_watermark_paging_ratio = 0.99
disk_free_limit.relative = 1.0
collect_statistics_interval = 10000
persistence:
storage: "500Gi"
terminationGracePeriodSeconds: 604800
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- production-ready
topologyKey: kubernetes.io/hostname
override:
statefulSet:
spec:
template:
spec:
containers: []
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: production-ready |
I've just attempted reproducing on GKE with the above manifest and triggering a rolling upgrade of the Kubernetes nodes, and can't reproduce. I do see when a node where a RabbitmqCluster Pod is scheduled goes down for upgrade, this triggers several Reconcile loops as the operator receives the delete event for the Pod it owns, and then recreates the Pod. It looks like there's about 4 Reconcile loops triggered per Pod from my tests. I agree with @Zerpet, I think it's important to know whether the Reconciles happened at the exact same time as the node restarts or not, as we would expect a handful of Reconciles per restart. Also, did the Reconciles continue to occur long after the node restarts? On my system, I'm seeing the logs completely stop after the node upgrades are completed. |
This issue has been marked as stale due to 60 days of inactivity. Stale issues will be closed after a further 30 days of inactivity; please remove the stale label in order to prevent this occurring. |
Closing stale issue due to further inactivity. |
Describe the bug
The cluster or any configuration related to it wasn't changed in the last week, suddenly we saw reconciling logs every few seconds, this caused one of the pods and his queues some downtime.
Help is needed as we don't feel comfortable with the operator in production right now
Expected behavior
reconciling should only happen for a reason.
Version and environment information
Additional context
A node version upgrade happened in a similar time.
Might be related to #826
Maintainer edit: log as a code block
The text was updated successfully, but these errors were encountered: