Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale down #223

Open
ChunyiLyu opened this issue Jul 27, 2020 · 36 comments
Open

Scale down #223

ChunyiLyu opened this issue Jul 27, 2020 · 36 comments
Labels
design-doc-needed Design doc required before implementation enhancement New feature or request never-stale Issue or PR marked to never go stale

Comments

@ChunyiLyu
Copy link
Contributor

ChunyiLyu commented Jul 27, 2020

Is your feature request related to a problem? Please describe.

Scaling down a rabbitmq cluster does not work at the moment. We should look into why it fails, and what we need to do to support scaling down.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

  • rabbitmqctl forget_cluster_node might be needed before we deleting pods
  • we might need to skip preStop hooks
  • what do we do with non-HA queues that only exists in nodes that will be deleted?
@ChunyiLyu ChunyiLyu added this to To do in RabbitMQ Cluster Kubernetes Operator via automation Jul 27, 2020
@ChunyiLyu ChunyiLyu added the enhancement New feature or request label Jul 28, 2020
@coro
Copy link
Contributor

coro commented Aug 4, 2020

From today's sync-up:

rabbitmq-server might not support scaling down - in which case, should we support it?

I didn't know what 'scaling down doesn't work' meant in this issue, so I dug into it; there are three/four failure modes to scaling down not working:

  • If we lose the majority of nodes in a cluster during scaling down (e.g. 5->1), we go into pause minority and the whole cluster goes down
  • If any of the nodes are quorum critical or mirror sync critical (judged with rabbitmq-queues command), and this does not self-resolve (e.g. under heavy load / insufficient consumers), the pod will fail to terminate as the preStop hook never completes
  • Quorum queues require the terminating node to be removed from the RAFT algorithm through rabbitmq-queues shrink
  • Durable, classic queues (i.e. not mirrored/quorum) whose primary node is the node going down will be lost irrecoverably
    • This one is tenuous in my opinion - if you want replication, classic queues aren't going to provide that

@coro
Copy link
Contributor

coro commented Aug 4, 2020

For reference, the bitnami helm chart does allow for scaling down, but from my testing it does this by recreating the cluster, rather than removing single nodes.

@coro
Copy link
Contributor

coro commented Aug 4, 2020

@harshac and I played about with the helm chart today. It seems it has a persistent disk so that even when you scale the cluster & it recreates all of the nodes, it still persists any durable queues or persistent messages between iterations of the cluster.

@michaelklishin
Copy link
Member

michaelklishin commented Aug 4, 2020

Removing individual nodes would be much less disruptive with the introduction of maintenance mode in 3.8.x. RabbitMQ nodes are never implicitly removed from the cluster: rabbitmqctl forget_cluster_node would indeed be required to permanently remove a node.

@michaelklishin
Copy link
Member

RabbitMQ server does support permanent node removal from an existing cluster. Raft-based features such as quorum queues and MQTT client ID tracker require such node to also be explicitly removed from the Raft member list using, see

rabbitmq-queues help delete_member
rabbitmqctl help decommission_mqtt_node

The problematic part with downsizing is that the reduced number of nodes may or may not handle the same load on the system gracefully. A five-node cluster where all nodes run with the default open file handle limit of 1024 can sustain about 5000 connections but a three-node cluster would not be able to do that.

@coro coro self-assigned this Aug 13, 2020
@coro coro moved this from To do to In progress in RabbitMQ Cluster Kubernetes Operator Aug 13, 2020
@coro
Copy link
Contributor

coro commented Aug 13, 2020

It doesn't look like there is any native StatefulSet hook for specifically scaling down. The closest I've found is this issue, which covers our case exactly. It points out that there is no way to gracefully decommission a node in Kubernetes StatefulSets.

We will likely have to implement custom logic in the controller for this. Perhaps we can detect in the controller where the spec.replicas field in the CR is decreased, and use this to set flags on all of the pods in the stateful set, which lead to different preStop hook logic that does the maintenance mode work mentioned above? There would be no way for us to tell which pods would be brought down in this case, but any that do would have this behaviour. I have a suspicion that that would lead to all of the pods running this anyway when they are eventually brought down.

@mkuratczyk
Copy link
Collaborator

Indeed, it'd be great to see that as a native feature as described in that issue. For the time being, we could implement what you said which is more or less what we already have when deleting the cluster: https://github.com/rabbitmq/cluster-operator/blob/main/controllers/rabbitmqcluster_controller.go#L471. We can keep the replicas count untouched until the pod/node is successfully deleted/forgotten.

I'm not sure about this part: There would be no way for us to tell which pods would be brought down in this case. I'm pretty sure StatefulSet guarantees that if you scale down, the nodes with the higher index number will be deleted and even they will be deleted one by one. So the process could look something like this:

  1. Enable RabbitMQ maintenance mode for the node with the highest index
  2. Wait for the node to be drained
  3. Label (or in some other way mark) the node with the highest index for deletion
  4. Reduce the number of replicas in the StatefulSet
  5. The node with the highest index should be terminated quickly (it's empty anyway)
  6. We can run the forget/delete node command (unless done as part of node's termination)

@coro coro removed their assignment Aug 14, 2020
@coro coro moved this from In progress to To do in RabbitMQ Cluster Kubernetes Operator Aug 14, 2020
@github-actions
Copy link

github-actions bot commented Jan 2, 2021

This issue has been marked as stale due to 60 days of inactivity. Stale issues will be closed after a further 30 days of inactivity; please remove the stale label in order to prevent this occurring.

@github-actions github-actions bot added the stale Issue or PR with long period of inactivity label Jan 2, 2021
@mkuratczyk mkuratczyk added never-stale Issue or PR marked to never go stale and removed stale Issue or PR with long period of inactivity labels Jan 11, 2021
@adthonb
Copy link

adthonb commented Jun 15, 2021

Do you have a solution to scale down the pod at this moment? I edit a configuration on RabbitMQ instance for a replica from 3 to 1 but nothing happened.

@ChunyiLyu
Copy link
Contributor Author

Do you have a solution to scale down the pod at this moment? I edit a configuration on RabbitMQ instance for a replica from 3 to 1 but nothing happened.

Scale down is not supported by the cluster operator and it's not a planed feature for us atm. Reducing the number of replicas is ignored by the cluster operator, and if you check the operator logs and published events for your RabbitmqCluster, there should be a line says "Cluster Scale down not supported".

@NArnott
Copy link

NArnott commented Aug 27, 2021

Just ran across this. I have a development, single-node cluster set up that has 20ish devs, each with their own username and vhost. While doing some testing, I accidently scaled it up to a 3 node cluster, and now I have no way to bring it back down to one except to destroy and re-create it. Not a big deal, except I'll have to reconfigure all the users, vhosts and their permissions all over again.

With as easy as it is to change "replicas: 1 -> 3", it would have been nice to have a similar experience from 3 -> 1 (at least in a non-production setting).

@mkuratczyk
Copy link
Collaborator

Unfortunately making the scale down easy is exactly what we don't want to do until it's well supported (because it would make it easy to lose data). I understand that in your case it's a non-production system but I don't see how we could make it easy only in that case. Having said that, if your are only concerned with the definitions (vhosts, users, permissions, etc), you can easily take care of them by exporting the definitions and importing them to a new cluster. Something like this:

# deploy a 3-node cluster (for completeness, you already have one)
$ kubectl rabbitmq create foo --replicas 3

# export the definitions
$ kubectl exec foo-server-0 -- rabbitmqctl export_definitions - > definitions.json

# delete the 3-node cluster
$ kubectl rabbitmq delete foo

# deploy a single-node replacement
$ kubectl rabbitmq create foo

# copy the definitions to the new cluster's pod
$ kubectl cp definitions.json foo-server-0:/tmp/definitions.json

# import the definitions
$ kubectl exec foo-server-0 -- rabbitmqctl import_definitions /tmp/definitions.json

It's a bit mundane but will do the trick. You can replace the last few steps with an import on node startup (see the example.

Keep in mind two caveats:

  • you will lose the messages in the queues, so make sure you only follow these steps where that's not a problem
  • the default user created when a cluster is deployed is somewhat special. If you import the definitions after the new cluster is up (as in the steps above), you will still have default user of the new cluster (so the new cluster will have 1 more user than the old one). If you import the definitions on boot (as in the linked example), the new cluster will have exactly the same users as the old one but unfortunately your Kubernetes Secret (foo-default-user in this example) will contain credentials that won't work (they are generated for the new cluster but RabbitMQ doesn't create the default user when definitions import is configured). We hope to find a way to keep the secret in sync in such situations, but for now, you can just manually update it with the old cluster's credentials so that your secret matches the actual state.

@mkuratczyk mkuratczyk added the design-doc-needed Design doc required before implementation label Oct 1, 2021
@yuzsun
Copy link

yuzsun commented Nov 9, 2021

Hi @mkuratczyk, just want to know any update about this feature?
is the scale down fully supported now? or do we have any plan/ETA? thanks :D

@mkuratczyk
Copy link
Collaborator

It's not. Please provide the details of your situation and what you'd expect to happen and we can try to provide a workaround or will take such a use-case into account when working on that.

@yuzsun
Copy link

yuzsun commented Nov 10, 2021

we were using rabbitmq Statefulset in Kubernetes, and we found it was unable to scale down. whenever we tried to scale down the Statefulset to 0, there will be 3 replicas remained.
image
the issue only occurs in rabbitmq, other Statefulset like nginx Statefulset will be successfully scaled down to 0.

@mkuratczyk
Copy link
Collaborator

  1. The Operator reconciles the StatefulSet so if you configured replicas: 3 in the RabbitmqCluster, when you change the StatefulSet, the Operator will "fix it" because it no longer matches the definition. This is how reconciliation is supposed to work in Kubernetes (the same mechanism is what restarts pods when they disappear - their number no longer matches the defined number of replicas).
  2. You can't decrease the replicas in the RabbitmqCluster to prevent you from losing data. Scaling down this way can lead to data loss in a production environment (there can be data that is only present on some of the nodes and those nodes could get deleted).

Why do you want to scale down to zero? RabbitMQ is a stateful, clustered, application. It can't start in a fraction of a second when an application tries to connect. Is this some test environment that you want to turn off for the weekend or something like that? This question is what I meant by your use case.

If it is a test env, there are a few things you can try:

  1. Perhaps you don't need to maintain the state in the first place? We now support storage: 0 which doesn't even create the PVCs. If you just want to spin up a RabbitMQ cluster quickly (eg. to run CI tests) and delete it afterwards - that can be a good option.
  2. If you want to maintain the state but stop/delete the pods for some time, you can try kubectl rabbitmq pause-reconciliation NAME and then you can perform operations on the statefulset, that the Operator will not overwrite (reconcile). But please be careful - by manually changing the statefulset you can easily lose data or get into a state that the Operator will not be able to reconcile.

But again - please let us know what you are trying to achieve. RabbitMQ is more like a database. I'm not sure why you would run nginx in a statefulset but they are fundamentally different applications so you can't just expect the manage them the same way.

@AminSojoudi
Copy link

For a situation where queue messages are not important can we do this?
for example we have a 3 nodes cluster and we want to reduce it to 2, we can change CRD replicas to 2, then edit statefulset replicas to 2?

@mkuratczyk
Copy link
Collaborator

Please read through this thread and try if you want. I'd expect you to still need to run forget_cluster_node at least. Scale down is unsupported, so we don't have all the answers and don't test this. You can contribute by experimenting and sharing with others.

You didn't share your use case either. Why do you want to go from 3 nodes to 2? 2-nodes RabbitMQ clusters are hardly supported in the first place. Quorum queues and streams require 3 nodes (or just 1 for test envs). If you don't care about your messages - there are other options (running single node in the first place, running a cluster with no persistent storage).

@ksooner
Copy link

ksooner commented Nov 11, 2021

  1. The Operator reconciles the StatefulSet so if you configured replicas: 3 in the RabbitmqCluster, when you change the StatefulSet, the Operator will "fix it" because it no longer matches the definition. This is how reconciliation is supposed to work in Kubernetes (the same mechanism is what restarts pods when they disappear - their number no longer matches the defined number of replicas).
  2. You can't decrease the replicas in the RabbitmqCluster to prevent you from losing data. Scaling down this way can lead to data loss in a production environment (there can be data that is only present on some of the nodes and those nodes could get deleted).

Why do you want to scale down to zero? RabbitMQ is a stateful, clustered, application. It can't start in a fraction of a second when an application tries to connect. Is this some test environment that you want to turn off for the weekend or something like that? This question is what I meant by your use case.

If it is a test env, there are a few things you can try:

  1. Perhaps you don't need to maintain the state in the first place? We now support storage: 0 which doesn't even create the PVCs. If you just want to spin up a RabbitMQ cluster quickly (eg. to run CI tests) and delete it afterwards - that can be a good option.
  2. If you want to maintain the state but stop/delete the pods for some time, you can try kubectl rabbitmq pause-reconciliation NAME and then you can perform operations on the statefulset, that the Operator will not overwrite (reconcile). But please be careful - by manually changing the statefulset you can easily lose data or get into a state that the Operator will not be able to reconcile.

But again - please let us know what you are trying to achieve. RabbitMQ is more like a database. I'm not sure why you would run nginx in a statefulset but they are fundamentally different applications so you can't just expect the manage them the same way.

i am working with @yuzsun. our use case is this is a test environment and we want to stop kubernetes cluster running rabbitmq during weekend. I will look into the "storage: 0" option

@AminSojoudi
Copy link

AminSojoudi commented Nov 12, 2021

Please read through this thread and try if you want. I'd expect you to still need to run forget_cluster_node at least. Scale down is unsupported, so we don't have all the answers and don't test this. You can contribute by experimenting and sharing with others.

You didn't share your use case either. Why do you want to go from 3 nodes to 2? 2-nodes RabbitMQ clusters are hardly supported in the first place. Quorum queues and streams require 3 nodes (or just 1 for test envs). If you don't care about your messages - there are other options (running single node in the first place, running a cluster with no persistent storage).

I tested that and worked well without even calling forget_cluster_node, the situation that I have provided was unreal, it was just for testing purposes but our real scenario is this:
we have 3 nodes k8s cluster, we have rabbitmq cluster with 3 replicas with podAntiAffinity, accidentally someone in our team changes the replicas to 4 in case of high load. unfortunately, there is no way back, the new rabbitmq node is stuck in a pending state, we cannot do anything even if we remove podAntiAffinity. we cannot even delete the whole rabbitmq cluster and create a new one because that way our data will be lost, cause rabbitmq k8s operator delete the PVCs.

@Zerpet
Copy link
Collaborator

Zerpet commented Nov 12, 2021

@AminSojoudi There's a workaround to scale down, assuming you are ok with potential data loss. If the 4th instance never moved from Pending state, it should be fine.

  1. Set the replicas to the desired number in the RabbitmqCluster spec, e.g. 3
  2. Run kubectl scale sts --replicas=<desired-number> <rabbit-server-statefulset>
  3. Pod exec into one of the remaining nodes
  4. Run rabbitmqctl cluster_status to obtain the names of clustered Disk Nodes
  5. Run rabbitmqctl forget_cluster_node 'rabbit@...', where the name of the node should be the one you deleted.
  6. You may have a leftover PVC for the 4th Pod. You may want to consider cleanup of this PVC.

For your use case @ksooner, you can skip steps 3-5, since you simply want to scale to 0 over the weeked. The PVC will remain if you scale down to 0, so previous durable exchanges/queues will still be there when you scale back to 3. Also, you may get Pods in Terminating state for a long time. This likely will be due to our pre-stop hook, which ensure data safety and availability; to get around that, set .spec.terminationGracePeriodSeconds in RabbitmqCluster to something low like 10.

I'd like to re-state that scale down is not supported by the Operator, and this workaround risks data loss, and effectively bypasses most, if not all, data safety measures we've incorporated in the Operator.

@malmyros
Copy link

malmyros commented Jan 7, 2022

This was really helpful thank you for sharing

@oneumyvakin
Copy link

@mkuratczyk I'm using K8s on bare-metal and I'm not using shared storage operators.

I have a two cases when scaling down looks like can help:

  1. When need to replace node's server with a new server.
  2. When need to move cluster between datacenters.

As I can't just move PVs between K8s nodes I've to scale up and scale down cluster for each node one by one.

For case N1 I can scale up cluster first for +1 new node, scale down for -1 old node.
For case N2 I can scale up/scale down with step in 1 node for each nodes in cluster.

Is it valid cases for you? Or I can do something different?

@mkuratczyk
Copy link
Collaborator

  1. How is this different from a typical Kubernetes operation like changing CPUs for a pod? In that case, the disk is detached, a new pod (new "server") is created, and the disk attached. It feels like you can do pretty much the same (probably by copying data, not physically moving the disk)
  2. For migrating between data centres, blue-green is generally the best option.

Just to be clear, I know there are cases where scale-down may be helpful. It's just that it's a hard problem to solve in a generic way (that will work for different cases, with different queue types and so on).

@OzzyKampha

This comment was marked as duplicate.

@RicardsRikmanis
Copy link

A good case for why the scaledown is useful is for cost saving purposes. For example, scaling down all pods in development namespace to 0 during the inactive hours thus reducing the required node count. If the operator does not support it, then we have a dangling pod left in the namespace.

@mkuratczyk
Copy link
Collaborator

We realize there are use cases, they just don't make the problem any simpler. :) Development environments are the easiest case unless you have some special requirements. What's the benefit of scaling down to zero compared to just deleting and redeploying the cluster the following day? You can keep your definitions in a JSON file or also as Kubernetes resources, so that the cluster has all the necessary users, queues and so on when it starts.

@xixiangzouyibian
Copy link

xixiangzouyibian commented Jun 6, 2023

The other case for cluster scaling down:
We have quorum queues cluster with 5 rabbitmq nodes (one rabbitmq node per phyiscal k8s node). If 3 k8s nodes go down, the cluster will be unavaliable because they lost the majority. Scaling down the replica to 3 will make the cluster accessible again.

Loop experts @mkuratczyk @Zerpet for awareness

@xixiangzouyibian
Copy link

xixiangzouyibian commented Jun 6, 2023

@AminSojoudi There's a workaround to scale down, assuming you are ok with potential data loss. If the 4th instance never moved from Pending state, it should be fine.

  1. Set the replicas to the desired number in the RabbitmqCluster spec, e.g. 3
  2. Run kubectl scale sts --replicas=<desired-number> <rabbit-server-statefulset>
  3. Pod exec into one of the remaining nodes
  4. Run rabbitmqctl cluster_status to obtain the names of clustered Disk Nodes
  5. Run rabbitmqctl forget_cluster_node 'rabbit@...', where the name of the node should be the one you deleted.
  6. You may have a leftover PVC for the 4th Pod. You may want to consider cleanup of this PVC.

For your use case @ksooner, you can skip steps 3-5, since you simply want to scale to 0 over the weeked. The PVC will remain if you scale down to 0, so previous durable exchanges/queues will still be there when you scale back to 3. Also, you may get Pods in Terminating state for a long time. This likely will be due to our pre-stop hook, which ensure data safety and availability; to get around that, set .spec.terminationGracePeriodSeconds in RabbitmqCluster to something low like 10.

I'd like to re-state that scale down is not supported by the Operator, and this workaround risks data loss, and effectively bypasses most, if not all, data safety measures we've incorporated in the Operator.

Should label rabbitmqcluster as pauseReconcile=true to make CRD stop watching the cluster firstly ?

@mkuratczyk
Copy link
Collaborator

@xixiangzouyibian RabbitMQ cluster membership and quorum queue membership are separate concerns. Scaling down the RabbitMQ cluster cleanly would require scaling down quorum queues as well. However, for quorum queue membership changes, the quorum of nodes need to be available. For situations like you described, an API was recently added to force a single quorum queue (a Ra/Raft cluster) member to assume it's the only member now: rabbitmq/ra#306. This is a very dangerous operation and will likely will lead to data loss.

You should not lose 3 out of 5 members in the first place.

@xixiangzouyibian
Copy link

xixiangzouyibian commented Jun 6, 2023

Yes, I aware I can successfully declare quorum queue in single rabbitmq node now. Thanks for this PR.
Back to scale down topic, do you mean if I lose 3 out of 5 members, the cluster should fail, all of we can do is backup data and restore it from new cluster deployment or try to restore the lost 3 nodes instead of scaling down the cluster ?

@mkuratczyk
Copy link
Collaborator

What I mean is that forgetting a RabbitMQ cluster node doesn't automatically remove it from the RAFT cluster of the quorum queue(s). That would need to be handled separately and the PR I linked to allows to do that (it basically tells one quorum queue / RAFT node "you are the only survivor, forget all other nodes and become the leader", which means if that node wasn't up to date with the old leader, the delta will be lost).

Declaring quorum queues in a single node RabbitMQ has always been possible, that's unrelated to this PR.

@xixiangzouyibian
Copy link

Let me tidy it up:
The case for 4 out of 5 members that only 1 rabbitmq node survive currently, and the PR will work to make cluster avaliable, but the cost is delta data loss, right ?
In terms of 3 out of 5 members, the cluster loses majority, but 2 nodes survive not single, is this PR still working, and what happens about the cluster, is it still avaliable ?

What I mean is that forgetting a RabbitMQ cluster node doesn't automatically remove it from the RAFT cluster of the quorum queue(s). That would need to be handled separately and the PR I linked to allows to do that (it basically tells one quorum queue / RAFT node "you are the only survivor, forget all other nodes and become the leader", which means if that node wasn't up to date with the old leader, the delta will be lost).

Declaring quorum queues in a single node RabbitMQ has always been possible, that's unrelated to this PR.

@mkuratczyk
Copy link
Collaborator

If you lose quorum, the queue is not available without human intervention (since RabbitMQ doesn't know that the nodes were lost forever). The PR provides a way to intervene, by selecting a single queue member as the new leader and only member. The only difference when you have two survivors is that you can pick one of them to be the only one / new leader, so if their state differs, you can pick the more up to date one. Normally this is handled by leader election but the normal election process can't take place since we lost quorum.

@xixiangzouyibian
Copy link

xixiangzouyibian commented Jun 6, 2023

If you lose quorum, the queue is not available without human intervention (since RabbitMQ doesn't know that the nodes were lost forever). The PR provides a way to intervene, by selecting a single queue member as the new leader and only member. The only difference when you have two survivors is that you can pick one of them to be the only one / new leader, so if their state differs, you can pick the more up to date one. Normally this is handled by leader election but the normal election process can't take place since we lost quorum.

Got it, Thanks a lot !
Can I incidentally know which version of rabbitmq supports this feature ? I checked v3.12 has checked-in the PR until Feb 2023.

@mkuratczyk
Copy link
Collaborator

Seems like 3.12 should have it. I haven't tried.
rabbitmq/rabbitmq-server#8322

And once again: this is the last resort kind of solution. Do not use it unless you are sure you want to.

marshtompsxd pushed a commit to vmware-research/verifiable-controllers that referenced this issue Aug 13, 2023
The current version of RabbitMQ controller doesn't support scaling down;
directly reducing the replicas of the stateful set can lead to quorum
loss and even data loss. More details in [this
issue](rabbitmq/cluster-operator#223). Thus,
we decide to disallow downscale for the stateful set in our
implementation.

Using the [validation
rule](https://github.com/vmware-research/verifiable-controllers/blob/f5236647bf4fb26daa1359fde3c61a282a886735/src/controller_examples/rabbitmq_controller/spec/rabbitmqcluster.rs#L108)
can guarantee that updating the deployment won't decrease the replicas.
But here is a corner case: a workaround for downscale operation is to
delete the current deployment and create a new one with fewer replicas,
which doesn't violate the validation rule. If the execution happens in
this way, chances are that the stateful set created from the new cr may
not have been deleted by the garbage collector when the controller tries
to update the stateful set with the new cr which has a smaller
`replicas` field. Thus, the controller implementation still needs to
compare the old and new replicas before updating stateful set to make
sure scaling down doesn't happen. This makes the proof a lot more
difficult, because we have to show that the replicas of old stateful set
is lower than that of the current cr, whose proof requires us to show
that if a stateful set has owner reference pointing to some cr, its
replicas is no larger.

Therefore, we decide to let the controller wait for the garbage
collector to delete the old stateful set, which avoids the corner case,
and does not introduce too much complexity to the proof since it's
needless to compare the stateful set now. In this case, if the old
stateful set doesn't have an owner reference pointing to the current cr,
the reconcile will simply go to the error state and wait for the next
round of reconcile.

To make the liveness proof work, I change
`the_object_in_reconcile_has_spec_as` into
`the_object_in_reconcile_has_spec_and_uid_as` so that the
owner_references can also be derived from the desired custom resource.
The left work is as follows:
1. Add an eventual safety property showing that `spec /\
[]desired_state_is(cr) |= true ~> (sts_key_exists =>
sts.owner_references.contains(cr.owner_ref()))`.
2. Add `[](sts_key_exists =>
sts.owner_references.contains(cr.owner_ref()))` to the assumption of the
spec.
3. Add reasoning about the steps after create/update server config map
and the stateful set part should be similar as zookeeper as long as 1
and 2 are done.

---------

Signed-off-by: Wenjie Ma <rebekah1368@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-doc-needed Design doc required before implementation enhancement New feature or request never-stale Issue or PR marked to never go stale
Projects
No open projects
Development

No branches or pull requests