Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to cassandraImage are ignored #355

Open
linki opened this issue Mar 11, 2020 · 6 comments
Open

Updates to cassandraImage are ignored #355

linki opened this issue Mar 11, 2020 · 6 comments

Comments

@linki
Copy link

linki commented Mar 11, 2020

When updating the cassandraImage in a CDC spec it doesn't update the corresponding StatefulSets.

The operator correctly gets an event and starts reconciling the CDC but somewhere in the code it then doesn't manage to update the StatefulSet spec. There's no error in the logs.

Also tested with another field, such as flipping optimizeKernelParams which should also change the StatefulSet spec.

I think there's currently a bug with detecting whether a StatefulSet has changed and needs to be updated.

Tested with:

  • operator v5.0.0
  • sidecar v3.1.1 and v5.0.0
  • topology-aware setup
  • Kubernetes v1.16.7
@smiklosovic
Copy link
Collaborator

Interesting, I ll debug this one. Thanks for reaching out.

@smiklosovic
Copy link
Collaborator

@linki , actually, what would you like to see as a result here? StatefulSet is changed and now what ... If you want to effectively run on a different image, it means that the container itself would have to be restarted.

Could you check that the image is changed in Statefulset after your changes? In other terms, do you see that you have your new image in Statefulset but there is not any action taken?

As a workaround, you might restart that pod / container. Restarting a container should fetch latest stuff from Statefulset spec with your new image. Restarting just Cassandra container effectively means you kill pid 1. This might be done from Sidecar by calling restart endpoint. Please try to read this section in auth doc (1) which is related to restarting a pod (I might move this documentation somewhere else in the future).

(1) https://github.com/instaclustr/cassandra-operator/blob/master/doc/auth.md#switching-between-allowallauthenticator-to-passwordauthenticator

@linki
Copy link
Author

linki commented Mar 12, 2020

Could you check that the image is changed in Statefulset after your changes? In other terms, do you see that you have your new image in Statefulset but there is not any action taken?

It doesn't even update the StatefulSet itself so Kubernetes isn't doing anything. When the CDC changes the operator should propagate those down to the StatefulSets.

After thinking about it more it's probably not a good idea to blindly apply the new spec to the StatefulSet since some updates might break the clusters.

It's just that currently there's no way of updating the Cassandra version or other fields without directly editing the operator-managed StatefulSets.

@smiklosovic
Copy link
Collaborator

Yeah, maybe I can just do something about that one.

The only time it "reacts" is if number of replicas do not match, based on what you entered into new spec (bigger or lower number), it will scale up or down, so there is already this "functionality" we just have to expand it to image name and trigger restart somehow.

@jgoeres
Copy link

jgoeres commented May 13, 2020

We just started using cassandra with the help of the instaclustr operator, so I am a quite a newb to both, but I would like to add my thoughts here, based on what little info I found about that topic so far. Sorry if some things might seem a bit naive. ;-)

I am currently looking into ways to do an update a cassandra cluster, too. Ideally while keeping the cluster operational during the update process, albeit with reduced capacity and redundancy. According to some documentation, for a non-containerized Cassandra, it is possible to do a rolling update, as described, e.g., here:
https://stackoverflow.com/questions/44024170/upgrading-cassandra-without-losing-the-current-data
Alas, it seems not to be as simple as just patching the version in the statefulset and let the statefulset do a rolling update - in the description above, example, after a node is shut down and the binaries updated, in addition nodetool upgradesstables has to be run on that node.
I am not sure if that is always required when doing any update (major, minor or patch version change) - maybe there are version jumps (patches maybe?) where it is safe to just replace the binary with new one and start the node again?
Another option would be to have the startscript check if a version update has happened and run upgradesstables on demand? The pod would not become ready until this completes, and only then would the statefulset proceed with stopping and updating the next pod.

Naive, maybe, but that is what I would hope an operator could do for me. :-)

@smiklosovic
Copy link
Collaborator

smiklosovic commented May 15, 2020

Hello @jgoeres ,

yeah this one is very complex. I would personally investigate update of a node by decomissioning of a node so cluster would shink down and I would try to scale it up, presumably with new image. This can be maybe done by restarting a node with new stuff ... However this approach is rather ... strange.

Anyway, the whole approach should be ridden by operator, no custom scripts, that gets complicated very quickly.

There is a sidecar running along each node and the best thing would be if the operator sent "upgrade request" to Sidecar (via its REST API) and this Sidecar would restart each node one by one. Restarting of a node is already done. Rolling restart of a cluster and its orchestration is not hard here, what needs to be done is that there needs to be Cassandra image in spec changed so it is restarted with new values ... If you can do this manually, triggering rolling restarts with orchestrated sstable upgrades should be just easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants