Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to 'snapshot' kind cluster and restore it to snapshotted state via kind cli #3508

Open
SleepyBrett opened this issue Feb 6, 2024 · 4 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@SleepyBrett
Copy link

What would you like to be added:
The ability to use the kind cli to snapshot a clusters state, with all objects, and an additional command that will restore the cluster back to that state. I think this could be done with an etcd backup/restore + maybe backup of the images that were 'load'ed

Why is this needed:
We use kind pretty extensively for testing and our kind cluster can take a significant amount of time to set up. Usually when working on tests we only have to reset the cluster once at the beginning. However if a test run fails in a panic our tests cannot properly clean up all the temporary objects that were created and we end up in a state where we either have to go manually find all the random test objects and delete them or just pave the cluster and redo the cluster setup. This can be enormously frustrating.

It would be so much better if we could 'snapshot' the cluster after setup and on panic we could just use the kind cli to restore it back to state.

We've thought about writing our own tooling to fix this situation but maybe other people are suffering from similar issues.

@SleepyBrett SleepyBrett added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 6, 2024
@BenTheElder
Copy link
Member

This would be really complicated and is not in scope for the project.

https://kind.sigs.k8s.io/docs/contributing/project-scope/

There are third party backup tools available, however we strongly recommend treating clusters as disposable, which is the priority for this project. It is best practice to test from clean, repeatable environments.

[...] our kind cluster can take a significant amount of time to set up [...]

Is worth digging into.

I think this could be done with an etcd backup/restore + maybe backup of the images that were 'load'ed

Not enough, because of in-cluster state (status etc, not the yaml specs) that won't be valid across new clusters, e.g. node IP. restore/backup of etcd is moreso suitable for the exact same actual non-disposable cluster.

@SleepyBrett
Copy link
Author

I'm familiar with some of the backup tools for kubernetes and use velero/ark extensively. Velero doesn't work in this case because there is no 'restore to this state', aka delete anything not in this state that i'm aware of. So if i have a cluster with a single configmap, back it up, add another configmap and restore .. i still have a cluster with two configmaps. no bueno.

Not enough, because of in-cluster state (status etc, not the yaml specs) that won't be valid across new clusters, e.g. node IP. restore/backup of etcd is moreso suitable for the exact same actual non-disposable cluster.

Oh, I think the 'backup is only good for the kind cluster it was created for' is acceptable to me. I think the snapshots are tied to the cluster and can't be restored back to another cluster is workable. But also in the past doing etcd restores sure the pods scheduled nodes will be gone, but they just get rescheduled no problem.

In this way I can run a kind create ... .. add some objects kind snapshot ... and then later after some messing around, kind restore ....

I will probably attempt to tool this on my own, just saying it would be a nice-to-have feature

@rinormaloku
Copy link

rinormaloku commented Apr 4, 2024

I am looking for a similar solution. My use case is, I want to test a multi-cluster operator on different Kubernetes versions. Even after multiple optimizations, it takes about 10 minutes to reproduce the environment for testing against a single version because there are many dependencies.

@BenTheElder
Copy link
Member

This is not happening without a detailed design outline and technical solutions, if one is proposed here then we can review and consider it but this is not something I personally have time to work on and it is not well aligned with the scope of the project.

I am doubtful that a feature like this would work reliably or be very maintainable, clusters have a lot of live running state, the backup is something like velero / your original source yaml and images.

10m startup is pretty unfortunate, but that's almost certainly a performance issue in the additional components being brought up, typical application bringup on kind should take at most a few minutes unless severely resource starved.

https://kind.sigs.k8s.io/docs/contributing/getting-started/
https://kind.sigs.k8s.io/docs/contributing/project-scope/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants