Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does chaoskube really kill the pods? #103

Open
ljanatka opened this issue Sep 5, 2018 · 11 comments · Fixed by #104
Open

Does chaoskube really kill the pods? #103

ljanatka opened this issue Sep 5, 2018 · 11 comments · Fixed by #104

Comments

@ljanatka
Copy link

ljanatka commented Sep 5, 2018

Hi Martin,

I am currently working on a project where we are trying to improve reliability of our software via using chaos engineering (but, unfortunately, have a very little experience with it). Currently, our software runs on Azure/Kubernetes.

We found chaoskube as a promising tool to help us, but we found out, that it's behavior is different than expected. In the description of chaoskube, there is an information that it kills the pods, so I created a hypothesis about what will happen when one of our pods will just be dealing with a request when it is killed (there should be an error response and next requests should be processed by the other pod). When I started the experiment, the pods were killed but no error occured.

Then one of my colleagues looked in the source code of chaoskube and found out, that the pod is not killed (i.e. force killed instantly), but rather terminated (if I got it correctly, then by using this approach, the pod finishes dealing with it's current task and then "dies" peacefuly).

Is this really how chaoskube works?

We are learning more about chaos every day, but there is a lot of knowledge that we need to gain.

Since my hypothesis was probably wrong, I would be really graceful for any advice about what other chaos experiments is chaoskube suitable for.

Thank You,

Ladislav

@palmerabollo
Copy link
Contributor

palmerabollo commented Sep 26, 2018

This is a very good question. I also assumed that chaoskube was killing the pods. I think killing a pod instead of terminating it is be the best option, because "graceful shutdowns" rarely happen on production environments :)

Would it be possible to at least include a flag to choose the behaviour you want (kill vs terminate)? I'm thinking about adding a configurable gracePeriod in the call to delete the pod. Sounds good?

@linki
Copy link
Owner

linki commented Sep 27, 2018

@ljanatka @palmerabollo I agree. There's already a pull request for it by @jakewins: #104. It would help me a lot if you would also have a look and leave some feedback.

@ljanatka
Copy link
Author

@linki the #104 pull request seems to be marked as failing in CI build ...

@linki
Copy link
Owner

linki commented Oct 22, 2018

@ljanatka I just fixed it in case you want to give it a try again.

@ljanatka
Copy link
Author

@linki Hi, we finally got to give it a try.
As far as I know, it works quite well. The pod gets killed from the inside, the cluster detects this and restarts it (restart counter of given pod increases, new instance of the pod is not being created).

@linki
Copy link
Owner

linki commented Nov 19, 2018

@ljanatka Thanks for checking it out!

@ljanatka
Copy link
Author

@linki Hi, was my test enough to merge this "hardkill" feature into new version of chaoskube? When do You expect the new version to be released?
Thanks!

@linki
Copy link
Owner

linki commented Jan 14, 2019

@ljanatka I'm not sure. I want to refactor it a bit before merging and I have a work-in-progress branch for it.

@jakewins has a fork of chaoskube where this is merged. You could try using it in the meantime.

@ljanatka
Copy link
Author

Hi @linki

from the release notes it seems that chaoskube now can "hardkill" the pods. However I did not find any switch that would activate this feature. Or is the hardkill now implemented as default kill method?

Thanks!

@linki
Copy link
Owner

linki commented May 28, 2019

Hi @ljanatka,

https://github.com/linki/chaoskube/releases/tag/v0.12.1 extracted the current strategy into a separate object behind an interface in order to make it easier to add more ways to terminate a pods.

The actual "termination-by-kill" termination strategy from the original PR hasn't been ported over yet.

@dbsanfte
Copy link

It's been quite awhile now since this feature was requested and I see some refactoring was done. Is there any chance this could be looked at again soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants