Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod DNS error and Pod DNS spoof litmus tests validations and TOTAL_CHAOS_DURATION issue #564

Open
pawanphalak opened this issue Aug 4, 2022 · 14 comments

Comments

@pawanphalak
Copy link

For Pod DNS error litmus experiment, we followed the steps(https://litmuschaos.github.io/litmus/experiments/categories/pods/pod-dns-error/#ramp-time) to generate a chaos for the target hostname(nginx), we also ran a shell script to validate if the chaos is injected. But we were not able to identify the chaos injection for the application pods, since the hostname DNS was still working during the entire duration of chaos.

We also wanted to debug this more by increasing the TOTAL_CHAOS_DURATION to a higher value(like 300 seconds), but even after increasing the chaos duration, the chaos experiment completes within 30-40 seconds. Can you please confirm if there is any other configuration we can use to increase chaos duration or if we can validate the chaos experiment? We also noticed the similar behavior for POD DNS Spoof experiment.

@gdsoumya
Copy link
Member

gdsoumya commented Aug 4, 2022

Some applications cache DNS results; if the results are cached before the chaos experiment is injected, you will not see the experiment's effects. How did you validate if the DNS error was successful or not? Will be good if you could share that shell script you used and also your chaos engine spec for the experiment

@pawanphalak
Copy link
Author

Thanks @gdsoumya for the response.
I checked for the DNS cache,it is a simple nginx deployment with a cluster ip service named nginx. And I tested it with following script :

#!/bin/bash
while :
do
    curl [nginx](http://google.com/) >> /tmp/outputtrace.log && sleep 0.01;
    curl -LI [nginx](http://google.com/) -o /dev/null -w '%{http_code}\n' -s >> /tmp/outputstatus.log && sleep 0.01;
done

Following is the chaos engine spec for the experiment:

apiVersion: [litmuschaos.io/v1alpha1](http://litmuschaos.io/v1alpha1)
kind: ChaosEngine
metadata:
  name: dns-error
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: pod-dns-error-sa
  jobCleanUpPolicy:  retain
  experiments:
  - name: pod-dns-error
    spec:
      components:
        env:
        - name: CONTAINER_RUNTIME
          value: containerd
        ## comma separated list of host names
        ## if not provided, all hostnames/domains will be targeted
        - name: TARGET_HOSTNAMES
          value: '["nginx"]'
        - name: TOTAL_CHAOS_DURATION
          value: '500'

@gdsoumya
Copy link
Member

gdsoumya commented Aug 4, 2022

Are you running the DNS chaos on the nginx pod or on some other pod that is accessing to nginx? Also maybe try using the fully hostname for the service like <svc-name>.<namespace>.svc.cluster.local

@pawanphalak
Copy link
Author

Running the DNS chaos on the same nginx pod.
Tried using complete hostname as well, but didnt get any DNS error. I also validated with external hostname like google.com as target hostname but got all 200 responses.

@gdsoumya
Copy link
Member

gdsoumya commented Aug 5, 2022

If you run the experiment on the same nginx pod then it might not show the effect properly, you need to run it on the pod where you want the DNS requests to fail for example start a new pod and run the chaos on that and try using dig/curl in that pod to access nginx. Any other domain/host will not be affected because you mentioned target as just nginx so google.com will not be affected.

Also just confirming that you updated the chaos engine with the full service hostname?

@pawanphalak
Copy link
Author

We specify the app on which the chaos should run as following in the chaos engine spec right?

appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"

So in this case the expected behavior should be that we should see the DNS errors for the hostnames specified in TARGET_HOSTNAMES, when we try to run the curl from pods itself?

Also just confirming that you updated the chaos engine with the full service hostname? yes, I also tried removing the TARGET_HOSTNAMES variable completely(which should target all hostnames) but not able to validate the chaos.

@gdsoumya
Copy link
Member

gdsoumya commented Aug 5, 2022

which container env are you using? is it containerd?

@pawanphalak
Copy link
Author

yes containerd.

@gdsoumya
Copy link
Member

gdsoumya commented Aug 5, 2022

yes containerd.

containerd has had some issues with DNS chaos can you check the logs of the helper pod and confirm if there are any errors being reported in there.

@pawanphalak
Copy link
Author

The helper pod is getting deleted immediately. I tried this experiment multiple times, sometimes the helper pods also went in the error state but the final chaos result was still showing as passed.
Is there any configuration to keep the helper pods running?

@gdsoumya
Copy link
Member

gdsoumya commented Aug 5, 2022

Is it getting deleted immediately as soon as the experiment starts?

@pawanphalak
Copy link
Author

yes

@pawanphalak
Copy link
Author

@gdsoumya , I just created a new cluster with docker runtime and the DNS chaos worked as expected. It looks like it has some issues with containerd.
Also just wanted to confirm, does the chaos gets injected in any one of the pods, if we have multiple replica of the target application on which we are trying to perform chaos?
My observation was that I saw DNS errors in only one pod out of the 2 pods present.

@gdsoumya
Copy link
Member

gdsoumya commented Aug 5, 2022

it should affect all pods as far as I know, can you set the pod affect percentage to 100% and see. Tagging @ispeakc0de for further support on pods affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants