checkPodCount ends preemptively when 0 pods remain after pod killing #331

paigerube14 · 2021-06-25T20:08:56Z

After killing the only pod that is running in your certain namespace, the checkPodCount ends incorrectly before a pod comes back and is running again.
I would expect the checkPodCount to continue for the entire duration of the timeout before passing or failing.

Sample scenario yaml:

config:
  runStrategy:
    runs: 1
    maxSecondsBetweenRuns: 30
    minSecondsBetweenRuns: 1
scenarios:
  - name: "delete etcd pods"
    steps:
    - podAction:
        matches:
          - labels:
              namespace: "etcd"
              selector: "k8s-app=etcd"
        filters:
          - randomSample:
              size: 1
        actions:
          - kill:
              probability: 1
              force: true
    - podAction:
        matches:
          - labels:
              namespace: "etcd"
              selector: "k8s-app=etcd"
        retries:
          retriesTimeout:
            timeout: 180

        actions:
          - checkPodCount:
              count: 1

Output:

2021-06-25 19:16:09 INFO __main__ No cloud driver - some functionality disabled
2021-06-25 19:16:09 INFO __main__ Using stdout metrics collector
2021-06-25 19:16:09 INFO __main__ NOT starting the UI server
2021-06-25 19:16:09 INFO __main__ STARTING AUTONOMOUS MODE
2021-06-25 19:16:12 INFO scenario.delete etcd pod Starting scenario 'delete etcd pods' (2 steps)
2021-06-25 19:16:12 INFO action_nodes_pods.delete etcd pod Matching 'labels' {'labels': {'namespace': 'etcd', 'selector': 'k8s-app=etcd'}}
2021-06-25 19:16:12 INFO action_nodes_pods.delete etcd pod Matched 1 pods for selector k8s-app=etcd in namespace etcd
2021-06-25 19:16:12 INFO action_nodes_pods.delete etcd pod Initial set length: 1
2021-06-25 19:16:12 INFO action_nodes_pods.delete etcd pod Filtered set length: 1
2021-06-25 19:16:12 INFO action_nodes_pods.delete etcd pod Pod killed: [pod #0 name=etcd-master-00.qe-pr-sno2.qe.devcluster.openshift.com namespace=etcd containers=4state=Running labels:app=etcd,etcd=true,k8s-app=etcd,revision=2 annotations:kubernetes.io/config.hash=*,kubernetes.io/config.seen=2021-06-25T14:30:12.819685290Z,kubernetes.io/config.source=file,target.workload.openshift.io/management={"effect": "PreferredDuringScheduling"}]
2021-06-25 19:16:12 INFO action_nodes_pods.delete etcd pod Matching 'labels' {'labels': {'namespace': 'etcd', 'selector': 'k8s-app=etcd'}}
2021-06-25 19:16:12 INFO action_nodes_pods.delete etcd pod Matched 0 pods for selector k8s-app=etcd in namespace etcd
2021-06-25 19:16:12 INFO action_nodes_pods.delete etcd pod Initial set length: 0
2021-06-25 19:16:12 INFO scenario.delete etcd pod Scenario finished
2021-06-25 19:16:12 INFO policy_runner All done here!

The text was updated successfully, but these errors were encountered:

chaitanyaenr · 2021-06-25T20:58:07Z

@seeker89 PTAL when you get time. Thanks.

jcstanaway · 2021-06-28T18:16:40Z

Per the documentation, retries specifies "An object of retry criteria to rerun set actions". As the actions are only performed on matched pods which passed the filter criteria, and there were zero such pods at the moment that matches was evaluated, the actions are never run.

I'd suggest inserting a waitAction prior to the second podAction.

paigerube14 · 2021-09-16T18:10:33Z

In this case the waitAction is not super helpful because I would have to guess when the pod comes back which is the whole point of the retries in the podAction. The retires in the podAction should be used to verify the number of pods that exist. If 0 pods exist at the current time it should still wait until the time limit or retry count before failing.

paigerube14 linked a pull request Jun 30, 2021 that will close this issue

Adding update to retry if 0 filtered found #332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkPodCount ends preemptively when 0 pods remain after pod killing #331

checkPodCount ends preemptively when 0 pods remain after pod killing #331

paigerube14 commented Jun 25, 2021

chaitanyaenr commented Jun 25, 2021

jcstanaway commented Jun 28, 2021

paigerube14 commented Sep 16, 2021

checkPodCount ends preemptively when 0 pods remain after pod killing #331

checkPodCount ends preemptively when 0 pods remain after pod killing #331

Comments

paigerube14 commented Jun 25, 2021

chaitanyaenr commented Jun 25, 2021

jcstanaway commented Jun 28, 2021

paigerube14 commented Sep 16, 2021