Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_all --rerun-failed #4523

Closed
kodebach opened this issue Oct 8, 2022 · 4 comments
Closed

run_all --rerun-failed #4523

kodebach opened this issue Oct 8, 2022 · 4 comments
Labels

Comments

@kodebach
Copy link
Member

kodebach commented Oct 8, 2022

Our run_all script for make run_all (scripts/dev/run_all.in) includes an additional ctest $ARGS --rerun-failed at the end.

This was introduced as a workaround for #2439. However, this also introduces some issues. For example, the end of the output of make run_all will look something like this:

0% tests passed, 4 tests failed out of 4

Label Time Summary:
bindings    =   1.63 sec*proc (6 tests)
kdbtests    =  31.84 sec*proc (22 tests)
memleak     =  31.11 sec*proc (21 tests)

Total Test time (real) =  30.37 sec

The following tests FAILED:
          2 - testtool_backend (Failed)
         20 - testlib_notification (Failed)
         34 - testjna_gradle (Failed)
         38 - test_kdb.py (Failed)

The "4 tests faile out of 4" can be quite confusing, if you don't know about the --rerun-failed. To find the actual ratio of failed tests, you need to scroll up all the away beyond the output of the second run, which might be longer than your terminal's scroll buffer.

Another problem is that this can actually mask the real cause of a problem, if you only look at the output of the second run. That's because sometimes fail, because I previous (also failed test) didn't clean up properly. If you then do a rerun you might not see the true errors, because even the test that originally didn't clean up properly, may now fail differently.

Finally, the most annoying problem IMO is that the rerun means it takes quite a bit longer for make run_all to finish if more than one or two tests fail.

I propose we do one of 2 things:

  1. Remove ctest $ARGS --rerun-failed entirely. In Build Jobs: Parts of Test Suite Fail Regularly #2439 the speculation was that the actual cause of the flaky tests was server overload. This may no be better and/or we might have a way to limit the load.
  2. Add an AUTO_RERUN env-var flag to run_all that is set in the CI. Only when the env-var is set, will we do ctest $ARGS --rerun-failed.
@markus2330
Copy link
Contributor

First: I don't think that FLOSS participants can do anything useful here, we need someone working on this as bachelor thesis.

Second: I don't see how this AUTO_RERUN should help? Is it about that you want to be able to trigger jobs without the --rerun-failed manually? Please always say what you expect/want first. There is a good reason why we have the issue template.

@kodebach
Copy link
Member Author

There is a good reason why we have the issue template.

It is clearly aimed at bug reports. If you want more structure in other issues, we should create more templates. GitHub then lets you choose the template. See https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/configuring-issue-templates-for-your-repository

Please always say what you expect/want first.

I thought, it was clear that I don't want the second output (which I called confusing) to happend. And I expect the tests to just run once.

I don't think that FLOSS participants can do anything useful here, we need someone working on this as bachelor thesis.

This was not necessarily intended for FLOSS. Also not everything must be handled as part of a thesis or some course. This is simply something I noticed and wanted to bring up (that's what issues on FLOSS projects are for after all). During the last days on #4187, this was so annoying that I temporarily modified the run_all script to remove the second run.

However, I think the issue is simple enough for FLOSS. You just need to modify the run_all script to look for AUTO_RERUN and the Jenkinsfile and other CI config files to set AUTO_RERUN.

I don't see how this AUTO_RERUN should help? Is it about that you want to be able to trigger jobs without the --rerun-failed manually?

The purpose of AUTO_RERUN is that when you run make run_all locally, all tests are just executed once. You don't waste time waiting for the second run, and you see the actual ratio of tests that failed, without scrolling up past all the failed tests.

In the CI nothing would change. AUTO_RERUN would be set and we would still rerun the failed tests to mask flaky tests.

@github-actions
Copy link

I mark this stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@github-actions github-actions bot added the stale label Oct 16, 2023
Copy link

github-actions bot commented Nov 7, 2023

I closed this now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants