Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI : default_cert test fails way too often / randomly on GitHub Actions #786

Open
buchdag opened this issue Apr 26, 2021 · 4 comments
Open
Labels
status/help-wanted Help on this issue / PR would be welcome type/ci PR that change the CI configuration files and scripts

Comments

@buchdag
Copy link
Member

buchdag commented Apr 26, 2021

The default_cert test is failing so often and so randomly on GitHub Actions that I had to remove it, as GHA does not really have a "allow failure" like Travis or other CI systems, nor a mechanism to restart single tests, which mean I end up restarting the whole tests run sometimes 6 or 8 times in a row.

The test appears to be fine on local but I haven't used local tests very often lastly so I'm not 100% sure.

I tried to double the timeout before the test fails (60 to 120 seconds) to no avail, and I doubt this was a timeout issue to begin with. It might very well be an issue with the feature itself.

If anyone is willing to investigate this, help would be appreciated.

@buchdag buchdag added status/help-wanted Help on this issue / PR would be welcome type/ci PR that change the CI configuration files and scripts labels Apr 26, 2021
@polarathene
Copy link
Contributor

GHA does not really have a "allow failure" like Travis or other CI systems

Yes it does, see how I've done so here. Use continue-on-error: true. Another approach is handled in a separate workflow, that always ensures a step is run with if: ${{ always() }}.

nor a mechanism to restart single tests, which mean I end up restarting the whole tests run sometimes 6 or 8 times in a row.

I believe that's possible, but it's not something I've tried myself. If you can return output about the failure and what test needs to be run again, I think that can be used to trigger a re-run with the returned failures as new input for the test to only cover.

Probably similar to how I've got a workflow split into two workflows (build and deploy for PR doc previews), the 2nd part is only triggered when the 1st part has completed successfully. Note the job condition: if: ${{ github.event.workflow_run.event == 'pull_request' && github.event.workflow_run.conclusion == 'success' }}, while the 1st part of the split workflow also ensures stale runs are canceled (new commit pushed for a PR running a preview docs build).


If anyone is willing to investigate this, help would be appreciated.

I don't have time myself to contribute towards that, but I can say that I've found using bats to be pretty great for running shell script based tests. I'm slowly refactoring our test-suite, but a good example that I recently covered was our test for DH params.

We have a variety of helper functions that you're welcome to use :)

@buchdag
Copy link
Member Author

buchdag commented Sep 30, 2021

Thanks for the tips @polarathene, I'll look into all of this 😃

@polarathene
Copy link
Contributor

While working on a PR for nginx-proxy, I noticed they migrated away from bats test suite to one with python. Since you're maintaining both from what I understand, perhaps that'd make sense to adopt here (if you or any other maintainer ever does find the time to rewrite/port the tests).

@buchdag
Copy link
Member Author

buchdag commented Oct 1, 2021

Thats pretty much what I had in mind long term (migrating to pytest instead of the jury-rigged bash test suite I wrote), but yeah that will be some heavy work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/help-wanted Help on this issue / PR would be welcome type/ci PR that change the CI configuration files and scripts
Projects
None yet
Development

No branches or pull requests

2 participants