Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate new ci-artifacts flakiness #81

Open
dscho opened this issue Feb 20, 2024 · 8 comments
Open

Investigate new ci-artifacts flakiness #81

dscho opened this issue Feb 20, 2024 · 8 comments

Comments

@dscho
Copy link
Member

dscho commented Feb 20, 2024

As of https://github.com/git-for-windows/git-sdk-64/actions/runs/7895871817/job/21548925651, it seems that there is a flaky problem. The symptom looks like this:

    [...]
    LINK scalar.exe
strip  headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe git.exe
D:\a\git-sdk-64\git-sdk-64\minimal-sdk\mingw64\bin\strip.exe: unable to copy file 'git.exe'; reason: Permission denied
make: *** [Makefile:2376: strip] Error 1
make: *** Waiting for unfinished jobs....
make: Leaving directory '/d/a/git-sdk-64/git'
Error: Process completed with exit code 2.

This problem usually goes away after re-running a couple of times (once I had to re-run 3 times to make it succeed).

The lucky thing is that the strip Makefile rule is apparently not used in git/git's own CI, therefore things don't fail there (which would be disastrous). So we do not need to drop everything and fix this Right Now, but it needs to be fixed.

Now, the commit corresponding to the first build that exhibited the problem is 863c871. Contrary to what I first thought, that commit did not update the MSYS2 runtime. That update came in the next commit.

Comparing the first failing job with the corresponding job of the previous build, I see in the Set up job step that the runner version changed, from v2.312.0 to v2.313.0. But I don't see any obvious culprit in that version's release notes.

Also in the Set up job step, I see a difference in the runner image (but not in the Windows version), but the corresponding diff also does not shed any light into the issue.

It is possible, of course, that the previous build succeeded on first attempt due to flakiness rather than by virtue of being non-flaky. More investigation is needed here.

@dscho
Copy link
Member Author

dscho commented Feb 22, 2024

@dscho
Copy link
Member Author

dscho commented Feb 25, 2024

The latest three runs worked without any need for re-runs. May have been an overzealous Defender... I'll give it another week, and if there are no other instances of this flake, I'll close this ticket.

@dscho
Copy link
Member Author

dscho commented Feb 29, 2024

The problem is back.

@dscho
Copy link
Member Author

dscho commented Mar 4, 2024

The error happened today, too. Here are the latest ci-artifacts runs (starting with the first one where I did not try to re-run to turn the build green):

Image

@dscho
Copy link
Member Author

dscho commented Mar 22, 2024

After 7 consecutive successful runs, it happened again.

@dscho
Copy link
Member Author

dscho commented Apr 11, 2024

After 8 consecutive successful runs, it happened again.

@dscho
Copy link
Member Author

dscho commented Apr 14, 2024

After only one successful run, there was another failing one.

@dscho
Copy link
Member Author

dscho commented Apr 21, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant