Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further troubleshoot why the first unittest to run on a new VM can take longer than it should #1316

Open
paulcallen opened this issue May 2, 2022 · 0 comments
Labels
severity/moderate Severity: Moderate status/triaged Status: Triaged

Comments

@paulcallen
Copy link
Member

RE: #1313
During the investigations of a flakey unittest during pr #1313 it was discovered that the very first unittest to run on a new CI image in the PR pipeline this test would often take significantly longer than it should. The test/fork tests have 3 build dependencies for running tests. The first has three tests within it. These three tests are more or less doing the same thing, just testing very minor variances. The test itself calls a syscall that is disabled and so the test should get an error. The first test, however, could take sometimes many minutes to run where the other two complete very quickly. All three should take the same amount of time.
Investigations were done such that it was discovered the hanging was taking place even before the main() function in the myst tool itself was executed.
The script/runtest script does multiple things before finally executing myst, launching it through the timeout command in case it hangs. Sometimes this would generate a timeout and fail as a result of the timeout command.
With the script executing and the main() function not getting executed the only conclusion so far is that something during the CRT initialization is hanging. This initialization can include the initialization of any shared libraries and any global C++ constructors getting executed. No such initializations happen within the myst tool itself, so it can only be assumed that something is hanging within either a shared library that is being pulled in or a static library that is being linked.
The PR itself just increases the timeout of the very first test to allow more time for the unknown pause at the start to continue.
This issue is to try and track down further what is actually causing the pause so the underlying problem can be solved.

@paulcallen paulcallen added severity/moderate Severity: Moderate status/triaged Status: Triaged labels May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/moderate Severity: Moderate status/triaged Status: Triaged
Projects
None yet
Development

No branches or pull requests

1 participant