Remove `build-create-binary-log` and `dotnet-test-process-timeout` configurations from OSOCE workflows (OSOE-834) #736

dministro · 2024-04-02T20:20:47Z

After Lombiq/UI-Testing-Toolbox#228, remove the build-create-binary-log and dotnet-test-process-timeout configurations from OSOCE workflows.

Jira issue

The text was updated successfully, but these errors were encountered:

Piedone · 2024-04-03T00:06:13Z

As per Lombiq/UI-Testing-Toolbox#228 (comment):

Lets keep the current configuration for a whilw (1-2 months).

Then we can also remove it from Lombiq Tenants Core.

Piedone · 2024-05-07T00:28:29Z

I again see random, but frequent Windows hangs. E.g. the two runs here: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/8976025812?pr=746. These hang here, only telling me that a PS command is hanging, which is the dotnet test one:

Then this same code worked here and here, while getting stuck here and here (I disabled hand timeouts deliberately for these two to see what happens).

Do you perhaps have any suggestions?

dministro · 2024-05-07T18:20:18Z

Twice in a line, it was really rare. I can say.

As I see in the dump, the AsyncStreamReader.ReadBufferAsync is waiting to read something from stdout or stderr of the dotnet process. The dotnet process was not dumped so possibly it was finished before the timeout expired.

The dotnet process was exited or the ::warning::The process $($process.Id) didn't exit in $Timeout seconds. was not logged for an unknown reason.

I don't know if it is possible, but maybe this can be caused by a special character or control character coming out earlier on the given stream.

I'll look into the logs and dumps if it happens again. Please report similar failures under this issue.

Piedone · 2024-05-07T20:48:27Z

Thank you for looking into this! How do you get to see what you show in your first screenshot? This is all I see when opening the dump (Debug with Mixed):

I've done three test runs, with hang timeouts: the first one passed, the other two hang, see here and here. The hang dumb has the same.

These are for some new code, see #746, that might cause new issues too (I don't see why). And there's no output from the hanging test run.

But! If I skip TimeoutTests, then I get 4 passes in a row (two runs here, then here and here). @sarahelsaig does this ring a bell? Something in the timeout config seems to get tests to stuck.

dministro · 2024-05-07T21:04:23Z

You can turn on the Tasks window by selecting Debug/Windows/Tasks item. See:

Once you have the Tasks window double click on the line selected below:

Also Threads, Watch N, and Call Stack windows come in handy in similar situations.

I'll look at the logs latest tomorrow.

Piedone · 2024-05-07T22:04:52Z

Ah, thanks!

Looking at these, there's a bunch of Applications Insights references, what seems quite strange (however, no stack trace beyond a depth of 1, so I don't see where those are actually coming from). In any case, I removed the Application Insights module of ours being enabled during tests. It resulted in two passes (here and here, and two hangs (here and here). These still include AI references in the same way. So, it's not our AI, but something by GitHub.

I also did some test runs with reruns of what we have in dev (here and here), and they passed (though the second one failed with an "ENOENT: no such file or directory..." error, that's a problem with the artifact upload).

Piedone · 2024-05-09T20:27:31Z

BTW it's always the Lombiq.Tests.UI.Samples that hangs. And there's no output, though I thought we'd have that now with streaming test outputs. Locally I can see the same behavior when running Invoke-SolutionTests: no output from a test project until the next one starts.

Having the test run's output show up in the console output would help immensely to pinpoint where this goes wrong.

Piedone · 2024-05-09T23:05:09Z

That may be an xUnit/dotnet test issue; I've opened xunit/xunit#2925.

Piedone · 2024-05-10T12:21:25Z

That's something xUnit needs to add: xunit/visualstudio.xunit#408.

Piedone · 2024-05-10T17:55:47Z

I added a debug log file that gets updated immediately on every ITestOutputHelper write in the FailuredDumps, and observed a hang here: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9033609153. That log and the workflow output both show that the tests in Lombiq.OSOCE.Tests.UI finished (49 completed, as it should be), but still we get test failed:

2024-05-10T14:42:10.5686978Z Results File: C:\a\Open-Source-Orchard-Core-Extensions\Open-Source-Orchard-Core-Extensions\test\Lombiq.OSOCE.Tests.UI\TestResults\test-results.trx
2024-05-10T14:42:10.5687128Z 
2024-05-10T14:42:10.5688680Z Test Run Successful.
2024-05-10T14:42:10.5689829Z      Passed: 49
2024-05-10T14:42:10.5691093Z Total tests: 49
2024-05-10T14:42:10.5692141Z  Total time: 3.5414 Minutes
2024-05-10T14:42:10.5713126Z Test failed: test\Lombiq.OSOCE.Tests.UI\Lombiq.OSOCE.Tests.UI.csproj
2024-05-10T14:42:10.6026845Z ##[error]Process completed with exit code 1.
2024-05-10T14:42:10.6151953Z ##[group]Run Merge-FailureDumps -Directory "."
2024-05-10T14:42:10.6152088Z �[36;1mMerge-FailureDumps -Directory "."�[0m
2024-05-10T14:42:10.6183772Z shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'"
2024-05-10T14:42:10.6183861Z env:
2024-05-10T14:42:10.6184153Z   CHECKOUT_TOKEN: ***
2024-05-10T14:42:10.6184278Z   DOTNET_SKIP_FIRST_TIME_EXPERIENCE: true
2024-05-10T14:42:10.6184386Z   DOTNET_CLI_TELEMETRY_OPTOUT: true
2024-05-10T14:42:10.6184494Z   DOTNET_ROOT: C:\Program Files\dotnet
2024-05-10T14:42:10.6184592Z   DOTNET_SDK_TO_USE: 8.0.204
2024-05-10T14:42:10.6184869Z   Lombiq_Tests_UI__OrchardCoreUITestExecutorConfiguration__MaxParallelTests: 0
2024-05-10T14:42:10.6184946Z ##[endgroup]
2024-05-10T14:42:10.8788979Z 
2024-05-10T14:42:10.8795785Z     Directory: C:\a\Open-Source-Orchard-Core-Extensions\Open-Source-Orchard-Core-Extensions
2024-05-10T14:42:10.8796453Z 
2024-05-10T14:42:10.8797334Z �[32;1mMode   �[0m�[32;1m              LastWriteTime�[0m �[32;1;3m        Length�[0m�[32;1m Name�[0m
2024-05-10T14:42:10.8797934Z �[32;1m----   �[0m �[32;1m             -------------�[0m �[32;1m        ------�[0m �[32;1m----�[0m
2024-05-10T14:42:10.8944776Z d----           5/10/2024  2:42 PM                �[44;1mFailureDumps�[0m

This is with dotnet-test-process-timeout: 600000 and blame-hang-timeout being empty. It looks like some kind of premature task kill to me. Also, Write-Output in StartProcessAndWaitForExit won't work, since in PS output isn't surfaced from functions for some reason. I've fixed that with | Out-Host. Then something strange happend, as you can see under this build:

Lombiq.OSOCE.Tests.UI ran to completion with "Test Run Successful.", Lombiq.Tests.UI.Samples didn't start.
But still, "Test failed: test\Lombiq.OSOCE.Tests.UI\Lombiq.OSOCE.Tests.UI.csproj ##[error]Process completed with exit code 1."
So, it seems that this is not an actual hang but somehow the dotnest test process didn't exit, or didn't exit with the exit code 0, or somehow this line stopped working.

Worth noting that previously, I've seen different behaviors with actual hangs, where a test project was started but didn't finish. Not this time though.

Piedone · 2024-05-13T01:26:10Z

It was a PS syntax issue: https://github.com/Lombiq/GitHub-Actions/pull/346/files#diff-c9181b664fb498c4bf7694492ba4c67f5c9fe31a8b86be446ca2a80b0fdde3c6L206 After that and adjusting the timeouts, it seems OK, e.g.: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9055050994/attempts/1. And randomly the whole build can be horribly slow, but still succeed despite the test process being killed: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/9055173471/attempts/1. So, this looks good now!

Piedone · 2024-05-13T20:51:05Z

I think why hangs became more frequent, also generally, is because we switched from GitRunners to GitHub Windows runners after the former shut down. And the performance of these varies wildly, with even the build taking between 5 and 10 minutes. So, if we by chance get a worse runner, then tests will be slower too.

dministro · 2024-05-13T20:56:59Z

I think why hangs became more frequent, also generally, is because we switched from GitRunners to GitHub Windows runners after the former shut down. And the performance of these varies wildly, with even the build taking between 5 and 10 minutes. So, if we by chance get a worse runner, then tests will be slower too.

This is what I tried to point here and here.

Piedone · 2024-05-13T21:10:18Z

OK then!

dministro self-assigned this Apr 2, 2024

github-actions bot changed the title ~~Remove build-create-binary-log and dotnet-test-process-timeout configurations from OSOCE workflows~~ Remove build-create-binary-log and dotnet-test-process-timeout configurations from OSOCE workflows (OSOE-834) Apr 2, 2024

dministro mentioned this issue Apr 2, 2024

OSOCE NuGetTest UI testing randomly getting stuck under Windows (OSOE-464) Lombiq/UI-Testing-Toolbox#228

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `build-create-binary-log` and `dotnet-test-process-timeout` configurations from OSOCE workflows (OSOE-834) #736

Remove `build-create-binary-log` and `dotnet-test-process-timeout` configurations from OSOCE workflows (OSOE-834) #736

dministro commented Apr 2, 2024 •

edited by github-actions bot

Piedone commented Apr 3, 2024

Piedone commented May 7, 2024

dministro commented May 7, 2024

Piedone commented May 7, 2024

dministro commented May 7, 2024

Piedone commented May 7, 2024

Piedone commented May 9, 2024

Piedone commented May 9, 2024

Piedone commented May 10, 2024

Piedone commented May 10, 2024

Piedone commented May 13, 2024

Piedone commented May 13, 2024

dministro commented May 13, 2024

Piedone commented May 13, 2024

Remove build-create-binary-log and dotnet-test-process-timeout configurations from OSOCE workflows (OSOE-834) #736

Remove build-create-binary-log and dotnet-test-process-timeout configurations from OSOCE workflows (OSOE-834) #736

Comments

dministro commented Apr 2, 2024 • edited by github-actions bot

Piedone commented Apr 3, 2024

Piedone commented May 7, 2024

dministro commented May 7, 2024

Piedone commented May 7, 2024

dministro commented May 7, 2024

Piedone commented May 7, 2024

Piedone commented May 9, 2024

Piedone commented May 9, 2024

Piedone commented May 10, 2024

Piedone commented May 10, 2024

Piedone commented May 13, 2024

Piedone commented May 13, 2024

dministro commented May 13, 2024

Piedone commented May 13, 2024

Remove `build-create-binary-log` and `dotnet-test-process-timeout` configurations from OSOCE workflows (OSOE-834) #736

Remove `build-create-binary-log` and `dotnet-test-process-timeout` configurations from OSOCE workflows (OSOE-834) #736

dministro commented Apr 2, 2024 •

edited by github-actions bot