Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Cmdlet failures inside a container with error "An error occurred when creating the WebSocket with the factory of type 'CoreClrClientWebSocketFactory'" #3437

Open
hemisphera opened this issue Mar 21, 2024 · 15 comments

Comments

@hemisphera
Copy link
Contributor

Describe the issue
We're getting intermittent failures during Cmdlet execution with the new 'Microsoft.BusinessCentral.xxx.dll' modules for PS7. There does not seem to be a pattern, looks like a race condition, something that happens only under load even. Not really reproduceable. The same execution might fail and then work in a few minutes. The error we get is this:

An error occurred when creating the WebSocket with the factory of type 'CoreClrClientWebSocketFactory'. See the inner exception for details.

Might be related to #3435.

As I said, the container spins up fine. Everything works as expected. After a while the cmdlets executed from inside the container start failing with the message above. Once the error occurred, it keeps repeating. After a while they stop and things work again. No matter what cmdlet we invoke.

We are using the new cmdlets from Admin\Microsoft.BusinessCentral.xxx.dll using PWSH 7.4.1 inside the container. This only seems to happen on our pipelines when building apps with a rather large dependency chain. We therefore cannot even use the PS5 version of the cmdlets because things time out and the build fails for other reasons.

Scripts used to create container and cause the issue

New-BcContainer `
            -containerName <name>
            -accept_eula
            -artifactUrl https://bcpublicpreview.azureedge.net/sandbox/24.0.16410.16790/w1
            -auth 'UserPassword'
            -Credential ...
            -useBestContainerOS
            -allwaysPull
            -updateHosts
            -shortcut None
            -licenseFile ...
            -accept_outdated
            -runSandboxAsOnPrem
            -EnableTaskScheduler:$false
            -dns "8.8.8.8"
            -multitenant:$false
            -Isolation HyperV
            -restart "always"

Full output of scripts

Creating new container 'BC24-PVW'
BcContainerHelper is version 6.0.11
BcContainerHelper is not running as administrator
UsePsSession is True
Host is Microsoft Windows 10 Enterprise - 10.0.19045.4170
Docker Client Version is 24.0.7
Docker Server Version is 24.0.7
Removing Desktop shortcuts
Fetching all docker images
Fetching all docker volumes
Using image mcr.microsoft.com/businesscentral:ltsc2019
Creating Container BC24-PVW
Style: onprem
Multitenant: No
Version: 24.0.16410.16790
Platform: 24.0.16743.0
Generic Tag: 1.0.2.17
Container OS Version: 10.0.17763.5576 (ltsc2019)
Host OS Version: 10.0.19045.4170 (22H2)
Using HyperV isolation
Using locale en-US
Disabling the standard eventlog dump to container log every 2 seconds (use -dumpEventLog to enable)
Using license file C:\xxxxx.bclicense
Additional Parameters:
--network=nat
-e USEFOR="BuildPipes"
--env customNavSettings=EnableTaskScheduler=False
Files in C:\ProgramData\BcContainerHelper\Extensions\BC24-PVW\my:
- AdditionalOutput.ps1
- HelperFunctions.ps1
- license.bclicense
- MainLoop.ps1
- SetupVariables.ps1
- updatehosts.ps1
Creating container BC24-PVW from image mcr.microsoft.com/businesscentral:ltsc2019
e9301a355ef8dccda41b5c5079222460f9f1643108c6d3bfdd6090b4d8f699f1
Waiting for container BC24-PVW to be ready
Using artifactUrl https://bcpublicpreview.azureedge.net/sandbox/24.0.16410.16790/w1
Using installer from C:\Run\240
Installing Business Central: multitenant=False, installOnly=False, filesOnly=False, includeTestToolkit=False, includeTestLibrariesOnly=False, includeTestFrameworkOnly=False, includePerformanceToolkit=False, appArtifactPath=c:\dl\sandbox\24.0.16410.16790\w1, platformArtifactPath=c:\dl\sandbox\24.0.16410.16790\platform, databasePath=c:\dl\sandbox\24.0.16410.16790\w1\BusinessCentral-W1.bak, licenseFilePath=c:\dl\sandbox\24.0.16410.16790\w1\Cronus.bclicense, rebootContainer=True
Installing from artifacts
Starting Local SQL Server
Starting Internet Information Server
Copying Service Tier Files
c:\dl\sandbox\24.0.16410.16790\platform\ServiceTier\Program Files
c:\dl\sandbox\24.0.16410.16790\platform\ServiceTier\System64Folder
Copying Web Client Files
c:\dl\sandbox\24.0.16410.16790\platform\WebClient\Microsoft Dynamics NAV
Copying ModernDev Files
c:\dl\sandbox\24.0.16410.16790\platform
c:\dl\sandbox\24.0.16410.16790\platform\ModernDev\program files\Microsoft Dynamics NAV
Copying additional files
Copying ConfigurationPackages
C:\dl\sandbox\24.0.16410.16790\platform\ConfigurationPackages
Copying Test Assemblies
C:\dl\sandbox\24.0.16410.16790\platform\Test Assemblies
Copying Extensions
C:\dl\sandbox\24.0.16410.16790\w1\Extensions
Copying Applications
C:\dl\sandbox\24.0.16410.16790\platform\Applications
Copying dependencies
Importing PowerShell Modules
Restoring CRONUS Demo Database
Setting CompatibilityLevel for CRONUS on localhost\SQLEXPRESS
Modifying Business Central Service Tier Config File for Docker
Creating Business Central Service Tier
Installing SIP crypto provider: 'C:\Windows\System32\NavSip.dll'
Starting Business Central Service Tier
Importing license file
Stopping Business Central Service Tier
Installation took 135 seconds
Installation complete
Initializing...
Setting host.containerhelper.internal to 172.20.16.1 in container hosts file
Starting Container
Hostname is BC24-PVW
PublicDnsName is BC24-PVW
Using NavUserPassword Authentication
Creating Self Signed Certificate
Self Signed Certificate Thumbprint 473828FC5080153845A054283BD2D4D1A22A6E7E
DNS identity BC24-PVW
Modifying Service Tier Config File with Instance Specific Settings
Modifying Service Tier Config File with settings from environment variable
Setting EnableTaskScheduler to False
Starting Service Tier
Registering event sources
Creating DotNetCore Web Server Instance
Using application pool name: BC
Using default container name: NavWebApplicationContainer
Copy files to WWW root C:\inetpub\wwwroot\BC
Create the application pool BC
Create website: NavWebApplicationContainer without SSL
Update configuration: navsettings.json
Done Configuring Web Client
Using license file 'c:\run\my\license.bclicense'
Import License
Creating http download site
Setting SA Password and enabling SA
Creating admin as SQL User and add to sysadmin
WARNING: This license is not compatible with this version of Business Central.
Creating SUPER user
WARNING: This license is not compatible with this version of Business Central.
WARNING: This license is not compatible with this version of Business Central.
Container IP Address: 172.20.26.156
Container Hostname  : BC24-PVW
Container Dns Name  : BC24-PVW
Web Client          : http://BC24-PVW/BC/
Dev. Server         : http://BC24-PVW
Dev. ServerInstance : BC
Setting BC24-PVW to 172.20.26.156 in host hosts file

Files:
http://BC24-PVW:8080/ALLanguage.vsix

Container Total Physical Memory is 8.5Gb
Container Free Physical Memory is 5.6Gb

Initialization took 48 seconds
Ready for connections!

Creating Container user winrm
Reading CustomSettings.config from BC24-PVW
Cleanup old dotnet core assemblies
Container BC24-PVW successfully created

Use:
Get-BcContainerEventLog -containerName BC24-PVW to retrieve a snapshot of the event log from the container
Get-BcContainerDebugInfo -containerName BC24-PVW to get debug information about the container
Enter-BcContainer -containerName BC24-PVW to open a PowerShell prompt inside the container
Remove-BcContainer -containerName BC24-PVW to remove the container again
docker logs BC24-PVW to retrieve information about URL's again```

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
- does it happen all the time? no
- did it use to work? no
@freddydk
Copy link
Contributor

Are you using BcContainerHelper functions or are you using Invoke-ScriptInBcContainer with your own script?
Or are you using docker exec -it xxx yourself?
Could you try to give it 12G of RAM and see whether this is a memory issue? use -memoryLimit

@freddydk
Copy link
Contributor

In issue #3435 he is also using latest build of Windows 10 (22H2) - 10.0.19045.4170
Have you seen this on other OS' - could you try running with agents on Server 2022 or on Windows 11 machines?

@hemisphera
Copy link
Contributor Author

Are you using BcContainerHelper functions or are you using Invoke-ScriptInBcContainer with your own script? Or are you using docker exec -it xxx yourself?

Neither. We are actually running our own service (.NET8) inside the container that executes the cmdlets. It creates a PS session and loads the module, then executes scripts.

Could you try to give it 12G of RAM and see whether this is a memory issue? use -memoryLimit

Will try and let you know.

Have you seen this on other OS' - could you try running with agents on Server 2022 or on Windows 11 machines?

Yes. At the moment we're having somewhat similar issues on 3 machines, all running the same version. We've seen this also on Windows Server 2016 (10.0.14393) and Windows Server 2019 (10.0.17763.5576). Currently don't have access to any Server 2022 or Windows 11 machines myself, but many of my colleagues use Windows11. I don't have anything reported from them - yet.

@hemisphera
Copy link
Contributor Author

Some more details:
It looks like this might be a load issue. While I am still not really able to reproduce it on my machine, I have enable some debug logging and the results that are coming in are a bit confusing but all point in the same direction:

I've received some instances of The remote party closed the WebSocket connection without completing the close handshake. errors. Looks like the service closes connections? Would align with the original message of failing to set up the websocket.

On a different machine I've received reports of operations timing out because the service crashes. I have looked for details in the container's event log but found none (no usual .NET stacktrace), just the kernel fault.

Faulting application name: Microsoft.Dynamics.Nav.Server.exe, version: 24.0.16743.0, time stamp: 0x65a80000
Faulting module name: KERNELBASE.dll, version: 10.0.17763.5576, time stamp: 0x35679fc4
Exception code: 0xe0434352
Fault offset: 0x00000000000349b9
Faulting process id: 0x7bc
Faulting application start time: 0x01da7b9168810999
Faulting application path: C:\Program Files\Microsoft Dynamics NAV\240\Service\Microsoft.Dynamics.Nav.Server.exe
Faulting module path: C:\Windows\System32\KERNELBASE.dll
Report Id: e6f84387-ea3f-4e14-83ec-9df85102e8eb
Faulting package full name:
Faulting package-relative application ID:

Afterwards the event log is full of messages that the health check failed.

@freddydk
Copy link
Contributor

Yeah - this looks like the Service Tier crashes and then subsequent admin commandlets will fail with coreclrclient... - that makes sense.
Now we just need to figure out why the service tier crashes:-(

@hemisphera
Copy link
Contributor Author

Just an idea, because I have dealt with a similar issue in the past: Too many sockets being consumed by too many HttpClients being created? If I'm not mistaken you have replaced WCF (.NET Fx) with a REST API (.NET8). I don't know the code, but could it be that each time a cmdlet is invoked a HttpClient is spun up and disposed? Something that WCF used to take care of but now obviously can't anymore?

We frequently (very, very frequently) query the apps on the service using Get-NAVAppInfo. And for each app that this cmdlet returns, we then in turn call it again for each single app to get it's deployment details (DataVersion, IsInstalled a.s.o.). So for a normal BC service with 50 apps this amounts to 1+50 cmdlet invocations. Now I do not have exact numbers but say we're running this cmdlet 20 times in 1min, that amounts to 20 * 51 = 1020 cmdlet invocations in 1min. Not so great, but nothing that should make BC crumble.

Maybe it's not even the service that explodes, but having the client (Cmdlet) and server (BC) on the same machine (docker) and the client consuming all the sockets makes the server go boom?

@freddydk
Copy link
Contributor

Tried this:

Invoke-ScriptInBcContainer -containerName bcserver -scriptblock {
   1..100 | % {
     Get-NavAppInfo -serverinstance $serverinstance | % {
       Get-NavAppinfo -serverInstance $serverinstance -id "$($_.AppId)"
     }
   }
}

Doesn't give any problems here.

@hemisphera
Copy link
Contributor Author

I just reproduced this on my machine. Here's the output of the event log:

Type: System.InvalidOperationException
Message: An error occurred when creating the WebSocket with the factory of type 'CoreClrClientWebSocketFactory'. See the inner exception for details.
Source: System.Private.ServiceModel
HResult: -2146233079
StackTrace:
     at System.ServiceModel.Channels.ClientWebSocketTransportDuplexSessionChannel.CreateWebSocketWithFactoryAsync(X509Certificate2 certificate, TimeoutHelper timeoutHelper)
     at System.ServiceModel.Channels.ClientWebSocketTransportDuplexSessionChannel.OnOpenAsync(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.OnOpenAsyncInternal(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.System.ServiceModel.IAsyncCommunicationObject.OpenAsync(TimeSpan timeout)
     at System.ServiceModel.Channels.ServiceChannel.OnOpenAsync(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.OnOpenAsyncInternal(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.System.ServiceModel.IAsyncCommunicationObject.OpenAsync(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.OpenAsyncInternal(TimeSpan timeout)
     at System.ServiceModel.Channels.ServiceChannel.CallOnceManager.CallOnce(TimeSpan timeout, CallOnceManager cascade)
     at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
     at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(MethodInfo targetMethod, Object[] args)
     at generatedProxy_1.OpenAdminConnection(ConnectionRequest)
     at InvokeStub_IAdminService.OpenAdminConnection(Object, Span`1)
     at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
     at generatedProxy_2.OpenAdminConnection(ConnectionRequest)
     at Microsoft.Dynamics.Nav.Management.ServerInstanceConnection.DoOpenAdminConnection(IAdminService adminService) in s:\repo\src\Platform\Management\Prod.Management\ServerInstanceConnection.cs:line 322
     at Microsoft.Dynamics.Nav.Management.ServerInstanceConnection.InternalValidate() in s:\repo\src\Platform\Management\Prod.Management\ServerInstanceConnection.cs:line 212
     at Microsoft.Dynamics.Nav.Management.NavCommand.ProcessRecord() in s:\repo\src\Platform\Management\Prod.Management\NavCommand.cs:line 493
StackTrace:
     at System.ServiceModel.Channels.ClientWebSocketTransportDuplexSessionChannel.CreateWebSocketWithFactoryAsync(X509Certificate2 certificate, TimeoutHelper timeoutHelper)
     at System.ServiceModel.Channels.ClientWebSocketTransportDuplexSessionChannel.OnOpenAsync(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.OnOpenAsyncInternal(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.System.ServiceModel.IAsyncCommunicationObject.OpenAsync(TimeSpan timeout)
     at System.ServiceModel.Channels.ServiceChannel.OnOpenAsync(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.OnOpenAsyncInternal(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.System.ServiceModel.IAsyncCommunicationObject.OpenAsync(TimeSpan timeout)
     at System.ServiceModel.Channels.CommunicationObject.OpenAsyncInternal(TimeSpan timeout)
     at System.ServiceModel.Channels.ServiceChannel.CallOnceManager.CallOnce(TimeSpan timeout, CallOnceManager cascade)
     at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
     at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(MethodInfo targetMethod, Object[] args)
     at generatedProxy_1.OpenAdminConnection(ConnectionRequest)
     at InvokeStub_IAdminService.OpenAdminConnection(Object, Span`1)
     at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
     at generatedProxy_2.OpenAdminConnection(ConnectionRequest)
     at Microsoft.Dynamics.Nav.Management.ServerInstanceConnection.DoOpenAdminConnection(IAdminService adminService) in s:\repo\src\Platform\Management\Prod.Management\ServerInstanceConnection.cs:line 322
     at Microsoft.Dynamics.Nav.Management.ServerInstanceConnection.InternalValidate() in s:\repo\src\Platform\Management\Prod.Management\ServerInstanceConnection.cs:line 212
     at Microsoft.Dynamics.Nav.Management.NavCommand.ProcessRecord() in s:\repo\src\Platform\Management\Prod.Management\NavCommand.cs:line 493
----------------------------------
Type: System.Net.WebSockets.WebSocketException
ErrorCode: 0
WebSocketErrorCode: Faulted
NativeErrorCode: 0
Message: Unable to connect to the remote server
Source: System.Net.WebSockets.Client
HResult: -2147467259
StackTrace:
     at System.Net.WebSockets.WebSocketHandle.ConnectAsync(Uri uri, HttpMessageInvoker invoker, CancellationToken cancellationToken, ClientWebSocketOptions options)
     at System.Net.WebSockets.ClientWebSocket.ConnectAsyncCore(Uri uri, HttpMessageInvoker invoker, CancellationToken cancellationToken)
     at System.ServiceModel.Channels.CoreClrClientWebSocketFactory.CreateWebSocketAsync(Uri address, WebHeaderCollection headers, ICredentials credentials, WebSocketTransportSettings settings, TimeoutHelper timeoutHelper)
     at System.ServiceModel.Channels.ClientWebSocketTransportDuplexSessionChannel.CreateWebSocketWithFactoryAsync(X509Certificate2 certificate, TimeoutHelper timeoutHelper)
StackTrace:
     at System.Net.WebSockets.WebSocketHandle.ConnectAsync(Uri uri, HttpMessageInvoker invoker, CancellationToken cancellationToken, ClientWebSocketOptions options)
     at System.Net.WebSockets.ClientWebSocket.ConnectAsyncCore(Uri uri, HttpMessageInvoker invoker, CancellationToken cancellationToken)
     at System.ServiceModel.Channels.CoreClrClientWebSocketFactory.CreateWebSocketAsync(Uri address, WebHeaderCollection headers, ICredentials credentials, WebSocketTransportSettings settings, TimeoutHelper timeoutHelper)
     at System.ServiceModel.Channels.ClientWebSocketTransportDuplexSessionChannel.CreateWebSocketWithFactoryAsync(X509Certificate2 certificate, TimeoutHelper timeoutHelper)
----------------------------------
Type: System.Net.Http.HttpRequestException
HttpRequestError: ConnectionError
Message: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. (localhost:7086)
Source: System.Net.Http
HResult: -2147467259
StackTrace:
     at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
     at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
     at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
     at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
     at System.Net.WebSockets.WebSocketHandle.ConnectAsync(Uri uri, HttpMessageInvoker invoker, CancellationToken cancellationToken, ClientWebSocketOptions options)
StackTrace:
     at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
     at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
     at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
     at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
     at System.Net.WebSockets.WebSocketHandle.ConnectAsync(Uri uri, HttpMessageInvoker invoker, CancellationToken cancellationToken, ClientWebSocketOptions options)
----------------------------------
Type: System.Net.Sockets.SocketException
Message: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
SocketErrorCode: NoBufferSpaceAvailable
ErrorCode: 10055
NativeErrorCode: 10055
Source: System.Net.Sockets
HResult: -2147467259
StackTrace:
     at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
     at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
     at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
StackTrace:
     at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
     at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
     at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
     at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)

@freddydk
Copy link
Contributor

What did you do?
Ran my script?

@hemisphera
Copy link
Contributor Author

No. I have set memoryLimit to 12GB on a fresh new container and then I ran a part of our build pipeline that builds packages and deploys dependencies.

This time however, the service did not crash. It's still up and running, but I can no longer issue any management cmdlets as they all fail with the same message.

The strange thing is: we're using the same identical setup with the same identical scripts for 3ish years now, never had an issue with them. Only with BC24 it started misbehaving.

I'll try to replicate and log the actual cmdlet calls that are being made and update you.

@freddydk
Copy link
Contributor

A lot of things changed in BC24 in this layer, but this obviously shouldn't fail.
Try

netstat -ab

inside the container - what does that return?

@freddydk
Copy link
Contributor

How are you creating sessions, running commands and removing sessions again?
Using New-PsSession or Start-Job or???

@hemisphera
Copy link
Contributor Author

hemisphera commented Mar 21, 2024

Ran my script?

Now I did. You were almost there. Just iterating up to 100 is not enough, it needs to go up a little more. See below. Mine broke at 145.

netstat -ab inside the container - what does that return?

It fills my screen with tons and tons of lines like the following

[pwsh.exe]
  TCP    [::1]:61233            BC24-PVW:7086          ESTABLISHED

A clean container will have about 12 entries. One that runs the below script will easily reach 32k.

Here's a clean repro:

Enter a container using Open-BcContainer -pwsh $true -containerName <name> and then run the following script:

$ServerInstance = "BC"
$WarningPreference = "SilentlyContinue"
1..1000 | % {
	$sw = [System.Diagnostics.StopWatch]::StartNew()
	Get-NavAppInfo -serverinstance $serverinstance -TenantSpecificProperties -Tenant "default" | % {
		Get-NavAppinfo -serverInstance $serverinstance -Name "$($_.Name)" -Publisher "$($_.Publisher)" -Version "$($_.Version)" -TenantSpecificProperties -Tenant "default" | out-null
	}
	$sw.Stop()
	Write-Host "$_ : $($sw.Elapsed))"
}

Once the socket reaches 65534 the OS breaks. Mine gave in at iteration 145:

Get-NAVAppInfo: C:\Run\my\loadtest.ps1:6
Line |
   6 |          Get-NavAppinfo -serverInstance $serverinstance -Name "$($_.Na …
     |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | An error occurred when creating the WebSocket with the factory of type 'CoreClrClientWebSocketFactory'. See the inner exception for details.

It will eventually release some sockets back to the OS, but way too slow. A development server or container that is heavily deployed against will fatigue fast.

@freddydk
Copy link
Contributor

freddydk commented Mar 21, 2024

I can definitely repro that on Windows 11 / Windows Server 2022 as well:
image

Create a v24 container and run:

Invoke-ScriptInBcContainer -containerName bcserver -scriptblock {
  $WarningPreference = "SilentlyContinue"
  1..1000 | % {
    $sw = [System.Diagnostics.StopWatch]::StartNew()
    Get-NavAppInfo -serverinstance $serverinstance -TenantSpecificProperties -Tenant "default" | % {
      Get-NavAppinfo -serverInstance $serverinstance -Name "$($_.Name)" -Publisher "$($_.Publisher)" -Version "$($_.Version)" -TenantSpecificProperties -Tenant "default" | out-null
    }
    $sw.Stop()
    Write-Host "$_ : $($sw.Elapsed))"
  }
}

@freddydk
Copy link
Contributor

The error is in the CmdLets of BC24 - will check whether a fix will make it into BC24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants