[Bug]: Operation timeout with creation of 6+ shards #2743

nikita-petko · 2023-07-29T03:11:00Z

Check The Docs

I double checked the docs and couldn't find any useful information.

Verify Issue Source

I verified the issue was caused by Discord.Net.

Check your intents

I double checked that I have the required intents.

Description

Note: Please view edit history to see the original purpose of this issue.

If you create a sharded client with 6 or more shards, at around the 6th shard, all shards are prevented from running. The bot will continue to receive dispatches but will be unable to actually process the events (debug logging notes the dispatches but handlers are not being invoked).

This may relate to one of my old issues: #2126

Version

3.11.0

Working Version

No response

Logs

[2023-08-01T03:43:27.3327Z][0029][][KVEX-WIN-234][bot][ERROR] DiscordInternal-EXCEPTION-Shard #6:
Error Type: System.TimeoutException
Error Detail: The operation has timed out.
Inner Exception:
Exception Stack Trace:
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
Exception Source: mscorlib
Exception TargetSite: Void Throw()
Exception Data: System.Collections.ListDictionaryInternal

Sample

Instantiation

var client = new DiscordShardedClient(
	new DiscordSocketConfig
	{
		GatewayIntents =
			GatewayIntents.GuildMessages
			| GatewayIntents.DirectMessages
			| GatewayIntents.Guilds
			| GatewayIntents.MessageContent,
		LogGatewayIntentWarnings = false,
		TotalShards = 10,
		LogLevel = LogSeverity.Debug,
	}
)

Packages

N/A

The text was updated successfully, but these errors were encountered:

DeclanFrampton · 2023-07-31T11:50:36Z

Assuming you are in a Linux environment, could you attempt to reproduce this in a Windows environment and let me know your results. @nikita-petko

nikita-petko · 2023-07-31T12:00:54Z

@DeclanFrampton these were all performed on Windows machines.

DeclanFrampton · 2023-07-31T12:28:36Z

@DeclanFrampton these were all performed on Windows machines.

Thats that idea out the window then, what are the system specs(probs not the issue, always good to have more info though)

When you did a test with a different token for a singular server, did you use the same project? Also have you made any changes to the bot around the same time your the began?

nikita-petko · 2023-07-31T13:54:20Z

24 cores, 32GiB of physical memory. Windows Server 2019 Datacenter. 10GbE

Yes
The version was running fine and then stopped working completely, I shrugged it off to maybe I needed to update Discord.Net but that didn't fix it.

DeclanFrampton · 2023-07-31T15:06:14Z

24 cores, 32GiB of physical memory. Windows Server 2019 Datacenter. 10GbE

Yes

The version was running fine and then stopped working completely, I shrugged it off to maybe I needed to update Discord.Net but that didn't fix it.

Okay, since I can't debug this myself as I don't have 15k guilds to reproduce this, could you setup a new solution and use the same token. Use sharding/non shard and see if you still get the same issues. The bot doesn't need to have any features, just a fresh standalone build to test with.

If you still continue to get the issue ill bring up your issue in the discord to see if we can get the priority raised on this issue.

nikita-petko · 2023-08-01T03:10:57Z

@DeclanFrampton this is now the error that is encountered depsite sharding being enabled.

04:10:02 Shard #0    System.Exception: WebSocket connection was closed ---> Discord.Net.WebSocketClosedException: The server sent close 4011: "Sharding required."
   at Discord.Net.WebSockets.DefaultWebSocketClient.<RunAsync>d__34.MoveNext()
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
04:10:02 Shard #0    Disconnecting

nikita-petko · 2023-08-01T03:12:03Z

I have discovered the final issue, the reason it failed to connect is due to not enough shards, and I have set it up to automatically fetch the shards now.

Thank you for your help.

nikita-petko · 2023-08-01T03:47:01Z

Reopening as a new error has been encounted, after this error it will continously throw this error. It may be occuring in the GuildDownloader, but it also happens on my other test. After the 6th or 7th shard it will always throw this error.

[2023-08-01T03:43:27.3327Z][0029][][KVEX-WIN-234][bot][ERROR] DiscordInternal-EXCEPTION-Shard #6:
Error Type: System.TimeoutException
Error Detail: The operation has timed out.
Inner Exception:
Exception Stack Trace:
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
Exception Source: mscorlib
Exception TargetSite: Void Throw()
Exception Data: System.Collections.ListDictionaryInternal

DeclanFrampton · 2023-08-01T11:19:24Z

I can confirm that this issue is persistent when shards are greater than or equal to 6. Once I get some free time later on today I will take a deeper dive into this unless someone else gets there first

12:11:22 Shard #6    System.TimeoutException: The operation has timed out.
   at Discord.ConnectionManager.WaitAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 186
   at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\DiscordSocketClient.cs:line 324
   at Discord.ConnectionManager.ConnectAsync(CancellationTokenSource reconnectCancelToken) in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 153
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 77

DeclanFrampton · 2023-08-01T13:11:41Z

Conducted some additional testing, it seems the shard that gets timed out usually reconnects directly after. Always get the exception with 7 shards and over. Can you confirm this is the case for you?

14:08:28 Shard #0    System.TimeoutException: The operation has timed out.
   at Discord.ConnectionManager.WaitAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 187
   at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\DiscordSocketClient.cs:line 324
   at Discord.ConnectionManager.ConnectAsync(CancellationTokenSource reconnectCancelToken) in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 154
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 78
14:08:28 Shard #0    Disconnecting
14:08:34 Shard #0    Disconnected
14:08:36 Shard #0    Connecting
14:08:36 Shard #0    Resumed previous session
14:08:36 Shard #0    Connected

nikita-petko · 2023-08-01T13:39:37Z

Same but with 6 shards and over:

DeclanFrampton · 2023-08-02T01:09:46Z

Same but with 6 shards and over:

Least we can reproduce now, does the shard reconnect, if so does it continue to cause issues as time goes by or does it seem stable?

nikita-petko · 2023-08-02T05:18:00Z

The shard will just continue to timeout, but will try to reconnect. But it causes all the other shards to fail

DeclanFrampton · 2023-08-02T11:06:11Z

The shard will just continue to timeout, but will try to reconnect. But it causes all the other shards to fail

Ok, I've asked a maintainer to see if he could take a look also. Meanwhile could you provide a full stacktrace yourself of the exception from debug. Thanks.

nikita-petko · 2023-08-02T20:28:26Z

Same but with 6 shards and over:

@DeclanFrampton this is the full exception on a debug build.

* Upgrade direct deps * Enable debug logging * Implement sharding fix. * Fix intents error * Add username filitering to Renders * Fix infinite respond error on slash command received * Temp fix for discord-net/Discord.Net#2743 * Deprecate Google Analytics * Use native Backtrace client instead * Remove in favour of Threading.Tasks * Migrate to autogenerated HTTP Client instead * Remove HTTP in favour of native HTTP clients (most of the contents is not used) * Use VaultSharp instead * Use Consul instead * Deprecate RbxUsersClient * Use VaultSharp and Consul in Configuration and Discord.Configuration * Not used anymore * Migrate healtcheck client to native HTTP, may need future tests. * Remove GAM references and move naming to BotRegistry * Remove non needed event handlers * Move to support of native AsyncWorkQueue * Update BotRegistry references and remove GAM references * Removal of RBXUSERS * Apply checks for IsReady * Fixtures on GSI try open, remove unneeded settings * REPL not used * Update to support newer VaultSharp * Assembly binding redirect nightmares * Possible fix: Change to 6.0.0 * Fix prod error: NRE in config * Removal of WCF * Migration of ADP to not use WCF or EventLog * Remove uneeded targets and always log full exception! * Move Logging in support of #101 and #217 * Reference new assembly, in support of #101 and #217 * Move to new Logging, supports #101 and #217 * Reference new logging: #101 and #217 * Update all references to Logging in support of #101 and #217 * Change from MFDLabs.Logging to Logging * Move targets to ./targets (#217) * Move targets within sln (#217) * Move scripts around (in support of #217) * Fix Backtrace Newtonsoft error (BINDINGS!) * Move to respect #217 * Remove Pipeline (#220) * Move to lib/shared (#110 and #217) * Remove pipeline: #220 * Fix production exec issue * Remove Sentinels in favour of Polly * Move Assemblies in support of #101 and #217 * Remove these component tests (not used) * MFDLabs.Grid.Bot -> Grid.Bot (#101) * MFDLabs.Grid.AutoDeployer -> Grid.AutoDeployer (#101) * FIX: Old settngs references * Update run-service scripts * Rename build configurations (#217) * Update assembly names in configs (#217, #101) * Config sanity change * Sanity change to unpackers! * Final closure of #101 * Remove assembly info and auto generate it * Fix simple case where webserver is not getting killed * Small sanity changes with naming conventions (#217) * Introduction into ServiceDiscovery and Redis core libraries * Add floodchecking! * Implement a refresh interval so it doesn't DDoS consul * Change from task.delay to thread.sleep * Log refresh interval * Remove check for lastIndex * Fix error in prod * Use IPv4 * Fix for localscript * Change counter registry provider * Fix argument deciding not to work * Add support for checking if slash commands exist or not * Support only deploying RC releases * Support process watchdog, support uploading to backtrace * Expose Logging settings publically * Add new sandbox code * Fix format string! * Remove type casting * Fix variadic type * Fix table pairs * Fix httpservice method names * Fix pcall * Fix assertion * Sanity changes to make things work with old RCC * Add fflag blacklist & config * replace with fflag get/set * Revert "replace with fflag get/set" This reverts commit 871c585. * Delete web server deployment utils The web server is now dedicated. * Rollout LuaVM v2.0 * Fix issue where commands not getting deregistered * Minor changes to shared libraries. Move floodcheckers out to shared registry to lower resource allocation. Remove some commands and update permission schemes of others. Implement Luau checks for ScriptExecution slash command! * Surplus Changes * Better handling for timeouts * Add Client Settings command! * Fix handling * Fox response handlers * DO NOT USE STRING! --------- Co-authored-by: nosyliam <liammeshor@gmail.com>

nikita-petko added the bug label Jul 29, 2023

nikita-petko changed the title ~~[Bug]: Verified bot in 15k guilds continueosly reconnecting~~ [Bug]: Verified bot in 15k guilds continously reconnecting Jul 29, 2023

nikita-petko closed this as completed Aug 1, 2023

nikita-petko reopened this Aug 1, 2023

nikita-petko changed the title ~~[Bug]: Verified bot in 15k guilds continously reconnecting~~ [Bug]: Verified bot in 15k guilds operation timeout on creation of 6+ shards Aug 1, 2023

nikita-petko changed the title ~~[Bug]: Verified bot in 15k guilds operation timeout on creation of 6+ shards~~ [Bug]: Verified bot in 16k guilds operation timeout on creation of 6+ shards Aug 1, 2023

nikita-petko changed the title ~~[Bug]: Verified bot in 16k guilds operation timeout on creation of 6+ shards~~ [Bug]: Operation timeout with creation of 6+ shards Aug 1, 2023

nikita-petko added a commit to mfdlabs/grid-bot that referenced this issue Aug 7, 2023

Temp fix for discord-net/Discord.Net#2743

b8a2d0b

nikita-petko mentioned this issue Aug 7, 2023

Surplus global fixes mfdlabs/grid-bot#232

Merged

Misha-133 added project: websocket Needs investigation Needs to be looked at by a maintainer labels Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Operation timeout with creation of 6+ shards #2743

[Bug]: Operation timeout with creation of 6+ shards #2743

nikita-petko commented Jul 29, 2023 •

edited

DeclanFrampton commented Jul 31, 2023

nikita-petko commented Jul 31, 2023

DeclanFrampton commented Jul 31, 2023

nikita-petko commented Jul 31, 2023

DeclanFrampton commented Jul 31, 2023

nikita-petko commented Aug 1, 2023

nikita-petko commented Aug 1, 2023

nikita-petko commented Aug 1, 2023

DeclanFrampton commented Aug 1, 2023

DeclanFrampton commented Aug 1, 2023

nikita-petko commented Aug 1, 2023

DeclanFrampton commented Aug 2, 2023

nikita-petko commented Aug 2, 2023

DeclanFrampton commented Aug 2, 2023

nikita-petko commented Aug 2, 2023

[Bug]: Operation timeout with creation of 6+ shards #2743

[Bug]: Operation timeout with creation of 6+ shards #2743

Comments

nikita-petko commented Jul 29, 2023 • edited

Check The Docs

Verify Issue Source

Check your intents

Description

Version

Working Version

Logs

Sample

Packages

DeclanFrampton commented Jul 31, 2023

nikita-petko commented Jul 31, 2023

DeclanFrampton commented Jul 31, 2023

nikita-petko commented Jul 31, 2023

DeclanFrampton commented Jul 31, 2023

nikita-petko commented Aug 1, 2023

nikita-petko commented Aug 1, 2023

nikita-petko commented Aug 1, 2023

DeclanFrampton commented Aug 1, 2023

DeclanFrampton commented Aug 1, 2023

nikita-petko commented Aug 1, 2023

DeclanFrampton commented Aug 2, 2023

nikita-petko commented Aug 2, 2023

DeclanFrampton commented Aug 2, 2023

nikita-petko commented Aug 2, 2023

nikita-petko commented Jul 29, 2023 •

edited