Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Operation timeout with creation of 6+ shards #2743

Open
3 tasks done
nikita-petko opened this issue Jul 29, 2023 · 15 comments
Open
3 tasks done

[Bug]: Operation timeout with creation of 6+ shards #2743

nikita-petko opened this issue Jul 29, 2023 · 15 comments
Labels
bug Needs investigation Needs to be looked at by a maintainer project: websocket

Comments

@nikita-petko
Copy link
Contributor

nikita-petko commented Jul 29, 2023

Check The Docs

  • I double checked the docs and couldn't find any useful information.

Verify Issue Source

  • I verified the issue was caused by Discord.Net.

Check your intents

  • I double checked that I have the required intents.

Description

Note: Please view edit history to see the original purpose of this issue.

If you create a sharded client with 6 or more shards, at around the 6th shard, all shards are prevented from running. The bot will continue to receive dispatches but will be unable to actually process the events (debug logging notes the dispatches but handlers are not being invoked).

This may relate to one of my old issues: #2126

Version

3.11.0

Working Version

No response

Logs

[2023-08-01T03:43:27.3327Z][0029][][KVEX-WIN-234][bot][ERROR] DiscordInternal-EXCEPTION-Shard #6:
Error Type: System.TimeoutException
Error Detail: The operation has timed out.
Inner Exception:
Exception Stack Trace:
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
Exception Source: mscorlib
Exception TargetSite: Void Throw()
Exception Data: System.Collections.ListDictionaryInternal

Sample

Instantiation

var client = new DiscordShardedClient(
	new DiscordSocketConfig
	{
		GatewayIntents =
			GatewayIntents.GuildMessages
			| GatewayIntents.DirectMessages
			| GatewayIntents.Guilds
			| GatewayIntents.MessageContent,
		LogGatewayIntentWarnings = false,
		TotalShards = 10,
		LogLevel = LogSeverity.Debug,
	}
)

Packages

N/A

@nikita-petko nikita-petko changed the title [Bug]: Verified bot in 15k guilds continueosly reconnecting [Bug]: Verified bot in 15k guilds continously reconnecting Jul 29, 2023
@DeclanFrampton
Copy link
Contributor

Assuming you are in a Linux environment, could you attempt to reproduce this in a Windows environment and let me know your results. @nikita-petko

@nikita-petko
Copy link
Contributor Author

@DeclanFrampton these were all performed on Windows machines.

@DeclanFrampton
Copy link
Contributor

@DeclanFrampton these were all performed on Windows machines.

Thats that idea out the window then, what are the system specs(probs not the issue, always good to have more info though)

When you did a test with a different token for a singular server, did you use the same project? Also have you made any changes to the bot around the same time your the began?

@nikita-petko
Copy link
Contributor Author

24 cores, 32GiB of physical memory. Windows Server 2019 Datacenter. 10GbE

  1. Yes
  2. The version was running fine and then stopped working completely, I shrugged it off to maybe I needed to update Discord.Net but that didn't fix it.

@DeclanFrampton
Copy link
Contributor

24 cores, 32GiB of physical memory. Windows Server 2019 Datacenter. 10GbE

  1. Yes
  2. The version was running fine and then stopped working completely, I shrugged it off to maybe I needed to update Discord.Net but that didn't fix it.

Okay, since I can't debug this myself as I don't have 15k guilds to reproduce this, could you setup a new solution and use the same token. Use sharding/non shard and see if you still get the same issues. The bot doesn't need to have any features, just a fresh standalone build to test with.

If you still continue to get the issue ill bring up your issue in the discord to see if we can get the priority raised on this issue.

@nikita-petko
Copy link
Contributor Author

@DeclanFrampton this is now the error that is encountered depsite sharding being enabled.

04:10:02 Shard #0    System.Exception: WebSocket connection was closed ---> Discord.Net.WebSocketClosedException: The server sent close 4011: "Sharding required."
   at Discord.Net.WebSockets.DefaultWebSocketClient.<RunAsync>d__34.MoveNext()
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
04:10:02 Shard #0    Disconnecting

@nikita-petko
Copy link
Contributor Author

I have discovered the final issue, the reason it failed to connect is due to not enough shards, and I have set it up to automatically fetch the shards now.

Thank you for your help.

@nikita-petko
Copy link
Contributor Author

Reopening as a new error has been encounted, after this error it will continously throw this error. It may be occuring in the GuildDownloader, but it also happens on my other test. After the 6th or 7th shard it will always throw this error.

[2023-08-01T03:43:27.3327Z][0029][][KVEX-WIN-234][bot][ERROR] DiscordInternal-EXCEPTION-Shard #6:
Error Type: System.TimeoutException
Error Detail: The operation has timed out.
Inner Exception:
Exception Stack Trace:
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
Exception Source: mscorlib
Exception TargetSite: Void Throw()
Exception Data: System.Collections.ListDictionaryInternal

@nikita-petko nikita-petko reopened this Aug 1, 2023
@nikita-petko nikita-petko changed the title [Bug]: Verified bot in 15k guilds continously reconnecting [Bug]: Verified bot in 15k guilds operation timeout on creation of 6+ shards Aug 1, 2023
@nikita-petko nikita-petko changed the title [Bug]: Verified bot in 15k guilds operation timeout on creation of 6+ shards [Bug]: Verified bot in 16k guilds operation timeout on creation of 6+ shards Aug 1, 2023
@DeclanFrampton
Copy link
Contributor

I can confirm that this issue is persistent when shards are greater than or equal to 6. Once I get some free time later on today I will take a deeper dive into this unless someone else gets there first

12:11:22 Shard #6    System.TimeoutException: The operation has timed out.
   at Discord.ConnectionManager.WaitAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 186
   at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\DiscordSocketClient.cs:line 324
   at Discord.ConnectionManager.ConnectAsync(CancellationTokenSource reconnectCancelToken) in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 153
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 77

@nikita-petko nikita-petko changed the title [Bug]: Verified bot in 16k guilds operation timeout on creation of 6+ shards [Bug]: Operation timeout with creation of 6+ shards Aug 1, 2023
@DeclanFrampton
Copy link
Contributor

Conducted some additional testing, it seems the shard that gets timed out usually reconnects directly after. Always get the exception with 7 shards and over. Can you confirm this is the case for you?

14:08:28 Shard #0    System.TimeoutException: The operation has timed out.
   at Discord.ConnectionManager.WaitAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 187
   at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\DiscordSocketClient.cs:line 324
   at Discord.ConnectionManager.ConnectAsync(CancellationTokenSource reconnectCancelToken) in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 154
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 78
14:08:28 Shard #0    Disconnecting
14:08:34 Shard #0    Disconnected
14:08:36 Shard #0    Connecting
14:08:36 Shard #0    Resumed previous session
14:08:36 Shard #0    Connected

@nikita-petko
Copy link
Contributor Author

Same but with 6 shards and over:

@DeclanFrampton
Copy link
Contributor

Same but with 6 shards and over:

Least we can reproduce now, does the shard reconnect, if so does it continue to cause issues as time goes by or does it seem stable?

@nikita-petko
Copy link
Contributor Author

The shard will just continue to timeout, but will try to reconnect. But it causes all the other shards to fail

@DeclanFrampton
Copy link
Contributor

The shard will just continue to timeout, but will try to reconnect. But it causes all the other shards to fail

Ok, I've asked a maintainer to see if he could take a look also. Meanwhile could you provide a full stacktrace yourself of the exception from debug. Thanks.

@nikita-petko
Copy link
Contributor Author

Same but with 6 shards and over:

@DeclanFrampton this is the full exception on a debug build.

nikita-petko added a commit to mfdlabs/grid-bot that referenced this issue Aug 7, 2023
nikita-petko added a commit to mfdlabs/grid-bot that referenced this issue Aug 17, 2023
* Upgrade direct deps

* Enable debug logging

* Implement sharding fix.

* Fix intents error

* Add username filitering to Renders

* Fix infinite respond error on slash command received

* Temp fix for discord-net/Discord.Net#2743

* Deprecate Google Analytics

* Use native Backtrace client instead

* Remove in favour of Threading.Tasks

* Migrate to autogenerated HTTP Client instead

* Remove HTTP in favour of native HTTP clients (most of the contents is not used)

* Use VaultSharp instead

* Use Consul instead

* Deprecate RbxUsersClient

* Use VaultSharp and Consul in Configuration and Discord.Configuration

* Not used anymore

* Migrate healtcheck client to native HTTP, may need future tests.

* Remove GAM references and move naming to BotRegistry

* Remove non needed event handlers

* Move to support of native AsyncWorkQueue

* Update BotRegistry references and remove GAM references

* Removal of RBXUSERS

* Apply checks for IsReady

* Fixtures on GSI try open, remove unneeded settings

* REPL not used

* Update to support newer VaultSharp

* Assembly binding redirect nightmares

* Possible fix: Change to 6.0.0

* Fix prod error: NRE in config

* Removal of WCF

* Migration of ADP to not use WCF or EventLog

* Remove uneeded targets and always log full exception!

* Move Logging in support of #101 and #217

* Reference new assembly, in support of #101 and #217

* Move to new Logging, supports #101 and #217

* Reference new logging: #101 and #217

* Update all references to Logging in support of #101 and #217

* Change from MFDLabs.Logging to Logging

* Move targets to ./targets (#217)

* Move targets within sln (#217)

* Move scripts around (in support of #217)

* Fix Backtrace Newtonsoft error (BINDINGS!)

* Move to respect #217

* Remove Pipeline (#220)

* Move to lib/shared (#110 and #217)

* Remove pipeline: #220

* Fix production exec issue

* Remove Sentinels in favour of Polly

* Move Assemblies in support of #101 and #217

* Remove these component tests (not used)

* MFDLabs.Grid.Bot -> Grid.Bot (#101)

* MFDLabs.Grid.AutoDeployer -> Grid.AutoDeployer (#101)

* FIX: Old settngs references

* Update run-service scripts

* Rename build configurations (#217)

* Update assembly names in configs (#217, #101)

* Config sanity change

* Sanity change to unpackers!

* Final closure of #101

* Remove assembly info and auto generate it

* Fix simple case where webserver is not getting killed

* Small sanity changes with naming conventions (#217)

* Introduction into ServiceDiscovery and Redis core libraries

* Add floodchecking!

* Implement a refresh interval so it doesn't DDoS consul

* Change from task.delay to thread.sleep

* Log refresh interval

* Remove check for lastIndex

* Fix error in prod

* Use IPv4

* Fix for localscript

* Change counter registry provider

* Fix argument deciding not to work

* Add support for checking if slash commands exist or not

* Support only deploying RC releases

* Support process watchdog, support uploading to backtrace

* Expose Logging settings publically

* Add new sandbox code

* Fix format string!

* Remove type casting

* Fix variadic type

* Fix table pairs

* Fix httpservice method names

* Fix pcall

* Fix assertion

* Sanity changes to make things work with old RCC

* Add fflag blacklist & config

* replace with fflag get/set

* Revert "replace with fflag get/set"

This reverts commit 871c585.

* Delete web server deployment utils

The web server is now dedicated.

* Rollout LuaVM v2.0

* Fix issue where commands not getting deregistered

* Minor changes to shared libraries.

Move floodcheckers out to shared registry to lower resource allocation.
Remove some commands and update permission schemes of others.

Implement Luau checks for ScriptExecution slash command!

* Surplus Changes

* Better handling for timeouts

* Add Client Settings command!

* Fix handling

* Fox response handlers

* DO NOT USE STRING!

---------

Co-authored-by: nosyliam <liammeshor@gmail.com>
@Misha-133 Misha-133 added project: websocket Needs investigation Needs to be looked at by a maintainer labels Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Needs investigation Needs to be looked at by a maintainer project: websocket
Projects
None yet
Development

No branches or pull requests

3 participants