You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
After bug #792 occurs, some (but not always all) clients get disconnected from the server that just switched WADs. After this disconnect occurs, an affected client goes straight to the console. Immediately running reconnect in the console to rejoin the game can cause the client to crash without an error message (i.e., the Odamex window just disapepars).
Build that the bug occurred in
10.4.0 G46E0E1-8728
To Reproduce
Play an online game until you experience bug [BUG] WAD changing-related crash #792 (being 'disconnected-to-console' from a netgame after a server switches WADs)
Immediately in the console, run reconnect (as a client)
Reproduction might not be consistent or occur (see Additional context below)
Expected behavior
To rejoin the server you just disconnected from.
For context, I have my Odamex client setup to automatically start recording a netdemo as soon as a netgame starts. The short netdemo is the result of this super-brief game before the crash.
The netdemos require DOOM2.WAD, ODAHORDE_230421.WAD, and HORDAMEX.WAD.
Additional context https://www.twitch.tv/videos/2111767444?t=113m30s Here is a video from another player's (Hekksy's) POV of the second crash who also got disconnected. At the 1:53:40 mark, you can see me (in the upper-left corner) briefly reconnected, then immediately crash, using the same command shown at 1:53:30. As you can see, reproduction of the crash isn't consistent. Under the same sequence of events, you might or might not crash using the reconnect command after getting a disconnect-to-console event upon WAD switching.
The text was updated successfully, but these errors were encountered:
I'll attempt an explanation for the crash that sometimes happens in Horde mode, specifically when switching from a Hordamex map to a Odahorde map.
It happened on my koholint Horde servers and I've spent some time tracking it down.
Stack trace captured in gdb:
Breakpoint 1, 0x00007ffff7cd1968 in std::__throw_out_of_range_fmt(char const*, ...) () from /lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0 0x00007ffff7cd1968 in std::__throw_out_of_range_fmt(char const*, ...) ()
from /lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00005555557be823 in std::vector<hordeDefine_t, std::allocator<hordeDefine_t> >::_M_range_check (this=<optimized out>, __n=<optimized out>)
at /usr/include/c++/12/bits/stl_vector.h:1153
#2 std::vector<hordeDefine_t, std::allocator<hordeDefine_t> >::at (
__n=<optimized out>, this=<optimized out>)
at /usr/include/c++/12/bits/stl_vector.h:1175
#3 G_HordeDefine (id=<optimized out>)
at /home/doom/dev/odamex/common/g_horde.cpp:515
#4 HordeState::serialize (this=<optimized out>)
at /home/doom/dev/odamex/common/p_horde.cpp:404
#5 P_HordeInfo () at /home/doom/dev/odamex/common/p_horde.cpp:728
#6 SV_UpdateGametype (pl=...)
at /home/doom/dev/odamex/server/src/sv_main.cpp:2887
#7 0x00005555557bef30 in SV_WriteCommands ()
at /home/doom/dev/odamex/server/src/sv_main.cpp:3189
#8 0x00005555557c096f in SV_StepTics (count=0)
at /home/doom/dev/odamex/server/src/sv_main.cpp:4200
#9 0x00005555557ca258 in SV_RunTics ()
at /home/doom/dev/odamex/server/src/sv_main.cpp:4259
#10 0x00005555556ce208 in CappedTaskScheduler::run (this=0x5555560614f0)
at /home/doom/dev/odamex/common/d_main.cpp:1012
What's happening: an out-of-bounds exception when serializing the Horde state to update clients.
SV_UpdateGameType -> P_HordeInfo -> serialize -> and the critical line is
Here, m_defineID can be too large, causing the OOB when accessing the vector WAVE_DEFINES.
At first I was suspecting a bug in the code that initializes and handles m_defineID but it turns out that's not the case: the issue is a race condition.
In the meantime, the intermission is ending and the new wad gets loaded.
We go through G_LoadWad -> D_DoomWadReboot -> D_Init -> G_ParseHordeDefs
This clears WAVE_DEFINES, but m_defineID stays the same for the time being.
In the case of a switch from Hordamex to Odahorde, we go from many wave defines to... less wave defines! So if we had reached a high Wave number before the wad switch (thus our m_defineID was in the upper echelon), chances are high it is now OOB given the much lower define count of Odahorde.
m_defineID only gets reassigned in P_RunHordeTics() on the first tic. So there's a small window for the crash to happen.
To fix it, I'd suggest either synchronozing both events and calling HordeDirector.reset() earlier, and/or shielding the serialize code from the crash itself by skipping the code in SV_UpdateGametype entirely if gamestate == GS_INTERMISSION and optionally catching the OOB around the serialize call specifically.
Describe the bug
After bug #792 occurs, some (but not always all) clients get disconnected from the server that just switched WADs. After this disconnect occurs, an affected client goes straight to the console. Immediately running
reconnect
in the console to rejoin the game can cause the client to crash without an error message (i.e., the Odamex window just disapepars).Build that the bug occurred in
10.4.0 G46E0E1-8728
To Reproduce
reconnect
(as a client)Expected behavior
To rejoin the server you just disconnected from.
Screenshots, NetDemos, & Crash Dumps
The first crash from Thursday (crash dump with an associated netdemo):
odamex_g46e0e1_29968_20240405T005957.dmp
Odamex_HORDE_20240404_195957_ODAHORDE_230421.WAD_MAP09.zip
The second crash from Friday:
odamex_g46e0e1_5196_20240405T222342.dmp
Odamex_HORDE_20240405_172341_HORDAMEX.WAD_MAP13.zip
For context, I have my Odamex client setup to automatically start recording a netdemo as soon as a netgame starts. The short netdemo is the result of this super-brief game before the crash.
The netdemos require DOOM2.WAD, ODAHORDE_230421.WAD, and HORDAMEX.WAD.
Additional context
https://www.twitch.tv/videos/2111767444?t=113m30s Here is a video from another player's (Hekksy's) POV of the second crash who also got disconnected. At the 1:53:40 mark, you can see me (in the upper-left corner) briefly reconnected, then immediately crash, using the same command shown at 1:53:30. As you can see, reproduction of the crash isn't consistent. Under the same sequence of events, you might or might not crash using the
reconnect
command after getting adisconnect-to-console
event upon WAD switching.The text was updated successfully, but these errors were encountered: