Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dhewm3 crashes on cpu arch PPC64LE #508

Open
stallmanshiteater opened this issue Feb 5, 2023 · 23 comments
Open

dhewm3 crashes on cpu arch PPC64LE #508

stallmanshiteater opened this issue Feb 5, 2023 · 23 comments

Comments

@stallmanshiteater
Copy link

when a map loads (or when using cachemegs console command) the game crashes, looks like some type of memory access error

Looks like dhewm3 1.5.2 crashed with signal SIGSEGV (11) - sorry!

Backtrace:
dhewm3(+0x2b968c) [0x10a95968c]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0) [0x7fff8f2a0444]
/usr/lib64/dhewm3/base.so(+0x1e8214) [0x7ffec75e8214]
[0x7fffdcf16390]
/usr/lib64/dhewm3/base.so(+0x1690bc) [0x7ffec75690bc]
/usr/lib64/dhewm3/base.so(+0x1b6d44) [0x7ffec75b6d44]
/usr/lib64/dhewm3/base.so(+0x201324) [0x7ffec7601324]
/usr/lib64/dhewm3/base.so(+0x2030e4) [0x7ffec76030e4]
/usr/lib64/dhewm3/base.so(+0x213940) [0x7ffec7613940]
/usr/lib64/dhewm3/base.so(+0x16a6cc) [0x7ffec756a6cc]
/usr/lib64/dhewm3/base.so(+0x172108) [0x7ffec7572108]
/usr/lib64/dhewm3/base.so(+0x121ba8) [0x7ffec7521ba8]
/usr/lib64/dhewm3/base.so(+0x13a7b8) [0x7ffec753a7b8]
/usr/lib64/dhewm3/base.so(+0x133754) [0x7ffec7533754]
/usr/lib64/dhewm3/base.so(+0x13398c) [0x7ffec753398c]
/usr/lib64/dhewm3/base.so(+0x134d7c) [0x7ffec7534d7c]
/usr/lib64/dhewm3/base.so(+0x1a8d3c) [0x7ffec75a8d3c]
/usr/lib64/dhewm3/base.so(+0xb5664) [0x7ffec74b5664]
/usr/lib64/dhewm3/base.so(+0xb5ab0) [0x7ffec74b5ab0]
dhewm3(+0x17c410) [0x10a81c410]
dhewm3(+0x17c62c) [0x10a81c62c]
dhewm3(+0x16e888) [0x10a80e888]
dhewm3(+0x1712c0) [0x10a8112c0]
dhewm3(+0x17d864) [0x10a81d864]
dhewm3(+0x111384) [0x10a7b1384]
dhewm3(+0x344bc) [0x10a6d44bc]
/lib64/libc.so.6(+0x27c00) [0x7fff8e627c00]
/lib64/libc.so.6(__libc_start_main+0x94) [0x7fff8e627df4]

(Sorry it's not overly useful, build with libbacktrace support to get function names)
Segmentation fault (core dumped)

@DanielGibson
Copy link
Member

(Sorry it's not overly useful, build with libbacktrace support to get function names)

@stallmanshiteater
Copy link
Author

(Sorry it's not overly useful, build with libbacktrace support to get function names)

its not available in my repo on my current distro for my cpu arch,
i compiled it myself and had more stability vs the dhewm3 package in my repo, but still the same crash.

@DanielGibson
Copy link
Member

What distro are you using?

Try running it in gdb, and after the crash enter backtrace in there

@stallmanshiteater
Copy link
Author

stallmanshiteater commented Feb 5, 2023

Opensuse leap 15.4 ppc64le,

I'm able to complete some maps, mars_city2 for example is the only one i've tested so far that doesn't crash. runs beautifully and way more responsive than x86 when its not crashing.

@stallmanshiteater
Copy link
Author

How do I run it in gdb?

@DanielGibson
Copy link
Member

DanielGibson commented Feb 5, 2023

in a terminal in the directory the dhewm3 executable is in, run gdb ./dhewm3, then in gdb enter run
if you pass arguments to dhewm3 when starting it, it's gdb --args ./dhewm3 your arguments

@stallmanshiteater
Copy link
Author

Thread 1 "dhewm3" received signal SIGSEGV, Segmentation fault.
0x00007fff6828afbc in idEntity::SetShaderParm (value=75.4080048, parmnum=, this=0x1879bf3c) at /home/bug/dhewm3/neo/game/Entity.cpp:1018
1018 renderEntity.shaderParms[ parmnum ] = value;
Missing separate debuginfos, use: zypper install Mesa-dri-debuginfo-21.2.4-150400.68.9.1.ppc64le Mesa-libGL1-debuginfo-21.2.4-150400.68.9.1.ppc64le Mesa-libglapi0-debuginfo-21.2.4-150400.68.9.1.ppc64le glibc-locale-base-debuginfo-2.31-150300.41.1.ppc64le krb5-debuginfo-1.19.2-150400.3.3.1.ppc64le libFLAC8-debuginfo-1.3.2-150000.3.11.1.ppc64le libLLVM11-debuginfo-11.0.1-150300.3.6.1.ppc64le libSDL2-2_0-0-debuginfo-2.0.8-150200.11.9.1.ppc64le libX11-6-debuginfo-1.6.5-150000.3.24.1.ppc64le libX11-xcb1-debuginfo-1.6.5-150000.3.24.1.ppc64le libXau6-debuginfo-1.0.8-1.26.ppc64le libXcursor1-debuginfo-1.1.15-1.18.ppc64le libXext6-debuginfo-1.3.3-1.30.ppc64le libXfixes3-debuginfo-6.0.0-150400.1.4.ppc64le libXi6-debuginfo-1.7.9-3.2.1.ppc64le libXinerama1-debuginfo-1.1.3-1.22.ppc64le libXrandr2-debuginfo-1.5.1-2.17.ppc64le libXrender1-debuginfo-0.9.10-1.30.ppc64le libXss1-debuginfo-1.2.2-3.4.ppc64le libXxf86vm1-debuginfo-1.1.4-1.23.ppc64le libasound2-debuginfo-1.2.6.1-150400.1.4.ppc64le libbrotlicommon1-debuginfo-1.0.7-3.3.1.ppc64le libbrotlidec1-debuginfo-1.0.7-3.3.1.ppc64le libcap2-debuginfo-2.63-150400.1.7.ppc64le libcom_err2-debuginfo-1.46.4-150400.3.3.1.ppc64le libcurl4-debuginfo-7.79.1-150400.5.12.1.ppc64le libdbus-1-3-debuginfo-1.12.2-150400.18.5.1.ppc64le libdrm2-debuginfo-2.4.107-150400.1.8.ppc64le libdrm_amdgpu1-debuginfo-2.4.107-150400.1.8.ppc64le libdrm_nouveau2-debuginfo-2.4.107-150400.1.8.ppc64le libdrm_radeon1-debuginfo-2.4.107-150400.1.8.ppc64le libedit0-debuginfo-3.1.snap20150325-2.12.ppc64le libelf1-debuginfo-0.185-150400.5.3.1.ppc64le libexpat1-debuginfo-2.4.4-150400.3.12.1.ppc64le libgcc_s1-debuginfo-12.2.1+git416-150000.1.5.1.ppc64le libgcrypt20-debuginfo-1.9.4-150400.6.5.1.ppc64le libglvnd-debuginfo-1.3.3-150400.3.4.ppc64le libgpg-error0-debuginfo-1.42-150400.1.101.ppc64le libidn2-0-debuginfo-2.2.0-3.6.1.ppc64le libjack0-debuginfo-1.9.12-150000.3.3.1.ppc64le libkeyutils1-debuginfo-1.6.3-5.6.1.ppc64le libldap-2_4-2-debuginfo-2.4.46-150200.14.11.2.ppc64le liblz4-1-debuginfo-1.9.3-150400.1.7.ppc64le liblzma5-debuginfo-5.2.3-150000.4.7.1.ppc64le libncurses6-debuginfo-6.1-150000.5.12.1.ppc64le libnghttp2-14-debuginfo-1.40.0-6.1.ppc64le libogg0-debuginfo-1.3.2-150000.3.4.1.ppc64le libopenal1-debuginfo-1.17.2-3.7.41.ppc64le libopenssl1_1-debuginfo-1.1.1l-150400.7.19.1.ppc64le libpcre1-debuginfo-8.45-150000.20.13.1.ppc64le libpsl5-debuginfo-0.20.1-150000.3.3.1.ppc64le libpulse0-debuginfo-15.0-150400.2.10.ppc64le libsasl2-3-debuginfo-2.1.27-150300.4.6.1.ppc64le libselinux1-debuginfo-3.1-150400.1.69.ppc64le libsndfile1-debuginfo-1.0.28-150000.5.17.1.ppc64le libspeex1-debuginfo-1.2-150000.3.5.2.ppc64le libssh4-debuginfo-0.9.6-150400.1.5.ppc64le libstdc++6-debuginfo-12.2.1+git416-150000.1.5.1.ppc64le libsystemd0-debuginfo-249.14-150400.8.19.1.ppc64le libudev1-debuginfo-249.14-150400.8.19.1.ppc64le libunistring2-debuginfo-0.9.10-1.1.ppc64le libvorbis0-debuginfo-1.3.6-150000.4.5.2.ppc64le libvorbisenc2-debuginfo-1.3.6-150000.4.5.2.ppc64le libxcb-dri2-0-debuginfo-1.13-150000.3.9.1.ppc64le libxcb-dri3-0-debuginfo-1.13-150000.3.9.1.ppc64le libxcb-glx0-debuginfo-1.13-150000.3.9.1.ppc64le libxcb-present0-debuginfo-1.13-150000.3.9.1.ppc64le libxcb-shm0-debuginfo-1.13-150000.3.9.1.ppc64le libxcb-sync1-debuginfo-1.13-150000.3.9.1.ppc64le libxcb-xfixes0-debuginfo-1.13-150000.3.9.1.ppc64le libxcb1-debuginfo-1.13-150000.3.9.1.ppc64le libxshmfence1-debuginfo-1.2-1.23.ppc64le libz1-debuginfo-1.2.11-150000.3.39.1.ppc64le libzstd1-debuginfo-1.5.0-150400.1.71.ppc64le
(gdb) backtrace
#0 0x00007fff6828afbc in idEntity::SetShaderParm (value=75.4080048,
parmnum=, this=0x1879bf3c)
at /home/bug/dhewm3/neo/game/Entity.cpp:1018
#1 idEntity::Event_SetShaderParm (this=0x1879bf3c, parmnum=,
value=75.4080048) at /home/bug/dhewm3/neo/game/Entity.cpp:3985
#2 0x00007fff683b6df0 in idClass::ProcessEventArgPtr (this=0x1879bf3c,
ev=0x7fff685f7bf8 <EV_SetShaderParm>, data=0x7fffffffd610)
at /home/bug/dhewm3/neo/game/gamesys/Callbacks.cpp:64
#3 0x00007fff68408944 in idInterpreter::CallEvent (this=0x17e09cec,
func=0x7fff686982d8 <gameLocal+652472>, argsize=)
at /home/bug/dhewm3/neo/game/script/Script_Interpreter.cpp:834
#4 0x00007fff6840b0a4 in idInterpreter::Execute (this=0x17e09cec)
at /home/bug/dhewm3/neo/game/script/Script_Interpreter.cpp:1060
#5 0x00007fff6841bac0 in idThread::Execute (this=0x17e09cd4)
at /home/bug/dhewm3/neo/game/script/Script_Thread.cpp:662
#6 0x00007fff6827b8b8 in idActor::UpdateScript (this=0x1879bf3c)
at /home/bug/dhewm3/neo/game/Actor.cpp:1353
#7 0x00007fff68388138 in idAI::UpdateAIScript (this=)
at /home/bug/dhewm3/neo/game/ai/AI.cpp:1196
#8 0x00007fff6839a87c in idAI::Think (this=0x1879bf3c)
at /home/bug/dhewm3/neo/game/ai/AI.cpp:1111
#9 0x00007fff682b8364 in idGameLocal::RunFrame (
this=0x7fff685f8e20 , clientCmds=0x7fffffffe904)
--Type for more, q to quit, c to continue without paging--q
Quit
(gdb) quit
A debugging session is active.

Inferior 1 [process 4529] will be killed.

Quit anyway? (y or n) y

@DanielGibson
Copy link
Member

Weird, that really shouldn't crash, unless this isn't really an entity, however that should happen.

Sorry, but I can't help you figure this out, as I don't have any PPC64LE hardware.
Someone who knows C++ and how to use a debugger and has such obscure hardware will have to do it.

@stallmanshiteater
Copy link
Author

well, you can use ppc64le at integricloud.com i'm pretty sure, if you care enough to bother with it (don't blame you if you don't want to)

@DanielGibson
Copy link
Member

Our POWER9 / OpenPOWER dedicated hosting services start at just $325/mo.

lol

also not sure how some cloud server would help me debugging a game that needs a GPU and a display

@adecrepitcabbage
Copy link

I try not to wade into these, because I am not generally good with technology, but @stallmanshiteater, strongly consider checking your RAM. If you're short sticks and can grab a stick from work or something, that'll do for a test. I had a similar non-dhewm3 crash on a friend's machine, and it turned out it was something to do with one of the sticks. He had like 64GB ram, so he just removed them one by one, found the faulty one, and could kill time waiting to replace it so he could have his mildly excessive but still impressive collection.

It is a super basic test, but it's one that's so easy to forget it's worth testing.

@turol
Copy link
Contributor

turol commented Feb 6, 2023

The type punning in Script_Interpreter.cpp and Callbacks.cpp looks really suspicious. Wouldn't be surprised if there's some endianness problem there.

I don't think anyone here has a big endian machine anymore. You'll have to learn to debug this yourself. As a first step learn how to inspect variables and check if parmnum and value look reasonable. Valgrind also advertises PowerPC support so you could try that.

Also formatting your backtrace as code (with backticks) prevents github from turning the stack frame identifiers #1 into useless links.

@DanielGibson
Copy link
Member

DanielGibson commented Feb 6, 2023

I don't think anyone here has a big endian machine anymore.

I think PPC64LE is little endian ;)
(Also, dhewm3 is known to work on old PPC Macs that really are big endian)

Trying valgrind is a good idea. It'll slow the game down considerably, especially loading levels takes very long; hopefully it's still playable enough to be able to trigger the bug.

To use valgrind, just install it and run valgrind ./dhewm3 your arguments

@turol
Copy link
Contributor

turol commented Feb 6, 2023

I thought all PowerPC archs were big endian. Apparently not. Does it support unaligned loads? If not then that type punning looks like a really good candidate for Stuff Going Wrong.

@stallmanshiteater
Copy link
Author

I try not to wade into these, because I am not generally good with technology, but @stallmanshiteater, strongly consider checking your RAM. If you're short sticks and can grab a stick from work or something, that'll do for a test. I had a similar non-dhewm3 crash on a friend's machine, and it turned out it was something to do with one of the sticks. He had like 64GB ram, so he just removed them one by one, found the faulty one, and could kill time waiting to replace it so he could have his mildly excessive but still impressive collection.

It is a super basic test, but it's one that's so easy to forget it's worth testing.

I suppose I could, but this system only accepts DDR4 ECC registered ram.

I'll try to get valgrind and test that.

@stallmanshiteater
Copy link
Author

==9418== Invalid read of size 4
==9418== at 0x3AB4EFC4: GetWeight (Anim_Blend.cpp:1188)
==9418== by 0x3AB4EFC4: idAnimator::PushAnims(int, int, int) (Anim_Blend.cpp:3252)
==9418== by 0x3AB4F95B: idAnimator::CycleAnim(int, int, int, int) (Anim_Blend.cpp:3514)
==9418== by 0x3AACD8CB: idWeapon::Event_PlayCycle(int, char const*) (Weapon.cpp:2649)
==9418== by 0x3AB1708F: idClass::ProcessEventArgPtr(idEventDef const*, long*) (Callbacks.cpp:54)
==9418== by 0x3AB68943: idInterpreter::CallEvent(function_t const*, int) (Script_Interpreter.cpp:834)
==9418== by 0x3AB6B0A3: idInterpreter::Execute() (Script_Interpreter.cpp:1060)
==9418== by 0x3AB7BABF: idThread::Execute() [clone .part.31] (Script_Thread.cpp:662)
==9418== by 0x3AAD5113: idWeapon::UpdateScript() [clone .part.35] (Weapon.cpp:1828)
==9418== by 0x3AAD6A97: UpdateScript (Weapon.cpp:2011)
==9418== by 0x3AAD6A97: idWeapon::PresentWeapon(bool) (Weapon.cpp:1924)
==9418== by 0x3AA85487: idPlayer::UpdateWeapon() (Player.cpp:4085)
==9418== by 0x3AA9E257: idPlayer::Think() (Player.cpp:6346)
==9418== by 0x3AA18363: idGameLocal::RunFrame(usercmd_t const*) (Game_local.cpp:2344)
==9418== Address 0x1740462e6adc is not stack'd, malloc'd or (recently) free'd
==9418==

Looks like dhewm3 1.5.3pre crashed with signal SIGSEGV (11) - sorry!

Backtrace:
./dhewm3() [0x102a26f8]
[0x5825d978]

(Sorry it's not overly useful, build with libbacktrace support to get function names)
==9418==
==9418== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==9418== at 0x44CB000: raise (in /lib64/libpthread-2.31.so)
==9418== by 0x102A27C7: signalhandlerCrash(int) (posix_main.cpp:563)
==9418== by 0x5825D977: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==9418==
==9418== HEAP SUMMARY:
==9418== in use at exit: 303,835,089 bytes in 416,775 blocks
==9418== total heap usage: 2,750,100 allocs, 2,333,325 frees, 2,884,590,999 bytes allocated
==9418==
==9418== LEAK SUMMARY:
==9418== definitely lost: 2,395,016 bytes in 7,662 blocks
==9418== indirectly lost: 1,451,412 bytes in 13,002 blocks
==9418== possibly lost: 74,506,062 bytes in 47,836 blocks
==9418== still reachable: 225,482,599 bytes in 348,275 blocks
==9418== of which reachable via heuristic:
==9418== newarray : 4,872,880 bytes in 1,543 blocks
==9418== multipleinheritance: 644,248 bytes in 882 blocks
==9418== suppressed: 0 bytes in 0 blocks
==9418== Rerun with --leak-check=full to see details of leaked memory
==9418==
==9418== Use --track-origins=yes to see where uninitialised values come from
==9418== For lists of detected and suppressed errors, rerun with: -s
==9418== ERROR SUMMARY: 18639 errors from 616 contexts (suppressed: 8424 from 58)
Segmentation fault (core dumped)

@stallmanshiteater
Copy link
Author

I don't think anyone here has a big endian machine anymore.

I think PPC64LE is little endian ;) (Also, dhewm3 is known to work on old PPC Macs that really are big endian)

I suppose I could try testing this on BE as well sometime...
POWER9 is Bi-endian, so it can run either. Most linux distros are LE but there are some BE distros and BSDs.

@stallmanshiteater
Copy link
Author

another note, when changing image_cachemegs from 2048 to 4096, I was able to have the map "game/commoutside" load, it crashed later on, but previously when image_cachemegs was "2048" it crashed upon loading....

@DanielGibson
Copy link
Member

DanielGibson commented Feb 7, 2023

if you set image_showBackgroundLoads 1, does it show purging <texturename> messages in the console (or the terminal you start dhewm3 from) with image_cachemegs 2048?

and note that by default image_cacheMegs is 20, not 2048

what kind of hardware (incl. GPU) are you using exactly?

@stallmanshiteater
Copy link
Author

stallmanshiteater commented Feb 8, 2023

No, I didn't see that.
I know the default is 20, but when I played with the value I noticed certain moments the game would crash at every time no longer would, but crashed later on.

hardware: https://i.imgur.com/9EsDqk0.png

I'm on a raptor blackbird motherboard with an 8 core sforza power9 cpu, and 32gb of ECC registered ddr4 @ 2666 from black diamond memory. GPU is an OEM type AMD vega 64.

@DanielGibson
Copy link
Member

If you don't see those purging ... when loading a level with image_showBackgroundLoads 1, it means that image_cachemegs wasn't used => the game (not) crashing depending on image_cachemegs was just random.

BTW, it's totally possible that the crash is in the GPU driver and not in dhewm3 itself.

Try creating a debug build (pass -DCMAKE_BUILD_TYPE=Debug to cmake) and get a backtrace with gdb again and run it in valgrind again, maybe the actual crash happens at a place that makes more sense than the original backtrace suggested (by default dhewm3 is built as an optimized build with debug symbols, and the optimizations can make the backtrace incorrect)

@stallmanshiteater
Copy link
Author

driver is amdgpu, the open source linux one. ran fine on same card and driver on x86_64.

I'll update after I've tried the aforementioned.

@DanielGibson
Copy link
Member

ran fine on same card and driver on x86_64.

yeah, but maybe it has a PPC-specific bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants