Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Godot segfault at exit with threads #90909

Open
novalis opened this issue Apr 19, 2024 · 0 comments
Open

Godot segfault at exit with threads #90909

novalis opened this issue Apr 19, 2024 · 0 comments

Comments

@novalis
Copy link
Contributor

novalis commented Apr 19, 2024

Tested versions

System information

Godot v4.2.1.stable - Debian GNU/Linux trixie/sid trixie - X11 - Vulkan (Forward+) - dedicated AMD Radeon RX 5600 XT (RADV NAVI10) () - AMD Ryzen 5 2600X Six-Core Processor (12 Threads)

Issue description

Godot's shutdown isn't thread-safe. What it needs to do is: before deleting script instances, stop their threads. This isn't going to always work -- Godot uses std::thread which doesn't support forcibly killing threads. And we can't use std::jthread, since that attempts to join threads before their destruction, and the thread might be stuck in an infinite loop. But we could add a condition variable, and, around here, check the variable and exit the thread if it's set. That wouldn't help in the case that the thread is stuck in an infinite loop in native code, but that case is very rare compared to an infinite loop in gdscript. (We would also have to wait a few ticks for the top of the opcode loop, but we could detect this by having a thread set a "I got it, I'm done" variable, which would mean we could stop waiting as quickly as possible).

Here's the symptom:

WARNING: A Thread object is being destroyed without its completion having been realized.
Please call wait_to_finish() on it to ensure correct cleanup.
     at: ~Thread (core/os/thread.cpp:105)
...
Thread 27 "busy1" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc94006c0 (LWP 1180509)]
StringName::StringName (this=this@entry=0x7fffc93ff7f0, p_name=...)
    at core/string/string_name.cpp:197
197		if (p_name._data && p_name._data->refcount.ref()) {
(gdb) bt
#0  StringName::StringName (this=this@entry=0x7fffc93ff7f0, p_name=...)
    at core/string/string_name.cpp:197
#1  0x0000555559f915cd in MethodBind::get_name (this=this@entry=0x30)
    at core/object/method_bind.cpp:92
#2  0x0000555555f73cd6 in GDScriptFunction::call (this=<optimized out>, 
    p_instance=<optimized out>, p_instance@entry=0x55555f46aad0, 
    p_args=p_args@entry=0x0, p_argcount=<optimized out>, r_err=..., 
    p_state=<optimized out>) at modules/gdscript/gdscript_vm.cpp:1810
#3  0x0000555555e03ccf in GDScriptInstance::callp (this=0x55555f46aad0, 
    p_method=..., p_args=<optimized out>, p_argcount=<optimized out>, 
    r_error=...) at modules/gdscript/gdscript.cpp:1970
#4  0x0000555559f93a4c in Object::callp (this=0x55555f4812d0, p_method=..., 
    p_args=0x0, p_argcount=0, r_error=...) at core/object/object.cpp:815
#5  0x0000555559cffb04 in Callable::callp (this=0x7fffc93ffb50, 
    p_arguments=0x0, p_argcount=0, r_return_value=..., r_call_error=...)
    at core/variant/callable.cpp:69
#6  0x000055555a074db5 in core_bind::Thread::_start_func (ud=<optimized out>)
    at core/core_bind.cpp:1257
#7  0x0000555559a8384d in Thread::callback (p_caller_id=<optimized out>, 
    p_settings=..., 
    p_callback=0x55555a074b00 <core_bind::Thread::_start_func(void*)>, 
    p_userdata=0x7fffa8009800) at core/os/thread.cpp:64
#8  0x000055555a35ab73 in execute_native_thread_routine ()
#9  0x00007ffff7d5d45c in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:444
#10 0x00007ffff7dddbbc in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Note: When I run outside GDB, Godot says

Engine version: Godot Engine v4.3.dev.custom_build (9bc49a66bae5e9e506f12df3b3e141c8da13f983)
Dumping the backtrace. Please include this when reporting the bug to the project developer.

But actually it doesn't seem to dump the backtrace.

Steps to reproduce

I found this when trying (and failing) to debug #75308.

The repro works like this: it has a thread calling some native function in a loop. To repro, start the project and then close it by closing the window. About 1/10 times (very rough estimate), the error will happen. Sorry the repro is nondeterministic; it depends on a race condition.

Minimal reproduction project (MRP)

xcb.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants