Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zenoh-c DLL panics in libc::atexit handler on Windows #973

Open
fuzzypixelz opened this issue Apr 25, 2024 · 2 comments
Open

zenoh-c DLL panics in libc::atexit handler on Windows #973

fuzzypixelz opened this issue Apr 25, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@fuzzypixelz
Copy link
Member

Describe the bug

See this workflow run failure for context.

The z_api_double_drop_test fails when syncing when syncing with zenoh, starting from commit 0283aaa.

I've observed this crash only when zenoh-c is linked dynamically to an application and not when linked statically. Weirdly enough, this crash still happens when one of (or both of) the z_drop calls are removed.

Unfold this line to see the backtrace of the crash
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439:40
stack backtrace:
   0:     0x7ffd8557aee3 - std::backtrace_rs::backtrace::dbghelp::trace
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
   1:     0x7ffd8557aee3 - std::backtrace_rs::backtrace::trace_unsynchronized
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2:     0x7ffd8557aee3 - std::sys_common::backtrace::_print_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
   3:     0x7ffd8557aee3 - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
   4:     0x7ffd852eca2b - core::fmt::rt::Argument::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
   5:     0x7ffd852eca2b - core::fmt::write
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
   6:     0x7ffd85568c80 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
   7:     0x7ffd8557d1db - std::sys_common::backtrace::_print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
   8:     0x7ffd8557d1db - std::sys_common::backtrace::print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
   9:     0x7ffd8557cdce - std::panicking::default_hook::closure$1
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
  10:     0x7ffd8557dda4 - std::panicking::default_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
  11:     0x7ffd8557dda4 - std::panicking::rust_panic_with_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
  12:     0x7ffd8557d803 - std::panicking::begin_panic_handler::closure$0
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:595
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:117
  17:     0x7ffd85581ce9 - std::alloc::_::__rg_oom
  18:     0x7ffd8558e68e - std::alloc::_::__rg_oom
  19:     0x7ffd858a17ca - std::alloc::_::__rg_oom
  20:     0x7ffd858a2963 - std::alloc::_::__rg_oom
  21:     0x7ffd8557a0b2 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
  22:     0x7ffd8557a0b2 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
  23:     0x7ffd8557a0b2 - core::mem::drop
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\mem\mod.rs:987
  24:     0x7ffd8557a0b2 - std::sys::windows::thread::Thread::new
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys\windows\thread.rs:47
  25:     0x7ffd858a265a - std::alloc::_::__rg_oom
  26:     0x7ffd858a195a - std::alloc::_::__rg_oom
  27:     0x7ffd858a163a - std::alloc::_::__rg_oom
  28:     0x7ffdd52742d6 - execute_onexit_table
  29:     0x7ffdd52741fb - execute_onexit_table
  30:     0x7ffdd52741b4 - execute_onexit_table
  31:     0x7ffd859d88fd - dllmain_crt_process_detach
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
  32:     0x7ffd859d8a22 - dllmain_dispatch
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
  33:     0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
  34:     0x7ffdd77edcda - LdrShutdownProcess
  35:     0x7ffdd77eda8d - RtlExitUserProcess
  36:     0x7ffdd611e3bb - FatalExit
  37:     0x7ffdd52805bc - exit
  38:     0x7ffdd528045f - exit
  39:     0x7ff7db3112c7 - <unknown>
  40:     0x7ffdd6117344 - BaseThreadInitThunk

The Drop implementation of ZRuntimePool calls .shutdown_timeout() on each runtime in parallel by spawning a thread for each shutdown operation. This Drop implementation is in turn called in a libc::atexit handler.

I'm still not sure why this causes the crash. You can see from the backtrace that a new thread is created (I think this the thread spawned in the Drop implementation?) after zenoh receives a DLL_PROCESS_DETACH notification because the atexit handler of the DLL is called after the application process exits (actually, the DLL has its own atexit handler stack separate from the application). All application threads are signaled at process exit and thus calls to WaitForSingleObject return immediately (this is what Rust uses to implement .join()). The following is the origin of the panic:

# https://github.com/rust-lang/rust/blob/master/library/std/src/thread/mod.rs#L1577
impl<'scope, T> JoinInner<'scope, T> {
    fn join(mut self) -> Result<T> {
        // Calls `WaitForSingleObject` on Windows
        self.native.join();
        // The first `.unwrap()` is what panics. 
        // The standard library assumes that:
        // "the caller will never read this packet until the thread has exited",
        // but that invariant is somehow broken here,
        // probably because the thread doesn't exit normally.
        Arc::get_mut(&mut self.packet).unwrap().result.get_mut().take().unwrap()
    }
}

So my theory is that threads spawned after process exit as part of a DLL's atexit handler are somehow signaled at creation and therefore terminate immediately after calling WaitForSingleObject without terminating correctly and dropping their "packet" handles, but I don't have any proof. I think the underlying issue here is much more subtle and needs more digging (but we have a release to push out!). But back to the Drop implementation:

impl Drop for ZRuntimePool {
    fn drop(&mut self) {
        let t = std::time::Instant::now();
        let handles: Vec<_> = self
            .0
            .drain()
            .filter_map(|(_name, mut rt)| {
                rt.take()
                    .map(|r| std::thread::spawn(move || r.shutdown_timeout(Duration::from_secs(1))))
            })
            .collect();

        for hd in handles {
            let _ = hd.join();
        }
    }
}

To reproduce

  1. Run zenoh-c tests using zenoh commit 0283aaa on Windows.

System info

  • Platform: Windows
  • Zenoh commit: 0283aaa
@fuzzypixelz fuzzypixelz added the bug Something isn't working label Apr 25, 2024
@fuzzypixelz fuzzypixelz changed the title zenoh-c DLL crahes in libc::atexit handler on Windows zenoh-c DLL panics in libc::atexit handler on Windows Apr 25, 2024
@fuzzypixelz
Copy link
Member Author

After more digging realized that the error is non-deterministic. Sometimes the std::thread::spawn calls in the atexit handler fail with "Access is denied." (and this is what happens when one tries to spawn a thread on Windows in the atexit handler of a DLL, in general). But sometimes the error is much further down in Tokio.

I also realized I wasn't enabling debug symbols in my build and so my backtraces were not helpful at all. The following are backtraces for each scenario. Please note that the code from which I got the backtraces is slightly modified, but is functionally the same.

Backtrace when `Runtime::drop` panics
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439:40
stack backtrace:
   0:     0x7ffda9e7c753 - std::backtrace_rs::backtrace::dbghelp::trace
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
   1:     0x7ffda9e7c753 - std::backtrace_rs::backtrace::trace_unsynchronized
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2:     0x7ffda9e7c753 - std::sys_common::backtrace::_print_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
   3:     0x7ffda9e7c753 - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
   4:     0x7ffda9beed0b - core::fmt::rt::Argument::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
   5:     0x7ffda9beed0b - core::fmt::write
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
   6:     0x7ffda9e6ad50 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
   7:     0x7ffda9e7ea3b - std::sys_common::backtrace::_print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
   8:     0x7ffda9e7ea3b - std::sys_common::backtrace::print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
   9:     0x7ffda9e7e63e - std::panicking::default_hook::closure$1
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
  10:     0x7ffda9e7f594 - std::panicking::default_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
  11:     0x7ffda9e7f594 - std::panicking::rust_panic_with_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
  12:     0x7ffda9e7eff3 - std::panicking::begin_panic_handler::closure$0
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:595
  13:     0x7ffda9e7ef79 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0,never$>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:151
  14:     0x7ffda9e7ef64 - std::panicking::begin_panic_handler
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:593
  15:     0x7ffdaa2dea85 - core::panicking::panic_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
  16:     0x7ffdaa2dec52 - core::panicking::panic
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:117
  17:     0x7ffda9e83239 - std::thread::JoinInner<tuple$<> >::join<tuple$<> >
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439
  18:     0x7ffda9e8f8fe - tokio::runtime::blocking::pool::BlockingPool::shutdown
                               at C:\Users\zenoh\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.36.0\src\runtime\blocking\pool.rs:270
  19:     0x7ffdaa1a181a - tokio::runtime::blocking::pool::impl$4::drop
                               at C:\Users\zenoh\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.36.0\src\runtime\blocking\pool.rs:278
  20:     0x7ffdaa1a181a - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  21:     0x7ffdaa1a181a - core::ptr::drop_in_place<tokio::runtime::runtime::Runtime>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  22:     0x7ffdaa1a2b89 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  23:     0x7ffdaa1a2b89 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  24:     0x7ffdaa1a2b89 - core::mem::maybe_uninit::MaybeUninit<zenoh_runtime::impl$5::drop::closure$1::closure$0::closure_env$0>::assume_init_drop
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\mem\maybe_uninit.rs:728
  25:     0x7ffdaa1a2b89 - std::thread::impl$0::spawn_unchecked_::impl$1::drop
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:510
  26:     0x7ffdaa1a2b89 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  27:     0x7ffdaa1a2b89 - core::ptr::drop_in_place<std::thread::impl$0::spawn_unchecked_::closure_env$1<zenoh_runtime::impl$5::drop::closure$1::closure$0::closure_env$0,tuple$<>
 > >
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  28:     0x7ffda9e7b922 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
  29:     0x7ffda9e7b922 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
  30:     0x7ffda9e7b922 - core::mem::drop
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\mem\mod.rs:987
  31:     0x7ffda9e7b922 - std::sys::windows::thread::Thread::new
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys\windows\thread.rs:47
  32:     0x7ffdaa1a28b0 - std::panicking::try::do_call
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:500
  33:     0x7ffdaa1a28b0 - std::panicking::try
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:464
  34:     0x7ffdaa1a28b0 - std::panic::catch_unwind
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panic.rs:142
  35:     0x7ffdaa1a28b0 - zenoh_runtime::impl$5::drop::closure$1
                               at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:202
  36:     0x7ffdaa1a28b0 - core::ops::function::impls::impl$4::call_once
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ops\function.rs:305
  37:     0x7ffdaa1a28b0 - enum2$<core::option::Option<tuple$<zenoh_runtime::ZRuntime,enum2$<core::option::Option<tokio::runtime::runtime::Runtime> > > > >::map
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\option.rs:1075
  38:     0x7ffdaa1a28b0 - core::iter::adapters::map::impl$2::next<enum2$<core::result::Result<std::thread::JoinHandle<tuple$<> >,alloc::boxed::Box<dyn$<core::any::Any,core::mark
er::Send>,alloc::alloc::Global> > >,core::iter::adapters::take::Take<core::iter::adapters::filter_map::F
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\iter\adapters\map.rs:103
  39:     0x7ffdaa1a19f6 - core::ptr::drop_in_place<zenoh_runtime::ZRuntimePool>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  40:     0x7ffdaa1a168a - zenoh_runtime::cleanup
                               at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:152
  41:     0x7ffdd52742d6 - execute_onexit_table
  42:     0x7ffdd52741fb - execute_onexit_table
  43:     0x7ffdd52741b4 - execute_onexit_table
  44:     0x7ffdaa2d8aad - dllmain_crt_process_detach
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
  45:     0x7ffdaa2d8bd2 - dllmain_dispatch
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
  46:     0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
  47:     0x7ffdd77edcda - LdrShutdownProcess
  48:     0x7ffdd77eda8d - RtlExitUserProcess
  49:     0x7ffdd611e3bb - FatalExit
  50:     0x7ffdd52805bc - exit
  51:     0x7ffdd528045f - exit
  52:     0x7ff656f212c7 - <unknown>
  53:     0x7ffdd6117344 - BaseThreadInitThunk
  54:     0x7ffdd77e26b1 - RtlUserThreadStart
Backtrace when `Runtime::drop` doesn't panic
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 5, kind: PermissionDenied, message: "Access is denied." }',
 C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:208:24
stack backtrace:
   0:     0x7ffda9e7c753 - std::backtrace_rs::backtrace::dbghelp::trace
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
   1:     0x7ffda9e7c753 - std::backtrace_rs::backtrace::trace_unsynchronized
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2:     0x7ffda9e7c753 - std::sys_common::backtrace::_print_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
   3:     0x7ffda9e7c753 - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
   4:     0x7ffda9beed0b - core::fmt::rt::Argument::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
   5:     0x7ffda9beed0b - core::fmt::write
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
   6:     0x7ffda9e6ad50 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
   7:     0x7ffda9e7ea3b - std::sys_common::backtrace::_print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
   8:     0x7ffda9e7ea3b - std::sys_common::backtrace::print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
   9:     0x7ffda9e7e63e - std::panicking::default_hook::closure$1
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
  10:     0x7ffda9e7f594 - std::panicking::default_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
  11:     0x7ffda9e7f594 - std::panicking::rust_panic_with_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
  12:     0x7ffda9e7f025 - std::panicking::begin_panic_handler::closure$0
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:597
  13:     0x7ffda9e7ef79 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0,never$>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:151
  14:     0x7ffda9e7ef64 - std::panicking::begin_panic_handler
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:593
  15:     0x7ffdaa2dea85 - core::panicking::panic_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
  16:     0x7ffdaa2defa3 - core::result::unwrap_failed
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\result.rs:1651
  17:     0x7ffdaa1a29a5 - std::panicking::try::do_call
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:500
  18:     0x7ffdaa1a29a5 - std::panicking::try
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:464
  19:     0x7ffdaa1a29a5 - std::panic::catch_unwind
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panic.rs:142
  20:     0x7ffdaa1a29a5 - zenoh_runtime::impl$5::drop::closure$1
                               at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:202
  21:     0x7ffdaa1a29a5 - core::ops::function::impls::impl$4::call_once
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ops\function.rs:305
  22:     0x7ffdaa1a29a5 - enum2$<core::option::Option<tuple$<zenoh_runtime::ZRuntime,enum2$<core::option::Option<tokio::runtime::runtime::Runtime> > > > >::map
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\option.rs:1075
  23:     0x7ffdaa1a29a5 - core::iter::adapters::map::impl$2::next<enum2$<core::result::Result<std::thread::JoinHandle<tuple$<> >,alloc::boxed::Box<dyn$<core::any::Any,core::mark
er::Send>,alloc::alloc::Global> > >,core::iter::adapters::take::Take<core::iter::adapters::filter_map::F
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\iter\adapters\map.rs:104
  24:     0x7ffdaa1a19f6 - core::ptr::drop_in_place<zenoh_runtime::ZRuntimePool>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  25:     0x7ffdaa1a168a - zenoh_runtime::cleanup
                               at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:152
  26:     0x7ffdd52742d6 - execute_onexit_table
  27:     0x7ffdd52741fb - execute_onexit_table
  28:     0x7ffdd52741b4 - execute_onexit_table
  29:     0x7ffdaa2d8aad - dllmain_crt_process_detach
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
  30:     0x7ffdaa2d8bd2 - dllmain_dispatch
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
  31:     0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
  32:     0x7ffdd77edcda - LdrShutdownProcess
  33:     0x7ffdd77eda8d - RtlExitUserProcess
  34:     0x7ffdd611e3bb - FatalExit
  35:     0x7ffdd52805bc - exit
  36:     0x7ffdd528045f - exit
  37:     0x7ff656f212c7 - <unknown>
  38:     0x7ffdd6117344 - BaseThreadInitThunk
  39:     0x7ffdd77e26b1 - RtlUserThreadStart

To understand what's going on here. Let's start with the Windows thread spawning function of the Rust stdlib:

# https://github.com/rust-lang/rust/blob/1.72.0/library/std/src/sys/windows/thread.rs#L33
let ret = c::CreateThread(
    ptr::null_mut(),
    stack,
    Some(thread_start),
    p as *mut _,
    c::STACK_SIZE_PARAM_IS_A_RESERVATION,
    ptr::null_mut(),
);
let ret = HandleOrNull::from_raw_handle(ret);
return if let Ok(handle) = ret.try_into() {
    Ok(Thread { handle: Handle::from_inner(handle) })
} else {
    // The thread failed to start and as a result p was not consumed. Therefore, it is
    // safe to reconstruct the box so that it gets deallocated.
    drop(Box::from_raw(p));
    Err(io::Error::last_os_error())
};

Thus, if a thread creation syscall fails, Rust will try to drop the thread closure before returning the error. In our case, the Drop implementation is in Tokio's tokio::runtime::blocking::BlockingPool which will .join() all threads of the runtime:

# https://github.com/tokio-rs/tokio/blob/tokio-1.35.x/tokio/src/runtime/blocking/pool.rs#L269
for (_id, handle) in workers {
    let _ = handle.join();
}

So why do we sometimes reach this point in Runtime::drop and why does this make the .join() call panic? The answer lies again in the Rust stdlib thread implementation:

# https://github.com/rust-lang/rust/blob/1.72.0/library/std/src/thread/mod.rs#L528
let try_result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
    crate::sys_common::backtrace::__rust_begin_short_backtrace(f)
}));
// SAFETY: `their_packet` as been built just above and moved by the
// closure (it is an Arc<...>) and `my_packet` will be stored in the
// same `JoinInner` as this closure meaning the mutation will be
// safe (not modify it and affect a value far away).
unsafe { *their_packet.result.get() = Some(try_result) };
// Here `their_packet` gets dropped, and if this is the last `Arc` for that packet that
// will call `decrement_num_running_threads` and therefore signal that this thread is
// done.
drop(their_packet);

In the above snippet, f is the closure of the spawned thread. The Packet object is a means to transfer the result of the spawned thread back to the current thread. Thus the .join() implementation will first call WaitForSingleObject on Windows (i.e self.native.join()) and then assume that the thread finished execution and dropped its packet:

# https://github.com/rust-lang/rust/blob/master/library/std/src/thread/mod.rs#L1577
impl<'scope, T> JoinInner<'scope, T> {
    fn join(mut self) -> Result<T> {
        // Calls `WaitForSingleObject` on Windows
        self.native.join();
        Arc::get_mut(&mut self.packet).unwrap().result.get_mut().take().unwrap()
    }
}

Except that when the zenoh-c application (not the DLL) exits, Windows would've already signaled all the Tokio runtime threads by the time we reach the atexit handler. So there is a race condition where sometimes the thread will stop exection before dropping its packet (or setting the result value for that matter).

If a runtime thread ends up dropping its packet, then the .join() call on its handle will succeed, thus the std::thread::spawn call will correctly return the Windows "Access Denied" error. Otherwise, the .join() call will panic, misleading us about the origin of the error.

@fuzzypixelz
Copy link
Member Author

I opened rust-lang/rust#124466 and rust-lang/rust#124468 to discuss/improve the stdlib's handling of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant