fix!: Perform a clean exit again #2463

fredizzimo · 2024-04-04T00:28:00Z

What kind of change does this PR introduce?

The main loop now waits for Neovim to quit before exiting. This allows error messages to be shown and dealt with, instead of leaking nvim processes, as neovide will block here https://github.com/neovim/neovim/blob/15a2dd9e963784123273ec830e82e24ecea4ad0b/src/nvim/main.c#L802-L807

It primarily waits for the stdio stream to finish, but due to bugs, like neovim/neovim#26743, that does not always happen, so it additionally waits for the process to exit, and forces a quite if the io stream has not finished within 2 seconds after that. If you use Neovim nightly with neovim/neovim#28157, it should not happen

An exit crash that happens on some Wayland systems has also been fixed, by properly shutting down the rendering system before the event loop finishes.

Did this PR introduce a breaking change?

The special handling of just detaching from remote connections when closing the Window, which was added here #687 has been removed. It would uneccessarily complicate this change, and I found that re-attaching causes a lot of bugs, like invisible text, the cursor showing the wrong position, the screen not updating and so on, so it's not very usable anyway.

NOTE: That it's still possible to detach manually by the following command: :call chanclose(g:neovide_channel_id)

Some hangs that @sid-6581 has reported on Windows might also start to appear again. But it will not hang compltely, just for 2 seconds. If that still happens, the actual cause should be determined and fixed. The same applies to other platforms as well of course.

fredizzimo · 2024-04-04T00:36:09Z

src/running_tracker.rs

        info!("Quit with code {}: {}", code, reason);
    }

-    pub fn is_running(&self) -> bool {
-        self.running.load(Ordering::Relaxed)
-    }


The running tracker only tracks the exit code now, so maybe it could be renamed or replaced by variable. But I left that for future refactoring.

fredizzimo · 2024-04-04T18:45:17Z

src/bridge/mod.rs

-        Err(join_error) => error!("Error joining IO loop: '{}'", join_error),
-        Ok(Err(error)) => {
-            if !error.is_channel_closed() {
-                error!("Error: '{}'", error);


I will add this message back. I thought it wasn't important, but it's actually, the message printed in this bug for example:

Neovide crashing while "working" Value not an Array: '{"kind": "syntax", "hi_name": "SignColumn"... } #2452

Kethku · 2024-04-04T21:26:32Z

src/bridge/ui_commands.rs

@@ -276,15 +278,15 @@ impl AsRef<str> for UiCommand {
 }

 struct UIChannels {
-    sender: LoggingSender<UiCommand>,
+    sender: Mutex<Option<LoggingSender<UiCommand>>>,


These Mutex Options over concurrent capable types kill me a little, but after thinking about it for a bit nothing cleaner came to mind so I think its reasonable to leave it.

One thing we might consider is to use a parkinglot RwLock instead since the base case is just reading and we only write when the app closes. But tbh thats a micro optimization to take it or leave it

I simplified this, using OnceLock

sid-6581 · 2024-04-04T21:28:08Z

I just tried this PR, and got a hard lockup immediately. It hangs forever, not just for 2 seconds.

sid-6581 · 2024-04-04T21:36:07Z

src/bridge/mod.rs

+        let neovim_exited = select! {
+            _ = &mut session.io_handle => false,
+            _ = process.wait() => {
+                log::info!("The Neovim process quit before the IO stream, waiting two seconds");
+                true
+            }
+        };


I tried this approach as one of my first attempts when trying to fix this problem, but wait() doesn't detect that the process is gone.

On Linux this workaround is needed for Neovim 0.9.5 without my fix neovim/neovim#28157. There it correctly detects both the stdio stream close and process exit.

sid-6581 · 2024-04-04T21:36:44Z

src/bridge/mod.rs

+            let sleep = sleep(Duration::from_millis(2000));
+            tokio::pin!(sleep);
+            select! {
+                _ = session.io_handle => {}
+                _ = &mut sleep  => {
+                    log::info!("The IO stream was never closed, forcing Neovide to exit");
+                }


Wouldn't tokio::time::timeout() be simpler here?

Maybe, but it looks like the timeout consumes the future, and I want to re-use it after that.

I changed it to a Tokio timeout

sid-6581 · 2024-04-04T21:44:14Z

src/bridge/mod.rs

-        self.runtime.take().unwrap().shutdown_background();
+        log::info!("Starting neovim runtime shutdown");
+        let start = Instant::now();
+        shutdown_ui();


Shutting down the UI explicitly from the runtime drop doesn't feel very clean. IMHO the runtime should be concerned only about itself, and shutting down other things should be done elsewhere.

As an aside, I'm not sure why the runtime is created manually instead of just using tokio::main. I did some refactoring earlier and eliminated the entire NeovimRuntime and just used tokio::main, and it was a lot cleaner.

The function is probably a bit poorly named, it shuts down the UI event handlers, not the whole UI system. And we do have to wait for them here, otherwise they start to print stuff to stderr instead of the logging system, as the rest of the runtime shuts down while they continue running.

I simplified the code and removed this. The UI command tasks are now cancelled with the runtime shutdown.

fredizzimo · 2024-04-04T22:07:35Z

@sid-6581, We really need to find the root cause of not being able to detect either the process exit or the closing of the io streams. That's something that I know for sure works in Windows, tons of programs are absolutely dependent on that, and I have written a few myself and not something we should work around and cause more bugs anymore.

And those two are the only two signals we can rely on, since we need to keep the Neovide UI active, in order to be able to respond to errors that can happen after VimLeave and waits for user input here https://github.com/neovim/neovim/blob/15a2dd9e963784123273ec830e82e24ecea4ad0b/src/nvim/main.c#L802-L807

Unfortunately, I can't help much with that, since I'm unable to repeat it on any of my systems. It does seem to affect a few others, but for most it looks like the problem was with the corrupt shada files, which is now fixed so that the error message is shown and we can send the inputs. So, I think the problem is very rare.

sid-6581 · 2024-04-04T22:33:14Z

@fredizzimo I should have some more free time now to look at this again. I'll try to debug this again, I have a couple of things I'd like to try. At least I can still easily reproduce it, which is good.

fredizzimo · 2024-04-04T22:48:49Z

Great thanks, I would start by checking if it's stuck here:
https://github.com/rust-lang/rust/blob/a4b11c8e6057bd47f8d241baafd5ae1e813d62fd/library/std/src/sys/pal/windows/handle.rs#L250

That's where I think they might violate one or more of the conditions required https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntreadfile

The caller can safely wait on the file handle only if all three of the following conditions are met:

The handle was opened for asynchronous access (that is, neither FILE_SYNCHRONOUS_IO_XXX flag was specified).
The handle is being used for only one I/O operation at a time.
NtReadFile returned STATUS_PENDING.

Furthermore, this document says that asynchronous IO on pipes can only be performed using overlapped operations https://learn.microsoft.com/en-us/windows/win32/ipc/synchronous-and-overlapped-input-and-output#enabling-asynchronous-operation
NtReadFile supports it, but they are not using it for some reason.

sid-6581 · 2024-04-05T00:28:41Z

@fredizzimo I'm in the debugger now, and that's exactly where it's stuck. I need to dig further, but it looks like it's stuck waiting for a handle that the neovide process no longer has open. Here you see it's waiting for handle 0x384:

But it doesn't appear that neovide still has that handle open:

I can't quite track it back to anything specific to see what it's actually trying to wait on, since there are no clues in the current call stack, but I'll keep digging.

sid-6581 · 2024-04-05T00:46:38Z

I forgot to add -a as an argument in the screenshot above. I reproduced the hang again, and when I show all the handles it shows that it hangs on named pipe 350 here:

That leads me to https://github.com/rust-lang/rust/blob/385fa9d845dd326c6bbfd58c22244215e431948a/library/std/src/sys/pal/windows/pipe.rs#L83

I'll backtrack from there.

fredizzimo · 2024-04-05T00:51:06Z

That's most likely one of the pipes created here with piped()

neovide/src/bridge/session.rs

Line 84 in a973abb

let mut child = cmd.stdin(Stdio::piped()).stdout(Stdio::piped()).spawn()?;

You could maybe also try process monitor to log all the file io calls https://learn.microsoft.com/en-us/sysinternals/downloads/procmon

sid-6581 · 2024-04-05T01:02:12Z

Yep, that's what I'll try next. It's interesting that the pipe remains open when the nvim process is gone.

fredizzimo · 2024-04-05T20:46:08Z

I will put this as draft, so that it does not accidentally get merged until the Window issue is resolved or at least identified.

sid-6581 · 2024-04-05T22:59:19Z

I'm still digging, but from what I can tell:

Sometimes both named pipes stay open after nvim exits, sometimes stdin is closed but stdout remains open.

The named pipes are create in overlapped mode (only for neovide, not the ones that are handed to neovim), so WaitForSingleObject should work fine on them. However, the documentation says this:

If this handle is closed while the wait is still pending, the function's behavior is undefined

ProcMon slowed down things enough that I couldn't recreate the hang while it was running.

I'll be traveling this weekend, but I'll get back to it when I get back.

sid-6581 · 2024-04-06T02:05:37Z

Actually, the more I think about, I don't understand why the read from the overlapped named pipe is a blocking operation using WaitForSingleObject.

fredizzimo · 2024-05-11T16:08:29Z

I'm re-opening this. I still think it's better to hang at the exit on a few systems than to leak Neovim processes.

github-actions · 2024-05-11T16:22:10Z

Test Results

6 files ±0 6 suites ±0 20s ⏱️ +2s
110 tests ±0 110 ✅ ±0 0 💤 ±0 0 ❌ ±0
644 runs ±0 644 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 35582e8. ± Comparison against base commit 0eebfbb.

* Don't request exit multiple times * Wait until the Neovim exits before exiting the event loop * Fix exit crash on Wayland * Disable detach when the window is closed for remote connections * Wait for all tasks to exits, so that logs don't go to stderr * Force quit after some timeout, when the neovim process quits * Add some logging * Clean up the io stream wait code * Simplify the code, let the runtime shutdown the UICommand tasks

fredizzimo commented Apr 4, 2024

View reviewed changes

fredizzimo mentioned this pull request Apr 4, 2024

Exiting neovide on windows leaves orphaned neovim processes around #2387

Closed

fredizzimo commented Apr 4, 2024

View reviewed changes

Kethku reviewed Apr 4, 2024

View reviewed changes

sid-6581 reviewed Apr 4, 2024

View reviewed changes

fredizzimo marked this pull request as draft April 5, 2024 20:46

fredizzimo mentioned this pull request Apr 25, 2024

no-fork exits with SIGSEGV #2504

Closed

fredizzimo added 9 commits May 11, 2024 18:58

Don't request exit multiple times

b9ca216

Wait until the Neovim exits before exiting the event loop

66a85d5

Fix exit crash on Wayland

2bee0b4

Disable detach when the window is closed for remote connections

4d164ee

Wait for all tasks to exits, so that logs don't go to stderr

098b6ff

Force quit after some timeout, when the neovim process quits

686d8a1

Add some logging

522375b

Clean up the io stream wait code

07f00c6

Simplify the code, let the runtime shutdown the UICommand tasks

35582e8

fredizzimo force-pushed the fsundvik/wait_for_exit branch from 9e0699e to 35582e8 Compare May 11, 2024 16:07

fredizzimo marked this pull request as ready for review May 11, 2024 16:07

Kethku merged commit 0153bdb into neovide:main May 11, 2024
12 checks passed

fredizzimo mentioned this pull request May 12, 2024

The inconsistency in the behavior of Neovide when closing compared to the closing of nvim leads to the issue of nvim processes remaining active in the background. #2540

Closed

moonfruit mentioned this pull request May 17, 2024

neovide 0.13.0 Homebrew/homebrew-core#171965

Merged

This was referenced May 17, 2024

nvim process sometimes not quitting with Neovide #1595

Closed

Neovide hangs when nvim exits unexpectedly #1823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix!: Perform a clean exit again #2463

fix!: Perform a clean exit again #2463

fredizzimo commented Apr 4, 2024

fredizzimo Apr 4, 2024

fredizzimo Apr 4, 2024

Kethku Apr 4, 2024

fredizzimo Apr 5, 2024

sid-6581 commented Apr 4, 2024

sid-6581 Apr 4, 2024

fredizzimo Apr 4, 2024

sid-6581 Apr 4, 2024

fredizzimo Apr 4, 2024

fredizzimo Apr 5, 2024

sid-6581 Apr 4, 2024

fredizzimo Apr 4, 2024

fredizzimo Apr 5, 2024

fredizzimo commented Apr 4, 2024

sid-6581 commented Apr 4, 2024

fredizzimo commented Apr 4, 2024

sid-6581 commented Apr 5, 2024

sid-6581 commented Apr 5, 2024

fredizzimo commented Apr 5, 2024

sid-6581 commented Apr 5, 2024

fredizzimo commented Apr 5, 2024

sid-6581 commented Apr 5, 2024

sid-6581 commented Apr 6, 2024

fredizzimo commented May 11, 2024

github-actions bot commented May 11, 2024

fix!: Perform a clean exit again #2463

fix!: Perform a clean exit again #2463

Conversation

fredizzimo commented Apr 4, 2024

What kind of change does this PR introduce?

Did this PR introduce a breaking change?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sid-6581 commented Apr 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fredizzimo commented Apr 4, 2024

sid-6581 commented Apr 4, 2024

fredizzimo commented Apr 4, 2024

sid-6581 commented Apr 5, 2024

sid-6581 commented Apr 5, 2024

fredizzimo commented Apr 5, 2024

sid-6581 commented Apr 5, 2024

fredizzimo commented Apr 5, 2024

sid-6581 commented Apr 5, 2024

sid-6581 commented Apr 6, 2024

fredizzimo commented May 11, 2024

github-actions bot commented May 11, 2024

Test Results