Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: Perform a clean exit again #2463

Merged
merged 9 commits into from May 11, 2024

Conversation

fredizzimo
Copy link
Member

What kind of change does this PR introduce?

The main loop now waits for Neovim to quit before exiting. This allows error messages to be shown and dealt with, instead of leaking nvim processes, as neovide will block here https://github.com/neovim/neovim/blob/15a2dd9e963784123273ec830e82e24ecea4ad0b/src/nvim/main.c#L802-L807

It primarily waits for the stdio stream to finish, but due to bugs, like neovim/neovim#26743, that does not always happen, so it additionally waits for the process to exit, and forces a quite if the io stream has not finished within 2 seconds after that. If you use Neovim nightly with neovim/neovim#28157, it should not happen

An exit crash that happens on some Wayland systems has also been fixed, by properly shutting down the rendering system before the event loop finishes.

Did this PR introduce a breaking change?

The special handling of just detaching from remote connections when closing the Window, which was added here #687 has been removed. It would uneccessarily complicate this change, and I found that re-attaching causes a lot of bugs, like invisible text, the cursor showing the wrong position, the screen not updating and so on, so it's not very usable anyway.

NOTE: That it's still possible to detach manually by the following command: :call chanclose(g:neovide_channel_id)

Some hangs that @sid-6581 has reported on Windows might also start to appear again. But it will not hang compltely, just for 2 seconds. If that still happens, the actual cause should be determined and fixed. The same applies to other platforms as well of course.

info!("Quit with code {}: {}", code, reason);
}

pub fn is_running(&self) -> bool {
self.running.load(Ordering::Relaxed)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The running tracker only tracks the exit code now, so maybe it could be renamed or replaced by variable. But I left that for future refactoring.

Err(join_error) => error!("Error joining IO loop: '{}'", join_error),
Ok(Err(error)) => {
if !error.is_channel_closed() {
error!("Error: '{}'", error);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this message back. I thought it wasn't important, but it's actually, the message printed in this bug for example:

@@ -276,15 +278,15 @@ impl AsRef<str> for UiCommand {
}

struct UIChannels {
sender: LoggingSender<UiCommand>,
sender: Mutex<Option<LoggingSender<UiCommand>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Mutex Options over concurrent capable types kill me a little, but after thinking about it for a bit nothing cleaner came to mind so I think its reasonable to leave it.

One thing we might consider is to use a parkinglot RwLock instead since the base case is just reading and we only write when the app closes. But tbh thats a micro optimization to take it or leave it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified this, using OnceLock

@sid-6581
Copy link
Contributor

sid-6581 commented Apr 4, 2024

I just tried this PR, and got a hard lockup immediately. It hangs forever, not just for 2 seconds.

image

Comment on lines 259 to 265
let neovim_exited = select! {
_ = &mut session.io_handle => false,
_ = process.wait() => {
log::info!("The Neovim process quit before the IO stream, waiting two seconds");
true
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this approach as one of my first attempts when trying to fix this problem, but wait() doesn't detect that the process is gone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux this workaround is needed for Neovim 0.9.5 without my fix neovim/neovim#28157. There it correctly detects both the stdio stream close and process exit.

Comment on lines 274 to 160
let sleep = sleep(Duration::from_millis(2000));
tokio::pin!(sleep);
select! {
_ = session.io_handle => {}
_ = &mut sleep => {
log::info!("The IO stream was never closed, forcing Neovide to exit");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't tokio::time::timeout() be simpler here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but it looks like the timeout consumes the future, and I want to re-use it after that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to a Tokio timeout

self.runtime.take().unwrap().shutdown_background();
log::info!("Starting neovim runtime shutdown");
let start = Instant::now();
shutdown_ui();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shutting down the UI explicitly from the runtime drop doesn't feel very clean. IMHO the runtime should be concerned only about itself, and shutting down other things should be done elsewhere.

As an aside, I'm not sure why the runtime is created manually instead of just using tokio::main. I did some refactoring earlier and eliminated the entire NeovimRuntime and just used tokio::main, and it was a lot cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is probably a bit poorly named, it shuts down the UI event handlers, not the whole UI system. And we do have to wait for them here, otherwise they start to print stuff to stderr instead of the logging system, as the rest of the runtime shuts down while they continue running.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the code and removed this. The UI command tasks are now cancelled with the runtime shutdown.

@fredizzimo
Copy link
Member Author

@sid-6581, We really need to find the root cause of not being able to detect either the process exit or the closing of the io streams. That's something that I know for sure works in Windows, tons of programs are absolutely dependent on that, and I have written a few myself and not something we should work around and cause more bugs anymore.

And those two are the only two signals we can rely on, since we need to keep the Neovide UI active, in order to be able to respond to errors that can happen after VimLeave and waits for user input here https://github.com/neovim/neovim/blob/15a2dd9e963784123273ec830e82e24ecea4ad0b/src/nvim/main.c#L802-L807

Unfortunately, I can't help much with that, since I'm unable to repeat it on any of my systems. It does seem to affect a few others, but for most it looks like the problem was with the corrupt shada files, which is now fixed so that the error message is shown and we can send the inputs. So, I think the problem is very rare.

@sid-6581
Copy link
Contributor

sid-6581 commented Apr 4, 2024

@fredizzimo I should have some more free time now to look at this again. I'll try to debug this again, I have a couple of things I'd like to try. At least I can still easily reproduce it, which is good.

@fredizzimo
Copy link
Member Author

Great thanks, I would start by checking if it's stuck here:
https://github.com/rust-lang/rust/blob/a4b11c8e6057bd47f8d241baafd5ae1e813d62fd/library/std/src/sys/pal/windows/handle.rs#L250

That's where I think they might violate one or more of the conditions required https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntreadfile

The caller can safely wait on the file handle only if all three of the following conditions are met:

The handle was opened for asynchronous access (that is, neither FILE_SYNCHRONOUS_IO_XXX flag was specified).
The handle is being used for only one I/O operation at a time.
NtReadFile returned STATUS_PENDING.

Furthermore, this document says that asynchronous IO on pipes can only be performed using overlapped operations https://learn.microsoft.com/en-us/windows/win32/ipc/synchronous-and-overlapped-input-and-output#enabling-asynchronous-operation
NtReadFile supports it, but they are not using it for some reason.

@sid-6581
Copy link
Contributor

sid-6581 commented Apr 5, 2024

@fredizzimo I'm in the debugger now, and that's exactly where it's stuck. I need to dig further, but it looks like it's stuck waiting for a handle that the neovide process no longer has open. Here you see it's waiting for handle 0x384:

image

But it doesn't appear that neovide still has that handle open:

image

I can't quite track it back to anything specific to see what it's actually trying to wait on, since there are no clues in the current call stack, but I'll keep digging.

@sid-6581
Copy link
Contributor

sid-6581 commented Apr 5, 2024

I forgot to add -a as an argument in the screenshot above. I reproduced the hang again, and when I show all the handles it shows that it hangs on named pipe 350 here:

image

That leads me to https://github.com/rust-lang/rust/blob/385fa9d845dd326c6bbfd58c22244215e431948a/library/std/src/sys/pal/windows/pipe.rs#L83

I'll backtrack from there.

@fredizzimo
Copy link
Member Author

That's most likely one of the pipes created here with piped()

let mut child = cmd.stdin(Stdio::piped()).stdout(Stdio::piped()).spawn()?;

You could maybe also try process monitor to log all the file io calls https://learn.microsoft.com/en-us/sysinternals/downloads/procmon

@sid-6581
Copy link
Contributor

sid-6581 commented Apr 5, 2024

Yep, that's what I'll try next. It's interesting that the pipe remains open when the nvim process is gone.

@fredizzimo
Copy link
Member Author

I will put this as draft, so that it does not accidentally get merged until the Window issue is resolved or at least identified.

@fredizzimo fredizzimo marked this pull request as draft April 5, 2024 20:46
@sid-6581
Copy link
Contributor

sid-6581 commented Apr 5, 2024

I'm still digging, but from what I can tell:

Sometimes both named pipes stay open after nvim exits, sometimes stdin is closed but stdout remains open.

The named pipes are create in overlapped mode (only for neovide, not the ones that are handed to neovim), so WaitForSingleObject should work fine on them. However, the documentation says this:

If this handle is closed while the wait is still pending, the function's behavior is undefined

ProcMon slowed down things enough that I couldn't recreate the hang while it was running.

I'll be traveling this weekend, but I'll get back to it when I get back.

@sid-6581
Copy link
Contributor

sid-6581 commented Apr 6, 2024

Actually, the more I think about, I don't understand why the read from the overlapped named pipe is a blocking operation using WaitForSingleObject.

@fredizzimo fredizzimo marked this pull request as ready for review May 11, 2024 16:07
@fredizzimo
Copy link
Member Author

I'm re-opening this. I still think it's better to hang at the exit on a few systems than to leak Neovim processes.

Copy link

Test Results

  6 files  ±0    6 suites  ±0   20s ⏱️ +2s
110 tests ±0  110 ✅ ±0  0 💤 ±0  0 ❌ ±0 
644 runs  ±0  644 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 35582e8. ± Comparison against base commit 0eebfbb.

@Kethku Kethku merged commit 0153bdb into neovide:main May 11, 2024
12 checks passed
zbyna pushed a commit to zbyna/neovide that referenced this pull request May 17, 2024
* Don't request exit multiple times

* Wait until the Neovim exits before exiting the event loop

* Fix exit crash on Wayland

* Disable detach when the window is closed for remote connections

* Wait for all tasks to exits, so that logs don't go to stderr

* Force quit after some timeout, when the neovim process quits

* Add some logging

* Clean up the io stream wait code

* Simplify the code, let the runtime shutdown the UICommand tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants