Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pty close behavior, signals at exit #2387

Merged
merged 6 commits into from May 15, 2024

Conversation

cgull
Copy link
Contributor

@cgull cgull commented May 11, 2024

There's two big things in here.

When running mosh --local 0, I found that mosh-server would never terminate on iSH. It was waiting for the master side of its pty to close. Most applications that use ptys use other methods to detect child termination, but mosh-server expects that the pty master will act as a closed file after the last program on the pty slave exits-- any waiting reads should return immediately with 0 bytes, and further read calls should return EOF. This is what pipes and sockets do, of course. iSH master ptys stayed in the read call forever, even though the pty slave was gone. This fixes that and ends up better separating generic tty code from pty slaves from pty masters too.

dtach behaves similarly and the fix helps it too.

After I fixed that, I found that CLI ish would exit with signal 9 when mosh-server exited. This is because the README recommends to use /bin/login as the starting process. Unfortunately, that means /bin/login runs as pid 1 (init) and is notified when orphaned processes terminate (which mosh-server is, because it double forks). /bin/login has never been intended as a process group leader or init process, it expects to start and wait() for exactly 1 child, usually an interactive shell.

But running ish -f alpine /bin/sh didn't work correctly either, when the initial shell exited iSH would exit on signal 9. (Or something, this was a couple of months ago and I'm forgetting details. Maybe it was still when mosh-server exited, or when a subshell exited.) That turned out to be a nasty problem with halt_system(). This code runs when pid 1 exits. This code uses pthread_kill() to send a SIGKILL to threads running the emulated process group to kill off emulated processes in iSH. Uhh... that doesn't work. SIGKILL always nukes the entire process, not a single thread.

After I changed that code to do a more kernel-like thing of using deliver_signal() to send first SIGTERM, then SIGKILL, and then finally pthread_kill() with SIGTERM, iSH behaves a lot better about emulated process termination, and exits cleanly (rather than with SIGKILL) when pid 1 terminates.

I changed the README to tell people to use /bin/sh as the starting process for CLI ish, because it can work as init/pid 1 and ignores termination of orphaned processes. iSH.app should probably change to do the same.

I've not been able to build iSH.app in XCode, so all of this is untested there, but I believe it will help there-- the app won't quit when the initial shell exits.

After fixing this and then finding/fixing the procfs problem in #2386, I ran into yet another bug in iSH, I forget what it was. At that point I gave up chasing bugs in iSH.

@cgull
Copy link
Contributor Author

cgull commented May 11, 2024

Tabs removed.

Copy link
Member

@tbodt tbodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@@ -154,6 +154,10 @@ static void log_line(const char *line) {
static void log_line(const char *line) {
os_log_fault(OS_LOG_DEFAULT, "%s", line);
}
#elif LOG_HANDLER_STDERR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on the utility of this since the program in ish may also be writing to stderr, but if you found it useful it doesn't hurt to have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If iSH is running interactively (stdin and stdout are on a tty), then emulated programs are running with both stdout and stderr on the iSH tty driver. (This will be true for iSH.app as well, but writing to stderr might be less useful there.)

I hadn't even thought of CLI iSH supporting non-interactive use but yes, it does support that, and logging to stderr would be inappropriate then.

@@ -51,6 +69,13 @@ static int pty_slave_open(struct tty *tty) {
return 0;
}

static int pty_slave_close(struct tty *tty) {
if (tty->refcount - 1 == (tty->session ? 2 : 1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a really tricky bit. Could you add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -110,16 +110,7 @@ void tty_release(struct tty *tty) {
cond_destroy(&tty->produced);
free(tty);
} else {
// bit of a hack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bravo for cleaning this up

@@ -414,7 +408,7 @@ static bool pty_is_half_closed_master(struct tty *tty) {
struct tty *slave = tty->pty.other;
// only time one tty lock is nested in another
lock(&slave->lock);
bool half_closed = slave->ever_opened && slave->refcount == 1;
bool half_closed = slave->ever_opened && (slave->refcount == 1 || slave->hung_up);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the refcount check is still necessary? Would the new pty_slave_close do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed when pty_slave_cleanup() runs, I think. It calls pty_hangup_other(), which calls tty_hangup() on the master. That calls tty_input_wakeup() on the master, which may result in that user program waking up in tty_read(), or calling read() again, before the pty slave device instance is fully cleaned up and the refcount decremented. That's what I remember, anyway.

cgull added 3 commits May 13, 2024 01:41
Many pty-using programs use wait() or SIGCHLD to detect child
termination, but some expect the pty master to close and return EOF
when all children close the slave pty.  This did not work on iSH.

If the slave pty was closed by the last program exiting, the program on the
master would not receive any read/poll wakeups, and the master pty
would not be hung up.  Reference counting wasn't working properly when
the pty was in a process group or session.  The fix for that is a bit
simplistic, but it's a big improvement.

This also improves the tty/pty code separation and fixes some other
minor things.
Using SIGKILL on iSh itself was never going to end well.  :)
@tbodt tbodt merged commit cee7abd into ish-app:master May 15, 2024
6 checks passed
@francoisvignon
Copy link

francoisvignon commented May 17, 2024 via email

@tbodt
Copy link
Member

tbodt commented May 17, 2024

@francoisvignon do you have an easy way to reproduce? @cgull could you check on this?

@francoisvignon
Copy link

francoisvignon commented May 18, 2024

here is an easy way to reproduce.

using iSH 669 and :

  • gcc-11.2.1_git20220219-r2 x86 {gcc} (GPL-2.0-or-later LGPL-2.1-or-later) [installed]
  • libgcc-11.2.1_git20220219-r2 x86 {gcc} (GPL-2.0-or-later LGPL-2.1-or-later) [installed]
  • libstdc++-11.2.1_git20220219-r2 x86 {gcc} (GPL-2.0-or-later LGPL-2.1-or-later) [installed]
  • minicom-2.8-r0 x86 {minicom} (GPL-2.0-or-later) [installed]
  • musl-dev-1.2.3-r3 x86 {musl} (MIT) [installed]
  • musl-utils-1.2.3-r3 x86 {musl} (MIT BSD GPL2+) [installed]
  • make-4.3-r0 x86 {make} (GPL-3.0-or-later) [installed]

source: main.c

#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
  
int main(int argv, char **argc)
{
	int fd;
	char buf[256];
	fd = posix_openpt(O_RDWR | O_NOCTTY);
	grantpt(fd);
	unlockpt(fd);
	ptsname_r(fd, buf, 256);
	printf("client name %s\n", buf);

	for(;;)
	{
		int l;
		l = read(fd, buf, 256);
		printf("l=%d, errno=%d\n", l, errno);
		sleep(1); // to not have too many traces ;-)
	}
	return(0);
}

compiling by :

make main

executing:

McFv:~/test# ./main 
client name /dev/pts/2
// from other iSH window, connect to /dev/pts/2 by : minicom -D /dev/pts/2
// successfull
// keying CR one and two times
l=1, errno=0
l=2, errno=0
// disconnecting minicom by ^A ^X
l=-1, errno=5
l=0, errno=5
l=0, errno=5
l=0, errno=5
// from other iSH window, trying to reconnect by : minicom -D /dev/pts/2
// no way ...
l=0, errno=5
l=0, errno=5
l=0, errno=5
l=0, errno=s
// killing main
^C
McFv:
// and restarting main
McFv:~/test# ./main
client name /dev/pts/2 
// from other iSH window, connect to /dev/pts/2 by : minicom -D /dev/pts/2
// successfull
// keying CR one time
1=1, errno=0
// disconnecting minicom by ^A ^X
l=-1, errno=5
l=0, errno=5
l=0, errno=5
l=0, errno=5
l=0, errno=5
...

the same test with iSH 614

McFv:~/test# ./main
client name /dev/pts/2
// connect to /dev/pts/2 by : minicom -D /dev/pts/2
// successfull
// keying CR one and two times
l=1, errno=0
l=2, errno=0
// disconnecting minicom by ^A ^X
// read not interrrupted by disconnection, so, as read is blocking ... no new traces
// from other iSH window, trying to reconnect by : minicom -D /dev/pts/2
// successfull
// keying CR one and two times
l=1, errno=0
// killing main
^C
// minicom show /dev/pts/2 is not longer available
McFv:

with 614, client disconnection can be detected by other mean (setting read not blocking and checking errno).

please note, the source is voluntary simple to be the shortest code showing the issue (as example, displaying errno if not error is voluntary to not have too many lines, not setting read as not blocking also, ... ).

pelase note also: MacOs and Debian allows reconnection (like 614) but manage disconnection like 669 (but with a little variation about read returning -1 for debian and 0 for MacOs)

@cgull : I'm available to do more testing. thanks for your worlk

@tbodt
Copy link
Member

tbodt commented May 18, 2024

Ah, one bug replaced by another... The behavior on linux is different from either of those.

$ ./main
client name /dev/pts/1
// connect, type stuff
l=1, errno=0
l=3, errno=0
l=1, errno=0
l=6, errno=0
l=6, errno=0
// disconnect, get error 5 repeating
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
// reconnect, errors stop (well it still says `errno=5` but since the actual return code was success ignore that)
l=1, errno=5
l=2, errno=5
// disconnect, errors are back
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5

@francoisvignon
Copy link

francoisvignon commented May 18, 2024 via email

@tbodt
Copy link
Member

tbodt commented May 18, 2024

Linux linux-luka 6.7.6-1-aarch64-ARCH #1 SMP PREEMPT_DYNAMIC Sat Feb 24 10:08:35 MST 2024 aarch64 GNU/Linux

@francoisvignon
Copy link

francoisvignon commented May 18, 2024 via email

@tbodt
Copy link
Member

tbodt commented May 18, 2024

I'm trying to decide if I should revert this PR, if the new bug is worse than the old bug. Obviously ideally both bugs would be fixed, but I don't really have time to do that work. How bad is the new behavior?

@francoisvignon
Copy link

francoisvignon commented May 18, 2024 via email

@tbodt
Copy link
Member

tbodt commented May 18, 2024

Do you run into major issues with the new behavior?

@francoisvignon
Copy link

francoisvignon commented May 18, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants