Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LP hangs in tmux detection #806

Open
Farom opened this issue Nov 13, 2023 · 16 comments
Open

LP hangs in tmux detection #806

Farom opened this issue Nov 13, 2023 · 16 comments
Labels
bug needs information More information needed from the reporter tmux Related to tmux specific implemetation

Comments

@Farom
Copy link

Farom commented Nov 13, 2023

When having different versions of tmux (old server, new client) the tmux detection can hang for an indefinite period.
This blocks all interaction with shell.

Shell: 5.2.15(1)-release
Operating system: Debian bullseye upgrade to bookworm (Kernel 5.10.0-26-amd64)
Liquid Prompt version: b1ea560 (master on 2023-11-13)

Steps to Reproduce

  1. Start a tmux session with tmux version 3.1c (from debian oldstable/bullseye)
  2. upgrade tmux to version 3.3a (from debian bookworm)
  3. All Shells running with LP hang for an indefinite period. CTRL+C without effect.

Expected Behavior

I expect that at some timeout using the shell is possible.

Current Behavior

It hangs forever. No Interaction with shell possible.

Possible Solution / Workaround

in Line

  (( _LP_ENABLE_TMUX )) && count+=$(tmux list-sessions 2> /dev/null | GREP_OPTIONS='' \grep -cv 'attached')

start grep with timeout 1

@Farom Farom added the bug label Nov 13, 2023
@Farom
Copy link
Author

Farom commented Nov 13, 2023

This patch works for me:

diff --git a/liquidprompt b/liquidprompt
index 8a6bfbb..3bcc299 100755
--- a/liquidprompt
+++ b/liquidprompt
@@ -2476,7 +2476,7 @@ _lp_detached_sessions() {
 
     local -i count=0
     (( _LP_ENABLE_SCREEN )) && count=$(screen -ls 2> /dev/null | GREP_OPTIONS='' \grep -c '[Dd]etach[^)]*)$')
-    (( _LP_ENABLE_TMUX )) && count+=$(tmux list-sessions 2> /dev/null | GREP_OPTIONS='' \grep -cv 'attached')
+    (( _LP_ENABLE_TMUX )) && count+=$(tmux list-sessions 2> /dev/null | GREP_OPTIONS='' \timeout 1 \grep -cv 'attached')
     lp_detached_sessions=$count
 
     (( lp_detached_sessions ))

timeout comes from coreutils.

@Rycieos
Copy link
Collaborator

Rycieos commented Nov 13, 2023

Hopefully you still have that tmux session around. Or have a way to reproduce this with your patch intact.

What happens if you run tmux list-sessions? I think this is likely a bug in tmux, where the newer version hangs on a search where it finds an old version still running.

I do not think your patch will be possible to merge. timeout is not something liquidprompt already uses, and I do not want to add more binary dependencies that not all systems will have.

@Rycieos Rycieos added tmux Related to tmux specific implemetation needs information More information needed from the reporter labels Nov 13, 2023
@marckleinebudde
Copy link

Hey @Rycieos,

we debugged the issue and found out the tmux list-sessions writes the session information to an anonymous pipe that is connected to stdin of the grep process. If the tmux list-sessions exits the pipe is somehow (we haven't looked into that) transferred to the running tmux "server" process. The pipe is listed in /proc/TMUX_SERVER_PID/fd.

The grep process first reads the session information, and then blocks on a read() in stdin. As the pipe is never closed by the tmux "server" process the grep will hang on the read().

If we use old tmux executable /proc/TMUX_SERVER_PID/exe list-sessions the grep is not hanging.

@Farom
Copy link
Author

Farom commented Nov 14, 2023

If we use old tmux executable /proc/TMUX_SERVER_PID/exe list-sessions the grep is not hanging.

This is a possibilty from inside tmux. There is TMUX environment var:

TMUX=/tmp/tmux-1044/default,68128,0

2nd field is the pid of the tmux, so /proc/68128/exe list-sessions does the trick.

but:

When you are not inside a tmux sessions, i do not sea how to find the correct executable.

@Farom
Copy link
Author

Farom commented Nov 14, 2023

Hopefully you still have that tmux session around. Or have a way to reproduce this with your patch intact.

I probably can break things for reproducing this bug :-)

What happens if you run tmux list-sessions?

tmux shows the correct output.

I think this is likely a bug in tmux, where the newer version hangs on a search where it finds an old version still running.

Unfortunately it is more complicated and i think it is not worth it to debug tmux detection or make this detection much more complicated.

@Farom
Copy link
Author

Farom commented Nov 14, 2023

I do not think your patch will be possible to merge. timeout is not something liquidprompt already uses, and I do not want to add more binary dependencies that not all systems will have.

There are bash builtin possibilities:
http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3

Maybe it is a good feature to limit the execution time of LP in general?
(there are some git situations where waiting for the prompt longer than 20 seconds is really annoying)

@Farom
Copy link
Author

Farom commented Nov 14, 2023

I do not think your patch will be possible to merge. timeout is not something liquidprompt already uses, and I do not want to add more binary dependencies that not all systems will have.

I just talked to some embedded Linux people and they told me that timeout from the coreutils package is the tool to use, especially if you are already using tools from coreutils like id and uname.

Even busybox has a timeout with all required (short) options.

I hope you overthink your decision.

@Rycieos
Copy link
Collaborator

Rycieos commented Nov 14, 2023

we debugged the issue and found out the tmux list-sessions writes the session information to an anonymous pipe that is connected to stdin of the grep process. If the tmux list-sessions exits the pipe is somehow (we haven't looked into that) transferred to the running tmux "server" process. The pipe is listed in /proc/TMUX_SERVER_PID/fd.

I... find that hard to believe. The tmux process transfers the file handle for its stdout to the tmux server? That seems error prone. And it is causing an issue here.

I still think this is a bug in tmux then. Why is the file handle not being closed? This would hang with any process attached to the stdout, like tmux list-sessions | cat, or tmux list-sessions | less.

Could you capture the process tree once it hangs? If you are right, we should see a bash process as the parent, and only grep as a child; the tmux process would have closed.

@Rycieos
Copy link
Collaborator

Rycieos commented Nov 14, 2023

When you are not inside a tmux sessions, i do not sea how to find the correct executable.

Yes, we need a solution that works in both situations.

What happens if you run tmux list-sessions?

tmux shows the correct output.

What happens if you run tmux list-sessions | cat?

Unfortunately it is more complicated and i think it is not worth it to debug tmux detection or make this detection much more complicated.

Unfortunately I (partially) disagree. I would prefer to see this fixed in tmux. The patch you suggested is not possible for me to merge, as I will explain below.

There are bash builtin possibilities: http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3

That linked solution will not work. The method it uses is creating a subshell that runs the command or function, and killing the subshell if it runs too long. But subshells are separate processes, and cannot set variables that will persist to the main shell. There might be some hack that would allow for returning a value on stdout from the subshell, but that would only work for single commands, not the prompt generation as a whole. And we try to avoid subshells as much as possible as they are very slow on some systems.

Maybe it is a good feature to limit the execution time of LP in general? (there are some git situations where waiting for the prompt longer than 20 seconds is really annoying)

I agree. I think preventing prompt generation from being able to lock up your shell is very important. But as it stands, I do not know of any way to timeout in Bash that will work with functions and passing state around. Neither the timeout binary nor the above linked solution would allow for variables to be set in the shell.

I just talked to some embedded Linux people and they told me that timeout from the coreutils package is the tool to use, especially if you are already using tools from coreutils like id and uname.

Even busybox has a timeout with all required (short) options.

That is interesting, and useful information. However, Liquid Prompt supports more platforms than just Linux. We also support multiple forms of BSD, MacOS, and Windows. MacOS does not have timeout (though it can be installed with brew, but we do not required brew). We can have features that are only supported by some platforms, but a fix like this needs to be portable.

I hope you overthink your decision.

I am always open to rethinking my decisions. I have changed my mind on multiple features and bug fixes in Liquid Prompt in the past.

And to be clear, I am open to adding a fix to Liquid Prompt for this issue (that I am pretty sure is a bug in tmux). The problem is only that your suggested fix is not portable.

@marckleinebudde
Copy link

marckleinebudde commented Nov 14, 2023

I... find that hard to believe. The tmux process transfers the file handle for its stdout to the tmux server? That seems error prone. And it is causing an issue here.

ACK - This only happens if we're using the updated tmux client against the old tmux server.

I still think this is a bug in tmux then. Why is the file handle not being closed? This would hang with any process attached to the stdout, like tmux list-sessions | cat, or tmux list-sessions | less.

Yes, with the above scenario (old server, new client) also causes | cat or | less to hang.

@marckleinebudde
Copy link

marckleinebudde commented Nov 14, 2023

That's an strace of the tmux list-sessions

...
32654 12:37:50.102867 dup(1)            = 6
32654 12:37:50.102903 close(1)          = 0
32654 12:37:50.102927 poll([{fd=3, events=POLLIN}, {fd=6, events=POLLOUT}, {fd=5, events=POLLIN|POLLOUT}], 3, -1) = 2 ([{fd=6, revents=POLLOUT}, {fd=5, revents=POLLOUT}])
32654 12:37:50.102956 sendmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="1\1\0\0\30\0\0\0\10\0\0\0\377\377\377\377\1\0\0\0\0\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 24
32654 12:37:50.103150 poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}], 2, -1) = 1 ([{fd=5, revents=POLLIN}])
32654 12:37:50.103189 getpid()          = 32654
32654 12:37:50.103212 openat(AT_FDCWD, "/proc/32654/fd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 1
32654 12:37:50.103240 newfstatat(1, "", {st_mode=S_IFDIR|0500, st_size=0, ...}, AT_EMPTY_PATH) = 0
32654 12:37:50.103268 getdents64(1, 0x5574d0e27dc0 /* 9 entries */, 32768) = 216
32654 12:37:50.103296 getdents64(1, 0x5574d0e27dc0 /* 0 entries */, 32768) = 0
32654 12:37:50.103317 close(1)          = 0
32654 12:37:50.103339 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
32654 12:37:50.103362 recvmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0\1\0\0O\0\0\0\10\0\0\0\377\377\377\377\1\0\0\0000: 9 windows"..., iov_len=65535}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 99
32654 12:37:50.103393 poll([{fd=3, events=POLLIN}, {fd=6, events=POLLOUT}, {fd=5, events=POLLIN}], 3, -1) = 1 ([{fd=6, revents=POLLOUT}])
32654 12:37:50.103419 writev(6, [{iov_base="0: 9 windows (created Sun Nov 12"..., iov_len=59}], 1) = 59
32654 12:37:50.103465 fcntl(0, F_GETFL) = 0x2 (flags O_RDWR)
32654 12:37:50.103514 fcntl(0, F_SETFL, O_RDWR) = 0
32654 12:37:50.103535 fcntl(1, F_GETFL) = -1 EBADF (Bad file descriptor)
32654 12:37:50.103579 fcntl(2, F_GETFL <unfinished ...>
32654 12:37:50.103593 <... fcntl resumed>) = 0x2 (flags O_RDWR)
32654 12:37:50.103604 fcntl(2, F_SETFL, O_RDWR) = 0
32654 12:37:50.103649 exit_group(0)     = ?
32654 12:37:50.103791 +++ exited with 0 +++
32653 12:37:50.103811 <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 32654

stdout is dup()ed to 6, it writev()s to 6 and then exit.

Here the relevant part of grep. It blocks on read()....:

32655 12:37:50.097690 read(0,  <unfinished ...>                                                                                                                                                                                                                 
32655 12:37:50.103504 <... read resumed>"0: 9 windows (created Sun Nov 12"..., 98304) = 59                                                                                                                                                                      
32655 12:37:50.103587 read(0,  <unfinished ...>                                                                                                                                                                                                                 
32655 12:38:19.573170 <... read resumed>0x564accd3a03b, 98304) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)                                                                                                                                           
32655 12:38:19.573219 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---                                                                                                                                                                                   
32655 12:38:19.573253 read(0,                                                                             

Earlier in tmux list sessions, we see 2 sendmsg() with cmsg_type=SCM_RIGHTS, you can use to transfer file descriptors:

32654 12:37:50.101251 sendmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="d\0\0\0\24\0\0\0\10\0\0\0\377\377\377\377\0\0\1\10", iov_len=20}, {iov_base="o\0\0\0\30\0\0\0\10\0\0\0\377\377\377\377\0\0\1\10\0\0\0\0", iov_len=24}, {iov_base="e\0\0\0\37\0\0\0\10\0\0\0\377\377\377\377xterm-256color\0", iov_len=31}, {iov_base="m\0\0\0\24\0\0\0\10\0\0\0\377\377\377\377\0\0\0\0", iov_len=20}, {iov_base="f\0\0\0\33\0\0\0\10\0\0\0\377\377\377\377/dev/pts/5\0", iov_len=27}, {iov_base="l\0\0\0\26\0\0\0\10\0\0\0\377\377\377\377/root\0", iov_len=22}, {iov_base="p\0\0\0J\0\0\0\10\0\0\0\377\377\377\377acsc=``aaffggiij"..., iov_len=74}, {iov_base="p\0\0\0\25\0\0\0\10\0\0\0\377\377\377\377am=1\0", iov_len=21}, {iov_base="p\0\0\0\25\0\0\0\10\0\0\0\377\377\377\377AX=1\0", iov_len=21}, {iov_base="p\0\0\0\26\0\0\0\10\0\0\0\377\377\377\377bce=1\0", iov_len=22}, {iov_base="p\0\0\0\26\0\0\0\10\0\0\0\377\377\377\377bel=\7\0", iov_len=22}, {iov_base="p\0\0\0\33\0\0\0\10\0\0\0\377\377\377\377blink=\33[5m\0", iov_len=27}, {iov_base="p\0\0\0\32\0\0\0\10\0\0\0\377\377\377\377bold=\33[1m\0", iov_len=26}, {iov_base="p\0\0\0\35\0\0\0\10\0\0\0\377\377\377\377civis=\33[?25l\0", iov_len=29}, {iov_base="p\0\0\0\36\0\0\0\10\0\0\0\377\377\377\377clear=\33[H\33[2J\0", iov_len=30}, {iov_base="p\0\0\0#\0\0\0\10\0\0\0\377\377\377\377cnorm=\33[?12l\33[?2"..., iov_len=35}, {iov_base="p\0\0\0\33\0\0\0\10\0\0\0\377\377\377\377colors=256\0", iov_len=27}, {iov_base="p\0\0\0\32\0\0\0\10\0\0\0\377\377\377\377Cr=\33]112\7\0", iov_len=26}, {iov_base="p\0\0\0\37\0\0\0\10\0\0\0\377\377\377\377Cs=\33]12;%p1%s\7\0", iov_len=31}, {iov_base="p\0\0\0%\0\0\0\10\0\0\0\377\377\377\377csr=\33[%i%p1%d;%p"..., iov_len=37}, {iov_base="p\0\0\0\35\0\0\0\10\0\0\0\377\377\377\377cub=\33[%p1%dD\0", iov_len=29}, {iov_base="p\0\0\0\27\0\0\0\10\0\0\0\377\377\377\377cub1=\10\0", iov_len=23}, {iov_base="p\0\0\0\35\0\0\0\10\0\0\0\377\377\377\377cud=\33[%p1%dB\0", iov_len=29}, {iov_base="p\0\0\0\27\0\0\0\10\0\0\0\377\377\377\377cud1=\n\0", iov_len=23}, {iov_base="p\0\0\0\35\0\0\0\10\0\0\0\377\377\377\377cuf=\33[%p1%dC\0", iov_len=29}, {iov_base="p\0\0\0\31\0\0\0\10\0\0\0\377\377\377\377cuf1=\33[C\0", iov_len=25}, {iov_base="p\0\0\0%\0\0\0\10\0\0\0\377\377\377\377cup=\33[%i%p1%d;%p"..., iov_len=37}, {iov_base="p\0\0\0\35\0\0\0\10\0\0\0\377\377\377\377cuu=\33[%p1%dA\0", iov_len=29}, {iov_base="p\0\0\0\31\0\0\0\10\0\0\0\377\377\377\377cuu1=\33[A\0", iov_len=25}, {iov_base="p\0\0\0 \0\0\0\10\0\0\0\377\377\377\377cvvis=\33[?12;25h\0", iov_len=32}, {iov_base="p\0\0\0\35\0\0\0\10\0\0\0\377\377\377\377dch=\33[%p1%dP\0", iov_len=29}, {iov_base="p\0\0\0\31\0\0\0\10\0\0\0\377\377\377\377dch1=\33[P\0", iov_len=25}, ...], msg_iovlen=207, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[6]}], msg_controllen=24, msg_flags=0}, 0) = 5896                                                                                                                                                                                                                                               
32654 12:37:50.101552 sendmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="n\0\0\0\20\0\1\0\10\0\0\0\377\377\377\377", iov_len=16}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[7]}], msg_controllen=24, msg_flags=0}, 0) = 16 

@marckleinebudde
Copy link

marckleinebudde commented Nov 14, 2023

Unfortunately I (partially) disagree. I would prefer to see this fixed in tmux. The patch you suggested is not possible for me to merge, as I will explain below.

This particular problem cannot be fixed in tmux, as it's during a Debian update, which updates tmux from some version to another. Both versions are already released.

@Rycieos
Copy link
Collaborator

Rycieos commented Nov 16, 2023

This particular problem cannot be fixed in tmux, as it's during a Debian update, which updates tmux from some version to another. Both versions are already released.

Sure, but a newer version of tmux could be released to prevent this lockup. I just don't know where else this could be fixed. I am open to ideas.

Surely this is not an uncommon scenario to run into since tmux sessions can be so long running.

@Farom
Copy link
Author

Farom commented Nov 20, 2023

Even after sleeping on it for a few nights, I only have one idea how to improve the situation: Disable tmux support in root shells in general or use timeout when on linux.

@Rycieos
Copy link
Collaborator

Rycieos commented Nov 20, 2023

I only have one idea how to improve the situation: Disable tmux support in root shells in general

Wait, what do root shells have to do with this? Does this bug only appear when tmux is run as root? Are you running the tmux session as root, then hanging in a normal user shell? Or the other way around; running tmux session as a normal user, then hanging in a root shell?

@Farom
Copy link
Author

Farom commented Nov 20, 2023

I only have one idea how to improve the situation: Disable tmux support in root shells in general

Wait, what do root shells have to do with this? Does this bug only appear when tmux is run as root?

No this always happens.
But the worst thing is when the root shells no longer work in the middle of something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs information More information needed from the reporter tmux Related to tmux specific implemetation
Projects
None yet
Development

No branches or pull requests

3 participants