-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nonprintable and control characters are not escaped in the prompt #632
Comments
Unix filenames have always been a problem for shells, as they can include any character except the null char, and a filename can't include the forward slash (/) since it is the path delimiter. Other than those two characters, it can include anything, including escape characters. These characters by design either have no visual representation (historically terminal control characters) or their visual representation is whitespace only (newline, tab, etc.). Currently, in Liquidprompt, if a directory includes a special character, it is treated just like any "normal" character, and printed directly to the terminal. For control characters, this either shows nothing, or adds whitespace to the prompt. This can be annoying, like an extra newline, or confusing and/or break the prompt, like a carriage return or vertical tab. This isn't a problem limited to directory names, as any external data source could include any character except null in Bash or even null in Zsh. This isn't a problem unique to Liquidprompt. Bash, when expanding the current working directory directive (\w or \W), strips control characters, not displaying them at all (until Bash 5.1, which prints them literally). I can't find any documentation describing this behavior. Zsh, when expanding the current working directory directive (%~), replaces some with a \<char> code (eg \n, \t), and others with a ^<letter> notation (eg ^A, ^H). This is better, but the inconstancy between the two forms can be confusing, and Zsh does nothing to escape a literal backslash or caret (\ or ^), meaning a directory named with ANSI-C notation $'bar\\nope\nfoo' will print as bar\nope\nfoo, making the first \n sequence appear to be a newline character, when it is actually a backslash followed by the letter n. The lack of bug reports on this topic, and the fact that Bash also chooses to mostly ignore the problem, probably mean that this isn't an issue that many users face. But it isn't always up to the user to name their directories, VCS branch names, and other data sources: often external sources or programs can inject otherwise invisible characters into such strings, and pretending that won't happen won't fix the problem. This commit replaces unprintable characters with string representations using backslashes. A drawback is that there will rarely be a case where this is useful, but the check will always slightly slow down the prompt. Fixes #632
Unix filenames have always been a problem for shells, as they can include any character except the null char, and a filename can't include the forward slash (/) since it is the path delimiter. Other than those two characters, it can include anything, including escape characters. These characters by design either have no visual representation (historically terminal control characters) or their visual representation is whitespace only (newline, tab, etc.). Currently, in Liquidprompt, if a directory includes a special character, it is treated just like any "normal" character, and printed directly to the terminal. For control characters, this either shows nothing, or adds whitespace to the prompt. This can be annoying, like an extra newline, or confusing and/or break the prompt, like a carriage return or vertical tab. This isn't a problem limited to directory names, as any external data source could include any character except null in Bash or even null in Zsh. This isn't a problem unique to Liquidprompt. Bash, when expanding the current working directory directive (\w or \W), strips control characters, not displaying them at all (until Bash 5.1, which prints them literally). I can't find any documentation describing this behavior. Zsh, when expanding the current working directory directive (%~), replaces some with a \<char> code (eg \n, \t), and others with a ^<letter> notation (eg ^A, ^H). This is better, but the inconstancy between the two forms can be confusing, and Zsh does nothing to escape a literal backslash or caret (\ or ^), meaning a directory named with ANSI-C notation $'bar\\nope\nfoo' will print as bar\nope\nfoo, making the first \n sequence appear to be a newline character, when it is actually a backslash followed by the letter n. The lack of bug reports on this topic, and the fact that Bash also chooses to mostly ignore the problem, probably mean that this isn't an issue that many users face. But it isn't always up to the user to name their directories, VCS branch names, and other data sources: often external sources or programs can inject otherwise invisible characters into such strings, and pretending that won't happen won't fix the problem. This commit replaces unprintable characters with string representations using backslashes. A drawback is that there will rarely be a case where this is useful, but the check will always slightly slow down the prompt. Fixes #632
My recommended solution (3 above) uses I don't know of any other way to turn a character into its codepoint without make a forking call, which is needed since this would be called for each character printed to the screen. With a fork, it would be much too slow. As the end of the v2.0 merge window is approaching, I am shoving this to v2.1 at least. Since there is no demand for this right now, this will sit on the backburner until a solution is found that does not have any speed loss. |
This is mostly a theorical problem. While the name of the current directory might contain control characters in theory, fixing this problem in LP requires much code and that code would have much runtime impact (if we can't use shell built-ins), while in practice you never have such characters. When the current directory has bad characters, it's just easier for the user to temporarily disable the not-so-smart-prompt or to switch to another shell which doesn't have LP enabled. Of course, control characters in directory name can be a security problem if they are not escaped. But I think that because this is in an interactive shell the user is responsible for So my opinion is that a big warning in LP documentation is better than the performance impact. |
You are probably right. I also discovered that Fossil allows for almost any character in branch names. This can cause issues if checking out someone else's repository, where they have branch names containing characters. This is slightly less in the control of the terminal user, but I would agree that it is probably not worth the effort or runtime cost. I think the most likely scenario for this happening would be if some program created a directory with a newline or similar character in it. I have seen that before, but it is rare, and I'll probably leave this as a "would be neat" feature for the backlog, and if it ever does get implemented, unless it has no performance impact, it would be disabled by default. |
1/ Problems may happen if the current directory (or branch) contains control chars. Shell security problems exist. But this one is low on the list. |
Shell: all bash and zsh versions (bash 3.2-5.1, zsh 5.5.1)
Liquidprompt version: v_1.11 though v1.21.1
Problem description
Unix filenames have always been a problem for shells, as they can include any character accept the null char, and a filename can't include the forward slash (
/
) since it is the path delimiter. Other than those two characters, it can include anything, including escape characters. These characters by design either have no visual representation (historically terminal control characters) or their visual representation is whitespace only (newline, tab, etc.).Currently, in Liquidprompt, if a directory includes a special character, it is treated just like any "normal" character, and printed directly to the terminal. For control characters, this either shows nothing, or adds whitespace to the prompt. This can be annoying, like an extra newline, or confusing and/or break the prompt, like a carriage return or vertical tab. This isn't a problem limited to directory names, as any external data source could include any character except null in Bash or even null in Zsh.
This isn't a problem unique to Liquidprompt. Bash, when expanding the current working directory directive (
\w
or\W
), strips control characters, not displaying them at all (until Bash 5.1, which prints them literally). I can't find any documentation describing this behavior. Zsh, when expanding the current working directory directive (%~
), replaces some with a\<char>
code (eg\n
,\t
), and others with a^<letter>
notation (eg^A
,^H
). This is better, but the inconstancy between the two forms can be confusing, and Zsh does nothing to escape a literal backslash or caret (\
or^
), meaning a directory named with ANSI-C notation$'bar\\nope\nfoo'
will print asbar\nope\nfoo
, making the first\n
sequence appear to be a newline character, when it is actually a backslash followed by the lettern
.The lack of bug reports on this topic, and the fact that Bash also chooses to mostly ignore the problem, probably mean that this isn't an issue that many users face. But it isn't always up to the user to name their directories, VCS branch names, and other data sources: often external sources or programs can inject otherwise invisible characters into such strings, and pretending that won't happen won't fix the problem.
Possible Options
We have a few options to resolve this issue:
Possible Implementations
2).
This is simple to implement, but see below for encoding issues that might come up.
string=${string//[$'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\e\034\035\036\037\177\000']}
Note also that the null character (
\0
) is last, as in Bash that character will terminate a string (and therefore can't be stripped, but also can't be in the input string in the first place).EDIT:
Even simpler would be to use the character class:
string=${string//[[:cntrl:]]}
3).
This is a replacement for the current
_lp_escape()
function that only escapes backslashes on Bash and percents (%
) on Zsh so they are printed literally in the prompt. This would still do that, but would also replace control characters.The difficulty here is that we can't fork out to another tool to help. This function is called multiple times per prompt generation, and would be too slow. The only way we can take this option is if it doesn't add slowdown.
printf
is a builtin, and by using the-v
flag, we can assign the output directly to a variable without using a subshell.The
ls
command has a--quoting-style=shell-escape
option (which seems to be the default on modern systems) that does this string escaping nicely for directories. We can't call out tols
to do this escaping for us for multiple reasons, but it might provide some tips on how to do it correctly.The above implementation displays the characters as
ls
does:\<char>
if it has a shorthand form, otherwise as\0xx
where the number is the character's codepoint in octal.Alternative encodings are still a problem here (see below). The
for
loop isn't ideal, but it is fast.EDIT:
Using a check like below could skip the special encoding loop unless one of these characters exists:
Other complications
Liquidprompt is written in UTF-8, and all text pattern matching is ASCII only. Not every user terminal will be UTF-8 though, so whatever character matching we do needs to match (and maybe replace) control characters in the current encoding. Discovering control characters in the current encoding may be difficult or impossible. All of my suggested implementations are assuming ASCII or UTF-8, and don't deal with any characters outside of the ASCII range.
The text was updated successfully, but these errors were encountered: