Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bash.exe unusual commandline parsing #1746

Open
TSlivede opened this issue Mar 3, 2017 · 8 comments
Open

bash.exe unusual commandline parsing #1746

TSlivede opened this issue Mar 3, 2017 · 8 comments
Labels

Comments

@TSlivede
Copy link

TSlivede commented Mar 3, 2017

If I understand it correctly, bash.exe is a Windows commandline-executable, that starts WSL's /bin/bash and forwards given arguments.
As on linux (and WSL) executables are started with an argument array, but on windows executables are started with a single CommandLine, bash.exe needs to split the given CommandLine into multiple arguments, that can be redirected to /bin/bash.
As bash.exe is a Windows executable, in my opinion it should split the CommandLine into multiple arguments using usual Windows rules.
However the rules that are used to convert the CommandLine into multiple arguments are quite different. Its very confusing, if almost all executables use these rules, but then Microsoft publishes a new executable that conflicts with those rules.

Expected results:

bash.exe should split the commandline like this

Actual results (with terminal output if applicable):

bash.exe uses strange own rules to split commandline, only partly compatible to usual rules.
At least the second of those examples fails, a\\b should stay a\\b and not become a\b:
In cmd:

C:\Windows\System32\bash.exe -c "xargs -0 printf '%s\n' < /proc/$$/cmdline" a\\b d"e f"g h

outputs:

/bin/bash
-c
xargs -0 printf '%s\n' < /proc/2/cmdline
a\b
de fg
h

( I use -c "xargs -0 printf '%s\n' < /proc/$$/cmdline" to print arguments, because to use something with $@ or similar, I would have to pass double quotes to bash, and as mentioned, I don't know how bash.exe parses those...)

Windows build number:
Version 10.0.14393 Build 14393

@zrax
Copy link

zrax commented Mar 3, 2017

Interestingly, it does seem to match MSYS's bash (I wonder if there's some shared code):

> bash.exe -c "echo \"C:\\foo\\bar\" | sed 's,\\\\,/,g'"
C:/foo/bar
> C:\Devel\MSYS2\usr\bin\bash.exe -c "echo \"C:\\foo\\bar\" | /usr/bin/sed.exe 's,\\\\,/,g'"
C:/foo/bar

> bash.exe
$ echo "C:\\foo\\bar" | sed 's,\\\\,/,g'
C:\foo\bar
$ echo "C:\\foo\\bar" | sed 's,\\,/,g'
C:/foo/bar

I don't know about things other than bash, but I expect there's an extra layer of escaping for backslashes going on when running from cmd.exe.

@TSlivede
Copy link
Author

TSlivede commented Mar 3, 2017

@zrax Your example has a little problem: You echo a string that contains \\ and later replace \\ with something else, therefor it doesnt't matter if the called bash receives one or two backslashes, the output is identically. If I call

D:\Programs\msys64\rootfs\usr\bin\bash.exe -c "xargs -0 printf '%s\n' < /proc/$$/cmdline" a\\b d"e f"g h

I get this (correctly, see a\\b):

/usr/bin/bash
-c
xargs -0 printf '%s\n' < /proc/$$/cmdline
a\\b
de fg
h

@TSlivede
Copy link
Author

TSlivede commented Mar 3, 2017

And regarding calling from bash or from cmd:
When calling from cmd, the given commandline is passed exactly to the next executable (exept for cmd's variable substituion and parsing of redirection, etc.; Cmd does not add any backslashes or quotes or anything.)

AFAIK: When calling from msys2's bash, there happens some stuff in the background:

  1. bash interprets the given line and splits it according to bash's rules. (like on linux)
  2. bash calls exec or similar, which takes an array of string arguments. (like on linux)
  3. msys or cygwin runtime creates a single commandline compatible to these rules. (see linebuf::fromargv)
  4. this commandline is given to the next executable.

To view the given CommandLine, one can enable the CommandLine column in the details pane in TaskManager.

@zrax
Copy link

zrax commented Mar 3, 2017

Hmm, perhaps my example was a bit convoluted (it was related to something else I was working on). I did try the echo without the sed to be sure I was translating the same string beforehand though, but my examples are piping the whole command through bash (rather than letting cmd.exe handle the arguments).

As another side-note, quoting the "a\b" in msys causes it to drop the backslash like WSL's bash, but does not change the output for WSL's bash.

EDIT: Just saw your latest update -- that makes more sense for my cases then.

@TSlivede
Copy link
Author

TSlivede commented Mar 4, 2017

I saw your answer and assume with quoting the "a\b" in msys you mean calling msys2's bash from cmd with arguments?
If so: Yes, I have to admit, that cygwin and msys2 have strange (and to my knowledge undocumented) rules to parse a Commandline given from outside of their environment, but I thought:
Hey, it's bad in msys2 and cygwin, it doesn't necessarily need to be bad in WSL too 😄

However cygwin and msys2 rules are still very close to MS rules, and if arguments are quoted with ", they seem to try to follow MS rules.

Its also somewhat strange in cygwin and msys2: When calling from cygwin/msys2 out to a native executable, they follow the common rules, but when calling from outside into cygwin/msys2 they invent their own rules...

@TSlivede
Copy link
Author

TSlivede commented Jun 2, 2017

@benhillis as a response to your comment:

The goal was to pass the entire command line to /bin/bash as a single argument so bash would be in charge of command line parsing using the Linux rules. Since the commands being run are Linux commands it made sense to use Linux argument parsing rules.

I don't know if there is any chance to get this changed, but as I think that this is really important, I just have to list arguments for this.

I don't think there are "Linux argument parsing" rules. Any Linux executable gets it's arguments as already split argv-array. Sure, there are POSIX-shell or bash parsing rules. But if you call /bin/bash from e.g. python, argument splitting does't depend on bash rules, but it depends on python rules.
Arguments are split in the parent executable.

On Windows this is sadly not true. On Windows, each executable does it's own CommandLine splitting. This leads to some kind of anarchy. It's almost impossible, to programatically call executables, without risking that arguments are interpreted wrong.

Luckily almost all modern compilers on Windows generate executables that split arguments using a single set of rules: these rules. Many modern script interpreter follow these rules as well. Therefore it's in most cases possible to programatically call executables by generating a CommandLine with the inverse of these rules. This is for example done by:

All those shells/systems currently can't safely call WSL's bash.exe because it doesn't follow the typical rules.

Finally, I'll qoute @mklement0 (I hope this is ok for you, @mklement0)
From SO:

I wonder if it's fundamentally possible in the Windows world to ever switch to the Unix model of letting the shell do all the tokenization and quote removal predictably, up front, irrespective of the target program, and then invoke the target program by passing the resulting tokens.

It might not really be possible, but we should definitely try to get as close as possible. But if Microsoft decides, to officially release a new executable, that massively disrespects any common CommandLine splitting rules, that is a step in the exact opposite direction. (I think it's not to late to change this, as WSL is still beta.)

@TSlivede
Copy link
Author

TSlivede commented Jun 2, 2017

Since the commands being run are Linux commands it made sense to use Linux argument parsing rules.

I think, even for Linux users, the current behavior is unexpected: If, on linux, I start shell A from shell B, I need to quote A's arguments according to B's quoting rules, as B splits the arguments into an array.

If we apply this to the current case, a linux user should expect, that he needs to quote bash.exe's arguments according to cmd.exe's rules. (I know that these rules are not part of cmd, but that are the rules you almost always need to follow if you call executables from cmd.)

@TSlivede
Copy link
Author

I just noticed, that even though the default behavior of wsl.exe is not better than bash.exe, using the -e option to wsl.exe seems to actually result in the expected behavior:

C:\>wsl -e bash -c "xargs -0 printf '%s\n' < /proc/$$/cmdline" a\\b d"e f"g h
bash
-c
xargs -0 printf '%s\n' < /proc/$$/cmdline
a\\b
de fg
h

C:\>

Adding a parameter to wsl.exe to call the default shell with the commandline arguments parsed as in any other normal exe (an alias for wsl -e bash -c but with bash replaced by whatever the default shell of that distro ist) would be a nice addition. It would be the best if the default behavior would simply work in a way that one expects, but I guess it's too late for that...

If you wan't to argue about the "way that one expects", then yes the current behavior might not be that strongly unexpected for users of cmd, but for anybody who calls wsl from anywhere else (MSYS bash/powershell/one of the shells mentioned above) then the current behavior is disastrous - see for example this issue at the powershell repo. If wsl.exe behaved in a "reasonable" way, then the second example that @bitcrazed gave (or at least something very similar, that could even have been suggested in the error message), could simply have worked.

Keeping the behavior as it is currently, discourages wsl users from using any shell other than cmd.exe - which is certainly not what Microsoft had in mind, as AFAIK cmd is deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants