Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ArgumentList when invoking native executables #14747

Closed
JamesWTruher opened this issue Feb 9, 2021 · 18 comments
Closed

Use ArgumentList when invoking native executables #14747

JamesWTruher opened this issue Feb 9, 2021 · 18 comments
Assignees
Labels
Issue-Enhancement the issue is more of a feature request than a bug Resolution-Fixed The issue is fixed. WG-Engine core PowerShell engine, interpreter, and runtime
Milestone

Comments

@JamesWTruher
Copy link
Member

JamesWTruher commented Feb 9, 2021

Handling Parameter Binding in Native Executables

PowerShell generally provides a useful experience when working with native executables, but there are a number of issues:

  • passing embedded quoted strings is very problematic
  • passing an empty string is very problematic (mostly impossible)
  • passing a string which looks like a ScriptBlock requires extra care

A command line such as: msiexec /i testdb.msi INSTALLLEVEL=3 /l* msi.log COMPANYNAME="Acme ""Widgets"" and ""Gizmos""" is received as: msiexec /i testdb.msi INSTALLLEVEL=3 /l* msi.log COMPANYNAME="Acme Widgets and Gizmos". A command line such as: msiexec /i testdb.msi INSTALLLEVEL=3 /l* msi.log COMPANYNAME="Acme ""Widgets"" and ""Gizmos.""" is received as: msiexec /i testdb.msi INSTALLLEVEL=3 /l* msi.log COMPANYNAME="Acme Widgets and Gizmos." which strips the embedded double quotes.

Current behavior does not allow empty strings to be passed, so the following is not possible:

  • useradd -g 501 -u 1001 -p '' sam
  • emacsclient -u ''

These behaviors should be supported as there are many scenarios where an empty string is required as a parameter value.

Handling Quotes

Windows and Non-Windows have divergent behavior with regard to quotes.
While both Windows and Unix shells recognize "one two three" as a single string,
Windows does not recognize single quotes ' as string designator.
Unix shells (and PowerShell) has 2 types of strings;
Expandable strings "$a" where $a is expanded with the value of the variable $a, and literal strings '$a'where the literal string$a` is passed.

In the case of Windows a string such as 'foo bar' is 2 tokens ('foo and bar') where Unix will see a single string foo bar.

Null vs Empty Strings

Most shells don't have the same definition of null that PowerShell does.
In bash, for example, commands are invoked via strings, so naturally, an empty string can be easily notated with '' or "".
Similarly, in CMD.EXE an empty string may be passed with "".

However, the following should not result in empty strings being added:

  • $null
  • Reference to a variable which is unassigned
  • Reference to a variable which is assigned the value $null
  • When a collection wherein an element of the collection has the value $null

Non-null elements of a collection will be bound:

  • @($null, 1, $null) will bind a single value 1 as a parameter value
  • @($null, 1, $null, 2, $null) will bind 2 values (1 and 2) as parameter values
  • @($null, 1, $null, '', 2) will bind 3 values (1, '', 2) as parameter values

examples:

PS> useradd -g 501 -u 1001 -p $null sam # error - "sam" is used as password and no username is passed
PS> $a = $null
PS> useradd -g 501 -u 1001 -p $a sam # error - "sam" is used as password and no username is passed
PS> $a = @()
PS> useradd -g 501 -u 1001 -p $a sam # error - "sam" is used as password and no username is passed
PS> $a = @($null)
PS> useradd -g 501 -u 1001 -p $a sam # error - "sam" is used as password and no username is passed
PS> $a = ''
PS> useradd -g 501 -u 1001 -p $a sam # no error - empty string is passed as password
PS> $a = @('')
PS> useradd -g 501 -u 1001 -p $a sam # no error - empty string is passed as password
PS> $a = @('',"sam")
PS> useradd -g 501 -u 1001 -p $a # no error - empty string is passed as password and same is passed as username value
PS> $a = @("-g",501,"-u",1001,"-p",'',"sam")
PS> useradd $a # no error
PS> $a = @("-g",501,"-u",1001,$null,$null,$null,$null,"-p",'',"sam")
PS> useradd $a # no error - nulls are not passed

Additional Examples

S> msiexec /i A:\Example.msi PROPERTY="Embedded ""Quotes"" White Space" # no error - embedded quotes are passed to native executable
PS> msiexec /i A:\Example.msi PROPERTY="Embedded White Space" # no error - embedded spaces are passed to native executable
PS> msiexec /i A:\Example.msi PROPERTY="" # no error - empty string is passed to native executable

Globbing Considerations

Globbing will need to be done where appropriate. This behavior is platform dependent, so on Windows systems, globbing is not performed but is provided on Linux and Mac systems. This is because most utilities on Windows do their own globbing, but on Linux and Mac globbing is done by the shell. The current behavior for globbing does this and needs to remain unchanged to ensure that the scenarios continue to work on each platform. When a glob fails, the string provided shall be sent to the application without alteration.

The following example shows the difference based on platform:

PS> # on Windows
PS> .\echoit.exe rm c:\tmp\dd\f*
Argument 1 <rm>
Argument 2 <c:\tmp\dd\f*>

# on Mac/Linux
PS> trace-command -pshost -name parameterbinding { /bin/ls /tmp/dd/f* }
DEBUG: 2021-02-08 17:12:03.3886 ParameterBinding Information: 0 : BIND NAMED native application line args [/bin/ls]
DEBUG: 2021-02-08 17:12:03.3887 ParameterBinding Information: 0 :     BIND argument [/tmp/dd/f1 /tmp/dd/f2]
DEBUG: 2021-02-08 17:12:03.3956 ParameterBinding Information: 0 : CALLING BeginProcessing
/tmp/dd/f1
/tmp/dd/f2
PS> trace-command -pshost -name parameterbinding { /bin/ls /tmp/dd/fff* }
DEBUG: 2021-02-08 17:12:23.0600 ParameterBinding Information: 0 : BIND NAMED native application line args [/bin/ls]
DEBUG: 2021-02-08 17:12:23.0601 ParameterBinding Information: 0 :     BIND argument [/tmp/dd/fff*]
DEBUG: 2021-02-08 17:12:23.0670 ParameterBinding Information: 0 : CALLING BeginProcessing
ls: /tmp/dd/fff*: No such file or directory

Unsupported or Requiring Alteration

Some elements may not be used as they represent PowerShell tokens.

  • An embedded semi-colon ; is not allowed. PowerShell will parse this as a statement separator.
  • An open curly-brace will be interpreted as the beginning of a scriptblock

for example:

msiexec /p msipatch.msp;msipatch2.msp /n {00000001-0002-0000-0000-624474736554} /qb

  • This is not allowed because of the embedded ; which PowerShell will turn into 2 commands (; is a statement separator) this string must be quoted.
  • This is also not supported because of the use of ScriptBlock syntax. PowerShell can not determine if the ScriptBlock is a command (or in this case a guid).
    To execute this command, quote the problematic strings or use --% after the executable.

msiexec /p 'msipatch1.msp;msipatch2.msp' /n '{00000001-0002-0000-0000-624474736554}' /qb

or

msiexec --% /p msipatch1.msp;msipatch2.msp /n {00000001-0002-0000-0000-624474736554} /qb

NB: The behavior for supporting ScriptBlocks as strings could be supported via an allow-list for known executables. This may be needed because the amount of GUIDs (with braces) is used in a number of both Windows and Non-Windows utilities. However, managing the list of utilities may be burdensome.

Improved Tracing

The parameter binding tracing code for native executables is not currently implemented which makes debugging issues when execution native applications very difficult. Tracing for current parameter binding and new behavior shall be provided. In the case of the old style, the path to the executable and the string which makes up the Arguments property of the StartInfo object shall be provided. For the new behavior, since the arguments are a list, each element of the list shall be presented.
The following transcript shows how the tracing shall appear.

# new style - native arguments are bound to ArgumentList property
PS > trace-command -PSHOST -Name ParameterBinding { ~/echoit foo="bar ""blob"" bar" zap foo:bar:baz,bip,bar }
DEBUG: 2021-02-04 17:28:54.5674 ParameterBinding Information: 0 : BIND NAMED native application line args [/Users/james/echoit]
DEBUG: 2021-02-04 17:28:54.5674 ParameterBinding Information: 0 :     BIND cmd line arg [foo=bar "blob" bar] to position [0]
DEBUG: 2021-02-04 17:28:54.5675 ParameterBinding Information: 0 :     BIND cmd line arg [zap] to position [1]
DEBUG: 2021-02-04 17:28:54.5675 ParameterBinding Information: 0 :     BIND cmd line arg [foo:bar:baz,bip,bar] to position [2]
DEBUG: 2021-02-04 17:28:54.5728 ParameterBinding Information: 0 : CALLING BeginProcessing
Argument 1 <foo=bar "blob" bar>
Argument 2 <zap>
Argument 3 <foo:bar:baz,bip,bar>

# old style - native arguments are bound to Arguments property
PS > trace-command -PSHOST -Name ParameterBinding { ~/echoit foo="bar ""blob"" bar" zap foo:bar:baz,bip,bar }
DEBUG: 2021-02-04 17:29:01.9987 ParameterBinding Information: 0 : BIND NAMED native application line args [/Users/james/echoit]
DEBUG: 2021-02-04 17:29:01.9987 ParameterBinding Information: 0 :     BIND argument ["foo=bar "blob" bar" zap foo:bar:baz,bip,bar]
DEBUG: 2021-02-04 17:29:02.0058 ParameterBinding Information: 0 : CALLING BeginProcessing
Argument 1 <foo=bar blob bar>
Argument 2 <zap>
Argument 3 <foo:bar:baz,bip,bar>
PS > 

# view of tracing when --% is used
PS /Users/james> trace-command -PSHOST -name parameterbinding { ~/echoit --% 'foo bar' a,b,c "one two" "a\ b\ c"
>> }
DEBUG: 2021-02-08 17:07:10.0199 ParameterBinding Information: 0 : BIND NAMED native application line args [/Users/james/echoit]
DEBUG: 2021-02-08 17:07:10.0199 ParameterBinding Information: 0 :     BIND cmd line arg ['foo] to position [0]
DEBUG: 2021-02-08 17:07:10.0199 ParameterBinding Information: 0 :     BIND cmd line arg [bar'] to position [1]
DEBUG: 2021-02-08 17:07:10.0199 ParameterBinding Information: 0 :     BIND cmd line arg [a,b,c] to position [2]
DEBUG: 2021-02-08 17:07:10.0200 ParameterBinding Information: 0 :     BIND cmd line arg ["one] to position [3]
DEBUG: 2021-02-08 17:07:10.0200 ParameterBinding Information: 0 :     BIND cmd line arg [two"] to position [4]
DEBUG: 2021-02-08 17:07:10.0200 ParameterBinding Information: 0 :     BIND cmd line arg ["a\] to position [5]
DEBUG: 2021-02-08 17:07:10.0200 ParameterBinding Information: 0 :     BIND cmd line arg [b\] to position [6]
DEBUG: 2021-02-08 17:07:10.0200 ParameterBinding Information: 0 :     BIND cmd line arg [c"] to position [7]
DEBUG: 2021-02-08 17:07:10.0261 ParameterBinding Information: 0 : CALLING BeginProcessing
Argument 1 <'foo>
Argument 2 <bar'>
Argument 3 <a,b,c>
Argument 4 <"one>
Argument 5 <two">
Argument 6 <"a\>
Argument 7 <b\>
Argument 8 <c">

Additional considerations for tracing

It may be desirable to add additional tracing which provides information on the parameters as they were provided.
The tracing above is created at the point where the StartInfo object is populated,
and it may be useful to see the parameter before it is altered by globbing, etc.

Current Implementation

When PowerShell starts a new native process it takes all the arguments provided and attempts to stitch together the various parts into a single string (which is assigned to the Arguments property of the StartInfo object). This is done with some problematic behavior; empty strings '' are explicitly stripped, embedded quotes and spaces are "lost" and require addition escaping.

New Behavior

I think we can do better and reduce the effort and internal complexity when calling native applications. Dotnet has added a new property to the StartInfo object called ArgumentList which allows you to provide the arguments to the command as a collection of strings, alleviating the need to stitch the arguments into a single string. We can take advantage of this new API to reduce the complexity of our code. However, we should maintain backward compatibility if we can, so rather than producing new, breaking behavior via an experimental feature, I suggest that we provide a new runtime behavior based on a PowerShell variable. This allows users to change the behavior without restarting the PowerShell process and can be used when desired.
By not changing the default behavior we can provide users an easy way opt-in to the new behavior. Telemetry can be added if desired to capture the count of how many times the current way is used in comparison with this new implementation.

NB: This proposal will actually increase the internal complexity of our code because we'll have 2 ways of calling native applications. Hopefully, this would be temporary and we can use the new APIs exclusively in the future and deprecate the current code.

Tools used

The following utility is used to echo all passed parameters given in the examples above. This does not rely on the CLR runtime, but may be compiled for all platforms.

#include <stdio.h>
int main(int argc, char *argv[])
{
    for(int i = 1; i < argc; i++) {
        printf("Argument %d <%s>\n", i, argv[i]);
    }
    return 0;
}

We can add this to the tools for build if needed (either by binaries checked in for each platform or build)

Related Links:

Pull Requests

Issues

@JamesWTruher JamesWTruher added Issue-Enhancement the issue is more of a feature request than a bug Needs-Triage The issue is new and needs to be triaged by a work group. labels Feb 9, 2021
@SteveL-MSFT SteveL-MSFT added this to the 7.2-Consider milestone Feb 9, 2021
@SteveL-MSFT SteveL-MSFT added the WG-Engine core PowerShell engine, interpreter, and runtime label Feb 9, 2021
@bergmeister
Copy link
Contributor

By not changing the default behavior we can provide users an easy way opt-in to the new behavior

I suggest to rather have the new behavior by default and use the PowerShell variable rather for opting out of the new behavior. Especially with previews, it will allow people to discover whether they need to adapt their code or to give feedback that would allow tweaking of this new feature.

@kilasuit
Copy link
Collaborator

kilasuit commented Feb 9, 2021

Should this not go as an experimental feature as opposed to using an environmental variable to control it?

@sba923
Copy link
Contributor

sba923 commented Feb 16, 2021

By not changing the default behavior we can provide users an easy way opt-in to the new behavior

I suggest to rather have the new behavior by default and use the PowerShell variable rather for opting out of the new behavior. Especially with previews, it will allow people to discover whether they need to adapt their code or to give feedback that would allow tweaking of this new feature.

How often would that break existing code?

@mklement0
Copy link
Contributor

I think the right approach with respect to enabling the new behavior is:

  • Have the new preference variable default to the OLD behavior, and require opt-in to the new behavior; as suggested in Native invocation using ArgumentList #14692 (comment), it could be a $PSNativeCommandArgumentPassing (perhaps $PSNativeArgumentPassing is enough) with possible values Legacy (default, old behavior) and Standard (new behavior).

    • Needing an opt-in is highly unfortunate, given that argument-passing to external programs is a core mandate of any shell, but if backward-compatibility is paramount, it's our only choice.
  • Introduce a temporary experimental feature that makes the engine act as if the opt-in were in place - i.e. it acts as if $PSNativeCommandArgumentPassing were set to Standard (even if it is set to Legacy or the variable is missing); thereby, the new behavior will be in effect by default for preview users.

@mklement0
Copy link
Contributor

mklement0 commented Feb 21, 2021

Update: See #15143 for the most current version of this proposal; however, the post below additionally provides some background information on parameter-passing on Windows.


As for the proposal:

  • Being able to trace native parameter passing will be a welcome addition on Windows (and in legacy mode on Unix).

  • The following behaviors mentioned in the OP are incidental to the proposal:

    • No changes from the current behavior:
      • $null values are already ignored, both as individual arguments and as collection element.
      • native globbing already works (though, as an aside, it is worth mentioning that this globbing is always case-insensitive, unlike the native behavior on Linux, and you get no control over whether hidden items are also matched - they're not)
    • N/A, due to how processes are launched on Unix-like platforms (the quoting styles of POSIX-compatible shells such as bash are irrelevant here):
      • Quoting only applies to shells on Unix, whose job is to parse a command line - a single string encoding a command name and its arguments - into the command name and the array of verbatim arguments to pass to it.
      • Unlike on Windows, (non-shell) processes do not receive a command line - they only ever receive an array of verbatim arguments - that's precisely what .ArgumentList gives us, and it fully solves all problems on Unix.

Unfortunately, things aren't quite so simple on Windows, due to launched processes receiving a command line that they themselves must parse. This very unfortunate design forces processes to take on part of a job that should be exclusively a shell's responsibility: parsing this command line into an array of verbatim arguments.

Aside from placing undue burden on non-shell processes, it opens the door to individual processes interpreting their command line however they want:

While there is a widely adhered-to convention for how to encode multiple arguments in such a process command line - namely the ones used by Microsoft's C / C++ compiler / the CLR (which I'll call the C convention here) - no single encoding is guaranteed to work with all programs:

  • The most prominent exception is cmd.exe - and therefore calls to batch files: they accept only "" as an escaped ", not the \" required by the C convention); while Microsoft compiler-generated executables also support "", there are third-party programs that support only \")

  • An additional problem is that batch files unfortunately and inappropriately parse their arguments as if they had been passed from inside cmd.exe, which causes something like .\foo.cmd http://example.org?foo&bar to break due to & being misinterpreted as a statement separator. Using "http://example.org?foo&bar", i.e. quoting from PowerShell doesn't help, because PowerShell - justifiably - omits the quotes when it rebuilds the process command line behind the scenes, given that value contains neither spaces nor embedded " chars.

    • This is especially problematic given that the CLIs of many high-profile environments (e.g., Azure, Node.js) use batch files as their CLI entry points, so that something like az ... http://example.org?foo&bar predictably fails.
  • Calling cmd.exe /c <command-line> or cmd.exe /k <command-line> directly with a command line to be executed through a happy accident actually currently works as intended, without a workaround - and that behavior must be retained.

  • Finally, many programs are particular about partial quoting of arguments, notably msiexec.exe with property arguments such as PROP="VALUE WITH SPACES"; purely syntactically, "PROP=VALUE WITH SPACES" (which is what PowerShell currently sends) should be equivalent (and if you let C runtime / CLR parse it, is - the resulting verbatim string is PROP=VALUE WITH SPACES in both cases), but in practice it is not.

    • PowerShell should not pay attention to the original quoting on the PowerShell command line in an attempt to emulate it when re-encoding behind the scenes; no such quoting may be present to begin with (e.g., PROP=$someValuePossiblyWithSpaces), and users generally shouldn't have to worry about such intricacies - see below.

It's impossible for PowerShell to fully solve these problems, but it makes sense to make accommodations for these exceptions, so as to make the vast majority of calls just work.

For the remaining, edge cases there is:

  • --% for console applications
  • Start-Process for GUI-subsystem applications with a CLI such as msiexec which allows you to fully control the process command line by passing a single string to -ArgumentList (in a pinch you can also use it with console applications, but you lose stream integration).

In concrete terms this means for the new behavior:

After PowerShell's own parsing, once the array of verbatim arguments - stripped of $nulls - to pass on is available:

  • On Unix-like platform:

    • Pass that array to .ArgumentList - that is all that is ever needed.
  • On Windows:

    • Except for the cases detailed below, also pass that array to .ArgumentList - behind the scenes, .NET performs the necessary re-encoding based on the C conventions for us, and any conventional CLI should interpret the result correctly.

    • The following exceptions may apply independently or in combination, and they require manual re-encoding by PowerShell (with assignment to .Arguments, as currently):

      • If the target command is a batch file:

        • use "" (rather than \") to escape embedded verbatim " (and ensure enclosure in syntactic "...", even if the value has no spaces)
        • "..."-enclose any argument that contain no spaces (such arguments are normally not quoted) but contain any of the following cmd.exe metacharacters: & | < > ^ , ; (while , and ; have no impact on arguments pass-through with %*, they serve as argument separators in intra-batch file argument parsing; this also applies to =, but, unfortunately, passing something like FOO=bar as "FOO=bar" conflicts with the accommodation for msiexec-style CLIs below).
      • Irrespective of the target executable, if any of the arguments have the form of a misexec-style partial-quoting argument, apply double-quoting only to the "value" part (the part after the separator):

        • Specifically, if an argument (a) matches regex ^([/-]\w+[=:]|\w+=)(.+)$, and (b) the part after = or : requires double-quoting (either due to containing spaces and/or an embedded " and/or, in the case of a batch file containing cmd.exe metacharacters), leave the part up to and including = or : unquoted, and double-quote only the remaining part.
        • Examples:
          • The following PowerShell arguments:
            • FOO='bar none', -foo:$value (with $value containing verbatim bar none), /foo:bar` none (and even quoted-in-full variants 'FOO=bar none', ....)
          • would end up in the .Arguments command line as follows:
            • FOO="bar none", -foo:"bar none", /foo:"bar none"
      • Finally, a current behavior that must be retained, i.e. no escaping of embedded " is called for in the very specific case of cmd.exe being called with either the /c or the /k option followed by a single argument (with spaces) representing a cmd.exe command line in full.

        • See below for details, including the proposal for an optional additional accommodation that would make sense.

Again, these are reasonable accommodations to make, which:

  • allow users to focus solely on PowerShell's syntax
  • should make the vast majority of calls just work.
  • are easy to conceptualize and document - the proposed tracing should help too.

I invite everyone to scrutinize these accommodations to see if they're complete, overzealous, ...

This is a chance to finally cure all native quoting / argument-passing headaches - even if only by opt-in.

(To experiment with the proposed behaviors up front (based on my personal implementation that sits on top of the current behavior), you can use Install-Module Native and prepend ie to command lines; if the proposed changes are implemented, such a stopgap will no longer be necessary, although it can still help on earlier versions.)

@SeeminglyScience
Copy link
Collaborator

  • Have the new preference variable

It should be noted that additional runtime behavior altering preference variables is not without risk. With each one added, the amount of ceremony required to guard against inherited preferences is increased heavily.

I'm not weighing in on whether it's the right call in this scenario, just want to make sure that's considered.

@mklement0
Copy link
Contributor

mklement0 commented Feb 22, 2021

Yes, that is a concern with all preference variables, and the lack of lexical scoping is surely a candidate for #6745 (a similar concern arose recently with respect to allowing opt-in to using / as the path separator on Windows, #10509 (comment)).

The - not exactly obvious - workaround is to use the $private scope:

# Predefined preference variable:
#  - defaults to 'Legacy'
#  - is NOT defined with option 'AllScope'
$global:PSNativeArgumentPassing = 'Legacy'

& {

    # Override the global preference variable *for this scope only*.
    $private:PSNativeArgumentPassing = 'Standard'

    "In child scope: $PSNativeArgumentPassing"

    # Thanks to $private, descendant scopes again see the global variable.
    & {
      "In grandchild scope: $PSNativeArgumentPassing"
    }

}

I hope there's no question that:

  • introducing the new behavior is a must.
  • in-session control must be available as to which scopes adopt the new behavior vs. which ones do not.

In light of that, @SeeminglyScience: can you think of a different current-scope-only opt-in mechanism?

@SeeminglyScience
Copy link
Collaborator

Yes, that is a concern with all preference variables

It's a little less of an issue since most of the existing preference variables are only going to change behavior that is already leaking to the user. Like error preference, usually when that changes runtime behavior it's when you're emitting an error to the user.

I do say "usually" there because it can for sure end up forcing a different code path, but doesn't change the meaning of your code. Also either way, the more that are added the harder it is to guard against.

In light of that, @SeeminglyScience: can you think of a different current-scope-only opt-in mechanism?

Current scope isn't necessarily what you want either, too many things create a new scope where most users would not necessarily expect it.

Parser based lexical scoping does solve that problem, but creates a new one in the form of a new dialect which is very expensive in the long term.

I don't know of a way to make a change like this without one of these extra chunks of complexity or breaking changes.

@mklement0
Copy link
Contributor

Thanks, @SeeminglyScience - sounds like a preference variable is our only - imperfect - realistic short-term option.

(A quick aside: $PSDefaultParameterValues is the one preference variable that potentially has the farthest-reaching consequences, especially since modifying it directly in any scope modifies the global copy; fortunately, it doesn't seem to be used all that much in practice).

In practice, in terms of guiding users, this means:

  • Set the global copy of $PSNativeArgumentPassing to 'Standard' if you want all code in your session to exhibit the new behavior.

  • Set a (non-global) scope-specific copy for that scope and all its descendant scopes to exhibit the new behavior - but not across module boundaries.

  • Set $private:PSNativeArgumentPassing to exhibit the new behavior in that scope only - which requires knowledge of what does and doesn't run in a child scope (the rules for which are indeed not always clear - see Do delay-bind scriptblocks / scriptblocks in calculated properties run in a child scope by design? #7157).

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Feb 24, 2021

A good experiment might be to try a whole lot of different build scripts for that use various toolsets and see how often they break with the new setting. That will be a prime example of accidental inheritance with often complex executable arguments.

Personally I'd still lean toward the stop parsing sigil being the way to go, but in all honesty this issue has rarely affected me in a meaningful way so ymmv (Edit: apparently because I coincidentally tend to bypass it with cmd.exe /c).

@mklement0
Copy link
Contributor

mklement0 commented Feb 24, 2021

I'd still lean toward the stop parsing sigil

Unfortunately, --%, the stop-parsing symbol:

this issue has rarely affected me in a meaningful way

I presume that's because you're a developer rather than a sysadmin or DevOps person.

As the following can attest:

the issue at hand is a real, long-standing pain point.

And the issue is an even bigger one on Unix, where - unlike on Windows - many capable native utilities exist, and that for performance reasons and due to lack of binary pipeline support resorting to native utilities (including the native shell) is sometimes a must.

It all comes back to this striking example (run on a Unix-like platform):

PS> /bin/echo '{ "foo" : 1 }'
{ foo : 1 }  # !! double quotes were effectively stripped

This is such blatantly broken behavior that you can't help but wonder why this hasn't been fixed - even if only on an opt-in basis - in the 14+ years of PowerShell's existence.

Stack Overflow
Stack Overflow | The World’s Largest Online Community for Developers
Super User
Q&A for computer enthusiasts and power users

@SeeminglyScience
Copy link
Collaborator

Unfortunately, --%, the stop-parsing symbol:

I mean the new one being proposed in a different issue somewhere.

I presume that's because you're a developer rather than a sysadmin or DevOps person.

I'm a sysadmin.

As the following can attest:

I included that last part as a disclaimer on my opinion, not as a dismissal of the need. Put more casually and a bit exaggerated it would be "I think it should be X but what I do I know, I don't run into this problem".

@mklement0
Copy link
Contributor

mklement0 commented Feb 24, 2021

I mean the new one being proposed in a different issue somewhere.

I assume you mean the aforementioned #13068 ("native operator") - its purpose is different, requires you to apply a different shell's syntax and, without using a (here-)string as enclosure, is subject to the same conceptual headaches as --% while generally making it hard to integrate PowerShell variables / expressions in a given call. In short: it is not meant to address the problem at hand, and it would so poorly - see #13068 (comment)

That said, a new call operator - to be used explicitly, in lieu of & (which also came up in the same thread, at #13068 (comment)) - might be a low-ceremony alternative that avoids the preference-variable scoping headaches.

Finding the right sigil combination (I don't think a single character is an option), may be a challenge (&! was mentioned), and, of course, it would be a very visible reminder in every call that extra effort is needed to get argument-passing to act correctly.

I'm a sysadmin.

Kudos on the extraordinary depth of your programming knowledge (just to be very clear: I mean it).

"I think it should be X but what I do I know, I don't run into this problem".

Understood. I just wanted to complement that with a transpersonal perspective, to leave no doubt that many others do struggle with this.

@mklement0
Copy link
Contributor

mklement0 commented Mar 14, 2021

I made a mistake (since corrected, and it is correct in the ie function) in the list of metacharacters that in batch-file calls should trigger double-quoting even for space-less arguments: the correct list is & | < > ^ , ;

The reason that , and ; are on that list is that cmd.exe interprets them as argument separators in an unquoted token; for instance, in .\foo.cmd 'a,b' and .\foo.cmd 'a;b' the respective argument is currently (justifiably) passed as unquoted a,b and a;b to batch files, and if they do individual parameter processing with %1, ... or shift (as opposed to all-arguments pass-through with %*), they see two arguments: a binds to %1, and b to %2 - that is, the calls are effectively the same as using the usual whitespace-based argument separation: .\foo.cmd a b.

  • Making PowerShell pass "1,2" and "1;2" in this case preserves the ability to pass such arguments as-is, without needing to resort to --% - and if truly two arguments are to be passed, whitespace separation can be used.

  • Note that that above additionally applies to =, but it is not on the list, because it conflicts with the other accommodation, namely the one for msiexec-style programs: <name>=<value> should always leave the <name>= part unquoted.

    • Given the diminishing importance of batch files (other than mere CLI entry points that pass all arguments through), it makes sense to me to prioritize this accommodation.
    • The implication is that either .\foo.cmd --% "a=b" or cmd /c '.\foo.cmd "a=b"' must be used to pass verbatim a=b to a batch that does individual parameter processing.

Again, what we should strive for is accommodations that:

  • make the vast majority of calls "just work"
  • based on rules that are as easy to conceptualize and remember as possible, and are clearly documented along with workarounds for the rare remaining edge cases.

@mklement0
Copy link
Contributor

mklement0 commented Mar 16, 2021

There is another accommodation we need to make (the summary above now links here):

  • Direct cmd.exe calls that pass a command (line) to execute in the form of either cmd.exe [<options>] /c <command> or cmd.exe [<options>] /k <command>

  • For a single-argument <command> - which is ultimately the only robust way to pass a command line - this currently works as-is due to a happy accident:

    • PowerShell's current lack of escaping of embedded " is canceled out by cmd.exe not expecting any escaping and blindly stripping enclosing "...".
  • If we don't keep handling this case the way we do now (see below), we would break something that currently works as-is (no workaround needed).

    • As an aside: Passing a command line as a single string to the platform-native shell - without having to worry about quirks and specific option names - is what the Native module's ins / Invoke-NativeShell cmdlet is designed to do, and I think such a cmdlet is called for as part of PowerShell, as the proper solution to Call native operator #13068 (rationale in Call native operator #13068 (comment)).

Example:

# A command line to pass to cmd.exe for execution.
# If executed correctly, the following should print verbatim:
#       Ready to move on [Y,N]?Y
$cmdLine = ' "C:\WINDOWS\system32\choice.exe" /d Y /t 0 /m "Ready to move on" '

# !! Despite the lack of behind-the-scenes escaping of the embedded double quotes, 
# !! this currently works *as intended*, in both editions:
cmd /c $cmdLine

What PowerShell currently translates the list of arguments to behind the scenes and assigns to .Arguments is the following verbatim string (I've prepended cmd so that it's easy to execute the command interactively from a cmd.exe session):

cmd /c " "C:\WINDOWS\system32\choice.exe" /d Y /t 0 /m "Ready to move on" "

That is, the verbatim content of the string that PowerShell saw was blindly enclosed in overall "..." (because the content contains spaces), without escaping embedded " chars.

As it turns out, that's exactly what cmd.exe expects - and that's the behavior we need to retain in this case.


As a courtesy, we could additionally do the following:

cmd.exe situationally allows you to pass a /c / /k command line as multiple arguments, analogous to the PowerShell CLI's -Command / -c parameter.

However, that breaks - through cmd.exe's own fault - if both the first argument (the executable) and a subsequent argument are double-quoted; e.g.:

# Breaks from both cmd.exe and PowerShell, because both the first and a subsequent argument are double-quoted.
C:[PS]> cmd /c "C:\Program Files\PowerShell\7\pwsh" -noprofile -c " 'hi there' "
'C:\Program' is not recognized as an internal or external command,
operable program or batch file.

This could be avoided if PowerShell - in the event that multiple arguments follow /c or /k - transformed these multiple arguments into a single one enclosed in overall "..." quoting, with the constituent arguments internally double-quoted as needed (but, again, not escaped).

That is, PowerShell could automatically transform the above into the following verbatim string assigned to .Arguments (cmd again prepended to facilitate verification from an interactive cmd.exe session; spaces around the overall enclosing " added for readability):

# OK - transformed to single-argument command line, which works robustly - outputs 'hi there'
cmd /c " "C:\Program Files\PowerShell\7\pwsh" -noprofile -c " 'hi there' " "

In fact, this is what the ie function from the Native module does as of v1.2.1:

# Without `ie`, this breaks.
PS> ie cmd /c 'C:\Program Files\PowerShell\7\pwsh' -noprofile -c " 'hi there' "
hi there

@SeeminglyScience
Copy link
Collaborator

  • Direct cmd.exe calls that pass a command (line) to execute in the form of either cmd.exe [<options>] /c <command> or cmd.exe [<options>] /k <command>

Ahh that explains why I don't run into this issue much. The few times that I need to invoke an executable with enough argument complication, I tend to build the whole command line as a single string and pass it to cmd.exe /c.

@SteveL-MSFT SteveL-MSFT added Resolution-Fixed The issue is fixed. and removed Needs-Triage The issue is new and needs to be triaged by a work group. labels Apr 4, 2023
@SteveL-MSFT
Copy link
Member

Addressed as part of PSNativeCommandArgumentPassing feature

@ghost
Copy link

ghost commented Apr 6, 2023

This issue has been marked as fixed and has not had any activity for 1 day. It has been closed for housekeeping purposes.

@ghost ghost closed this as completed Apr 6, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Enhancement the issue is more of a feature request than a bug Resolution-Fixed The issue is fixed. WG-Engine core PowerShell engine, interpreter, and runtime
Projects
None yet
Development

No branches or pull requests

7 participants