Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don’t parse the pipeline as text when it is directed from an EXE to another EXE or file. Keep the bytes as-is. #1908

Closed
be5invis opened this issue Aug 18, 2016 · 67 comments
Assignees
Labels
Issue-Bug Issue has been identified as a bug in the product WG-Engine core PowerShell engine, interpreter, and runtime WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance
Milestone

Comments

@be5invis
Copy link

be5invis commented Aug 18, 2016

Currently PowerShell parses STDOUT as string when piping from an EXE, while in some cases it should be preserved as a byte stream, like this scenario:

curl.exe http://whatever/a.png > a.png

or

node a.js | gzip -c > out.gz

Affected patterns include: native | native, native > file and (maybe) cat file | native.

@be5invis be5invis changed the title Don Don‘t parse the pipeline when it is redirected from an EXE to another EXE. Keep the bytes as-is. Aug 18, 2016
@be5invis be5invis changed the title Don‘t parse the pipeline when it is redirected from an EXE to another EXE. Keep the bytes as-is. Don‘t parse the pipeline when it is redirected from an EXE to another EXE or file saving. Keep the bytes as-is. Aug 18, 2016
@lzybkr lzybkr added Issue-Bug Issue has been identified as a bug in the product WG-Engine core PowerShell engine, interpreter, and runtime labels Aug 18, 2016
@be5invis
Copy link
Author

be5invis commented Aug 18, 2016

@vors @lzybkr
The current NativeCommandProcessor breaks:

  • LF line endings.
  • Non-ASCII text within UTF-8 without BOM header.
  • Binary file redirects (like curl.exe’s output).
  • > layouts text into 80 columns by default.

@be5invis be5invis changed the title Don‘t parse the pipeline when it is redirected from an EXE to another EXE or file saving. Keep the bytes as-is. Don’t parse the pipeline as text when it is directed from an EXE to another EXE or file. Keep the bytes as-is. Aug 18, 2016
@ForNeVeR
Copy link
Contributor

ForNeVeR commented Aug 25, 2016

Maybe add a cmdlet/operator to call native command and get its raw output (as a byte array / stream?), something like this:

# Consider ^& operator is an alias for Get-CommandRawOutputStream; this is just an example syntax
$output = ^& curl.exe http://whatever/a.png # $output now is a byte array or stream
$output > C:\Temp\file.png # file.png now is a valid image file

# This should be valid, too:
^& curl.exe http://whatever/a.png > C:\Temp\file.png

This opens an opportunity for some additional usage patterns (you can put this raw content into variables, and pipe raw content from native commands to managed cmdlets).

@ForNeVeR
Copy link
Contributor

ForNeVeR commented Aug 25, 2016

But maybe we could add a special kind of redirection operator (like 2>&1, 3>&1, *>&1 we already have), something like this (where %>&1 is a new redirection operator that redirects command "raw output" without processing it as a string):

$output = curl.exe http://whatever/a.png %>&1
$output > C:\Temp\file.png

# Or even this:
curl.exe http://whatever/a.png %> C:\Temp\file.png # which is just awesome

Overall: I don't think that this kind of redirection should be tied to only native commands or some limited list of usage patterns (e.g. native | native).

@be5invis
Copy link
Author

@ForNeVeR My proposal is that:

  1. For native | native, keep the bytes as-is. This is already purposed by @vors.
  2. For ps | native, add a set of cmdlets which encodes PS objects into bytes, perhaps ps | encode-text utf-8 | native.
  3. For native | ps, we can use the type system to identify whether a cmdlet accepts “raw input”. For cmdlets like out-file or maybe decode-text, it will keep the bytes from native, and other cmdlets will use the parsed string as its input.

@ForNeVeR
Copy link
Contributor

@be5invis okay, it seems like this proposal also supports all the relevant use cases I can imagine.

@GeeLaw
Copy link

GeeLaw commented Sep 2, 2016

Shouldn't this open up an RFC since this is a breaking change (changes the observed behaviour)?

A workaround for this is to provide a cmdlet that stores the content in a temporary file. A working example is Use-RawPipeline in PowerShell Gallery. The current implementation is to store the file, but it could also be streamlined so that the file doesn't have to be stored.

@jhclark
Copy link

jhclark commented Sep 6, 2016

See also #559, where this appears to be actively discussed and worked on by @vors on the PowerShell team.

@vors vors self-assigned this Sep 12, 2016
@vors vors added this to the 6.0.0 milestone Sep 12, 2016
@vors vors added the WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance label Sep 12, 2016
@vors
Copy link
Collaborator

vors commented Sep 12, 2016

Great discussion! Thank you all for the feedback.

I'd like to share my plans about this work:

  • In the scope if this issue we will address only native | native and native > file behavior. Note, that although it could be seen as a breaking change, it would not be the case for text output. The behavior would be preserved. Byte output would be much more reliable without wrapping bytes in PS strings. We agreed with @lzybkr that it's not breaking, hence no RFC process would be applied.
  • I don't see the immediate need in enhancing native | ps case, since PS is able to consume strings only from the native commands. Although, somebody may want to write function like
function foo
{
  param([byte[]]$rawBytes)
}

they may archive it with a temp file or some other technique as @GeeLaw pointed out.

  • Similarly, ps | native case has a well established pattern: when ps objects need to be passed to the native command, we apply implicit Out-String and pass everything as a text.
    Because PS doesn't use byte streams as a primitive for pipeline, I don't think we should develop special sugar to support it in the language directly. If there is a case, when it needs to be done, similar work-arounds can be used.

We can revisit the last two parts later, but I'd like to set expectations about scope of this issue.

@be5invis
Copy link
Author

@vors However the current “>” is identical to out-file, so you have to add a special version of out-file which takes raw bytes. So why don’t you give the ability to everyone?

@ravindUwU
Copy link

sleepy cat with the PowerShell logo on its little head, captioned "I WOULD BE UNSTOPPABLE" "IF NOT FOR THE BYTE STREAMS"

@Shayan-To
Copy link

We shouldn't need a new operator or some kind of special syntax ...

If PowerShell is ever to introduce a way to get the binary output stream of a binary command (or pipe it into a Cmdlet), through, say, a Stream object, then maybe a new operator would be needed, and that might complicate things here as well.

@btjgit
Copy link

btjgit commented Oct 26, 2022

We shouldn't need a new operator or some kind of special syntax ...

If PowerShell is ever to introduce a way to get the binary output stream of a binary command (or pipe it into a Cmdlet), through, say, a Stream object, then maybe a new operator would be needed, and that might complicate things here as well.

@Shayan-To, you left out the rest of my paragraph, which I believe encapsulates my counter-argument:

...It seems the challenge is the implementation and historical PowerShell design more than the objective. Let's not lose track of the objective due to implementation detail.

Let's elevate this to user experience, and put aside the implementation challenges for the moment...

My crude analogy: most GUI systems have a clipboard mechanism for transferring data from one application to another.

One approach would be to say only UTF-16 text is supported, and would be captured via a Ctrl+C keystroke. If the source data wasn't really text, then the data would be "transformed" into that format, even if it resulted in a loss/change of information. The end user would get something, but it might not be what they expect. The burden of dealing with that could be put on the user, and (for example) we could expect them to learn that if they want to copy HTML and paste it to an HTML-aware application and preserve the mark-up, they should know to instead know to use Alt+Ctrl+C to do the copy, and so on for each possible data format. That would be roughly equivalent to creating an assortment of new pipe operators in PowerShell.

However, another approach would be something similar to what Windows has done with its clipboard: if the sender has a preferred/native non-text format, and the receiver has indicated that it can accept that format, then a direct transfer is done in that format. And, the user can just use Ctrl+C (or the | operator) in all cases, and in most cases will get the expected result.

(I'm intentionally leaving out the implementation detail of how the clipboard accomplishes this - it's a user experience example, not an implementation example, and a different implementation would likely be more appropriate for PowerShell.)

So, my understanding of this baby-step proposal is:

  • If both the sender and receiver are binary executables, use a byte stream (or equivalent).
  • If the sender is binary and the receiver is a file, then use whatever the sender prefers (i.e., connect its output directly to the file, or equivalent).
  • Otherwise, continue to use the current "powerful" PowerShell behavior.
  • Most importantly, don't force the casual PowerShell user to manually make that choice by choosing different operators, options, or syntax when an as-is transfer is possible. Transforming/inferring the data type should be a last resort rather than a required step.

To be clear: I'm NOT proposing that a full general anything-to-anything system needs to be designed to take this first tiny step (a general solution might not be needed in the end). That could be deferred until if/when something came up in the future, such as the use case you describe. This baby step is just to fix an existing bug, which is that data output from traditional executables is being modified in certain well-defined cases where it doesn't need to be, and that is an unfortunate impediment to PowerShell adoption. Let's make PowerShell less of an annoyance so it can be unstoppable. (Poor kitten! 😉)

@SeeminglyScience
Copy link
Collaborator

There's some great discussion happening here about native | script pipes but it may be better suited in a new issue. This thread and the associated PR are explicitly about the native | native scenario. This issue will likely be closed when that PR is merged and I don't want to see the discussion lost in that.

@SeidChr
Copy link

SeidChr commented Nov 29, 2022

so i came by this issue, and wondered how it is not possible to send bytes to the input stream of another command.
while its not possible via the pipeline (yet), there are other ways:

script | native

function Send-Bytes {
    param(
        [string] $To, 
        [string] $Arguments
    )
 
    begin {
        $startInfo = [System.Diagnostics.ProcessStartInfo]::new()
        $startInfo.RedirectStandardInput = $true
        $startInfo.FileName = $To
        $startInfo.Arguments = $Arguments
 
        $process = [System.Diagnostics.Process]::Start($startInfo)
        $inputStream = $process.StandardInput.BaseStream
    }
 
    process {
        $inputStream.WriteByte([byte]$_)
    }
 
    end {
        $inputStream.Flush()
        $inputStream.Close()
        
        $process.WaitForExit()
    }
}

enables you to do this:
[byte]0xf4 | Sent-Bytes -To xxd
script | Send-Bytes -To native
or maybe with an alias
script |§ native

unfortunately i wasn't really able to capture the standard output properly, to pipe it back into powershell (.ReadByte() never returns), but maybe someone more capable wants to try? 😃

If this would pipe the output back into powershell, you could even chain multiple native commands

script | Send-Bytes -To native -Out | Send-Bytes -To native2
script |§ native -§|§ native2

@huettenhain
Copy link

Hey @SeidChr, workarounds like this do exist, see for example Use-RawPipeline. Unfortunately, I believe that many of the people supporting this issue are not looking for a workaround. Speaking for myself, I can assure you that no workaround would be sufficient for me to consider using PowerShell as my daily driver. Literally the only way it can work for me is if I can pipe byte streams from one application to another with a single pipe operator like, honestly, in any other shell.

@aetonsi
Copy link

aetonsi commented Jan 26, 2023

Fixing pwsh piping/redirecting is crucially needed... I cannot use pwsh professionally if i can't rely on basic redirection or piping of data...
Even using workarounds like cmd.exe /c myapp `| anotherapp has problems due to buffering and whatnot... Using third party modules like use-rawpipeline also has its problems. And they all require rewriting every invocation that pipes or redirects data, in all of your scripts

@potatoqualitee
Copy link

Indeed, it's the one thing stopping many Linux users from pwsh adoption as a primary shell. Has there been any progress on this front, pwsh team?

@SeeminglyScience
Copy link
Collaborator

Indeed, it's the one thing stopping many Linux users from pwsh adoption as a primary shell. Has there been any progress on this front, pwsh team?

@potatoqualitee yeah, my PR is ready for review now (#17857). It is a large change in a sensitive area of the code base, so don't expect a speedy review, but it's comin'.

@fjh1997
Copy link

fjh1997 commented Mar 5, 2023

Powershell regard output as string.But windows-1252 maps every byte to character,you can try use this encoding.
It's a trick though it is not efficient.

[console]::outputencoding=[console]::inputencoding=[System.Text.Encoding]::GetEncoding(1252)
$result=python zipbomb --mode=quoted_overlap --num-files=250 --compressed-size=21179
[IO.File]::WriteAllText("$pwd\result.zip",$result,[System.Text.Encoding]::GetEncoding(1252))

@potatoqualitee
Copy link

Great news, @SeeminglyScience !! Thank you, looking very forward to it.

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Jun 5, 2023

Forgot to mark the PR as resolving this issue! The PR was merged on 04/27, should be in the next preview 🎉 Closing

Oh also when it does make it to preview, remember to run Enable-ExperimentalFeature PSNativeCommandPreserveBytePipe and file any feedback if you find issues ❤️

Linux/Mac Usability automation moved this from Priority-High (???) to Done Jun 5, 2023
@potatoqualitee
Copy link

HECK YES! 🚀 Such great news for the non-Windows communities. Thanks so much

@ghost
Copy link

ghost commented Jun 8, 2023

to anyone interested, this is the commit:

2424ad8

personally I will be waiting until the next proper release (not preview) but regardless this is great news - thanks (I guess 7 years late is better than never 😀)

@mitchcapper
Copy link

wooooooooooo this works: zstd --decompress Digest-BLAKE-0.05.tar.zst --stdout | gzip -9 | gzip -dc | tar -axvf -

.\\Digest-BLAKE-0.05/
.\\Digest-BLAKE-0.05/BLAKE.xs
.\\Digest-BLAKE-0.05/Changes
.\\Digest-BLAKE-0.05/ex/
.\\Digest-BLAKE-0.05/ex/benchmark.pl
.\\Digest-BLAKE-0.05/lib/
...

For me though, with this feature enabled, you can't actually redirect stdout to a file, it always seems to go to the console with native commands followup bug filed though #19836

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Bug Issue has been identified as a bug in the product WG-Engine core PowerShell engine, interpreter, and runtime WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance
Projects
Development

Successfully merging a pull request may close this issue.