Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument Escaping #233

Open
3 of 4 tasks
wasabii opened this issue Mar 19, 2024 · 1 comment
Open
3 of 4 tasks

Argument Escaping #233

wasabii opened this issue Mar 19, 2024 · 1 comment

Comments

@wasabii
Copy link

wasabii commented Mar 19, 2024

Details

This isn't exactly a specific bug, because I don't know the intentions of CliWrap here. But I'd like to talk about better argument parsing, since I've hit a few issues where arguments are not done properly. Namely trailing quotes on Windows, and double backslashes with single and double quotes on Unix.

Right now, when you add arguments as a list, you get the option to set escape as a bool. And when you do so, it escapes them immediately upon adding, without consultig which OS it runs on. It then eventually builds this into a string simply by appending with spaces, and adding to ProcessStartInfo.Arguments.

So, issue. First, whether arguments need to be escape, and how they need to be escaped, depends on the OS, and the version of .NET you are running.

.NET Core added ProcessStartInfo.ArgumentList.

On Windows, the Win32 CreateProcess API only accepts a pointer to a string for the arguments. The process can only receive a single string as an argument. msvcrt and such in each program handles parsing this string and expanding it to argv* for the C main function. Hence, parsing rules are not part of the OS, they are part of each individual program. There is mostly agreement that the VC rules are used.

On Unix, the fork/exec functions take a pointer to an array of pointers to null-terminated characters. This is nice, because it means there are no parsing rules on Unix. The process natively takes a vector of arguments.

.NET handles both of these. First, if you provide Arguments, and are running on Windows, it doesn't actually need to escape anything: the user is expected to provide escaping of this string. And it's directly handed off to CreateProcess. In fact, this s why PSI.Arguments existed as a single-string to begin with: they assumed Windows.

.NET Core running on Unix however, operates differently. If you send Arguments (the single string), it tries to parse and split it internally BEFORE passing it to exec, because it has to, because the Unix process accepts a vector. This parsing logic is pretty bad. It is basically just splitting on string, and tab for some reason, and building the vector that way. It doesn't implement either the VC logic, nor any Unix shell logic. Its it's third new thing.

.NET Core when providing ArgumentList passes the list directly through to Unix by turning it into an argv without escaping anything. Exactly as you'd want.

So the problem here is there is no format you should use for Arguments which is correctly handled on both OS's. If you want to include quotes, for instance, those get passed to CreateProcess on Windows, then to the process, which handles quotes. If you pass quotes to Arguments on Unix, those don't get parsed, and they become part of the vector, and sent to the app.

So the only way to do is right is to use either ArgumentList on both, or Arguments on Windows with your own VC-style escaping and Arguments on Unix with a different escaping style. And this get's worse: because ArgumentList doesn't exist in .NET Standard 2.0, but does exist when running ON .NET Core, reflection needs to be used to check and set it.

So, kind of a terrible picture.

The problem here is CliWrap uses it's own escaping method, which is cribbed from .NET (ParseArguments). This implementation tries to convert the vector list CliWrap has (when you use Add(IEnumerable)), into VC-style arguments EVEN ON UNIX, and then hand them to ProcessStartInfo.Arguments, EVEN ON UNIX, which doesn't quite handle that format correctly.

And CliWrap doesn't provide a way to use ArgumentList.

So, two fold issue. When you pass true for escape your escaping things to a Windows-only format. Even on Unix. And two, when you use CliWrap, there are arguments you can add using Add(IEnumerable) that get messed up on Unix.

I don't realy see the point of a boolean to escape on Add. It's like, escape to what? Which OS' rules?

And, I think CliWrap should use ArgumentsList where available, and where the user has added an IEnumerable. Including .NET Standard (which requires Reflection).

All of this comes from me trying to write a custom MSBuild task that is better than Exec and runs on both Windows and Unix without weird rules. And I wanted to use Cliwrap.

Checklist

  • I have looked through existing issues to make sure that this feature has not been requested before
  • I have provided a descriptive title for this issue
  • I am aware that even valid feature requests may be rejected if they do not align with the project's goals
  • I have sponsored this project
@Tyrrrz
Copy link
Owner

Tyrrrz commented Mar 19, 2024

Yeah, there are 3 main reasons why the ArgumentList approach is not used:

  1. It's not available on all the frameworks that CliWrap targets. It's not just .NET Standard, but also the legacy .NET Framework. That means that, if running on a Unix platform but targeting .NET Standard or .NET Fx (via Mono), you still need to find a way to shove the arguments into a single string.
  2. .NET users are very, very used to be able to just do process.Arguments = "foo --bar baz" without initializing arrays and whatnot. As much as I'd prefer if nobody used the string overload of WithArguments(...), it's still really popular.
  3. CliWrap needs the final command line string (after all escaping is done) for exception messages and debugging purposes. It's important that, if something fails, the user can see which exact command was executed. That means that even if we were able to use ArgumentList (despite point 1), we'd still need to resolve the argument string somehow -- and we'd have to do that without creating an instance of Process.

Ultimately, because one way or another we'd still have to join the arguments into a single string, it makes sense to have that as the convergence point between different run-time environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants