Create parser builder pipeline. #302

Mike111177 · 2020-08-15T21:41:04Z

Currently the only way to configure yargs-parser is by the following. (Excuse my mixed syntax)

require('yargs-parser")(args, opts) => Arguments

I propose creating a builder pipeline that allows you to preconfigure your opts before running the command by adding an opts function to the Parser interface

export interface Parser {
   //other stuff
   opts: (opts: Partial<Options>) => Parser;
}

This has a bunch of advantages:

Allows programmer to not keep track of opts variable when used multiple times.

const parse = require('yargs-parser').opts({...})
function myfunc1(){parse("no opts variable here")}
function myfunc2(){parse("no opts variable here")}

Allows parser to check options ahead of time. This means that if after you create your parser with opts, yargs-parser doesn't have to do any of the setup of validation twice. So if you do myParser("mystring") there is no need to
any of the options pre-processing twice. The first time the args are even referenced on the parser function are like 150 lines of code in:

yargs-parser/lib/yargs-parser.ts

Line 203 in 5f987aa

for (let i = 0; i < args.length; i++) {

Since this is not a breaking change you can also now throw an exception on checkConfiguration() errors, since this is a new opt-in feature anyway.
Really good interop with ideas from Hopefully ending any and all discussion about how to handle quotes.... #300 if considered:

//Oldway
const oldway = require('yargs-parser)
oldway("my command", {opts...})
//Can't throw opts error because that's breaking change
//Warning only happens when parser is used, instead of when options are defined

//To make a preconfigured bashlike parser
const parse = require('yargs-parser').opts({...}).bash
//Returns (arg: string, opts?: Partial<Options>) =>Arguments
//Can immediately throw if your options are invalid, faster debugging

parse("myweird option with a literal \\\" in it")
//Is able to skip option validation since it was already done.

parse("myweird option with a literal \\\" in it", {additional_opts})
//Can't skip validation, but original options will still have thrown error if malformed

parse(["my", "bunch", "of", "tokens"])
//Compiler error: bash tokenizer expects string, tokenizing tokens makes no sense

These are just a couple of thoughts. Other potential things that could come from this is some of the functionality of yargs being possible in yargs parser, maybe in the future the idea of commands.

The text was updated successfully, but these errors were encountered:

bcoe · 2020-09-20T21:12:52Z

These are just a couple of thoughts. Other potential things that could come from this is some of the functionality of yargs being possible in yargs parser, maybe in the future the idea of commands.

@Mike111177 I appreciate you taking the time to write this up 👍

The goal of the yargs-parser library is to be a (relatively) lightweight library, that given process.argv provides are parsed version, which a higher level library like yargs or meow (with both use the library) can build on top of.

I think if we were to extend too much on the API surface like this, we start to blur the line too much between what the goals are of yargs, vs., the goals of yargs-parser.

I think a good way to think of it might be the responsibilities of Node.js, vs., v8.

Mike111177 · 2020-10-26T17:29:24Z

Hmm, I kinda regret not including #300 in this issue, because this is kind of a response to both. I'll be quoting stuff from there too, I don't know if it is possible to merge issues.

@bcoe I totally agree with keeping the functionality of yargs, and yargs-parser separate and we'll defined. My concern is that the current responsibilities of yargs-parser is already pretty muddy. It serves as two roles currently:

Given a token array (usually passed via yargs) create an object that contains the arguments.
When used as a stand-alone package, tokenize a string and then feed into role 1.

Because most shells differ in behavior from yargs-parser tokenizer this can cause unexpected results from the user.

In an ideal world yargs parser would be split into two packages, something like yargs-tokenizer, and yargs-interpreter. This would make it so the yargs package doesn't need to include the yargs-tokenizer package, making yargs more lightweight. It would only need to include yargs-interpreter, which would be what yargs-parser is sans tokenizer. Obviously this would break backwards compatibility. But you could also for legacy support have yargs parser just include both the interpreter and the tokenizer. You would have a dependency tree that looks like this (including meow):

#300

I'm a proponent of decomposing libraries, but have definitely gotten some flack for yargs' footprint.

Honestly, in my opinion, it requires no extra effort to install a package with dependencies, especially if you are also the maintainer of that dependency. The only real argument I can think of against it is that maybe if you are using some preemptable cloud VM thing where small code footprint is critical for speed of download during initialization. And if that is the problem then why wouldn't you be webpacking your app anyway to minimize the size?

I think that with this method, you can both reduce the logic complexity and code footprint of yargs, as well as expand the functionality of yargs-parser, giving it a stricter conformance and improve user expectations.

Ultimately, in order to maximize the flexibility and user control while minimizing code footprint (and maintaining backwards compat) I think you need a 3 stage pipeline in which any component can be swapped out.

bcoe · 2020-10-27T15:04:36Z

@Mike111177 I appreciate these detailed posts 👏

To manage expectations, a complete overhaul of the yargs-parser library is not work that this project has the cycles to take on right now (or to help someone take on):

I only have a small amount of my spare time to guide the yargs project, and a complete overhaul of the core parser is not how I would like to spend it (the next major project I'd like to take on is figuring out how to make yargs itself better manage async commands).
As you've point out, we do have open bugs for the tokenizer, but in practice yargs is used successfully by millions of people each week -- I'm not getting bugs opened related to tokenization issues frequently enough that I believe it's worth the risk of a complete overhaul of the parsers' guts (the parser has changed little in the last 7 years from minimist, which it was forked from).
Generally, I prefer incremental changes, moving towards a new design, vs., a sweeping overhaul (especially once a project has reached a certain level of maturity).

Where I would rather start would to be to see if we could make the existing tokenizer a bit smarter, to address some of the bugs you bring up in #300:

Perhaps it will become clear that we can move some logic from the parser into the tokenizer, through looking at patterns in the bugs.

I would also be interested in seeing if we could simplify the logic in yargs-parser.ts, e.g., could we make the handling of arrays, dot notation, etc., handled in a more abstract way.

If I see a few incremental wins, that make the library more malleable, I'm certainly going to be more supportive of larger refactors in the future.

bcoe · 2020-10-27T15:07:17Z

@Mike111177 I'm not trying to discourage you, and would appreciate and support your contributions 🎉

I would just love for us to start with some initial wins, perhaps we can get a test suite in place for some other shells (using GitHub actions), this would be a great starting point.

Mike111177 · 2020-10-27T21:32:37Z

Ok, I think what I will do then is more precisely document each tokenizer (That being yargs and the various shells). This way at a minimum we can try to enforce the yargs tokenizer can emulate all of the common behavior of the four most common shells (bash, cmd, powers hell, and sh). This should also allow us to determine better what is the responsibility of the tokenizer vs the parser. Currently I am not totally certain on how to do the tests against shells it in a CI friendly manner (See #300), but with more precise documentation on the differences between existing tokenizers, we can make more precise tests.

Once that is done, we can determine the token style varients and generalize syntax for the parser.

bcoe · 2020-10-29T13:28:14Z

@Mike111177 that sounds like a great start to me.

Mike111177 · 2020-10-31T18:37:19Z

@bcoe should I copy #300 into here and close #300? These two issues are a lot more coupled than I originally realized.

bcoe · 2020-11-02T17:03:57Z

@bcoe should I copy #300 into here and close #300? These two issues are a lot more coupled than I originally realized.

Yes, let's combine the conversations.

bcoe added the feature label Aug 19, 2020

verhovsky mentioned this issue Jul 30, 2021

support bash's ANSI-C quoted strings #366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create parser builder pipeline. #302

Create parser builder pipeline. #302

Mike111177 commented Aug 15, 2020 •

edited

bcoe commented Sep 20, 2020

Mike111177 commented Oct 26, 2020 •

edited

bcoe commented Oct 27, 2020 •

edited

bcoe commented Oct 27, 2020

Mike111177 commented Oct 27, 2020 •

edited

bcoe commented Oct 29, 2020

Mike111177 commented Oct 31, 2020 •

edited

bcoe commented Nov 2, 2020

Create parser builder pipeline. #302

Create parser builder pipeline. #302

Comments

Mike111177 commented Aug 15, 2020 • edited

bcoe commented Sep 20, 2020

Mike111177 commented Oct 26, 2020 • edited

bcoe commented Oct 27, 2020 • edited

bcoe commented Oct 27, 2020

Mike111177 commented Oct 27, 2020 • edited

bcoe commented Oct 29, 2020

Mike111177 commented Oct 31, 2020 • edited

bcoe commented Nov 2, 2020

Mike111177 commented Aug 15, 2020 •

edited

Mike111177 commented Oct 26, 2020 •

edited

bcoe commented Oct 27, 2020 •

edited

Mike111177 commented Oct 27, 2020 •

edited

Mike111177 commented Oct 31, 2020 •

edited