Add anonymous token support. #42

pragmaticpandy · 2021-04-30T01:16:42Z

Work in progress; haven't fixed all the tests yet. Threw this together and wanted to confirm you are interested in the patch and get any other feedback before I spend more time on it.

See changes to the readme and OrTest to get a quick idea how this changes the public API.

#36

h0tk3y · 2021-05-02T20:30:32Z

@pragmaticpandy Thanks a lot for looking into it! I like the idea overall, but what actually bothers me is that anonymous tokens (or tokens collected from the grammar's parsers in general) are ordered implicitly. When the set of tokens is ambiguous, their order matters, and the easiest way to order tokens in the way you want is to declare them explicitly so that the precedence comes from the order of declaration.

What do you think about sorting the anonymous (parser-provided) tokens by their enclosing parsers order? I mean, if an anonymous token is declared and used in a parser that goes in a grammar before another parser, then this token takes precedence over the other parser's anonymous tokens? I'm not sure this kind of ordering is intuitive enough, but I'd say it's a bit more intuitive than ordering the tokens by their occurrences in the nested, complex parser structure.

I also think about experimenting with token-less parsing, where a Parser<T> can directly match characters of the input string. If I manage to get it to work fast enough, then the problem of explicit tokens declaration goes away, too, because there will be no need to tokenize the input sequence in the first place.

BenjaminHolland · 2021-07-15T04:42:55Z

@h0tk3y re: Tokenless parsers
I believe this is what Superpower (C#) does. Out of the box provides string-based tokenless parsing, and provides the tokenization layer above it.

pragmaticpandy · 2022-12-04T05:53:57Z

Apologies for abandoning this PR for so long.

Totally agree on explicit declaration being most clear.

I was burned on this issue again today because I tried to use something like the following:
val positiveInt by regexToken("\\d+") use { text.toInt() }
Being so simple, I expected it to just work; I typed it alongside my other tokens and didn't give it a second thought.

Then, I was confused for a while when I got a NoMatchingToken error.

What I really want is to have not been confused by such a situation; I have been at least twice now. My original draft PR suggests dealing with this by supporting anonymous tokens, but perhaps a better solution is simply to detect any anonymous tokens, and throw an exception that explains the situation and prompts the user to add explicit token declarations.

Add anonymous token support and some test refactoring.

e23a6f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add anonymous token support. #42

Add anonymous token support. #42

pragmaticpandy commented Apr 30, 2021

h0tk3y commented May 2, 2021 •

edited

BenjaminHolland commented Jul 15, 2021

pragmaticpandy commented Dec 4, 2022

Add anonymous token support. #42

Are you sure you want to change the base?

Add anonymous token support. #42

Conversation

pragmaticpandy commented Apr 30, 2021

h0tk3y commented May 2, 2021 • edited

BenjaminHolland commented Jul 15, 2021

pragmaticpandy commented Dec 4, 2022

h0tk3y commented May 2, 2021 •

edited