Localized ignore (Grammar Composition) #1272

erezsh · 2023-05-03T08:58:29Z

Suggestion

The global ignore mechanism is the only thing stopping us from having perfect grammar composition. (in theory, at least)

If we allow each rule to have different ignore terminals, we would be able to import any rule from any grammar.

It can replace the global ignore mechanism, if we rewrite the global %ignore directive as applying to each rule individually.

Optional:

We can also add new syntax for a localize ignore, for example:

conf_file: conf_statement+ %ignore CONF_COMMENT

Provide a clear and concise description of what the problem is, and what you would like to happen.

Describe alternatives you've considered

We could keep the global ignore mechanism as optional, but I don't see any reason to do so.

MegaIng · 2023-05-03T09:45:03Z

Didn't we previously try something into this direction? Wasn't the problem that is ambiguous what should be ignored on the borders between rules when they have different ignore sets?

I.e. does the %ignore apply between the rules in the rule, or also before and after the rule?

erezsh · 2023-05-03T09:54:09Z

That's a good point, but I think it's reasonable to say that each rule is surrounded by its own ignore terminals on both sides. And since ignoring something twice is the same as doing so once, having two rules with the same ignore set means you only need to match once. So you end up with:

i1 S1 i1  i2 S2 i2     when i1 != i2
i1 S1 i1 S2 i1         when i1 == i2

and of course the logic chains.

erezsh · 2023-05-03T10:04:44Z

I think the main challenge in implementing this would be that in LR parsers it's not clear which rule we are parsing until we reach a reduce. So if the two rules being considered have different ignore sets, there is no trivial way to resolve this at the rule level.

But I think it should work if we do it at the symbol level.

One option is to have each symbol aware of its ignore set. That can get a little tricky when trying to support things like i1 i2 S2, but maybe we can just concat the ignores into one regex, and get i1,2 S2.

Second option is to concat the regexes of the ignores to each terminal directly . The re engine should be able to handle it, and we actually get the added bonus of higher performance, because we're doing less re calls. (I did some experiments with the json parser, and concating the ignore makes it ~20% faster.)

However, it might have some unintended consequences, for example it might complicate the "unless" mechanism we use for keywords, or make it slower since we'll have to match the ignored twice.

Edit: actually for both options, probably the main challenge would be making sure that the "ignore_before" and "ignore_after" sets are minimal, and we're not matching the same regex twice on the borders of rules. But sounds like it should be possible.

erezsh added the enhancement label May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Localized ignore (Grammar Composition) #1272

Localized ignore (Grammar Composition) #1272

erezsh commented May 3, 2023

MegaIng commented May 3, 2023

erezsh commented May 3, 2023 •

edited

erezsh commented May 3, 2023 •

edited

Localized ignore (Grammar Composition) #1272

Localized ignore (Grammar Composition) #1272

Comments

erezsh commented May 3, 2023

MegaIng commented May 3, 2023

erezsh commented May 3, 2023 • edited

erezsh commented May 3, 2023 • edited

erezsh commented May 3, 2023 •

edited

erezsh commented May 3, 2023 •

edited