Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localized ignore (Grammar Composition) #1272

Open
erezsh opened this issue May 3, 2023 · 3 comments
Open

Localized ignore (Grammar Composition) #1272

erezsh opened this issue May 3, 2023 · 3 comments

Comments

@erezsh
Copy link
Member

erezsh commented May 3, 2023

Suggestion

The global ignore mechanism is the only thing stopping us from having perfect grammar composition. (in theory, at least)

If we allow each rule to have different ignore terminals, we would be able to import any rule from any grammar.

It can replace the global ignore mechanism, if we rewrite the global %ignore directive as applying to each rule individually.

Optional:

We can also add new syntax for a localize ignore, for example:

conf_file: conf_statement+ %ignore CONF_COMMENT

Provide a clear and concise description of what the problem is, and what you would like to happen.

Describe alternatives you've considered

We could keep the global ignore mechanism as optional, but I don't see any reason to do so.

@MegaIng
Copy link
Member

MegaIng commented May 3, 2023

Didn't we previously try something into this direction? Wasn't the problem that is ambiguous what should be ignored on the borders between rules when they have different ignore sets?

I.e. does the %ignore apply between the rules in the rule, or also before and after the rule?

@erezsh
Copy link
Member Author

erezsh commented May 3, 2023

That's a good point, but I think it's reasonable to say that each rule is surrounded by its own ignore terminals on both sides. And since ignoring something twice is the same as doing so once, having two rules with the same ignore set means you only need to match once. So you end up with:

i1 S1 i1  i2 S2 i2     when i1 != i2
i1 S1 i1 S2 i1         when i1 == i2

and of course the logic chains.

@erezsh
Copy link
Member Author

erezsh commented May 3, 2023

I think the main challenge in implementing this would be that in LR parsers it's not clear which rule we are parsing until we reach a reduce. So if the two rules being considered have different ignore sets, there is no trivial way to resolve this at the rule level.

But I think it should work if we do it at the symbol level.

One option is to have each symbol aware of its ignore set. That can get a little tricky when trying to support things like i1 i2 S2, but maybe we can just concat the ignores into one regex, and get i1,2 S2.

Second option is to concat the regexes of the ignores to each terminal directly . The re engine should be able to handle it, and we actually get the added bonus of higher performance, because we're doing less re calls. (I did some experiments with the json parser, and concating the ignore makes it ~20% faster.)

However, it might have some unintended consequences, for example it might complicate the "unless" mechanism we use for keywords, or make it slower since we'll have to match the ignored twice.

Edit: actually for both options, probably the main challenge would be making sure that the "ignore_before" and "ignore_after" sets are minimal, and we're not matching the same regex twice on the borders of rules. But sounds like it should be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants