Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leakage when applying recursive combinators #884

Closed
A4-Tacks opened this issue Apr 13, 2024 · 6 comments
Closed

Memory leakage when applying recursive combinators #884

A4-Tacks opened this issue Apr 13, 2024 · 6 comments

Comments

@A4-Tacks
Copy link

grammar;
A<I> = { "x" I "y" I "z", A<("." I)> }
P = A<()>;

lalrpop version: 0.20.0 ~ 0.20.2

@yannham
Copy link
Contributor

yannham commented Apr 19, 2024

Hello,

Thanks for the report! Would you have, by any chance, more information? For example, and simple self-contained main.rs using this grammar and the tool used to measure the memory leak would help assess, reproduce and fix the issue.

@dburgener
Copy link
Contributor

I've been looking at this a bit. The "leak" looks to me more like an infinite recursion, and it's during parser generation, rather than an issue in the generated parser.

Just dropping the provided grammar in a file, and running lalrpop on it should reproduce the issue.

It seems to be something about the fact that the recursive macro expansion isn't just passing through the args. Replacing A<("." I)> in the above grammar with A<I> "q" for example seems to work fine, as expected.

I haven't looked at the macro expansion code much in the past, so I'm still trying to get my head around what we do there to figure out what's going wrong.

@dburgener
Copy link
Contributor

I think this is a fundamental limitation of how lalrpop implements macros.

lalrpop macros resolve the generic arguments into the actual callers, then treats every resolved generic as a grammar nonterminal. So, for example A<()> becomes a nonterminal after macro expansion in the above grammar. But we also need to resolve the generic in the recursive call. So we need to add nonterminals for A<("." ())>, A<("." ("." ()))>, and so on infinitely.

Common recursive macros, such as the ones http://lalrpop.github.io/lalrpop/tutorial/006_macros.html terminate because once we resolve the generics in the macro definition, we end up with a nonterminal we already have (for example the macro def for Tier<Op, NextTier> references Tier<Op, NextTier>, which resolves the generics to the callers Tier<ExprOp, Factor> and Tier<FactorOp, Term>, but we already already have those nonterminals from resolving the macro name, so we don't need to keep resolving them.

I think that rewriting the grammar to treat the generic as a nonterminal would work. Something like:

grammar;
I = { ("." I), () }
A = { "x" I "y" I "z" }
P = A;

That errors, because the original grammar is invalid totally besides this issue, but it doesn't recurse infinitely.

I'm not really sure about the following two questions:

  1. Is detecting this case as simple as seeing if there's a macro call in a macro definition that would need to resolve as a new non-terminal (ie the resolve call didn't occur outside a macro definition)? Or does that miss something? (I haven't thought about the case of multiple macros calling each other yet. That is definitely something to consider...)
  2. Does rewriting the generics as nonterminals actually make sense, or do we just want to error and recommend that the user do the rewriting themselves? I'm concerned that it might generate confusing errors if we later detect a conflict based on these generated nonterminals. We already generate nonterminals for the macros, so maybe doing it for the generics isn't that different? Not sure...

I'm out of time for today, but I might try to put together a proof of concept for the above approach in a few days, and we could discuss the above questions in PR discussion. I'm curious if anyone has any other thoughts or things I'm missing in the interim though.

@Pat-Lafon
Copy link
Contributor

If we took inspiration from macro_rules macros, a solution might be to just add a macro recursion limit which elides any complexities in trying to detect these cases. It's a little niche for there to be a familiarity benefit though seeing a similar error come up in the context of macro_rules might help with understanding the error. I haven't looked into how to implement this but I imagine it would be minimally invasive. It's not the most principled solution though.

@dburgener
Copy link
Contributor

Yeah, it's crude, and I was initially opposed, but as I thought on it, I think trying to do true detection is probably too big of a can of worms, and as you said, just detecting deep recurion is minimally invasive. I submitted #896 doing the recursion counting approach. Let me know what you think.

@dburgener
Copy link
Contributor

Fixed with #896

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants