Memory leakage when applying recursive combinators #884

A4-Tacks · 2024-04-13T05:48:42Z

grammar;
A<I> = { "x" I "y" I "z", A<("." I)> }
P = A<()>;

lalrpop version: 0.20.0 ~ 0.20.2

The text was updated successfully, but these errors were encountered:

yannham · 2024-04-19T16:00:24Z

Hello,

Thanks for the report! Would you have, by any chance, more information? For example, and simple self-contained main.rs using this grammar and the tool used to measure the memory leak would help assess, reproduce and fix the issue.

dburgener · 2024-04-19T16:12:37Z

I've been looking at this a bit. The "leak" looks to me more like an infinite recursion, and it's during parser generation, rather than an issue in the generated parser.

Just dropping the provided grammar in a file, and running lalrpop on it should reproduce the issue.

It seems to be something about the fact that the recursive macro expansion isn't just passing through the args. Replacing A<("." I)> in the above grammar with A<I> "q" for example seems to work fine, as expected.

I haven't looked at the macro expansion code much in the past, so I'm still trying to get my head around what we do there to figure out what's going wrong.

dburgener · 2024-04-19T18:38:44Z

I think this is a fundamental limitation of how lalrpop implements macros.

lalrpop macros resolve the generic arguments into the actual callers, then treats every resolved generic as a grammar nonterminal. So, for example A<()> becomes a nonterminal after macro expansion in the above grammar. But we also need to resolve the generic in the recursive call. So we need to add nonterminals for A<("." ())>, A<("." ("." ()))>, and so on infinitely.

Common recursive macros, such as the ones http://lalrpop.github.io/lalrpop/tutorial/006_macros.html terminate because once we resolve the generics in the macro definition, we end up with a nonterminal we already have (for example the macro def for Tier<Op, NextTier> references Tier<Op, NextTier>, which resolves the generics to the callers Tier<ExprOp, Factor> and Tier<FactorOp, Term>, but we already already have those nonterminals from resolving the macro name, so we don't need to keep resolving them.

I think that rewriting the grammar to treat the generic as a nonterminal would work. Something like:

grammar;
I = { ("." I), () }
A = { "x" I "y" I "z" }
P = A;

That errors, because the original grammar is invalid totally besides this issue, but it doesn't recurse infinitely.

I'm not really sure about the following two questions:

Is detecting this case as simple as seeing if there's a macro call in a macro definition that would need to resolve as a new non-terminal (ie the resolve call didn't occur outside a macro definition)? Or does that miss something? (I haven't thought about the case of multiple macros calling each other yet. That is definitely something to consider...)
Does rewriting the generics as nonterminals actually make sense, or do we just want to error and recommend that the user do the rewriting themselves? I'm concerned that it might generate confusing errors if we later detect a conflict based on these generated nonterminals. We already generate nonterminals for the macros, so maybe doing it for the generics isn't that different? Not sure...

I'm out of time for today, but I might try to put together a proof of concept for the above approach in a few days, and we could discuss the above questions in PR discussion. I'm curious if anyone has any other thoughts or things I'm missing in the interim though.

Pat-Lafon · 2024-04-20T00:50:18Z

If we took inspiration from macro_rules macros, a solution might be to just add a macro recursion limit which elides any complexities in trying to detect these cases. It's a little niche for there to be a familiarity benefit though seeing a similar error come up in the context of macro_rules might help with understanding the error. I haven't looked into how to implement this but I imagine it would be minimally invasive. It's not the most principled solution though.

dburgener · 2024-05-10T18:17:52Z

Yeah, it's crude, and I was initially opposed, but as I thought on it, I think trying to do true detection is probably too big of a can of worms, and as you said, just detecting deep recurion is minimally invasive. I submitted #896 doing the recursion counting approach. Let me know what you think.

dburgener · 2024-05-17T14:11:04Z

Fixed with #896

dburgener mentioned this issue May 10, 2024

Detect deep recursion when expanding macros and error out #896

Merged

dburgener closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leakage when applying recursive combinators #884

Memory leakage when applying recursive combinators #884

A4-Tacks commented Apr 13, 2024

yannham commented Apr 19, 2024

dburgener commented Apr 19, 2024

dburgener commented Apr 19, 2024

Pat-Lafon commented Apr 20, 2024

dburgener commented May 10, 2024

dburgener commented May 17, 2024

Memory leakage when applying recursive combinators #884

Memory leakage when applying recursive combinators #884

Comments

A4-Tacks commented Apr 13, 2024

yannham commented Apr 19, 2024

dburgener commented Apr 19, 2024

dburgener commented Apr 19, 2024

Pat-Lafon commented Apr 20, 2024

dburgener commented May 10, 2024

dburgener commented May 17, 2024