You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With lalrpop 0.19.9, for certain grammars the expected tokens given for UnrecognizedTokens / UnrecognizedEOF errors frequently include invalid entries—tokens that could not be parsed successfully at the site of the error, regardless of what input follows. This was also the case on 0.19.8 and probably earlier releases. I found a minimal-ish grammar that reproduces the problem:
This is a simplified subset of the Unix shell grammar. Legal productions are things like X; case Q in R) A; B; ;; esac;. Given the input X we get an UnrecognizedEOF error. The only token that could turn this into a valid production is ";", but the expected tokens given by lalrpop are ")", ";", "in".
This seems to be happening because the top state in the parser when the error is hit could handle any of those lookaheads; it would reduce r"[A-Z]+" => Word and leave the lookahead token to be shifted by another rule. In the case of the input X only ";" could subsequently be accepted, but the expected token generation doesn't examine the state stack to determine this. The infrastructure to perform the full check does exist (for the parse table based codegen, although not for recursive ascent).
The text was updated successfully, but these errors were encountered:
wabain
changed the title
Expected tokens in error output sometimes includes invalid entries
Expected tokens in error output sometimes include invalid entries
Apr 8, 2023
With lalrpop 0.19.9, for certain grammars the expected tokens given for UnrecognizedTokens / UnrecognizedEOF errors frequently include invalid entries—tokens that could not be parsed successfully at the site of the error, regardless of what input follows. This was also the case on 0.19.8 and probably earlier releases. I found a minimal-ish grammar that reproduces the problem:
This is a simplified subset of the Unix shell grammar. Legal productions are things like
X; case Q in R) A; B; ;; esac;
. Given the inputX
we get an UnrecognizedEOF error. The only token that could turn this into a valid production is";"
, but the expected tokens given by lalrpop are")", ";", "in"
.This seems to be happening because the top state in the parser when the error is hit could handle any of those lookaheads; it would reduce
r"[A-Z]+" => Word
and leave the lookahead token to be shifted by another rule. In the case of the inputX
only";"
could subsequently be accepted, but the expected token generation doesn't examine the state stack to determine this. The infrastructure to perform the full check does exist (for the parse table based codegen, although not for recursive ascent).The text was updated successfully, but these errors were encountered: