Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected tokens in error output sometimes include invalid entries #736

Open
wabain opened this issue Apr 8, 2023 · 1 comment
Open

Expected tokens in error output sometimes include invalid entries #736

wabain opened this issue Apr 8, 2023 · 1 comment

Comments

@wabain
Copy link
Contributor

wabain commented Apr 8, 2023

With lalrpop 0.19.9, for certain grammars the expected tokens given for UnrecognizedTokens / UnrecognizedEOF errors frequently include invalid entries—tokens that could not be parsed successfully at the site of the error, regardless of what input follows. This was also the case on 0.19.8 and probably earlier releases. I found a minimal-ish grammar that reproduces the problem:

grammar;

pub Program: Vec<()> = Commands;

Commands: Vec<()> = (<Command> ";")*;

Command: () = {
    "case" <_matched:Word> "in" <_cases: (<CaseItem> ";;")*> <_last:CaseItem?> "esac" => (),
    Word => (),
};

CaseItem: () = {
    "("? <_pattern:Word> ")" <_block:Commands> => (),
};

Word: () = {
    r"[A-Z]+" => ()
};

This is a simplified subset of the Unix shell grammar. Legal productions are things like X; case Q in R) A; B; ;; esac;. Given the input X we get an UnrecognizedEOF error. The only token that could turn this into a valid production is ";", but the expected tokens given by lalrpop are ")", ";", "in".

This seems to be happening because the top state in the parser when the error is hit could handle any of those lookaheads; it would reduce r"[A-Z]+" => Word and leave the lookahead token to be shifted by another rule. In the case of the input X only ";" could subsequently be accepted, but the expected token generation doesn't examine the state stack to determine this. The infrastructure to perform the full check does exist (for the parse table based codegen, although not for recursive ascent).

@wabain wabain changed the title Expected tokens in error output sometimes includes invalid entries Expected tokens in error output sometimes include invalid entries Apr 8, 2023
@wabain
Copy link
Contributor Author

wabain commented Apr 8, 2023

This is likely the same issue reported in #533, although I haven't checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant