Expected tokens in error output sometimes include invalid entries #736

wabain · 2023-04-08T04:34:00Z

With lalrpop 0.19.9, for certain grammars the expected tokens given for UnrecognizedTokens / UnrecognizedEOF errors frequently include invalid entries—tokens that could not be parsed successfully at the site of the error, regardless of what input follows. This was also the case on 0.19.8 and probably earlier releases. I found a minimal-ish grammar that reproduces the problem:

grammar;

pub Program: Vec<()> = Commands;

Commands: Vec<()> = (<Command> ";")*;

Command: () = {
    "case" <_matched:Word> "in" <_cases: (<CaseItem> ";;")*> <_last:CaseItem?> "esac" => (),
    Word => (),
};

CaseItem: () = {
    "("? <_pattern:Word> ")" <_block:Commands> => (),
};

Word: () = {
    r"[A-Z]+" => ()
};

This is a simplified subset of the Unix shell grammar. Legal productions are things like X; case Q in R) A; B; ;; esac;. Given the input X we get an UnrecognizedEOF error. The only token that could turn this into a valid production is ";", but the expected tokens given by lalrpop are ")", ";", "in".

This seems to be happening because the top state in the parser when the error is hit could handle any of those lookaheads; it would reduce r"[A-Z]+" => Word and leave the lookahead token to be shifted by another rule. In the case of the input X only ";" could subsequently be accepted, but the expected token generation doesn't examine the state stack to determine this. The infrastructure to perform the full check does exist (for the parse table based codegen, although not for recursive ascent).

The text was updated successfully, but these errors were encountered:

wabain · 2023-04-08T04:36:27Z

This is likely the same issue reported in #533, although I haven't checked.

wabain changed the title ~~Expected tokens in error output sometimes includes invalid entries~~ Expected tokens in error output sometimes include invalid entries Apr 8, 2023

wabain mentioned this issue Apr 8, 2023

Avoid spurious expected tokens in error output #737

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected tokens in error output sometimes include invalid entries #736

Expected tokens in error output sometimes include invalid entries #736

wabain commented Apr 8, 2023

wabain commented Apr 8, 2023

Expected tokens in error output sometimes include invalid entries #736

Expected tokens in error output sometimes include invalid entries #736

Comments

wabain commented Apr 8, 2023

wabain commented Apr 8, 2023