Better "found" information on syntax errors #464

YarnSphere · 2016-10-17T15:42:02Z

I believe that the found property of SyntaxError instances could/should be improved.
I'll use your example JavaScript grammar to illustrate the issue.

Assume the following input (I'm using the online version of PEG.js to check the errors):

var if = 0;

We get: Line 1, column 5: Expected comment, end of line, or whitespace but "i" found.¹

What I argue here is that we should be getting a message with ... but "if" found., i.e. the found property in the SyntaxError instance should contain "if" instead of "i".

Another example:

1 + 2a

We get: Line 1, column 5: Expected "!", "(", "+", "++", "-", "--", "[", "delete", "false", "function", "new", "null", "this", "true", "typeof", "void", "{", "~", comment, end of line, identifier, number, regular expression, string, or whitespace but "2" found. Note that the message says that it is expecting a number, but 2 was found instead (this is at least weird). In my opinion ... but "2a" found. would be a much better error message.

¹ As a side note, your JavaScript grammar should probably have a human-readable name on the Identifier rule (together with the name on the IdentifierName rule), this would make the error print Line 1, column 5: Expected comment, end of line, identifier, or whitespace but "i" found. (note the added identifier on the list of what is expected).

Why change what appears in `found`?

When printing syntax errors to a user, many times we don't want to show what the parser is expecting (because there might be huge list of possibilities), but simply what the parser found that wasn't expected.

Many real-life parsers would show us something like: [1:5] unexpected "if" for the first example; [1:5] unexpected "1a" or even [1:6] unexpected "a" for the second.

Using the information we obtain from PEG.js we would produce a message such as [1:5] unexpected "i" or [1:5] unexpected "2".

Such messages are misleading: the i itself is not the problem, the problem is that if is a reserved word that should not appear where an identifier is expected; the same for the 2, the problem lies in the a.

How to solve this?

I don't know if what I propose is possible to implement but I think that, in practice, the issue arises when defining rules such as:

Identifier =
  !ReservedWord name:IdentifierName { return name; }

or

NumericLiteral "number" =
  literal:HexIntegerLiteral !(IdentifierStart / DecimalDigit) { return literal; }

I.e. rules where there is an explicit !, stating that something is not expected.

I believe that PEG.js should somehow remember that it failed to parse some input because of one of these explicit ! in some rule; the part of the input that failed against such a rule is what should appear in the found property of the error.

What are your thoughts on this? Do you have any alternatives?

The text was updated successfully, but these errors were encountered:

YemSalat · 2016-10-17T20:18:54Z

I agree that this could be improved (especially the foundoption)
In my own project I work around this by using the location of where the error occurred and "manually" extracting the token that caused the error.

PS I also find that in my project the found property is more useful in reporting errors then expected

dmajda · 2016-11-14T10:29:16Z

What are your thoughts on this? Do you have any alternatives?

I acknowledge that what you describe is a problem and in many cases automatically produced error messages are not ideal.

I’ll have a deeper look at this after 1.0.0 — any solution here needs to be thought-through carefully and coordinated with other changes (#11 and #311 come to my mind), but my focus is now elsewhere.

Mingun · 2017-11-12T11:36:58Z

I think that this can must be done with plugins, but not in pegjs core. pegjs do not known about tokens -- it works with characters. But you can easy improve error message with such technique. After you catch SyntaxError you need try parse reserved tokens in location of error occurs and, if it is success, use captured information in error message. Something like this:

try {
  parser.parse(input, { ... });
} catch (e) {
  if (!(e instanceof parser.SyntaxError)) throw e;

  let token = parser.parse(
    input.substr(e.location.start.offset),
    { startRule: 'errorToken', ... }
  );
  if (token) {
    ...build error message with token...
  } else {
    ...leave standard error message...
  }
}

and something in grammar:

...
errorToken = ('if' / 'while' / ...)?;

When annotations (#256) will be introduced it can be made even more elegantly:

@plugin('error-token');
{
...initializer...
}
start = ...;
...
@errorToken
myInternalName = 'if' / 'while' / ...;

dmajda added the feature label Nov 14, 2016

dmajda added this to the post-1.0.0 milestone Nov 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better "found" information on syntax errors #464

Better "found" information on syntax errors #464

YarnSphere commented Oct 17, 2016

YemSalat commented Oct 17, 2016 •

edited

dmajda commented Nov 14, 2016

Mingun commented Nov 12, 2017 •

edited

Better "found" information on syntax errors #464

Better "found" information on syntax errors #464

Comments

YarnSphere commented Oct 17, 2016

Why change what appears in found?

How to solve this?

YemSalat commented Oct 17, 2016 • edited

dmajda commented Nov 14, 2016

Mingun commented Nov 12, 2017 • edited

Why change what appears in `found`?

YemSalat commented Oct 17, 2016 •

edited

Mingun commented Nov 12, 2017 •

edited