Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better "found" information on syntax errors #464

Open
YarnSphere opened this issue Oct 17, 2016 · 3 comments
Open

Better "found" information on syntax errors #464

YarnSphere opened this issue Oct 17, 2016 · 3 comments
Labels
Milestone

Comments

@YarnSphere
Copy link

I believe that the found property of SyntaxError instances could/should be improved.
I'll use your example JavaScript grammar to illustrate the issue.

Assume the following input (I'm using the online version of PEG.js to check the errors):

var if = 0;

We get: Line 1, column 5: Expected comment, end of line, or whitespace but "i" found.¹

What I argue here is that we should be getting a message with ... but "if" found., i.e. the found property in the SyntaxError instance should contain "if" instead of "i".

Another example:

1 + 2a

We get: Line 1, column 5: Expected "!", "(", "+", "++", "-", "--", "[", "delete", "false", "function", "new", "null", "this", "true", "typeof", "void", "{", "~", comment, end of line, identifier, number, regular expression, string, or whitespace but "2" found. Note that the message says that it is expecting a number, but 2 was found instead (this is at least weird). In my opinion ... but "2a" found. would be a much better error message.

¹ As a side note, your JavaScript grammar should probably have a human-readable name on the Identifier rule (together with the name on the IdentifierName rule), this would make the error print Line 1, column 5: Expected comment, end of line, identifier, or whitespace but "i" found. (note the added identifier on the list of what is expected).

Why change what appears in found?

When printing syntax errors to a user, many times we don't want to show what the parser is expecting (because there might be huge list of possibilities), but simply what the parser found that wasn't expected.

Many real-life parsers would show us something like: [1:5] unexpected "if" for the first example; [1:5] unexpected "1a" or even [1:6] unexpected "a" for the second.

Using the information we obtain from PEG.js we would produce a message such as [1:5] unexpected "i" or [1:5] unexpected "2".

Such messages are misleading: the i itself is not the problem, the problem is that if is a reserved word that should not appear where an identifier is expected; the same for the 2, the problem lies in the a.

How to solve this?

I don't know if what I propose is possible to implement but I think that, in practice, the issue arises when defining rules such as:

Identifier =
  !ReservedWord name:IdentifierName { return name; }

or

NumericLiteral "number" =
  literal:HexIntegerLiteral !(IdentifierStart / DecimalDigit) { return literal; }

I.e. rules where there is an explicit !, stating that something is not expected.

I believe that PEG.js should somehow remember that it failed to parse some input because of one of these explicit ! in some rule; the part of the input that failed against such a rule is what should appear in the found property of the error.

What are your thoughts on this? Do you have any alternatives?

@YemSalat
Copy link

YemSalat commented Oct 17, 2016

I agree that this could be improved (especially the foundoption)
In my own project I work around this by using the location of where the error occurred and "manually" extracting the token that caused the error.

PS I also find that in my project the found property is more useful in reporting errors then expected

@dmajda
Copy link
Contributor

dmajda commented Nov 14, 2016

What are your thoughts on this? Do you have any alternatives?

I acknowledge that what you describe is a problem and in many cases automatically produced error messages are not ideal.

I’ll have a deeper look at this after 1.0.0 — any solution here needs to be thought-through carefully and coordinated with other changes (#11 and #311 come to my mind), but my focus is now elsewhere.

@dmajda dmajda added the feature label Nov 14, 2016
@dmajda dmajda added this to the post-1.0.0 milestone Nov 14, 2016
@Mingun
Copy link
Contributor

Mingun commented Nov 12, 2017

I think that this can must be done with plugins, but not in pegjs core. pegjs do not known about tokens -- it works with characters. But you can easy improve error message with such technique. After you catch SyntaxError you need try parse reserved tokens in location of error occurs and, if it is success, use captured information in error message. Something like this:

try {
  parser.parse(input, { ... });
} catch (e) {
  if (!(e instanceof parser.SyntaxError)) throw e;

  let token = parser.parse(
    input.substr(e.location.start.offset),
    { startRule: 'errorToken', ... }
  );
  if (token) {
    ...build error message with token...
  } else {
    ...leave standard error message...
  }
}

and something in grammar:

...
errorToken = ('if' / 'while' / ...)?;

When annotations (#256) will be introduced it can be made even more elegantly:

@plugin('error-token');
{
...initializer...
}
start = ...;
...
@errorToken
myInternalName = 'if' / 'while' / ...;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants