You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe that the found property of SyntaxError instances could/should be improved.
I'll use your example JavaScript grammar to illustrate the issue.
Assume the following input (I'm using the online version of PEG.js to check the errors):
varif=0;
We get: Line 1, column 5: Expected comment, end of line, or whitespace but "i" found.¹
What I argue here is that we should be getting a message with ... but "if" found., i.e. the found property in the SyntaxError instance should contain "if" instead of "i".
Another example:
1+2a
We get: Line 1, column 5: Expected "!", "(", "+", "++", "-", "--", "[", "delete", "false", "function", "new", "null", "this", "true", "typeof", "void", "{", "~", comment, end of line, identifier, number, regular expression, string, or whitespace but "2" found. Note that the message says that it is expecting a number, but 2 was found instead (this is at least weird). In my opinion ... but "2a" found. would be a much better error message.
¹ As a side note, your JavaScript grammar should probably have a human-readable name on the Identifier rule (together with the name on the IdentifierName rule), this would make the error print Line 1, column 5: Expected comment, end of line, identifier, or whitespace but "i" found. (note the added identifier on the list of what is expected).
Why change what appears in found?
When printing syntax errors to a user, many times we don't want to show what the parser is expecting (because there might be huge list of possibilities), but simply what the parser found that wasn't expected.
Many real-life parsers would show us something like: [1:5] unexpected "if" for the first example; [1:5] unexpected "1a" or even [1:6] unexpected "a" for the second.
Using the information we obtain from PEG.js we would produce a message such as [1:5] unexpected "i" or [1:5] unexpected "2".
Such messages are misleading: the i itself is not the problem, the problem is that if is a reserved word that should not appear where an identifier is expected; the same for the 2, the problem lies in the a.
How to solve this?
I don't know if what I propose is possible to implement but I think that, in practice, the issue arises when defining rules such as:
I.e. rules where there is an explicit !, stating that something is not expected.
I believe that PEG.js should somehow remember that it failed to parse some input because of one of these explicit ! in some rule; the part of the input that failed against such a rule is what should appear in the found property of the error.
What are your thoughts on this? Do you have any alternatives?
The text was updated successfully, but these errors were encountered:
I agree that this could be improved (especially the foundoption)
In my own project I work around this by using the location of where the error occurred and "manually" extracting the token that caused the error.
PS I also find that in my project the found property is more useful in reporting errors then expected
What are your thoughts on this? Do you have any alternatives?
I acknowledge that what you describe is a problem and in many cases automatically produced error messages are not ideal.
I’ll have a deeper look at this after 1.0.0 — any solution here needs to be thought-through carefully and coordinated with other changes (#11 and #311 come to my mind), but my focus is now elsewhere.
I think that this can must be done with plugins, but not in pegjs core. pegjs do not known about tokens -- it works with characters. But you can easy improve error message with such technique. After you catch SyntaxError you need try parse reserved tokens in location of error occurs and, if it is success, use captured information in error message. Something like this:
I believe that the
found
property ofSyntaxError
instances could/should be improved.I'll use your example JavaScript grammar to illustrate the issue.
Assume the following input (I'm using the online version of PEG.js to check the errors):
We get:
Line 1, column 5: Expected comment, end of line, or whitespace but "i" found.
¹What I argue here is that we should be getting a message with
... but "if" found.
, i.e. thefound
property in theSyntaxError
instance should contain"if"
instead of"i"
.Another example:
We get:
Line 1, column 5: Expected "!", "(", "+", "++", "-", "--", "[", "delete", "false", "function", "new", "null", "this", "true", "typeof", "void", "{", "~", comment, end of line, identifier, number, regular expression, string, or whitespace but "2" found.
Note that the message says that it is expecting anumber
, but2
was found instead (this is at least weird). In my opinion... but "2a" found.
would be a much better error message.¹ As a side note, your JavaScript grammar should probably have a human-readable name on the
Identifier
rule (together with the name on theIdentifierName
rule), this would make the error printLine 1, column 5: Expected comment, end of line, identifier, or whitespace but "i" found.
(note the addedidentifier
on the list of what is expected).Why change what appears in
found
?When printing syntax errors to a user, many times we don't want to show what the parser is expecting (because there might be huge list of possibilities), but simply what the parser found that wasn't expected.
Many real-life parsers would show us something like:
[1:5] unexpected "if"
for the first example;[1:5] unexpected "1a"
or even[1:6] unexpected "a"
for the second.Using the information we obtain from PEG.js we would produce a message such as
[1:5] unexpected "i"
or[1:5] unexpected "2"
.Such messages are misleading: the
i
itself is not the problem, the problem is thatif
is a reserved word that should not appear where an identifier is expected; the same for the2
, the problem lies in thea
.How to solve this?
I don't know if what I propose is possible to implement but I think that, in practice, the issue arises when defining rules such as:
or
I.e. rules where there is an explicit
!
, stating that something is not expected.I believe that PEG.js should somehow remember that it failed to parse some input because of one of these explicit
!
in some rule; the part of the input that failed against such a rule is what should appear in thefound
property of the error.What are your thoughts on this? Do you have any alternatives?
The text was updated successfully, but these errors were encountered: