Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign error reporting #198

Closed
dmajda opened this issue Aug 14, 2013 · 7 comments
Closed

Redesign error reporting #198

dmajda opened this issue Aug 14, 2013 · 7 comments
Assignees
Labels
Milestone

Comments

@dmajda
Copy link
Contributor

dmajda commented Aug 14, 2013

I see three problems with the current error reporting system in PEG.js:

  1. Actions report errors by returning null

    Reporting using null is inflexible (it doesn't allow to carry any information about the kind of error) and makes it impossible to return null as a regular value from actions, which would be useful sometimes (for example in a JSON parser).

    See also null indicates failure - workaround for a grammar that should produce nulls? #17.

  2. There is no way to produce errors with completely custom messages

    For example, imagine a HTTP 1.1 parser that wants to report an error about missing Host header in a HTTP request with a message like "A HTTP 1.1 request must contain a Host header". Currently the only way to do this is to throw an exception with desired message manually. This is cumbersome and it does not interact well with the backtracking behavior (throwing an exception halts the parsing completely, even when it is possible to backtrack and try another alternatives).

  3. The expected property of SyntaxError is hard to process mechanically

    The expected property of SyntaxError is either an array of expectations (each describing one alternative which the parser expected at the position of error) or null (meaning that the end of input was expected).

    Each expectation is represented by a string. This string can mean an expected literal (represented using its PEG.js syntax — a quoted string), character class (represented using its PEG.js syntax — [...]), or any character (represented by string "any"). To differentiate between these cases, one has to parse the string representations manually, which makes building tools based on PEG.js such as autocompleters unnecessarily hard.

I plan to redesign the error reporting system to fix these problems. Before I describe the changes I am thinking about, a little description how the current system works is needed.

How Error Reporting Currently Works?

When a PEG.js parser tries to match a string literal, character class, or ., and fails, it produces a match failure. It consists of a position and an expectation (description of what the parser tried to match).

After producing a match failure, the parser backtracks and possibly tries another alternatives. If none of them succeeds and there is nothing more to try, the parser throws the SyntaxError exception. When setting its properties (such as position, expectations, error message, etc.), the parser considers only the furthest match failure (the one with the greatest position). If there are more such failures, their expectations are combined.

The situation is little bit more complicated if named rules are used, but this is irrelevant for this discussion.

Proposed Changes

I am thinking about the following changes in the error reporting system:

  1. Expectations in SyntaxError.expected won't be represented by strings, but by objects. The objects will have a type property (with values like "literal", "class", "any", "eof") which will determine expectation type. For some expectation types, other properties will conain the details (e.g. for "literal" expectation where would be a value property containing the expected string). This will simplify mechanical processing of the expectations.

    An alternative to the type property is to use classses. But I think the type property will be easier to handle for users.

  2. Actions will be allowed to return null. It will be a regular value and won't signify an error.

  3. Actions will be able to trigger a match failure using a new error function. It will take an error message as its parameter. Failures triggered by this function (called custom match failures) will consist of a position and an error message. They won't have any expectation.

    The error function won't interrupt the action execution, it will just mark it as failing and save the error message. The actual failure will be produced only after the action execution finishes. If the error function is invoked multiple times, the last invocation will win (its error message will be used).

    Custom match failures will be handled like regular match failures, meaning they won't halt the parsing completely and let the parser backtrack and possibly try another alternatives. There is one difference though — when finally throwing the SyntaxError exception, the expectation combination rule which applies to regular match failures won't apply to the custom ones. If in the set of match failures with the furthest position there is at least one custom one, it will simply override the regular ones completely. If there are more custom failures, the one that was produced last will win.

    A SyntaxError exception based on a custom match failure will differ from exceptions based on a regular failure. It's message property will be equal to the failure's error message and its expected property will be null.

    Example

    start = sign:[+-]? digits:[0-9]+ {
      var result = parseInt((sign || "") + digits.join(""), 10);
    
      if (result % 2 == 0) {
        error("The number must be an odd integer.");
      }
    
      return result;
    }
    

    On input 2, the parser generated from the grammar above will produce a SyntaxError with message set to "The number must be an odd integer." and expected set to null

  4. Actions will also be able to to trigger a regular match failure using the expected function. It will take an expected value description as a parameter. This function will be provided mainly as a convenience for situations where one doesn't need to generate a full error message and automatically generated form "Expected X but "2" found." is enough.

    A SyntaxError exception based on a match failure generated by the expected function will be similar to exceptions based on a regular failure.

    Example

    start = sign:[+-]? digits:[0-9]+ {
      var result = parseInt((sign || "") + digits.join(""), 10);
    
      if (result % 2 == 0) {
        expected("odd integer");
      }
    
      return result;
    }
    

    On input 2, the parser generated from the grammar above will produce a SyntaxError with message set to "Expected odd integer but "2" found." and expected set to [ { type: "user", description: "odd integer" } ]

Next Steps

I welcome any notes to the proposed changes — please add them as comments. I plan to start implementing the proposal (or some modified version based on feedback) soon.

@otac0n
Copy link

otac0n commented Aug 14, 2013

It would be better if multiple invocations of the error function lead to multiple errors. You could then continue parsing as much as possible.

string = '"' value:(!(eol / '"') .)+ '"' { return value; }
       / '"' value:(!(eol / '"') .)+     { error('unterminated string constant'); return value; }

I would also recommend adding support for warnings as well.

@dmajda
Copy link
Contributor Author

dmajda commented Aug 16, 2013

It would be better if multiple invocations of the error function lead to multiple errors. You could then continue parsing as much as possible.

Could you name some use case(s) where you would need this? It seems to me that the example you provided would work with my proposal too.

I already have some use cases, but I don't know how representative they are, so I'd like to see some more.

I would also recommend adding support for warnings as well.

I am also thinking about it.

A problem with multiple errors and warnings is that they would need a different interface than simple and intuitive “parse returns some value on success or an exception on error”. The parser would need to report multiple errors on unsuccessful parse, plus warnings on both successful and unsuccessful parse.

What kind of API would you find most inutitive here? Again, I have some ideas, but I'd like to see what others think.

@otac0n
Copy link

otac0n commented Aug 18, 2013

Could you name some use case(s) where you would need this? It seems to me that the example you provided would work with my proposal too.

My primary use case is for non-fatal parse errors. Non-terminated strings, identifiers that start with a number, nested comments, missing semicolons, etc.

Generally, it is best to let the parser advance as far as possible, so that as many errors can be shown to the user as possible. If the parser could even finish the parse, and let the next steps (e.g. compilation) report errors as well, that would be great.

dmajda added a commit that referenced this issue Dec 1, 2013
Before this commit, the |expected| property of an exception object
thrown when a generated parser encountered an error contained
expectations as strings. These strings were in a human-readable format
suitable for displaying in the UI but not suitable for machine
processing. For example, expected string literals included quotes and a
string "any character" was used when any character was expected.

This commit makes expectations structured objects. This makes the
machine processing easier, while still allowing to generate a
human-readable representation if needed.

Implements part of #198.

Speed impact
------------
Before:     1180.41 kB/s
After:      1165.31 kB/s
Difference: -1.28%

Size impact
-----------
Before:     863523 b
After:      950817 b
Difference: 10.10%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
dmajda added a commit that referenced this issue Dec 1, 2013
Using a special value to indicate match failure instead of |null| allows
actions to return |null| as a regular value. This simplifies e.g. the
JSON parser.

Note the special value is internal and intentionally undocumented. This
means that there is currently no official way how to trigger a match
failure from an action. This is a temporary state which will be fixed
soon.

The negative performance impact (see below) is probably caused by
changing lot of comparisons against |null| (which likely check the value
against a fixed constant representing |null| in the interpreter) to
comparisons against the special value (which likely check the value
against another value in the interpreter).

Implements part of #198.

Speed impact
------------
Before:     1146.82 kB/s
After:      1031.25 kB/s
Difference: -10.08%

Size impact
-----------
Before:     950817 b
After:      973269 b
Difference: 2.36%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
dmajda added a commit that referenced this issue Dec 1, 2013
Before this commit, the |?| operator returned an empty string upon
unsuccessful match. This commit changes the returned value to |null|. It
also updates the PEG.js grammar and the example grammars, which used the
value returned by |?| quite often.

Returning |null| is possible because it no longer indicates a match
failure.

I expect that this change will simplify many real-world grammars, as an
empty string is almost never desirable as a return value (except some
lexer-level rules) and it is often translated into |null| or some other
value in action code.

Implements part of #198.
dmajda added a commit that referenced this issue Dec 1, 2013
After making the |?| operator return |null| instead of an empty string
in the previous commit, empty strings were still returned from
predicates. This didn't make much sense.

Return value of a predicate is unimportant (if you have one in hand, you
already know the predicate succeeded) and one could even argue that
predicates shouldn't return any value at all. The closest thing to
"return no value" in JavaScript is returning |undefined|, so I decided
to make predicates return exactly that.

Implements part of #198.
dmajda added a commit that referenced this issue Dec 1, 2013
The |expected| function allows users to report regular match failures
inside actions.

If the |expected| function is called, and the reported match failure
turns out to be the cause of a parse error, the error message reported
by the parser will be in the usual "Expected ... but found ..." format
with the description specified in the |expected| call used as part of
the message.

Implements part of #198.

Speed impact
------------
Before:     1146.82 kB/s
After:      1031.25 kB/s
Difference: -10.08%

Size impact
-----------
Before:     950817 b
After:      973269 b
Difference: 2.36%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
dmajda added a commit that referenced this issue Dec 1, 2013
This is in anticipation of |peg$error|. The |peg$expected| and
|peg$error| internal functions will nicely mirror the |expected| and
|error| functions available to user code in actions.

Implements part of #198.
dmajda added a commit that referenced this issue Dec 1, 2013
The exception-creating code will get somewhat hairy soon, so let's make
sure them mess will be contained.

Implements part of #198.
dmajda added a commit that referenced this issue Dec 1, 2013
It will be possible to create errors with user-supplied messages soon.
The |SyntaxError| class needs to be ready for that.

Implements part of #198.
dmajda added a commit that referenced this issue Dec 1, 2013
The |error| function allows users to report custom match failures inside
actions.

If the |error| function is called, and the reported match failure turns
out to be the cause of a parse error, the error message reported by the
parser will be exactly the one specified in the |error| call.

Implements part of #198.

Speed impact
------------
Before:     999.83 kB/s
After:      1000.84 kB/s
Difference: 0.10%

Size impact
-----------
Before:     1017212 b
After:      1019968 b
Difference: 0.27%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
dmajda added a commit that referenced this issue Dec 1, 2013
@dmajda
Copy link
Contributor Author

dmajda commented Dec 1, 2013

This is now implemented. The tools/impact script reports the following performance impact of the whole set of commits:

Speed impact
------------
Before:     1144.21 kB/s
After:      999.89 kB/s
Difference: -12.62%

Size impact
-----------
Before:     863523 b
After:      1019968 b
Difference: 18.11%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

I think that 12.62% speed penalty and 18.11% size penalty is fine for resolving such a long-standing set of problems.

Closing.

@dmajda dmajda closed this as completed Dec 1, 2013
@michaelficarra
Copy link

@dmajda: That's great news! I'm so glad null no longer signals failure.

andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
Before this commit, the |expected| property of an exception object
thrown when a generated parser encountered an error contained
expectations as strings. These strings were in a human-readable format
suitable for displaying in the UI but not suitable for machine
processing. For example, expected string literals included quotes and a
string "any character" was used when any character was expected.

This commit makes expectations structured objects. This makes the
machine processing easier, while still allowing to generate a
human-readable representation if needed.

Implements part of pegjs#198.

Speed impact
------------
Before:     1180.41 kB/s
After:      1165.31 kB/s
Difference: -1.28%

Size impact
-----------
Before:     863523 b
After:      950817 b
Difference: 10.10%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
Using a special value to indicate match failure instead of |null| allows
actions to return |null| as a regular value. This simplifies e.g. the
JSON parser.

Note the special value is internal and intentionally undocumented. This
means that there is currently no official way how to trigger a match
failure from an action. This is a temporary state which will be fixed
soon.

The negative performance impact (see below) is probably caused by
changing lot of comparisons against |null| (which likely check the value
against a fixed constant representing |null| in the interpreter) to
comparisons against the special value (which likely check the value
against another value in the interpreter).

Implements part of pegjs#198.

Speed impact
------------
Before:     1146.82 kB/s
After:      1031.25 kB/s
Difference: -10.08%

Size impact
-----------
Before:     950817 b
After:      973269 b
Difference: 2.36%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
Before this commit, the |?| operator returned an empty string upon
unsuccessful match. This commit changes the returned value to |null|. It
also updates the PEG.js grammar and the example grammars, which used the
value returned by |?| quite often.

Returning |null| is possible because it no longer indicates a match
failure.

I expect that this change will simplify many real-world grammars, as an
empty string is almost never desirable as a return value (except some
lexer-level rules) and it is often translated into |null| or some other
value in action code.

Implements part of pegjs#198.
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
After making the |?| operator return |null| instead of an empty string
in the previous commit, empty strings were still returned from
predicates. This didn't make much sense.

Return value of a predicate is unimportant (if you have one in hand, you
already know the predicate succeeded) and one could even argue that
predicates shouldn't return any value at all. The closest thing to
"return no value" in JavaScript is returning |undefined|, so I decided
to make predicates return exactly that.

Implements part of pegjs#198.
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
The |expected| function allows users to report regular match failures
inside actions.

If the |expected| function is called, and the reported match failure
turns out to be the cause of a parse error, the error message reported
by the parser will be in the usual "Expected ... but found ..." format
with the description specified in the |expected| call used as part of
the message.

Implements part of pegjs#198.

Speed impact
------------
Before:     1146.82 kB/s
After:      1031.25 kB/s
Difference: -10.08%

Size impact
-----------
Before:     950817 b
After:      973269 b
Difference: 2.36%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
This is in anticipation of |peg$error|. The |peg$expected| and
|peg$error| internal functions will nicely mirror the |expected| and
|error| functions available to user code in actions.

Implements part of pegjs#198.
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
The exception-creating code will get somewhat hairy soon, so let's make
sure them mess will be contained.

Implements part of pegjs#198.
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
It will be possible to create errors with user-supplied messages soon.
The |SyntaxError| class needs to be ready for that.

Implements part of pegjs#198.
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
The |error| function allows users to report custom match failures inside
actions.

If the |error| function is called, and the reported match failure turns
out to be the cause of a parse error, the error message reported by the
parser will be exactly the one specified in the |error| call.

Implements part of pegjs#198.

Speed impact
------------
Before:     999.83 kB/s
After:      1000.84 kB/s
Difference: 0.10%

Size impact
-----------
Before:     1017212 b
After:      1019968 b
Difference: 0.27%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014
sminnee pushed a commit to sminnee/dysphasia that referenced this issue Aug 20, 2015
Discussion on pegjs/pegjs#198 indicated that
PegJS works best if descriptions are only applied to basic symbols and
not complex rules. As expected, this leads to much more useful error
messages.
@eraxillan
Copy link

So, there is no "warning" function yet?

@StoneCypher
Copy link

The warning function topic is tracked at #325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants