Redesign error reporting #198

dmajda · 2013-08-14T10:07:31Z

I see three problems with the current error reporting system in PEG.js:

Actions report errors by returning null

Reporting using null is inflexible (it doesn't allow to carry any information about the kind of error) and makes it impossible to return null as a regular value from actions, which would be useful sometimes (for example in a JSON parser).

See also null indicates failure - workaround for a grammar that should produce nulls? #17.
There is no way to produce errors with completely custom messages

For example, imagine a HTTP 1.1 parser that wants to report an error about missing Host header in a HTTP request with a message like "A HTTP 1.1 request must contain a Host header". Currently the only way to do this is to throw an exception with desired message manually. This is cumbersome and it does not interact well with the backtracking behavior (throwing an exception halts the parsing completely, even when it is possible to backtrack and try another alternatives).
The expected property of SyntaxError is hard to process mechanically

The expected property of SyntaxError is either an array of expectations (each describing one alternative which the parser expected at the position of error) or null (meaning that the end of input was expected).

Each expectation is represented by a string. This string can mean an expected literal (represented using its PEG.js syntax — a quoted string), character class (represented using its PEG.js syntax — [...]), or any character (represented by string "any"). To differentiate between these cases, one has to parse the string representations manually, which makes building tools based on PEG.js such as autocompleters unnecessarily hard.

I plan to redesign the error reporting system to fix these problems. Before I describe the changes I am thinking about, a little description how the current system works is needed.

How Error Reporting Currently Works?

When a PEG.js parser tries to match a string literal, character class, or ., and fails, it produces a match failure. It consists of a position and an expectation (description of what the parser tried to match).

After producing a match failure, the parser backtracks and possibly tries another alternatives. If none of them succeeds and there is nothing more to try, the parser throws the SyntaxError exception. When setting its properties (such as position, expectations, error message, etc.), the parser considers only the furthest match failure (the one with the greatest position). If there are more such failures, their expectations are combined.

The situation is little bit more complicated if named rules are used, but this is irrelevant for this discussion.

Proposed Changes

I am thinking about the following changes in the error reporting system:

Expectations in SyntaxError.expected won't be represented by strings, but by objects. The objects will have a type property (with values like "literal", "class", "any", "eof") which will determine expectation type. For some expectation types, other properties will conain the details (e.g. for "literal" expectation where would be a value property containing the expected string). This will simplify mechanical processing of the expectations.

An alternative to the type property is to use classses. But I think the type property will be easier to handle for users.
Actions will be allowed to return null. It will be a regular value and won't signify an error.
Actions will be able to trigger a match failure using a new error function. It will take an error message as its parameter. Failures triggered by this function (called custom match failures) will consist of a position and an error message. They won't have any expectation.

The error function won't interrupt the action execution, it will just mark it as failing and save the error message. The actual failure will be produced only after the action execution finishes. If the error function is invoked multiple times, the last invocation will win (its error message will be used).

Custom match failures will be handled like regular match failures, meaning they won't halt the parsing completely and let the parser backtrack and possibly try another alternatives. There is one difference though — when finally throwing the SyntaxError exception, the expectation combination rule which applies to regular match failures won't apply to the custom ones. If in the set of match failures with the furthest position there is at least one custom one, it will simply override the regular ones completely. If there are more custom failures, the one that was produced last will win.

A SyntaxError exception based on a custom match failure will differ from exceptions based on a regular failure. It's message property will be equal to the failure's error message and its expected property will be null.

Example
```
start = sign:[+-]? digits:[0-9]+ {
  var result = parseInt((sign || "") + digits.join(""), 10);

  if (result % 2 == 0) {
    error("The number must be an odd integer.");
  }

  return result;
}
```
On input 2, the parser generated from the grammar above will produce a SyntaxError with message set to "The number must be an odd integer." and expected set to null
Actions will also be able to to trigger a regular match failure using the expected function. It will take an expected value description as a parameter. This function will be provided mainly as a convenience for situations where one doesn't need to generate a full error message and automatically generated form "Expected X but "2" found." is enough.

A SyntaxError exception based on a match failure generated by the expected function will be similar to exceptions based on a regular failure.

Example
```
start = sign:[+-]? digits:[0-9]+ {
  var result = parseInt((sign || "") + digits.join(""), 10);

  if (result % 2 == 0) {
    expected("odd integer");
  }

  return result;
}
```
On input 2, the parser generated from the grammar above will produce a SyntaxError with message set to "Expected odd integer but "2" found." and expected set to [ { type: "user", description: "odd integer" } ]

Next Steps

I welcome any notes to the proposed changes — please add them as comments. I plan to start implementing the proposal (or some modified version based on feedback) soon.

The text was updated successfully, but these errors were encountered:

otac0n · 2013-08-14T10:30:09Z

It would be better if multiple invocations of the error function lead to multiple errors. You could then continue parsing as much as possible.

string = '"' value:(!(eol / '"') .)+ '"' { return value; }
       / '"' value:(!(eol / '"') .)+     { error('unterminated string constant'); return value; }

I would also recommend adding support for warnings as well.

dmajda · 2013-08-16T08:24:47Z

It would be better if multiple invocations of the error function lead to multiple errors. You could then continue parsing as much as possible.

Could you name some use case(s) where you would need this? It seems to me that the example you provided would work with my proposal too.

I already have some use cases, but I don't know how representative they are, so I'd like to see some more.

I would also recommend adding support for warnings as well.

I am also thinking about it.

A problem with multiple errors and warnings is that they would need a different interface than simple and intuitive “parse returns some value on success or an exception on error”. The parser would need to report multiple errors on unsuccessful parse, plus warnings on both successful and unsuccessful parse.

What kind of API would you find most inutitive here? Again, I have some ideas, but I'd like to see what others think.

otac0n · 2013-08-18T17:53:10Z

Could you name some use case(s) where you would need this? It seems to me that the example you provided would work with my proposal too.

My primary use case is for non-fatal parse errors. Non-terminated strings, identifiers that start with a number, nested comments, missing semicolons, etc.

Generally, it is best to let the parser advance as far as possible, so that as many errors can be shown to the user as possible. If the parser could even finish the parse, and let the next steps (e.g. compilation) report errors as well, that would be great.

Before this commit, the |expected| property of an exception object thrown when a generated parser encountered an error contained expectations as strings. These strings were in a human-readable format suitable for displaying in the UI but not suitable for machine processing. For example, expected string literals included quotes and a string "any character" was used when any character was expected. This commit makes expectations structured objects. This makes the machine processing easier, while still allowing to generate a human-readable representation if needed. Implements part of #198. Speed impact ------------ Before: 1180.41 kB/s After: 1165.31 kB/s Difference: -1.28% Size impact ----------- Before: 863523 b After: 950817 b Difference: 10.10% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

Using a special value to indicate match failure instead of |null| allows actions to return |null| as a regular value. This simplifies e.g. the JSON parser. Note the special value is internal and intentionally undocumented. This means that there is currently no official way how to trigger a match failure from an action. This is a temporary state which will be fixed soon. The negative performance impact (see below) is probably caused by changing lot of comparisons against |null| (which likely check the value against a fixed constant representing |null| in the interpreter) to comparisons against the special value (which likely check the value against another value in the interpreter). Implements part of #198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

Before this commit, the |?| operator returned an empty string upon unsuccessful match. This commit changes the returned value to |null|. It also updates the PEG.js grammar and the example grammars, which used the value returned by |?| quite often. Returning |null| is possible because it no longer indicates a match failure. I expect that this change will simplify many real-world grammars, as an empty string is almost never desirable as a return value (except some lexer-level rules) and it is often translated into |null| or some other value in action code. Implements part of #198.

After making the |?| operator return |null| instead of an empty string in the previous commit, empty strings were still returned from predicates. This didn't make much sense. Return value of a predicate is unimportant (if you have one in hand, you already know the predicate succeeded) and one could even argue that predicates shouldn't return any value at all. The closest thing to "return no value" in JavaScript is returning |undefined|, so I decided to make predicates return exactly that. Implements part of #198.

The |expected| function allows users to report regular match failures inside actions. If the |expected| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be in the usual "Expected ... but found ..." format with the description specified in the |expected| call used as part of the message. Implements part of #198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

The exception-creating code will get somewhat hairy soon, so let's make sure them mess will be contained. Implements part of #198.

It will be possible to create errors with user-supplied messages soon. The |SyntaxError| class needs to be ready for that. Implements part of #198.

The |error| function allows users to report custom match failures inside actions. If the |error| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be exactly the one specified in the |error| call. Implements part of #198. Speed impact ------------ Before: 999.83 kB/s After: 1000.84 kB/s Difference: 0.10% Size impact ----------- Before: 1017212 b After: 1019968 b Difference: 0.27% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

Implements part of #198.

dmajda · 2013-12-01T15:20:36Z

This is now implemented. The tools/impact script reports the following performance impact of the whole set of commits:

Speed impact
------------
Before:     1144.21 kB/s
After:      999.89 kB/s
Difference: -12.62%

Size impact
-----------
Before:     863523 b
After:      1019968 b
Difference: 18.11%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

I think that 12.62% speed penalty and 18.11% size penalty is fine for resolving such a long-standing set of problems.

Closing.

michaelficarra · 2013-12-02T03:15:13Z

@dmajda: That's great news! I'm so glad null no longer signals failure.

Before this commit, the |expected| property of an exception object thrown when a generated parser encountered an error contained expectations as strings. These strings were in a human-readable format suitable for displaying in the UI but not suitable for machine processing. For example, expected string literals included quotes and a string "any character" was used when any character was expected. This commit makes expectations structured objects. This makes the machine processing easier, while still allowing to generate a human-readable representation if needed. Implements part of pegjs#198. Speed impact ------------ Before: 1180.41 kB/s After: 1165.31 kB/s Difference: -1.28% Size impact ----------- Before: 863523 b After: 950817 b Difference: 10.10% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

Using a special value to indicate match failure instead of |null| allows actions to return |null| as a regular value. This simplifies e.g. the JSON parser. Note the special value is internal and intentionally undocumented. This means that there is currently no official way how to trigger a match failure from an action. This is a temporary state which will be fixed soon. The negative performance impact (see below) is probably caused by changing lot of comparisons against |null| (which likely check the value against a fixed constant representing |null| in the interpreter) to comparisons against the special value (which likely check the value against another value in the interpreter). Implements part of pegjs#198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

Before this commit, the |?| operator returned an empty string upon unsuccessful match. This commit changes the returned value to |null|. It also updates the PEG.js grammar and the example grammars, which used the value returned by |?| quite often. Returning |null| is possible because it no longer indicates a match failure. I expect that this change will simplify many real-world grammars, as an empty string is almost never desirable as a return value (except some lexer-level rules) and it is often translated into |null| or some other value in action code. Implements part of pegjs#198.

After making the |?| operator return |null| instead of an empty string in the previous commit, empty strings were still returned from predicates. This didn't make much sense. Return value of a predicate is unimportant (if you have one in hand, you already know the predicate succeeded) and one could even argue that predicates shouldn't return any value at all. The closest thing to "return no value" in JavaScript is returning |undefined|, so I decided to make predicates return exactly that. Implements part of pegjs#198.

The |expected| function allows users to report regular match failures inside actions. If the |expected| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be in the usual "Expected ... but found ..." format with the description specified in the |expected| call used as part of the message. Implements part of pegjs#198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

The exception-creating code will get somewhat hairy soon, so let's make sure them mess will be contained. Implements part of pegjs#198.

It will be possible to create errors with user-supplied messages soon. The |SyntaxError| class needs to be ready for that. Implements part of pegjs#198.

The |error| function allows users to report custom match failures inside actions. If the |error| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be exactly the one specified in the |error| call. Implements part of pegjs#198. Speed impact ------------ Before: 999.83 kB/s After: 1000.84 kB/s Difference: 0.10% Size impact ----------- Before: 1017212 b After: 1019968 b Difference: 0.27% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)

Implements part of pegjs#198.

Discussion on pegjs/pegjs#198 indicated that PegJS works best if descriptions are only applied to basic symbols and not complex rules. As expected, this leads to much more useful error messages.

eraxillan · 2017-05-31T06:13:11Z

So, there is no "warning" function yet?

StoneCypher · 2020-02-03T09:10:33Z

The warning function topic is tracked at #325

This was referenced Aug 14, 2013

null indicates failure - workaround for a grammar that should produce nulls? #17

Closed

Fix example JSON parser null issue #91

Closed

Return null from JSON parser #191

Closed

ghost assigned dmajda Aug 16, 2013

This was referenced Aug 16, 2013

Report unused rules #193

Closed

Wrong column and line numbers in error message. #163

Closed

dmajda mentioned this issue Nov 12, 2013

Create a new release and publish it as NPM module #220

Closed

arcanis mentioned this issue Nov 23, 2013

Error field too small arcanis/peg.arcanis.fr#2

Open

dmajda added a commit that referenced this issue Dec 1, 2013

Error handling: Extract exception building into its own function

3fe6aba

The exception-creating code will get somewhat hairy soon, so let's make sure them mess will be contained. Implements part of #198.

dmajda added a commit that referenced this issue Dec 1, 2013

Error handling: Build error message out of |SyntaxError|'s constructor

dd74ea4

It will be possible to create errors with user-supplied messages soon. The |SyntaxError| class needs to be ready for that. Implements part of #198.

dmajda added a commit that referenced this issue Dec 1, 2013

Error handling: Use the new |error| function in PEG.js's grammar itself

f8b5e04

Implements part of #198.

dmajda closed this as completed Dec 1, 2013

dmajda mentioned this issue Jan 18, 2014

Use special object instead of null #182

Closed

andreineculau pushed a commit to for-GET/pegjs that referenced this issue Apr 17, 2014

Error handling: Use the new |error| function in PEG.js's grammar itself

5c9254d

Implements part of pegjs#198.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign error reporting #198

Redesign error reporting #198

dmajda commented Aug 14, 2013

otac0n commented Aug 14, 2013

dmajda commented Aug 16, 2013

otac0n commented Aug 18, 2013

dmajda commented Dec 1, 2013

michaelficarra commented Dec 2, 2013

eraxillan commented May 31, 2017

StoneCypher commented Feb 3, 2020

Redesign error reporting #198

Redesign error reporting #198

Comments

dmajda commented Aug 14, 2013

How Error Reporting Currently Works?

Proposed Changes

Next Steps

otac0n commented Aug 14, 2013

dmajda commented Aug 16, 2013

otac0n commented Aug 18, 2013

dmajda commented Dec 1, 2013

michaelficarra commented Dec 2, 2013

eraxillan commented May 31, 2017

StoneCypher commented Feb 3, 2020