"unexpected" rule #607

log4b0at · 2019-03-23T20:03:31Z

Feature Request:

Add "unexpected" rule to override standard error message.

unexpected = 
      keywords { return "Unexpected keyword "+text()+"."; }
    / expression { return "Unexpected expression «"+text()+"»."; }
    / lamdba_function { return "Unexpected lambda function."; } ;

To implement it just change peg$buildError by:

  function peg$buildError() {
    var expected = peg$expected[0];
    var failPos = expected.pos;

    // if "unexpected" rule exist this might throw the appropriate error.
    if (typeof peg$parseunexpected !== 'undefined') {
      // make peg$expect temporary unavailable, set the cursor position to the fail position
      // next get the output of the rule, if it's a string, return it.
      const tmp = peg$expect;
      peg$expect = new Function();
      peg$currPos = failPos;
      const unexpected = peg$parseunexpected();
      peg$expect = tmp;
      if (typeof unexpected === 'string') {
        const length = peg$currPos - failPos;
        const location = failPos < input.length
        ? peg$computeLocation(failPos, failPos + length)
        : peg$computeLocation(failPos, failPos);
        return new peg$SyntaxError(unexpected, expected.variants, unexpected, location);
      }
    }
    // else return standard error.
    const unexpected = input.charAt(failPos);
    const location = failPos < input.length
        ? peg$computeLocation(failPos, failPos + 1)
        : peg$computeLocation(failPos, failPos);
    return new peg$SyntaxError(
      peg$SyntaxError.buildMessage(expected.variants, unexpected), expected.variants, unexpected, location);
  }

Expected behavior:
Improve error handling.

If it's not ethical, is there anyone who can tell me how to create a plugin that would make the change?
Thanks a lot

norech · 2019-04-02T18:11:41Z

I agree that this feature could be added, but until now a way to do it is using the error(...) function.

For example, you have something could not match:

AnythingThatCanFail
  = FirstMatch
  / AnotherMatch

You could add the following expression, matching every char to be sure that no other match will be done:

UnexpectedThing
  =   .+
  {
    error("Unexpected thing: " + text().substring(0,1));
  }

And then you could call it when something that should match do not match.

AnythingThatCanFail
  = FirstMatch
  / AnotherMatch
  / UnexpectedThing

log4b0at · 2019-04-03T14:56:26Z

your UnexpectedThing rule cannot handle keywords, expressions or others atoms of your grammar, it just return a single character: text().substring(0,1), so what you do is exactly the same thing that the pegjs parser produce. The feature that i proposed is a quite bit different because it handle all specified rules inside the unexpected rule at any point of the input, fully implicitly and in a non-redondant way.

Here is an example of what the code above will produce for my typescript parser:
Input:

public function some_func(){
     public function wtf_here_func(){
     }
}

Output error:

Line X, Column 4: Expected ... but "p" found.

Output error: (with unexpected rule feature)

Line X, Column 4: Unexpected "wtf_here_func" method declaration.

The only thing that i added to my grammar is:

unexpected = m:method_declaration { return `Unexpected "${m.identifier}" method declaration.` };

I notice that using the error function is more elegant than return statement, you're right. So the above should become:

unexpected = m:method_declaration { error(`Unexpected "${m.identifier}" method declaration.`) };

Try it yourself and you will see all that it can bring to our parsers.

Mingun · 2019-04-03T20:06:13Z

You always can write UnexpectedThing as required. Actually, you already do this, just name it unexpected. But in your implicit variant you can not have different unexpected rules for different parts of grammar

log4b0at · 2019-04-04T18:16:45Z

@Mingun
What you say is wrong, you can override my implicit "unexpected rule" by adding an "UnexpectedThing" in the concerned rule, as you said yourself.
An implicit rule is fully necessary, here is a concrete example, the rule for method declaration:

method_declaration = (privacy __)? "function" _ identifier _ '(' _ args _ ')' _ '{' _ instruction* _ '}';

With the explicit UnexpectedThing at the end:

method_declaration = (privacy __)? "function" _ identifier _ '(' _ args _ ')' _ '{' _ instruction* _ '}' / UnexpectedThing;

This does'nt work if the unexpected thing appear between "function" and the identifier, or between args and ')' etc.
So you should do this:

method_declaration = (privacy __)? ("function"/UnexpectedThing) _ identifier _ ('('/UnexpectedThing) _ args _ (')'/UnexpectedThing) _ ('{'/UnexpectedThing) _ instruction* _ ('}'/UnexpectedThing) / UnexpectedThing;

privacy = ... / UnexpectedThing;
_ = ... / UnexpectedThing;
identifier = ... / UnexpectedThing;
instruction = ... / UnexpectedThing;

...

Why this is bad?

The grammar complexity grow O(n*2) instead of O(n)
The error message confuses the rules expected for and the rules unexpected, like that:

Expected ... or UnexpectedThing1, UnexpectedThing2..., but found "x".

Do you understand my argument now?
It does not matter if you use my code or not, we need an implicit rule for our parsers.

Mingun · 2019-04-05T13:35:18Z

If you want just instead of one symbol in the message Expected... but found X to see a word Expected... but found XXX, it becomes elementary and does not demand any changes. Just catch SyntaxError and reparse input from error position with special "lexer" parser. You can even define it in the same file as main grammar and reuse some rules. In the code:

let parser = PEG.generate(<main grammar>);
// For correct work this parser must parse any input and return string as result
let lexer = PEG.generate(<lexer grammar>);
try {
  return parser.parse(<input>);
} catch (e) {
  if (!(e instanceof parser.SyntaxError)) throw e;

  // lexer must return string
  let found = lexer.parse(input.substr(e.location.start.offset));
  // Or you can use specific rule from the same parser
  //let found = parser.parse(input.substr(e.location.start.offset), { startRule: "unexpected" });
  throw new parser.SyntaxError(
    parser.SyntaxError.buildMessage(e.expected, found),
    e.expected,
    found,
    e.location
  );
  // or you can throw you own exception type
}

To introduce some special support in the generator for this purpose to me sees excessive though I not against that there was a annotation which will mark the rule as a lexer entry point ~~when~~ if annotations will be implemented.

log4b0at · 2019-04-06T10:14:41Z

It is a solution certainly, but too little accessible and difficult to maintain. Anyway thank you for your code, it's always good to take.

I agree that the direct implementation of a lexer in the generator would be welcome.
What concerns do you have so that annotations cannot be implemented?

Mingun · 2019-04-06T20:30:43Z

What concerns do you have so that annotations cannot be implemented?

Unfortunately, as you can see, the project is dead or, at least, in a deep stazis

futagoza · 2019-04-15T17:29:41Z

I actually like the idea of an unexpected rule, but I'm thinking of putting it in as optional via the features option. Is that fine with you @log4b0at?

log4b0at · 2019-04-20T16:44:16Z

@futagoza Surely

StoneCypher · 2020-02-03T00:36:56Z

@log4b0at - The unexpected rule already exists.

The problem with having an unexpected rule specialize on the nature of the thing it failed to parse is that it doesn't know what that is, because it failed to parse it.

Consider the following grammar:

Document = (DadJoke "\n"?)+

Kind = [a-zA-Z0-9 ]+

Car    = "car: "    Kind ".";
Insect = "insect: " Kind ".";
Annoy  = "annoy: "  Kind ".";

DadJoke = Car / Insect / Annoy

(This is a dad joke because, obviously, the correct answer for every rule is "bug.")

This should without problems parse input like

car: Honda
insect: Beetle
annoy: Whine

There are two ways to read handling the unexpected. Either it's that the carrier rule is wrong and you'll handle that, or the subsumed rule is wrong and you'll handle that.

As I understand it, what you're asking for is a rule for unexpected that lets you give different output based on whether what's unexpected was a car, an insect, or an annoyance.

Let's suppose you have a specialization like:

const UnexpectedCustoms = { // no, not shoes off
  'annoy'  : 'Unexpected annoyance',
  'insect' : 'Unexpected insect',
  'car'    : 'Unexpected car'
};

So what should it give as an error message when I give it this input?

defect: bug
microphone: bug
disease: bug
hobbyist: bug

Which of those are cars?

What should the error messages there be?

This isn't solvable. This problem is equivalent to saying "hey peg, given that the next thing can't be interpreted, why don't you tell me what it is so I can tell someone?"

First you need to teach it how to interpret that. Next you don't need anything.

This is, in essence, the reason some things are set up as a tokenizer then a lexer. Just use peg as a tokenizer, in this case, then write a lexer that parses the generated AST and says "uh, you're ... you're not allowed to have a close parenthesis without an open one"

Alternately, if it's about the subsumed rule, rather than the carrier rule, then instead you have an input of the form

car: car: car: ...

Is there any way for pegjs to know that that's a car, other than writing the rule you can already write to match that out, then handling it?

That's pretty straightforward to handle in-grammar today, and many grammars do. Why do you want extra features for that?

Just write a rule with the name of the feature you're asking for. Pow: done.

No extra peg.js complexity needed.

Here's the other way to say it.

In order to give an error message for the specific incorrect parsing, you'd need a correct parsing of the incorrect stuff to interpret. Either write a parser that accepts the wrong things and rejects them in the handlers, or write a secondary parser to handle the partial, or write an AST that can accept the wrong thing then interpret the AST to be wrong."

Finally, this really shouldn't be done, because unknown is the fifth or sixth most common existing name for a rule, after things like document and operator

I would warrant more than a third of grammars already have this, because the language is already able to express it without features

If you try to add this, all you do is break the existing grammars to add something we already have

This should be declined

StoneCypher · 2020-02-03T00:37:45Z

@Mingun - I want to resurrect this project. There's no good reason for it to be dead

norech · 2020-02-05T01:10:19Z

I think this feature is meant to make the errors more clear, even if an unexpected rule might not be the right way to implement this sort of thing.

The point is that having an Unexpected X error seems to make it more practical to spot the parse errors (at least in many cases) rather than having Expected A, B, C, D, E, X, Y or Z or Expected expression.

That would be great to see PEG.js being able to do such a thing.

StoneCypher · 2020-02-05T01:11:32Z

So how do you identify what the X that's unexpected is?

norech · 2020-02-05T01:30:19Z

That’s probably the issue of this feature: being able to clearly identify what’s wrong.

The problem here is not trying to identify what X could be, but being sure of what X is.

StoneCypher · 2020-02-05T02:01:43Z

Like I tried to explain above, that's called "parsing," and the way to do that is to specify it in the grammar

log4b0at · 2020-02-05T11:03:30Z

I dont understand well the problem that you trying to raise, can you give me more examples about ?

StoneCypher · 2020-02-05T15:36:53Z

They'll be literally identical to the existing one.

Try answering the question. It's there socratically and rhetorically; you should learn what the problem is in trying to answer.

Consider the following grammar:

Document = (DadJoke "\n"?)+

Kind = [a-zA-Z0-9 ]+

Car    = "car: "    Kind ".";
Insect = "insect: " Kind ".";
Annoy  = "annoy: "  Kind ".";

DadJoke = Car / Insect / Annoy

So what should it give as an error message when I give it this input?

defect: bug
microphone: bug
disease: bug
hobbyist: bug

As I understand your request, the parser is supposed to say something like "I found a disease when I was expecting a car, an insect, or an annoyance."

How's it supposed to know that's a disease?

Parsing fails when the parser doesn't know what the next thing is.

An error message for parsing failure that requires it to know what the next thing is is contradictory to the contextual situation

log4b0at · 2020-02-05T15:51:50Z

defect: bug
microphone: bug
disease: bug
hobbyist: bug

"What should the error messages there be?"

You get that with actual error handling:

Line 1, column 1: Expected "annoy: ", "car: ", or "insect: " but "d" found.

If i define an unexpected rule like that

Identifier = [a-zA-Z]+;

unexpected = 
    DadJoke { error("Unexpected dad joke here"); }
/ i:$Identifier { error(`Unexpected identifier "${i}"`); };

I will get

Line 1, column 1: Unexpected identifier "defect"

"Alternately, if it's about the subsumed rule, rather than the carrier rule, then instead you have an input of the form"
car: car: car:

Here you will get the default message, because no unexpected-rule match ":" punctuator

Line 1, column 8: Expected ... blabla ... but found ":"

Moreover the processus of detecting unexpected things is totally passive and happen only when pegjs detect an error, and don't add any overhead in term of performances.

Does this answer your question correctly?

log4b0at · 2020-02-05T16:25:47Z

if you want try yourself, I quickly made a code for 0.10 version of pegjs, replace (in your parser) peg$buildStructuredError by

function peg$buildStructuredError(expected, found, location) {
    if (typeof peg$parseunexpected !== 'undefined') {
        peg$fail = new Function();
        peg$currPos = location.start.offset;
        peg$parseunexpected();
    }
    return new peg$SyntaxError(peg$SyntaxError.buildMessage(expected, found), expected, found, location);
}

StoneCypher · 2020-02-05T17:02:35Z

Isn't that roughly what mingun said in 2019?

Now I worry that I'm misunderstanding something here

StoneCypher · 2020-02-05T17:02:57Z

In this comment

log4b0at · 2020-02-05T19:41:53Z

Use a tokenizer has a cost.
such feature is simple to implement and cost nothing to nobody

StoneCypher · 2020-02-05T23:33:02Z

okay, that's a fair point

Seb35 · 2020-04-03T08:57:58Z

Just some bikeshedding, but when such feature is implemented, it should be let to the user the choice of the rule to be defined as the “unexpected” rule, e.g. with

pegjs.generate( grammar, { unexpected: "unrecognised_token" } )

log4b0at · 2021-02-28T17:33:42Z

Hello, I just made a pull request for this functionality, following your advices, namely the use of the error function, much more consistent than a return, suggested by @norech.
And the use of an option to change the name of the rule, suggested by @Seb35
As @futagoza said, the feature is optional and is disabled by default. (I don't know about the features option but by default there is no unexpected rule)
See #661 pull

futagoza added the feature label Apr 15, 2019

futagoza added this to the 0.12.0 milestone Apr 15, 2019

This was referenced Apr 2, 2020

Partial match #645

Open

NodeJS streams #221

Closed

czerwinskilukasz1 mentioned this issue Jul 26, 2020

Provide more user-friendly parser errors CatchTheTornado/askql#358

Open

log4b0at mentioned this issue Feb 28, 2021

Add unexpectedRule option and feature #661

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"unexpected" rule #607

"unexpected" rule #607

log4b0at commented Mar 23, 2019

norech commented Apr 2, 2019

log4b0at commented Apr 3, 2019 •

edited

Mingun commented Apr 3, 2019

log4b0at commented Apr 4, 2019

Mingun commented Apr 5, 2019

log4b0at commented Apr 6, 2019

Mingun commented Apr 6, 2019

futagoza commented Apr 15, 2019

log4b0at commented Apr 20, 2019

StoneCypher commented Feb 3, 2020 •

edited

StoneCypher commented Feb 3, 2020

norech commented Feb 5, 2020 •

edited

StoneCypher commented Feb 5, 2020

norech commented Feb 5, 2020

StoneCypher commented Feb 5, 2020

log4b0at commented Feb 5, 2020

StoneCypher commented Feb 5, 2020

log4b0at commented Feb 5, 2020 •

edited

log4b0at commented Feb 5, 2020 •

edited

StoneCypher commented Feb 5, 2020

StoneCypher commented Feb 5, 2020

log4b0at commented Feb 5, 2020 •

edited

StoneCypher commented Feb 5, 2020

Seb35 commented Apr 3, 2020

log4b0at commented Feb 28, 2021

"unexpected" rule #607

"unexpected" rule #607

Comments

log4b0at commented Mar 23, 2019

norech commented Apr 2, 2019

log4b0at commented Apr 3, 2019 • edited

Mingun commented Apr 3, 2019

log4b0at commented Apr 4, 2019

Mingun commented Apr 5, 2019

log4b0at commented Apr 6, 2019

Mingun commented Apr 6, 2019

futagoza commented Apr 15, 2019

log4b0at commented Apr 20, 2019

StoneCypher commented Feb 3, 2020 • edited

StoneCypher commented Feb 3, 2020

norech commented Feb 5, 2020 • edited

StoneCypher commented Feb 5, 2020

norech commented Feb 5, 2020

StoneCypher commented Feb 5, 2020

log4b0at commented Feb 5, 2020

StoneCypher commented Feb 5, 2020

log4b0at commented Feb 5, 2020 • edited

log4b0at commented Feb 5, 2020 • edited

StoneCypher commented Feb 5, 2020

StoneCypher commented Feb 5, 2020

log4b0at commented Feb 5, 2020 • edited

StoneCypher commented Feb 5, 2020

Seb35 commented Apr 3, 2020

log4b0at commented Feb 28, 2021

log4b0at commented Apr 3, 2019 •

edited

StoneCypher commented Feb 3, 2020 •

edited

norech commented Feb 5, 2020 •

edited

log4b0at commented Feb 5, 2020 •

edited

log4b0at commented Feb 5, 2020 •

edited

log4b0at commented Feb 5, 2020 •

edited