Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"unexpected" rule #607

Open
log4b0at opened this issue Mar 23, 2019 · 25 comments
Open

"unexpected" rule #607

log4b0at opened this issue Mar 23, 2019 · 25 comments
Labels
Milestone

Comments

@log4b0at
Copy link

  • Feature Request:

Add "unexpected" rule to override standard error message.

unexpected = 
      keywords { return "Unexpected keyword "+text()+"."; }
    / expression { return "Unexpected expression «"+text()+"»."; }
    / lamdba_function { return "Unexpected lambda function."; } ;

To implement it just change peg$buildError by:

  function peg$buildError() {
    var expected = peg$expected[0];
    var failPos = expected.pos;

    // if "unexpected" rule exist this might throw the appropriate error.
    if (typeof peg$parseunexpected !== 'undefined') {
      // make peg$expect temporary unavailable, set the cursor position to the fail position
      // next get the output of the rule, if it's a string, return it.
      const tmp = peg$expect;
      peg$expect = new Function();
      peg$currPos = failPos;
      const unexpected = peg$parseunexpected();
      peg$expect = tmp;
      if (typeof unexpected === 'string') {
        const length = peg$currPos - failPos;
        const location = failPos < input.length
        ? peg$computeLocation(failPos, failPos + length)
        : peg$computeLocation(failPos, failPos);
        return new peg$SyntaxError(unexpected, expected.variants, unexpected, location);
      }
    }
    // else return standard error.
    const unexpected = input.charAt(failPos);
    const location = failPos < input.length
        ? peg$computeLocation(failPos, failPos + 1)
        : peg$computeLocation(failPos, failPos);
    return new peg$SyntaxError(
      peg$SyntaxError.buildMessage(expected.variants, unexpected), expected.variants, unexpected, location);
  }

Expected behavior:
Improve error handling.

If it's not ethical, is there anyone who can tell me how to create a plugin that would make the change?
Thanks a lot

@norech
Copy link

norech commented Apr 2, 2019

I agree that this feature could be added, but until now a way to do it is using the error(...) function.

For example, you have something could not match:

AnythingThatCanFail
  = FirstMatch
  / AnotherMatch

You could add the following expression, matching every char to be sure that no other match will be done:

UnexpectedThing
  =   .+
  {
    error("Unexpected thing: " + text().substring(0,1));
  }

And then you could call it when something that should match do not match.

AnythingThatCanFail
  = FirstMatch
  / AnotherMatch
  / UnexpectedThing

@log4b0at
Copy link
Author

log4b0at commented Apr 3, 2019

your UnexpectedThing rule cannot handle keywords, expressions or others atoms of your grammar, it just return a single character: text().substring(0,1), so what you do is exactly the same thing that the pegjs parser produce. The feature that i proposed is a quite bit different because it handle all specified rules inside the unexpected rule at any point of the input, fully implicitly and in a non-redondant way.

Here is an example of what the code above will produce for my typescript parser:
Input:

public function some_func(){
     public function wtf_here_func(){
     }
}

Output error:

Line X, Column 4: Expected ... but "p" found.

Output error: (with unexpected rule feature)

Line X, Column 4: Unexpected "wtf_here_func" method declaration.

The only thing that i added to my grammar is:

unexpected = m:method_declaration { return `Unexpected "${m.identifier}" method declaration.` };

I notice that using the error function is more elegant than return statement, you're right. So the above should become:

unexpected = m:method_declaration { error(`Unexpected "${m.identifier}" method declaration.`) };

Try it yourself and you will see all that it can bring to our parsers.

@Mingun
Copy link
Contributor

Mingun commented Apr 3, 2019

You always can write UnexpectedThing as required. Actually, you already do this, just name it unexpected. But in your implicit variant you can not have different unexpected rules for different parts of grammar

@log4b0at
Copy link
Author

log4b0at commented Apr 4, 2019

@Mingun
What you say is wrong, you can override my implicit "unexpected rule" by adding an "UnexpectedThing" in the concerned rule, as you said yourself.
An implicit rule is fully necessary, here is a concrete example, the rule for method declaration:

method_declaration = (privacy __)? "function" _ identifier _ '(' _ args _ ')' _ '{' _ instruction* _ '}';

With the explicit UnexpectedThing at the end:

method_declaration = (privacy __)? "function" _ identifier _ '(' _ args _ ')' _ '{' _ instruction* _ '}' / UnexpectedThing;

This does'nt work if the unexpected thing appear between "function" and the identifier, or between args and ')' etc.
So you should do this:

method_declaration = (privacy __)? ("function"/UnexpectedThing) _ identifier _ ('('/UnexpectedThing) _ args _ (')'/UnexpectedThing) _ ('{'/UnexpectedThing) _ instruction* _ ('}'/UnexpectedThing) / UnexpectedThing;

privacy = ... / UnexpectedThing;
_ = ... / UnexpectedThing;
identifier = ... / UnexpectedThing;
instruction = ... / UnexpectedThing;

...

Why this is bad?

  • The grammar complexity grow O(n*2) instead of O(n)
  • The error message confuses the rules expected for and the rules unexpected, like that:

Expected ... or UnexpectedThing1, UnexpectedThing2..., but found "x".

Do you understand my argument now?
It does not matter if you use my code or not, we need an implicit rule for our parsers.

@Mingun
Copy link
Contributor

Mingun commented Apr 5, 2019

If you want just instead of one symbol in the message Expected... but found X to see a word Expected... but found XXX, it becomes elementary and does not demand any changes. Just catch SyntaxError and reparse input from error position with special "lexer" parser. You can even define it in the same file as main grammar and reuse some rules. In the code:

let parser = PEG.generate(<main grammar>);
// For correct work this parser must parse any input and return string as result
let lexer = PEG.generate(<lexer grammar>);
try {
  return parser.parse(<input>);
} catch (e) {
  if (!(e instanceof parser.SyntaxError)) throw e;

  // lexer must return string
  let found = lexer.parse(input.substr(e.location.start.offset));
  // Or you can use specific rule from the same parser
  //let found = parser.parse(input.substr(e.location.start.offset), { startRule: "unexpected" });
  throw new parser.SyntaxError(
    parser.SyntaxError.buildMessage(e.expected, found),
    e.expected,
    found,
    e.location
  );
  // or you can throw you own exception type
}

To introduce some special support in the generator for this purpose to me sees excessive though I not against that there was a annotation which will mark the rule as a lexer entry point when if annotations will be implemented.

@log4b0at
Copy link
Author

log4b0at commented Apr 6, 2019

It is a solution certainly, but too little accessible and difficult to maintain. Anyway thank you for your code, it's always good to take.

I agree that the direct implementation of a lexer in the generator would be welcome.
What concerns do you have so that annotations cannot be implemented?

@Mingun
Copy link
Contributor

Mingun commented Apr 6, 2019

What concerns do you have so that annotations cannot be implemented?

Unfortunately, as you can see, the project is dead or, at least, in a deep stazis

@futagoza futagoza added this to the 0.12.0 milestone Apr 15, 2019
@futagoza
Copy link
Member

I actually like the idea of an unexpected rule, but I'm thinking of putting it in as optional via the features option. Is that fine with you @log4b0at?

@log4b0at
Copy link
Author

@futagoza Surely

@StoneCypher
Copy link

StoneCypher commented Feb 3, 2020

@log4b0at - The unexpected rule already exists.

The problem with having an unexpected rule specialize on the nature of the thing it failed to parse is that it doesn't know what that is, because it failed to parse it.

Consider the following grammar:

Document = (DadJoke "\n"?)+

Kind = [a-zA-Z0-9 ]+

Car    = "car: "    Kind ".";
Insect = "insect: " Kind ".";
Annoy  = "annoy: "  Kind ".";

DadJoke = Car / Insect / Annoy

(This is a dad joke because, obviously, the correct answer for every rule is "bug.")

This should without problems parse input like

car: Honda
insect: Beetle
annoy: Whine

There are two ways to read handling the unexpected. Either it's that the carrier rule is wrong and you'll handle that, or the subsumed rule is wrong and you'll handle that.

As I understand it, what you're asking for is a rule for unexpected that lets you give different output based on whether what's unexpected was a car, an insect, or an annoyance.

Let's suppose you have a specialization like:

const UnexpectedCustoms = { // no, not shoes off
  'annoy'  : 'Unexpected annoyance',
  'insect' : 'Unexpected insect',
  'car'    : 'Unexpected car'
};

So what should it give as an error message when I give it this input?

defect: bug
microphone: bug
disease: bug
hobbyist: bug

Which of those are cars?

What should the error messages there be?

This isn't solvable. This problem is equivalent to saying "hey peg, given that the next thing can't be interpreted, why don't you tell me what it is so I can tell someone?"

First you need to teach it how to interpret that. Next you don't need anything.

This is, in essence, the reason some things are set up as a tokenizer then a lexer. Just use peg as a tokenizer, in this case, then write a lexer that parses the generated AST and says "uh, you're ... you're not allowed to have a close parenthesis without an open one"


Alternately, if it's about the subsumed rule, rather than the carrier rule, then instead you have an input of the form

car: car: car: ...

Is there any way for pegjs to know that that's a car, other than writing the rule you can already write to match that out, then handling it?

That's pretty straightforward to handle in-grammar today, and many grammars do. Why do you want extra features for that?

Just write a rule with the name of the feature you're asking for. Pow: done.

No extra peg.js complexity needed.


Here's the other way to say it.

In order to give an error message for the specific incorrect parsing, you'd need a correct parsing of the incorrect stuff to interpret. Either write a parser that accepts the wrong things and rejects them in the handlers, or write a secondary parser to handle the partial, or write an AST that can accept the wrong thing then interpret the AST to be wrong."


Finally, this really shouldn't be done, because unknown is the fifth or sixth most common existing name for a rule, after things like document and operator

I would warrant more than a third of grammars already have this, because the language is already able to express it without features

If you try to add this, all you do is break the existing grammars to add something we already have

This should be declined

@StoneCypher
Copy link

@Mingun - I want to resurrect this project. There's no good reason for it to be dead

@norech
Copy link

norech commented Feb 5, 2020

I think this feature is meant to make the errors more clear, even if an unexpected rule might not be the right way to implement this sort of thing.

The point is that having an Unexpected X error seems to make it more practical to spot the parse errors (at least in many cases) rather than having Expected A, B, C, D, E, X, Y or Z or Expected expression.

That would be great to see PEG.js being able to do such a thing.

@StoneCypher
Copy link

So how do you identify what the X that's unexpected is?

@norech
Copy link

norech commented Feb 5, 2020

That’s probably the issue of this feature: being able to clearly identify what’s wrong.

The problem here is not trying to identify what X could be, but being sure of what X is.

@StoneCypher
Copy link

Like I tried to explain above, that's called "parsing," and the way to do that is to specify it in the grammar

@log4b0at
Copy link
Author

log4b0at commented Feb 5, 2020

I dont understand well the problem that you trying to raise, can you give me more examples about ?

@StoneCypher
Copy link

They'll be literally identical to the existing one.

Try answering the question. It's there socratically and rhetorically; you should learn what the problem is in trying to answer.


Consider the following grammar:

Document = (DadJoke "\n"?)+

Kind = [a-zA-Z0-9 ]+

Car    = "car: "    Kind ".";
Insect = "insect: " Kind ".";
Annoy  = "annoy: "  Kind ".";

DadJoke = Car / Insect / Annoy

So what should it give as an error message when I give it this input?

defect: bug
microphone: bug
disease: bug
hobbyist: bug

As I understand your request, the parser is supposed to say something like "I found a disease when I was expecting a car, an insect, or an annoyance."

How's it supposed to know that's a disease?

Parsing fails when the parser doesn't know what the next thing is.

An error message for parsing failure that requires it to know what the next thing is is contradictory to the contextual situation

@log4b0at
Copy link
Author

log4b0at commented Feb 5, 2020

defect: bug
microphone: bug
disease: bug
hobbyist: bug

"What should the error messages there be?"

You get that with actual error handling:

Line 1, column 1: Expected "annoy: ", "car: ", or "insect: " but "d" found.

If i define an unexpected rule like that

Identifier = [a-zA-Z]+;

unexpected = 
    DadJoke { error("Unexpected dad joke here"); }
/ i:$Identifier { error(`Unexpected identifier "${i}"`); };

I will get

Line 1, column 1: Unexpected identifier "defect"

"Alternately, if it's about the subsumed rule, rather than the carrier rule, then instead you have an input of the form"
car: car: car:

Here you will get the default message, because no unexpected-rule match ":" punctuator

Line 1, column 8: Expected ... blabla ... but found ":"

Moreover the processus of detecting unexpected things is totally passive and happen only when pegjs detect an error, and don't add any overhead in term of performances.

Does this answer your question correctly?

@log4b0at
Copy link
Author

log4b0at commented Feb 5, 2020

if you want try yourself, I quickly made a code for 0.10 version of pegjs, replace (in your parser) peg$buildStructuredError by

function peg$buildStructuredError(expected, found, location) {
    if (typeof peg$parseunexpected !== 'undefined') {
        peg$fail = new Function();
        peg$currPos = location.start.offset;
        peg$parseunexpected();
    }
    return new peg$SyntaxError(peg$SyntaxError.buildMessage(expected, found), expected, found, location);
}

@StoneCypher
Copy link

Isn't that roughly what mingun said in 2019?

Now I worry that I'm misunderstanding something here

@StoneCypher
Copy link

In this comment

@log4b0at
Copy link
Author

log4b0at commented Feb 5, 2020

Use a tokenizer has a cost.
such feature is simple to implement and cost nothing to nobody

@StoneCypher
Copy link

okay, that's a fair point

This was referenced Apr 2, 2020
@Seb35
Copy link

Seb35 commented Apr 3, 2020

Just some bikeshedding, but when such feature is implemented, it should be let to the user the choice of the rule to be defined as the “unexpected” rule, e.g. with

pegjs.generate( grammar, { unexpected: "unrecognised_token" } )

@log4b0at
Copy link
Author

Hello, I just made a pull request for this functionality, following your advices, namely the use of the error function, much more consistent than a return, suggested by @norech.
And the use of an option to change the name of the rule, suggested by @Seb35
As @futagoza said, the feature is optional and is disabled by default. (I don't know about the features option but by default there is no unexpected rule)
See #661 pull

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants