-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"unexpected" rule #607
Comments
I agree that this feature could be added, but until now a way to do it is using the For example, you have something could not match:
You could add the following expression, matching every char to be sure that no other match will be done:
And then you could call it when something that should match do not match.
|
your Here is an example of what the code above will produce for my typescript parser: public function some_func(){
public function wtf_here_func(){
}
} Output error:
Output error: (with unexpected rule feature)
The only thing that i added to my grammar is: unexpected = m:method_declaration { return `Unexpected "${m.identifier}" method declaration.` }; I notice that using the unexpected = m:method_declaration { error(`Unexpected "${m.identifier}" method declaration.`) }; Try it yourself and you will see all that it can bring to our parsers. |
You always can write |
@Mingun method_declaration = (privacy __)? "function" _ identifier _ '(' _ args _ ')' _ '{' _ instruction* _ '}'; With the explicit UnexpectedThing at the end: method_declaration = (privacy __)? "function" _ identifier _ '(' _ args _ ')' _ '{' _ instruction* _ '}' / UnexpectedThing; This does'nt work if the unexpected thing appear between "function" and the identifier, or between args and ')' etc. method_declaration = (privacy __)? ("function"/UnexpectedThing) _ identifier _ ('('/UnexpectedThing) _ args _ (')'/UnexpectedThing) _ ('{'/UnexpectedThing) _ instruction* _ ('}'/UnexpectedThing) / UnexpectedThing;
privacy = ... / UnexpectedThing;
_ = ... / UnexpectedThing;
identifier = ... / UnexpectedThing;
instruction = ... / UnexpectedThing;
... Why this is bad?
Do you understand my argument now? |
If you want just instead of one symbol in the message Expected... but found X to see a word Expected... but found XXX, it becomes elementary and does not demand any changes. Just catch let parser = PEG.generate(<main grammar>);
// For correct work this parser must parse any input and return string as result
let lexer = PEG.generate(<lexer grammar>);
try {
return parser.parse(<input>);
} catch (e) {
if (!(e instanceof parser.SyntaxError)) throw e;
// lexer must return string
let found = lexer.parse(input.substr(e.location.start.offset));
// Or you can use specific rule from the same parser
//let found = parser.parse(input.substr(e.location.start.offset), { startRule: "unexpected" });
throw new parser.SyntaxError(
parser.SyntaxError.buildMessage(e.expected, found),
e.expected,
found,
e.location
);
// or you can throw you own exception type
} To introduce some special support in the generator for this purpose to me sees excessive though I not against that there was a annotation which will mark the rule as a lexer entry point |
It is a solution certainly, but too little accessible and difficult to maintain. Anyway thank you for your code, it's always good to take. I agree that the direct implementation of a lexer in the generator would be welcome. |
Unfortunately, as you can see, the project is dead or, at least, in a deep stazis |
I actually like the idea of an |
@futagoza Surely |
@log4b0at - The The problem with having an unexpected rule specialize on the nature of the thing it failed to parse is that it doesn't know what that is, because it failed to parse it. Consider the following grammar:
(This is a dad joke because, obviously, the correct answer for every rule is "bug.") This should without problems parse input like
There are two ways to read handling the As I understand it, what you're asking for is a rule for Let's suppose you have a specialization like: const UnexpectedCustoms = { // no, not shoes off
'annoy' : 'Unexpected annoyance',
'insect' : 'Unexpected insect',
'car' : 'Unexpected car'
}; So what should it give as an error message when I give it this input?
Which of those are cars? What should the error messages there be? This isn't solvable. This problem is equivalent to saying "hey peg, given that the next thing can't be interpreted, why don't you tell me what it is so I can tell someone?" First you need to teach it how to interpret that. Next you don't need anything. This is, in essence, the reason some things are set up as a tokenizer then a lexer. Just use Alternately, if it's about the subsumed rule, rather than the carrier rule, then instead you have an input of the form
Is there any way for That's pretty straightforward to handle in-grammar today, and many grammars do. Why do you want extra features for that? Just write a rule with the name of the feature you're asking for. Pow: done. No extra Here's the other way to say it. In order to give an error message for the specific incorrect parsing, you'd need a correct parsing of the incorrect stuff to interpret. Either write a parser that accepts the wrong things and rejects them in the handlers, or write a secondary parser to handle the partial, or write an AST that can accept the wrong thing then interpret the AST to be wrong." Finally, this really shouldn't be done, because I would warrant more than a third of grammars already have this, because the language is already able to express it without features If you try to add this, all you do is break the existing grammars to add something we already have This should be declined |
@Mingun - I want to resurrect this project. There's no good reason for it to be dead |
I think this feature is meant to make the errors more clear, even if an The point is that having an That would be great to see PEG.js being able to do such a thing. |
So how do you identify what the X that's unexpected is? |
That’s probably the issue of this feature: being able to clearly identify what’s wrong. The problem here is not trying to identify what X could be, but being sure of what X is. |
Like I tried to explain above, that's called "parsing," and the way to do that is to specify it in the grammar |
I dont understand well the problem that you trying to raise, can you give me more examples about ? |
They'll be literally identical to the existing one. Try answering the question. It's there socratically and rhetorically; you should learn what the problem is in trying to answer.
As I understand your request, the parser is supposed to say something like "I found a disease when I was expecting a car, an insect, or an annoyance." How's it supposed to know that's a disease? Parsing fails when the parser doesn't know what the next thing is. An error message for parsing failure that requires it to know what the next thing is is contradictory to the contextual situation |
"What should the error messages there be?" You get that with actual error handling:
If i define an unexpected rule like that Identifier = [a-zA-Z]+;
unexpected =
DadJoke { error("Unexpected dad joke here"); }
/ i:$Identifier { error(`Unexpected identifier "${i}"`); }; I will get
"Alternately, if it's about the subsumed rule, rather than the carrier rule, then instead you have an input of the form" Here you will get the default message, because no unexpected-rule match ":" punctuator
Moreover the processus of detecting unexpected things is totally passive and happen only when pegjs detect an error, and don't add any overhead in term of performances. Does this answer your question correctly? |
if you want try yourself, I quickly made a code for 0.10 version of pegjs, replace (in your parser) function peg$buildStructuredError(expected, found, location) {
if (typeof peg$parseunexpected !== 'undefined') {
peg$fail = new Function();
peg$currPos = location.start.offset;
peg$parseunexpected();
}
return new peg$SyntaxError(peg$SyntaxError.buildMessage(expected, found), expected, found, location);
} |
Isn't that roughly what mingun said in 2019? Now I worry that I'm misunderstanding something here |
Use a tokenizer has a cost. |
okay, that's a fair point |
Just some bikeshedding, but when such feature is implemented, it should be let to the user the choice of the rule to be defined as the “unexpected” rule, e.g. with pegjs.generate( grammar, { unexpected: "unrecognised_token" } ) |
Hello, I just made a pull request for this functionality, following your advices, namely the use of the error function, much more consistent than a return, suggested by @norech. |
Add "unexpected" rule to override standard error message.
To implement it just change peg$buildError by:
Expected behavior:
Improve error handling.
If it's not ethical, is there anyone who can tell me how to create a plugin that would make the change?
Thanks a lot
The text was updated successfully, but these errors were encountered: