Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding descriptions seems to mess up error reporting #302

Open
pragdave opened this issue Jun 2, 2021 · 5 comments
Open

Adding descriptions seems to mess up error reporting #302

pragdave opened this issue Jun 2, 2021 · 5 comments

Comments

@pragdave
Copy link

pragdave commented Jun 2, 2021

Given a simple grammar:

Myopic {
  Program  = ExpressionList
  ExpressionList  = Expression*
  Expression = BlockExpression   | number
  BlockExpression  = "block" "{"   ExpressionList "}" 
  number = digit+
}

I feed it a program with a deliberate error, and I get a beautiful error:

image

This is correct, the "?" is not a valid expression, and it is flagged as such.

However, if I add a description (to any of the rules),

Myopic {
  Program (p) = ExpressionList
  ExpressionList  = Expression*
  Expression = BlockExpression   | number
  BlockExpression  = "block" "{"   ExpressionList "}" 
  number = digit+
}

the error now gets shuffled up to the top-level:

image

Am I misunderstanding how this is supposed to work?

Cheers (and thanks for a very nice-looking project)

Dave Thomas
@pragdave

@alexwarth
Copy link
Contributor

Hi @pragdave,

When a rule R has a description, it "swallows up" any parse failures that happen down the call stack and instead reports errors using its description, at the position where R was applied. But if R succeeds, it will simply discard the swallowed-up failures that were backtracked over.

That's why you stopped seeing a helpful error message after you attached a description to Program. A Program is an ExpressionList, which in turn is zero or more Expressions. Your sample input block { 1 ? 2 } doesn't begin with a valid expression and so Program succeeds without consuming any input. (The empty string is a valid Program.)

So Program succeeded and discarded all of those failures that would have been helpful to see (}, digit, and block). Then Ohm expected to see the end of the input stream but instead it found block { 1 ? 2 }. And because of the failure-discarding semantics that I described above, Expected end of input is all we can say.

This is certainly not desirable here, and we're going to have to rethink the discard-failures-on-success thing. I don't have a solution in mind yet, but I'll think on it and have a chat with @pdubroy to see if we can come up with something that works.

Best,
Alex

p.s. Thanks for the kind words :)

@pragdave
Copy link
Author

pragdave commented Jun 2, 2021

Wow: I didn't expect a response so soon, and such a detailed one at that.

I see the motivation for what it does, but as you know it doesn't help if the grammar can recurse from the lowest levels back up to the top, because all errors become global.

Let me bounce a half-formed idea off you. Rather than having errors get caught it the highest possible level, would it be possible to define a one-way fence (à la Prolog or Snobol). This would allow the parse to search forward through it, but would act as an error boundary during backtracking.

For example, in my block example, once I've seen the opening brace following block, I'm committed. There is no other possible parse except for a bunch of expressions followed by a closing brace.

In that case, any error inside that block (or, I guess, below that block) is local to the context of that error. An invalid expression means that the expression is wrong, and not that the overall block is wrong.

Clearly, this is something that depends on the grammar being written. But being able to prevent backtracking can both increase performance and also give better error contexts.

Maybe something like:

Myopic {
  Program  = ExpressionList
  ExpressionList  = Expression*
  Expression = BlockExpression   | number
  BlockExpression  = "block" "{"   |>   ExpressionList "}" 
  number = digit+
}

where |> is meant to show the one-way flow through the parse at that point.

I am a total bystander when it comes to parsing theory, and I'm sure there are a gazillion holes in this. But I'm hoping that perhaps it might trigger a neuron and lead to some interesting discussions.

Dave

@pdubroy
Copy link
Contributor

pdubroy commented Jul 4, 2021

@pragdave I think having some kind of cut operator could be a really interesting idea. The paper Packrat Parsers Can Handle Practical Grammars in Mostly Constant Space explores the automatic insertion of cut points as a performance enhancement, but maybe there are also some insights here that could help improve our error handling.

@pdubroy
Copy link
Contributor

pdubroy commented Jul 4, 2021

Thinking about this a bit more...a cut usually only prevents backtracking within its enclosing rule, but it seems like what @pragdave proposes would be more like a global cut. I wonder how this would work inside a lookahead expression.

I also wonder how this would work in the presence of grammar extension. It seems to me that you could only write add these cuts with a global knowledge of the whole grammar, which prevent other authors from extending the grammar in certain ways.

BTW, I also found Error reporting in Parsing Expression Grammars which is a nice summary of some different approaches for error handling in PEGs.

@pragdave
Copy link
Author

pragdave commented Jul 5, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants