Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(<|>) is not associative #412

Open
gelisam opened this issue May 11, 2020 · 9 comments
Open

(<|>) is not associative #412

gelisam opened this issue May 11, 2020 · 9 comments
Assignees
Labels

Comments

@gelisam
Copy link

gelisam commented May 11, 2020

Here is an example demonstrating that we get different error messages depending on how we parethesize a <|> b <|> c.

import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char.Lexer


type Parser = Parsec Void String

sym :: String -> Parser ()
sym s = do
  _ <- symbol (pure ()) s
  pure ()

-- consumes one character, then rolls back
a :: Parser ()
a = try (sym "a" >> empty) <?> "a"

-- fails immediately (on the given input)
b :: Parser ()
b = sym "b"

-- succeeds immediately
c :: Parser ()
c = pure ()

-- |
-- >>> parseTest (leftAssociative >> sym "d") "aaa"
-- 1:1:
--   |
-- 1 | aaa
--   | ^
-- unexpected 'a'
-- expecting 'd'
leftAssociative :: Parser ()
leftAssociative = (a <|> b) <|> c

-- |
-- >>> parseTest (rightAssociative >> sym "d") "aaa"
-- 1:1:
--   |
-- 1 | aaa
--   | ^
-- unexpected 'a'
-- expecting 'b' or 'd'
rightAssociative :: Parser ()
rightAssociative = a <|> (b <|> c)
@gelisam
Copy link
Author

gelisam commented May 11, 2020

Personally I would prefer if the error message was expecting a, 'b', or 'd', because the user has three ways to try to fix the problem: they can fix the input so that a well-formed a begins at that character (which would require the user to understand why "aaa" is not a valid a even though it begins with an 'a'), so that a valid sym "b" begins at that character (which would require that character to be a 'b'), or so that a valid sym "d" begins at that character (which would require that character to be a 'd').

But since we wouldn't want e.g. a = try (sym "a" >> sym "z") to cause the error message to incorrectly suggest replacing the first 'a' with a 'z', I understand that it's probably easier not to include the suggestions which involve a try which has consumed characters. Maybe sym "a" sets a flag indicating that it has consumed a character, causing try to clear the suggestions when it rolls back, but try doesn't clear that flag, thus causing sym "b"'s suggestion to also get cleared?

@Ailrun
Copy link

Ailrun commented May 11, 2020

The problem is caused by hint/error distinction and how error merge works.

(try (sym "a" >> empty) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> error ["b"] at 0) <|> pure ()
===> (error [] at 1) <|> pure () -- the errors are merged (the second is ignored because of its offset)
===> hint [] -- since @pure ()@ succeeds, it tries to promote the error to a hint, and ignores it because of its offset.

try (sym "a" >> empty) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (error ["b"] at 0 <|> pure ())
===> (error [] at 1) <|> hint ["b"] -- since @pure ()@ succeeds, the second error is promoted to a hint
===> hint ["b"]

@mrkkrp mrkkrp added the bug label May 18, 2020
@mrkkrp
Copy link
Owner

mrkkrp commented May 18, 2020

This is a tricky one. Would be nice to have it fixed, but I'm not sure how.

@gelisam
Copy link
Author

gelisam commented May 19, 2020

How about keeping track of both the error message at the latest-consumed position and at the rightmost-observed position? Here is a long explanation of why I think it's a good idea, a prototype, and a proof that my prototype's implementation of (<|>) is associative: https://gist.github.com/gelisam/bca49348986f119457837cdbfa746cd4

@mrkkrp
Copy link
Owner

mrkkrp commented May 20, 2020

@gelisam Thank you! I'll get to reading this soon.

@mrkkrp
Copy link
Owner

mrkkrp commented Jun 9, 2020

Sorry for the delay. I did not forget. Will try to look at this today.

@gelisam
Copy link
Author

gelisam commented Jun 9, 2020

I'm not in a hurry, take your time!

@mrkkrp
Copy link
Owner

mrkkrp commented Jun 10, 2020

OK, I looked at this today. @gelisam I appreciate your write up and the propotype, but what we want here is one minimal change to the existing code base, a change that would not affect performance or anything else. I added your test to the test suite in the choice-associativity branch.

Looking at the two cases it seems to me that the "right associative" one works as it should. "Left associative" should do better. More precisely, the problem seems to be that it discards the error at offset 0 completely while it potentially still can be useful later (when the error is converted to hints). I think it could be preserved somehow "just in case" and then restored. But well, it is just an idea. A ParseError can have only one offset to begin with, so with current code base this path seems impossible. Let's face it too, it is nice to have simpler representation for parse errors. There doesn't seem to be another way to save that error at offset 0.

In the end, Parsec-like parser are stateful monsters. The current situation is the result of the compromises that we've made over time and in most cases the library works nicely. I'd welcome a PR which attempts to solve this in an non-intrusive way, but I myself will not allocate time for working on this issue because it is not clear how we could do better without a major re-write.

tomjaguarpaw pushed a commit to tomjaguarpaw/megaparsec that referenced this issue Sep 29, 2022
)

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from v2.2.1 to v2.2.2.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v2.2.1...e448a9b)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@Lev135
Copy link
Contributor

Lev135 commented Jun 5, 2023

More precisely, the problem seems to be that it discards the error at offset 0 completely while it potentially still can be useful later (when the error is converted to hints).

Maybe we should convert errors with less offsets into hints (and store hints not only for ok branches, but also for error)? If I understood the problem correctly, it should be fixed:

(try (sym "a" >> empty) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> error ["b"] at 0) <|> pure ()
===> (error [] at 1, hint ["b"]) <|> pure () -- the errors are merged (the second is converted to hint because of its offset)
===> hint ["b"]

try (sym "a" >> empty) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (error ["b"] at 0 <|> pure ())
===> (error [] at 1) <|> hint ["b"] -- since @pure ()@ succeeds, the second error is promoted to a hint
===> hint ["b"]

@mrkkrp mrkkrp self-assigned this Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants