(<|>) is not associative #412

gelisam · 2020-05-11T02:14:42Z

Here is an example demonstrating that we get different error messages depending on how we parethesize a <|> b <|> c.

import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char.Lexer


type Parser = Parsec Void String

sym :: String -> Parser ()
sym s = do
  _ <- symbol (pure ()) s
  pure ()

-- consumes one character, then rolls back
a :: Parser ()
a = try (sym "a" >> empty) <?> "a"

-- fails immediately (on the given input)
b :: Parser ()
b = sym "b"

-- succeeds immediately
c :: Parser ()
c = pure ()

-- |
-- >>> parseTest (leftAssociative >> sym "d") "aaa"
-- 1:1:
--   |
-- 1 | aaa
--   | ^
-- unexpected 'a'
-- expecting 'd'
leftAssociative :: Parser ()
leftAssociative = (a <|> b) <|> c

-- |
-- >>> parseTest (rightAssociative >> sym "d") "aaa"
-- 1:1:
--   |
-- 1 | aaa
--   | ^
-- unexpected 'a'
-- expecting 'b' or 'd'
rightAssociative :: Parser ()
rightAssociative = a <|> (b <|> c)

The text was updated successfully, but these errors were encountered:

gelisam · 2020-05-11T02:36:43Z

Personally I would prefer if the error message was expecting a, 'b', or 'd', because the user has three ways to try to fix the problem: they can fix the input so that a well-formed a begins at that character (which would require the user to understand why "aaa" is not a valid a even though it begins with an 'a'), so that a valid sym "b" begins at that character (which would require that character to be a 'b'), or so that a valid sym "d" begins at that character (which would require that character to be a 'd').

But since we wouldn't want e.g. a = try (sym "a" >> sym "z") to cause the error message to incorrectly suggest replacing the first 'a' with a 'z', I understand that it's probably easier not to include the suggestions which involve a try which has consumed characters. Maybe sym "a" sets a flag indicating that it has consumed a character, causing try to clear the suggestions when it rolls back, but try doesn't clear that flag, thus causing sym "b"'s suggestion to also get cleared?

Ailrun · 2020-05-11T02:48:32Z

The problem is caused by hint/error distinction and how error merge works.

(try (sym "a" >> empty) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> error ["b"] at 0) <|> pure ()
===> (error [] at 1) <|> pure () -- the errors are merged (the second is ignored because of its offset)
===> hint [] -- since @pure ()@ succeeds, it tries to promote the error to a hint, and ignores it because of its offset.

try (sym "a" >> empty) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (error ["b"] at 0 <|> pure ())
===> (error [] at 1) <|> hint ["b"] -- since @pure ()@ succeeds, the second error is promoted to a hint
===> hint ["b"]

mrkkrp · 2020-05-18T22:00:53Z

This is a tricky one. Would be nice to have it fixed, but I'm not sure how.

gelisam · 2020-05-19T17:42:14Z

How about keeping track of both the error message at the latest-consumed position and at the rightmost-observed position? Here is a long explanation of why I think it's a good idea, a prototype, and a proof that my prototype's implementation of (<|>) is associative: https://gist.github.com/gelisam/bca49348986f119457837cdbfa746cd4

mrkkrp · 2020-05-20T08:54:57Z

@gelisam Thank you! I'll get to reading this soon.

mrkkrp · 2020-06-09T10:13:36Z

Sorry for the delay. I did not forget. Will try to look at this today.

gelisam · 2020-06-09T11:53:03Z

I'm not in a hurry, take your time!

mrkkrp · 2020-06-10T18:22:09Z

OK, I looked at this today. @gelisam I appreciate your write up and the propotype, but what we want here is one minimal change to the existing code base, a change that would not affect performance or anything else. I added your test to the test suite in the choice-associativity branch.

Looking at the two cases it seems to me that the "right associative" one works as it should. "Left associative" should do better. More precisely, the problem seems to be that it discards the error at offset 0 completely while it potentially still can be useful later (when the error is converted to hints). I think it could be preserved somehow "just in case" and then restored. But well, it is just an idea. A ParseError can have only one offset to begin with, so with current code base this path seems impossible. Let's face it too, it is nice to have simpler representation for parse errors. There doesn't seem to be another way to save that error at offset 0.

In the end, Parsec-like parser are stateful monsters. The current situation is the result of the compromises that we've made over time and in most cases the library works nicely. I'd welcome a PR which attempts to solve this in an non-intrusive way, but I myself will not allocate time for working on this issue because it is not clear how we could do better without a major re-write.

) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from v2.2.1 to v2.2.2. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v2.2.1...e448a9b) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Lev135 · 2023-06-05T12:38:58Z

More precisely, the problem seems to be that it discards the error at offset 0 completely while it potentially still can be useful later (when the error is converted to hints).

Maybe we should convert errors with less offsets into hints (and store hints not only for ok branches, but also for error)? If I understood the problem correctly, it should be fixed:

(try (sym "a" >> empty) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> error ["b"] at 0) <|> pure ()
===> (error [] at 1, hint ["b"]) <|> pure () -- the errors are merged (the second is converted to hint because of its offset)
===> hint ["b"]

try (sym "a" >> empty) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (error ["b"] at 0 <|> pure ())
===> (error [] at 1) <|> hint ["b"] -- since @pure ()@ succeeds, the second error is promoted to a hint
===> hint ["b"]

mrkkrp added the bug label May 18, 2020

mrkkrp added the awaiting-pr label Jun 10, 2020

mrkkrp self-assigned this Jun 12, 2023

mrkkrp removed the awaiting-pr label Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(<|>) is not associative #412

(<|>) is not associative #412

gelisam commented May 11, 2020

gelisam commented May 11, 2020

Ailrun commented May 11, 2020 •

edited

mrkkrp commented May 18, 2020

gelisam commented May 19, 2020

mrkkrp commented May 20, 2020

mrkkrp commented Jun 9, 2020

gelisam commented Jun 9, 2020

mrkkrp commented Jun 10, 2020

Lev135 commented Jun 5, 2023

(<|>) is not associative #412

(<|>) is not associative #412

Comments

gelisam commented May 11, 2020

gelisam commented May 11, 2020

Ailrun commented May 11, 2020 • edited

mrkkrp commented May 18, 2020

gelisam commented May 19, 2020

mrkkrp commented May 20, 2020

mrkkrp commented Jun 9, 2020

gelisam commented Jun 9, 2020

mrkkrp commented Jun 10, 2020

Lev135 commented Jun 5, 2023

Ailrun commented May 11, 2020 •

edited