Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to report "lexical" errors? #387

Open
banacorn opened this issue Nov 26, 2019 · 3 comments
Open

How to report "lexical" errors? #387

banacorn opened this issue Nov 26, 2019 · 3 comments

Comments

@banacorn
Copy link

I'm building a parser that accepts custom token stream.
I've made TokenStream (from lexer-applicative) an instance of Stream

instance Stream (TokenStream (L tok)) where

And that's wonderful, everything worked as expected, until a "lexcial error" appear in my token stream

-- | A stream of tokens
data TokenStream tok
  = TsToken tok (TokenStream tok)
  | TsEof
  | TsError LexicalError

The parser complained about unexpected end of input, that's because I had no choice but had to treat TsError like TsEof.

I think there are 3 ways of solving this:

  1. Make Stream "aware" of these lexical errors: for example, let take1_ return a Either value instead of just a Maybe value.
  2. Make the parser incremental: so that users can check if the next token is TsError, before feeding it to the parser.
  3. The "happy" way, something between 1. and 2.

I'll explain more about how it can be done in happy:

Happy also allows user to choose their own type token stream (usually with alex). As long as we tell happy what is the token for eof:

%lexer { <lexer> } { <eof> }

and what to do when a token comes in:

lexer :: (Token -> P a) -> P a

For example, this is how to deal with a token stream from lexer-applicative:

lexer :: (Token -> P a) -> P a
lexer f = scanNext >>= f

scanNext :: P Token
scanNext = do
  stream <- gets tokenStream
  case stream of
    TsToken (L _ tok) stream -> return tok
    TsEof -> return TokenEOF
    TsError (LexicalError pos) -> throwError $ Lexical pos

I think this is the best among the 3 solutions, because it allows users to handle lexical errors the way they like, and it's not an overkill like making megaparsec incremental.

But I'm still not sure about how to incorporate this into the Stream class, if we are going to do this.

@mrkkrp
Copy link
Owner

mrkkrp commented Nov 27, 2019

Should a token stream with an error in it be fed into a parser? You could just report the error because parsing won't succeed anyway.

@banacorn
Copy link
Author

Ideally you would not know if there's an error in a token stream, until you keep extracting from the stream and finally encounter one.

The workaround I'm using now is to force the whole stream into a list, and see if there's any error.

@1Computer1
Copy link
Contributor

I don't know if you still need this, but another workaround is to have type Token s = Either String tok then throw a parser error whenever you get a Left. It'll unfortunately mean you'll end up with expected tokens that are always Right _, so you could use a label for that instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants