Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer error cuts input within a codepoint #313

Open
pandaman64 opened this issue Dec 19, 2021 · 3 comments
Open

Lexer error cuts input within a codepoint #313

pandaman64 opened this issue Dec 19, 2021 · 3 comments

Comments

@pandaman64
Copy link
Contributor

When the lexer sees a non-ASCII illegal token, it emits only the first byte as the cause of the error, resulting in a mangled message.

How to reproduce

Compile the following (non-conforming) source with satysfi (SATySFi version 0.0.6).

Then we get an error with an invalid codepoint. (In the following output, the invalid codepoint is replaced with U+FFFD � replacement character, but the actual output is the first byte of あ.)

$ satysfi -o /dev/null a.saty
 ---- ---- ---- ----
  target file: 'null'
  dump file: 'a.satysfi-aux' (will be created)
  parsing 'a.saty' ...
! [Syntax Error at Lexer] at "a.saty", line 1, characters 0-1:
    illegal token '�' in a program area

Related Issues

#312 reports an issue with the error position, but the root cause is the same: Unicode-aware treatment of errors.

@leque
Copy link
Contributor

leque commented Dec 19, 2021

I think this and #312 could be fixed by reimplementing the lexer with sedlex. Modifications needed would have large conflicts with #294, though.

@puripuri2100
Copy link
Contributor

I have reimplemented SATySFi lexer with sedlex1.

Footnotes

  1. https://github.com/puripuri2100/satysfifmt/blob/master/src/frontend/lexer.ml

@leque
Copy link
Contributor

leque commented Dec 19, 2021

I have reimplemented SATySFi lexer with sedlex1.

Awesome! That will be a great starting point to resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants