Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separate Tokenizer/Lexer? #241

Open
Philipp-M opened this issue Mar 29, 2023 · 1 comment
Open

Use separate Tokenizer/Lexer? #241

Philipp-M opened this issue Mar 29, 2023 · 1 comment

Comments

@Philipp-M
Copy link

Hi,

I ran into a few issues when trying to parse the IFC express schema.

One being that something like TrueNorth is tried to be parsed (e.g. in an expression rule) as a boolean literal, because the literal rule has higher priority than the simple_id rule, but obviously after changing the order of these rules made something like True not a boolean literal anymore.
So I think there are two options to solve this in its core issue (I think it's just a sign for further issues that may arise because of ambiguous parsing):

Either all the basic parsing rules check that they are not another basic parsing rule (e.g. simple_id checks that it doesn't contain e.g. literals or other things that may also be a simple_id) or use a separate lexer/tokenizer that weeds these cases out already.

I personally prefer using a lexer, it's easier to restrict the problem space/abstract the parser on top of that, because I also have had issues with weird parsing ambiguities in the past when not using a separate lexer (in way simpler languages). I think the BNF grammar of STEP and EXPRESS should allow tokenizing/lexing the whole input without having to think about modal lexing etc. but I'm not sure yet.

I have actually started writing a parser/lexer for the express language, I'm not sure yet, if I will progress this project much further though (I guess I underestimated the scope of supporting STEP completely).
My original motivation was having better error recovery/messages (by using something like chumsky as parser combinator library).

I think the lexer is almost complete, so you may be interested in this:
https://github.com/Philipp-M/express-parser/blob/6464b29e5eb14d70b0445b84567ed58fdfd144b6/src/lexer.rs

@Philipp-M
Copy link
Author

Btw. this may also be helpful in case you want to go with a lexer:

https://github.com/stepcode/stepcode/blob/develop/src/express/expscan.l

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant