Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to parse indents? #37

Closed
notramo opened this issue Jul 5, 2022 · 4 comments
Closed

How to parse indents? #37

notramo opened this issue Jul 5, 2022 · 4 comments

Comments

@notramo
Copy link

notramo commented Jul 5, 2022

I want to parse a format where indents are used instead of brackets, like in Nim or Python.
What's the best approach to this task using this library?

Considering later stages of the parsing (AST generation), probably it would be the easiest approach to have an Indent and a Dedent token type. If one line has 3 levels of indentation, then the next only has 2 levels, then it would generate a Dedent token before the first token on the next line.
This will probably need some state machine which tracks the level of the last indentation. Is this library capable of doing this?

@Hejsil
Copy link
Owner

Hejsil commented Jul 5, 2022

Hmmm. I've never really written an indent sensitive parser, since it sounds like a pain, so I'm not sure what techniques are used.

Had a look around, and other similar parser libraries (PEG and parser combinators are closely related) have had similar discussions on this. Have a look: pegjs/pegjs#217

It doesn't seem like mecha would be the best fit for this

@notramo
Copy link
Author

notramo commented Jul 5, 2022

What about using a converter function to push/pop from a state and emit the respective tokens? Is it reliable, or the order of function execution could mess the state machine up?

Another idea is to parse all space at the start of the line to an PrefixSpace token, which includes ^\s*, and do post processing on the tokens to swap the PrefixSpace tokens out to Indent or Dedent.

@Hejsil
Copy link
Owner

Hejsil commented Jul 5, 2022

🤷

One thing to keep in mind, is that mecha basically does backtracking. If one branch fails to parse, the next one is tried. You would need functions that make sure the state machine is restored to some sensible state before mecha tries to parse a different "branch" of syntax.

Edit: To be clear, it is not really mecha per say, that "backtracks", but that is how the mecha.oneOf function does what it does. It tries all parses, one after another, until one succeeds.

@notramo
Copy link
Author

notramo commented Jul 5, 2022

I think parsing it to an Indent token, then post processing it is sufficient.

Thank you!

@notramo notramo closed this as completed Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants