How to parse indents? #37

notramo · 2022-07-05T13:06:17Z

I want to parse a format where indents are used instead of brackets, like in Nim or Python.
What's the best approach to this task using this library?

Considering later stages of the parsing (AST generation), probably it would be the easiest approach to have an Indent and a Dedent token type. If one line has 3 levels of indentation, then the next only has 2 levels, then it would generate a Dedent token before the first token on the next line.
This will probably need some state machine which tracks the level of the last indentation. Is this library capable of doing this?

Hejsil · 2022-07-05T13:19:10Z

Hmmm. I've never really written an indent sensitive parser, since it sounds like a pain, so I'm not sure what techniques are used.

Had a look around, and other similar parser libraries (PEG and parser combinators are closely related) have had similar discussions on this. Have a look: pegjs/pegjs#217

It doesn't seem like mecha would be the best fit for this

notramo · 2022-07-05T13:30:27Z

What about using a converter function to push/pop from a state and emit the respective tokens? Is it reliable, or the order of function execution could mess the state machine up?

Another idea is to parse all space at the start of the line to an PrefixSpace token, which includes ^\s*, and do post processing on the tokens to swap the PrefixSpace tokens out to Indent or Dedent.

Hejsil · 2022-07-05T13:41:07Z

🤷

One thing to keep in mind, is that mecha basically does backtracking. If one branch fails to parse, the next one is tried. You would need functions that make sure the state machine is restored to some sensible state before mecha tries to parse a different "branch" of syntax.

Edit: To be clear, it is not really mecha per say, that "backtracks", but that is how the mecha.oneOf function does what it does. It tries all parses, one after another, until one succeeds.

notramo · 2022-07-05T15:34:50Z

I think parsing it to an Indent token, then post processing it is sufficient.

Thank you!

notramo closed this as completed Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to parse indents? #37

How to parse indents? #37

notramo commented Jul 5, 2022

Hejsil commented Jul 5, 2022

notramo commented Jul 5, 2022

Hejsil commented Jul 5, 2022 •

edited

notramo commented Jul 5, 2022

How to parse indents? #37

How to parse indents? #37

Comments

notramo commented Jul 5, 2022

Hejsil commented Jul 5, 2022

notramo commented Jul 5, 2022

Hejsil commented Jul 5, 2022 • edited

notramo commented Jul 5, 2022

Hejsil commented Jul 5, 2022 •

edited