Stability of the Lexer's internal API #143

eemeli · 2020-06-07T19:28:58Z

I'm writing a parser that uses moo as a lexer. In certain situations, it would be highly beneficial to have access to the lexer's current state and/or stack when determining the meaning of some of the tokens generated by the lexer.

Currently, the info I want is available as lexer.state and lexer.stack, but this does not appear to be documented anywhere. Is there a reason why I shouldn't use them, and/or is there a chance that their API will change between patch updates?

I would've asked about changes between minor versions, but as moo is still at 0.5 that's not really relevant.

The text was updated successfully, but these errors were encountered:

tjvr · 2020-06-09T15:29:12Z

Hmm. Can you get the information you need by calling lexer.save()? I think I'd recommend doing that instead if you can.

Moo is probably more stable than the version number implies, although of course we reserve the right to change it if we feel the need to -- hence the 0.5 version number :-)

Out of interest, what sort of ambiguity are you using this information to resolve?

mcous · 2020-06-09T15:46:41Z

To piggyback a tiny amount, the userland solution to the question I raised in #142 relies on the Lexer.index property, so I, too, would be curious about which properties are stable and which are not, if it's not the "standard" _prefix notation

eemeli · 2020-06-09T16:56:08Z

My issue was with needing different handling for a terminating } token depending on whether the stack was empty or not. I ended up working around the issue by adding an atRoot argument to the handler function, as that also made it easier to get return type overloading to work in TypeScript.

Separately I also could've made use of lexer.buffer to get the original source, but ended up working around that issue as well. If the API for these internals were, well, less internal, I'm sure that they'd find plenty of use.

nathan · 2020-06-10T23:04:46Z

@eemeli If I understand what you're describing, you can do that without modifying moo or using its internals:

const moo = require('moo')

const main = {
  left: {match: /{/, push: 'inner'},
  unmatched: /}/,
}
const inner = {
  right: {match: /}/, pop: 1},
}
const lexer = moo.states({
  main,
  inner: Object.assign({}, inner, main, inner),
})

lexer.reset('{{}}}')
console.log(Array.from(lexer, t => t.type))
// [ 'left', 'left', 'right', 'right', 'unmatched' ]

(@tjvr there's a bug in the single-character fast case that breaks this if you use '{' and '}' instead of /{/ and /}/.)

eemeli · 2020-06-11T02:17:04Z

Oh, something like that would certainly work as well.

I guess my general point here is that sometimes it's useful (but clearly not essential) to access the lexer's context when processing tokens. Right now that's not possible without using an internal API, which isn't guaranteed to change between major (or 0.x minor) versions. It's practically certain that any such use can be worked around with the current public API, but those solutions aren't always as elegant.

Having been on the other side of this sort of request a number of times, I fully understand not wanting to make any API any more complicated than it minimally needs to be. I think you have a terrific tool here, so I'd just like to point out that there are additional usage patterns for it that could be enabled by a relatively small change just to its documentation.

As a bit of background, the previous implementation for the parser I'm replacing used PEG.js, and coming from there I'm finding that I need to adjust my thinking somewhat to account for the lack of arguments for the lexer, and for the lack of custom state variables. Both are perfectly understandable, just requiring a bit of a mental shift to handle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stability of the Lexer's internal API #143

Stability of the Lexer's internal API #143

eemeli commented Jun 7, 2020

tjvr commented Jun 9, 2020

mcous commented Jun 9, 2020 •

edited

eemeli commented Jun 9, 2020

nathan commented Jun 10, 2020

eemeli commented Jun 11, 2020

Stability of the Lexer's internal API #143

Stability of the Lexer's internal API #143

Comments

eemeli commented Jun 7, 2020

tjvr commented Jun 9, 2020

mcous commented Jun 9, 2020 • edited

eemeli commented Jun 9, 2020

nathan commented Jun 10, 2020

eemeli commented Jun 11, 2020

mcous commented Jun 9, 2020 •

edited