Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stability of the Lexer's internal API #143

Open
eemeli opened this issue Jun 7, 2020 · 5 comments
Open

Stability of the Lexer's internal API #143

eemeli opened this issue Jun 7, 2020 · 5 comments

Comments

@eemeli
Copy link

eemeli commented Jun 7, 2020

I'm writing a parser that uses moo as a lexer. In certain situations, it would be highly beneficial to have access to the lexer's current state and/or stack when determining the meaning of some of the tokens generated by the lexer.

Currently, the info I want is available as lexer.state and lexer.stack, but this does not appear to be documented anywhere. Is there a reason why I shouldn't use them, and/or is there a chance that their API will change between patch updates?

I would've asked about changes between minor versions, but as moo is still at 0.5 that's not really relevant.

@tjvr
Copy link
Collaborator

tjvr commented Jun 9, 2020

Hmm. Can you get the information you need by calling lexer.save()? I think I'd recommend doing that instead if you can.

Moo is probably more stable than the version number implies, although of course we reserve the right to change it if we feel the need to -- hence the 0.5 version number :-)

Out of interest, what sort of ambiguity are you using this information to resolve?

@mcous
Copy link

mcous commented Jun 9, 2020

To piggyback a tiny amount, the userland solution to the question I raised in #142 relies on the Lexer.index property, so I, too, would be curious about which properties are stable and which are not, if it's not the "standard" _prefix notation

@eemeli
Copy link
Author

eemeli commented Jun 9, 2020

My issue was with needing different handling for a terminating } token depending on whether the stack was empty or not. I ended up working around the issue by adding an atRoot argument to the handler function, as that also made it easier to get return type overloading to work in TypeScript.

Separately I also could've made use of lexer.buffer to get the original source, but ended up working around that issue as well. If the API for these internals were, well, less internal, I'm sure that they'd find plenty of use.

@nathan
Copy link
Collaborator

nathan commented Jun 10, 2020

@eemeli If I understand what you're describing, you can do that without modifying moo or using its internals:

const moo = require('moo')

const main = {
  left: {match: /{/, push: 'inner'},
  unmatched: /}/,
}
const inner = {
  right: {match: /}/, pop: 1},
}
const lexer = moo.states({
  main,
  inner: Object.assign({}, inner, main, inner),
})

lexer.reset('{{}}}')
console.log(Array.from(lexer, t => t.type))
// [ 'left', 'left', 'right', 'right', 'unmatched' ]

(@tjvr there's a bug in the single-character fast case that breaks this if you use '{' and '}' instead of /{/ and /}/.)

@eemeli
Copy link
Author

eemeli commented Jun 11, 2020

Oh, something like that would certainly work as well.

I guess my general point here is that sometimes it's useful (but clearly not essential) to access the lexer's context when processing tokens. Right now that's not possible without using an internal API, which isn't guaranteed to change between major (or 0.x minor) versions. It's practically certain that any such use can be worked around with the current public API, but those solutions aren't always as elegant.

Having been on the other side of this sort of request a number of times, I fully understand not wanting to make any API any more complicated than it minimally needs to be. I think you have a terrific tool here, so I'd just like to point out that there are additional usage patterns for it that could be enabled by a relatively small change just to its documentation.

As a bit of background, the previous implementation for the parser I'm replacing used PEG.js, and coming from there I'm finding that I need to adjust my thinking somewhat to account for the lack of arguments for the lexer, and for the lack of custom state variables. Both are perfectly understandable, just requiring a bit of a mental shift to handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants