Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum munch? #159

Open
molnarp opened this issue May 29, 2021 · 3 comments
Open

Maximum munch? #159

molnarp opened this issue May 29, 2021 · 3 comments
Labels

Comments

@molnarp
Copy link

molnarp commented May 29, 2021

Hi,

this is more of a question than an issue about Moo, so here goes:

I have the following lexer:

const lexer = moo.compile({
  TERM: /[a-z]+/,
  PREFIXTERM: /\*|(?:[a-z]+\*)/,
});

On input moo, this will return:

{"type":"TERM","value":"moo","text":"moo","offset":0,"lineBreaks":0,"line":1,"col":1}

On input moo* I would want it to return a single PREFIXTERM, but I'm getting this instead:

{"type":"TERM","value":"moo","text":"moo","offset":0,"lineBreaks":0,"line":1,"col":1}
{"type":"PREFIXTERM","value":"*","text":"*","offset":3,"lineBreaks":0,"line":1,"col":4}

How can I get it to go for a single PREFIXTERM?

@tjvr
Copy link
Collaborator

tjvr commented May 29, 2021

Have you tried swapping the order of the rules? Earlier rules take precedence.

@tjvr tjvr added the question label May 29, 2021
@molnarp
Copy link
Author

molnarp commented May 30, 2021

I can't really do that, because I also have:

WILDTERM: /(?:[a-z*?]+)/,

which is a superset of TERM phrases. In this setup, if the input is mo*o, TERM consumes the prefix, and then PREFIXTERM consumes the asterisk, etc.

This would work, if the longest match was picked. Instead, the earliest match is. I was wondering how to get around this issue.

@tjvr
Copy link
Collaborator

tjvr commented Jun 4, 2021

I'm afraid I don't exactly understand what you're trying to do.

Moo doesn't choose the regexp with the longest match -- indeed, because it combines all the regexps into a single JS regexp for speed, it can't do this. Instead, the first regexp will match: earlier rules take precedence.

It's hard to provide a solid recommendation without knowing more about the language you're trying to parse. But usually people seem to solve problems that sound like this by:

  • varying the order of the rules
  • using keywords
  • using (negative) lookahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants