Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty matches appearing unnecessarily when repeating empty rules ambiguously #1312

Open
redruin1 opened this issue Jul 30, 2023 · 3 comments
Labels
bug Earley Issues regarding the Earley parser

Comments

@redruin1
Copy link

I'm writing a simple assembly-like language and I'm using Lark to parse it's AST, but I'm having trouble with ambiguity. Here's a boiled-down MinRe:

grammar ="""
program: statement* // zero or more statements

?statement: instruction

instruction: pneumonic [parameter]* // match zero or more expected parameters
pneumonic: CNAME
parameter: ESCAPED_STRING | INT

SPACING: /[ \t\f]+/  
%ignore SPACING

%import common.CNAME
%import common.ESCAPED_STRING
%import common.INT
"""

asm_parser = Lark(grammar, start="program", ambiguity="explicit")

example = """PNEUMONIC "text" 10"""

syntax_tree = asm_parser.parse(example)

print(syntax_tree.pretty())

The above produces this output, which shows there's some phantom parameter between the pneumonic and the explicit strings and ints:

program
  _ambig
    instruction
      pneumonic PNEUMONIC
      parameter "text"
      parameter 10
    instruction
      pneumonic PNEUMONIC
      None
      parameter "text"
      parameter 10

At first, I thought this was the parser matching the whitespace between the pneumonic and the first parameter, but removing this whitespace doesn't seem to help (especially since this whitespace is seemingly ignored anyway):

example = """PNEUMONIC"text" 10""" # produces the same output as above

What did work however was to remove the brackets within the instruction rule:

instruction: pneumonic parameter*

which resolves the ambiguity:

program
  instruction
    pneumonic   PNEUMONIC
    parameter   "text"
    parameter   10

From what I understand the brackets indicate an "expected value" and the parser supplies None when nothing is found, but what is the parser actually matching in-between the pneumonic and the first parameter in this case?

@erezsh
Copy link
Member

erezsh commented Jul 30, 2023

This isn't a problem with brackets. It might be an issue with ambiguous repetition.

Note that this grammar produces the same result:

program: instruction*
instruction: pneumonic parameter*
pneumonic: CNAME
parameter: [ESCAPED_STRING | INT]

And this grammar produces a similar result:

program: instruction*
instruction: pneumonic parameter*
pneumonic: CNAME
parameter: (ESCAPED_STRING | INT)?

Which means that repeating empty rules is probably creating this pattern.

@erezsh erezsh changed the title Phantom captures appearing when using brackets? Empty matches appearing unnecessarily when repeating empty rules ambiguously Jul 30, 2023
@erezsh erezsh added bug and removed question labels Jul 30, 2023
@MegaIng
Copy link
Member

MegaIng commented Jul 30, 2023

Probably a duplicate of #1283. The solution is to make sure that you don't have ambiguites, and ideally you always want to use parser='lalr'. In this case if you just use parameter* (no brackets, since that is a duplication of the empty match possibility), it works and is even lalr compatible.

@erezsh
Copy link
Member

erezsh commented Jul 30, 2023

@MegaIng I tried disabling ChildFilterLALR and the ForestToParseTree cache, as that issue suggests, but this artifact still persists.

@erezsh erezsh added the Earley Issues regarding the Earley parser label Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Earley Issues regarding the Earley parser
Projects
None yet
Development

No branches or pull requests

3 participants