Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbols with empty match are not preserved #59

Open
zetaraku opened this issue Jul 28, 2019 · 3 comments
Open

Symbols with empty match are not preserved #59

zetaraku opened this issue Jul 28, 2019 · 3 comments
Labels

Comments

@zetaraku
Copy link
Contributor

Here is a minimal example.

Given the grammar:

abc = a b c
a = "a"
b = r'b?'
c = "c"

When it parses "abc" the result is normal,
but when it parses "ac" the NonTerminal b just disapper in the parse tree like this:

>>> print(parse_tree)
a | c

The result is the same when using b = "b"? instead.

Since b is actually matched, shouldn't it be in the parse tree (like a or c) with node.value == ''?

@igordejanovic
Copy link
Member

igordejanovic commented Jul 29, 2019

That was an early design decision, to remove elements that consume no input from the tree. IIRC the motivation was to make parse tree minimal and thus lower memory consumption but albeit it lead to the difficulties in processing parse trees as now you can't rely on the constant number of child nodes.

I don't think this behavior will change any time soon as all users of Arpeggio depend on it (e.g. textX) so it would be a very disruptive change.

There is a way to access nodes in non-terminal by name which might be a good general solution that wouldn't require change in the current behavior. I haven't checked if that is working at the moment for non-existing nodes but I guess that returning None for optional matches on access by name would be the way to go.

@zetaraku
Copy link
Contributor Author

zetaraku commented Aug 2, 2019

@igordejanovic Hi! I try the method you provided.

The empty NonTerminals still don't appear when accessed by rule name.

That causes problem with grammar like this:

a = b c "/" b c
b = r'b?'
c = r'c?'
>>> parse_tree = parser.parse("c / b")
>>> print(parse_tree)
c | / | b
>>> print(parse_tree.b)
b
>>> print(parse_tree.c)
c

Now I have to check the content for every combination of parse_tree to find out which b is presented.

Would it be possible to have an option to retain all the childrens?

@igordejanovic
Copy link
Member

Would it be possible to have an option to retain all the childrens?

Probably it would. I'm trying to figure out a general solution.
For example, you would have the same problem if you use optional rule instead of ? in regular expressions:

a = b? c "/" b c?
b = r'b'
c = r'c'

And there is also ZeroOrMore rule (*) which can match zero times:

a = b* c "/" b c*
b = r'b'
c = r'c'

All the grammars above match your input and return the same tree.

Let's leave this open as a feature request. It seems that this needs more analysis.

@igordejanovic igordejanovic changed the title Symbols with empty string are not preserved Symbols with empty match are not preserved Aug 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants