example: reconstruct_python.py failing in half cases #1300

zarnovican · 2023-07-14T14:04:08Z

Short description

When running the example examples/advanced/reconstruct_python.py, the code fails on AssertionError in approx 50% of cases.

Traceback (most recent call last):
  File "/home/zarnovic/git/lark/examples/advanced/reconstruct_python.py", line 86, in <module>
    test()
  File "/home/zarnovic/git/lark/examples/advanced/reconstruct_python.py", line 80, in test
    assert tree == tree_new
           ^^^^^^^^^^^^^^^^
AssertionError

To Reproduce

Execute the example in a loop. "1" indicate AssertionError. "0" is success.

$ cd examples/advanced/
$ for i in `seq 10`; do python reconstruct_python.py 2>&1 | grep -c AssertionError; done
1
1
1
1
0
1
0
0
1
0

Test environment

OS: Linux
Python: 3.11
Lark: 1.1.5

I have tested few older Pythons and older Larks as well. No change.

Long Description

As I understand the example, it is converting text Python to AST, then back to Python via "Reconstructor" and then, again to AST. The assertion is that the first and second AST trees should be the same.

After some debugging, I was able to isolate much smaller reproducer:

from lark import Token, Lark
from lark.reconstruct import Reconstructor
from lark.indenter import PythonIndenter

# Official Python grammar by Lark
python_parser3 = Lark.open_from_package('lark', 'python.lark', ['grammars'],
                                        parser='lalr', postlex=PythonIndenter(), start='file_input',
                                        maybe_placeholders=False    # Necessary for reconstructor
                                        )

def special(sym):
    return Token('SPECIAL', sym.name)

def test():
    r = Reconstructor(python_parser3, {'_NEWLINE': special, })
    tree = python_parser3.parse('''foo('a', 'b')\n''')
    output = r.reconstruct(tree)
    print(output)

if __name__ == '__main__':
    test()

The above code is converting foo('a', 'b') to AST and back to python. The Reconstructor produces four variations of the code

$ for i in `seq 10`; do python reconstruct_python_test.py; done | sort -u
foo('a''b',)_NEWLINE
foo('a','b',)_NEWLINE
_NEWLINE foo('a''b',)_NEWLINE
_NEWLINE foo('a','b',)_NEWLINE

Notice the missing comma "," between the two arguments. 'a''b' are strings concatenated into one argument, while 'a','b' are two arguments. The same problem happens in the original example on line:

python_parser3 = Lark.open_from_package('lark', 'python.lark', ['grammars'],

Hence the AssertionError.

Problem: Python grammar

I'm not experienced enough to understand how the Reconstructor works. Maybe the problem is ambiguity in the Python grammar defined in Lark. I cannot fathom, however, how could the Reconstructor arrive from a tree node "arguments" with two children to string concat 😕

Problem: non-determism

My initial motivation for root-cause analysis was actually finding the source of non-determinism. Why, if I put the loop inside the Python process, it returns consistent results, while the loop outside the process differ ? I haven't found "the line" where it is happening. The closest I came is this line, where the output unreduced_tree from the parser is already "random", while the input is not AFAIK.

In my opinion, the source of entrophy is Lark's usage of set()s. Every time the Python process is executed, the elements in sets are iterated over in a different order. This is expected behavior from Python point-of-view. But it has cascading effects on Larks processing, when, for example, rules are inspected in random order.

🤔 Maybe, one way of fixing it is to switch from set() to dict() and use some dummy value. AFAIK Python guarantees that the dict keys will be iterated in the same order.

The text was updated successfully, but these errors were encountered:

erezsh · 2023-10-23T08:22:24Z

The new release (1.1.8) should solve Earley non-determinism.

erezsh added Earley Issues regarding the Earley parser bug labels Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example: reconstruct_python.py failing in half cases #1300

example: reconstruct_python.py failing in half cases #1300

zarnovican commented Jul 14, 2023

erezsh commented Oct 23, 2023

example: reconstruct_python.py failing in half cases #1300

example: reconstruct_python.py failing in half cases #1300

Comments

zarnovican commented Jul 14, 2023

Short description

To Reproduce

Test environment

Long Description

Problem: Python grammar

Problem: non-determism

erezsh commented Oct 23, 2023