You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the example examples/advanced/reconstruct_python.py, the code fails on AssertionError in approx 50% of cases.
Traceback (most recent call last):
File "/home/zarnovic/git/lark/examples/advanced/reconstruct_python.py", line 86, in <module>
test()
File "/home/zarnovic/git/lark/examples/advanced/reconstruct_python.py", line 80, in test
assert tree == tree_new
^^^^^^^^^^^^^^^^
AssertionError
To Reproduce
Execute the example in a loop. "1" indicate AssertionError. "0" is success.
I have tested few older Pythons and older Larks as well. No change.
Long Description
As I understand the example, it is converting text Python to AST, then back to Python via "Reconstructor" and then, again to AST. The assertion is that the first and second AST trees should be the same.
After some debugging, I was able to isolate much smaller reproducer:
Notice the missing comma "," between the two arguments. 'a''b' are strings concatenated into one argument, while 'a','b' are two arguments. The same problem happens in the original example on line:
I'm not experienced enough to understand how the Reconstructor works. Maybe the problem is ambiguity in the Python grammar defined in Lark. I cannot fathom, however, how could the Reconstructor arrive from a tree node "arguments" with two children to string concat 😕
Problem: non-determism
My initial motivation for root-cause analysis was actually finding the source of non-determinism. Why, if I put the loop inside the Python process, it returns consistent results, while the loop outside the process differ ? I haven't found "the line" where it is happening. The closest I came is this line, where the output unreduced_tree from the parser is already "random", while the input is not AFAIK.
In my opinion, the source of entrophy is Lark's usage of set()s. Every time the Python process is executed, the elements in sets are iterated over in a different order. This is expected behavior from Python point-of-view. But it has cascading effects on Larks processing, when, for example, rules are inspected in random order.
Short description
When running the example
examples/advanced/reconstruct_python.py
, the code fails onAssertionError
in approx 50% of cases.To Reproduce
Execute the example in a loop. "1" indicate AssertionError. "0" is success.
Test environment
OS: Linux
Python: 3.11
Lark: 1.1.5
I have tested few older Pythons and older Larks as well. No change.
Long Description
As I understand the example, it is converting text Python to AST, then back to Python via "Reconstructor" and then, again to AST. The assertion is that the first and second AST trees should be the same.
After some debugging, I was able to isolate much smaller reproducer:
The above code is converting
foo('a', 'b')
to AST and back to python. The Reconstructor produces four variations of the codeNotice the missing comma "," between the two arguments.
'a''b'
are strings concatenated into one argument, while'a','b'
are two arguments. The same problem happens in the original example on line:Hence the
AssertionError
.Problem: Python grammar
I'm not experienced enough to understand how the Reconstructor works. Maybe the problem is ambiguity in the Python grammar defined in Lark. I cannot fathom, however, how could the Reconstructor arrive from a tree node "arguments" with two children to string concat 😕
Problem: non-determism
My initial motivation for root-cause analysis was actually finding the source of non-determinism. Why, if I put the loop inside the Python process, it returns consistent results, while the loop outside the process differ ? I haven't found "the line" where it is happening. The closest I came is this line, where the output
unreduced_tree
from the parser is already "random", while the input is not AFAIK.In my opinion, the source of entrophy is Lark's usage of
set()
s. Every time the Python process is executed, the elements in sets are iterated over in a different order. This is expected behavior from Python point-of-view. But it has cascading effects on Larks processing, when, for example, rules are inspected in random order.🤔 Maybe, one way of fixing it is to switch from
set()
todict()
and use some dummy value. AFAIK Python guarantees that the dict keys will be iterated in the same order.The text was updated successfully, but these errors were encountered: