New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Req: option for watching token streams and candidate targets #1320
Comments
For LALR, this is very easy to do using the See this recipe for an example: https://lark-parser.readthedocs.io/en/latest/recipes.html#adding-a-progress-bar-to-parsing-with-tqdm I'm not sure this is relevant for Earley, since it matches and considers many different tokens, that are eventually thrown away. i.e. it's not exactly a stream of tokens. |
@erezsh thanks for that. I have been meaning to check out I'll make a point of trying it out today. I'm guessing it will have big payoffs when I hit a lot more of the ancient syntax's nitty-gritties. |
I just tried it out, but noticed that the Sample code is below, and has got me exactly where I need to be :) :
As a takeaway, there might be merit in adding a couple of properties to the But for now, I'm delighted to have so much transparency in the parser's activity. Thanks again! |
It looks like the recipe isn't entirely correct. To get the result, you should call As for the line and column numbers, why don't you just take them from the token? |
Suggestion
Requesting constructor keyword options to allow logging the lexer tokens stream. Also, if feasible, the potential target fulfilments in the current context.
Describe alternatives you've considered
The PyCharm debugger has sophisticated breakpoint options, including the ability to set a breakpoint to:
Additional context
Printing the token stream, via the above IDE debugger breakpoint technique, has been a huge support in my current project.
(FYI, this requires carefully retro-implementing a parser for an archaic, convoluted and very non-standard programming/configuration language from the 1980s, whose parser was originally implemented in hand-crafted C, incrementally coded/patched/extended in a silo over the decades, and with no formal grammar specification, not even YACC. Getting its various cryptic nuances to parse and correctly feed into my transformer is a massively challenging undertaking, but I'm getting there.)
I would really like to be able to watch or log the LARK parser's token stream without reliance on the IDE. Even if a constructor option allowed passing an open writeable file object, and/or a logger object, and/or the pathname of a file to write to, this would be very helpful.
In a perfect world, for each token fetched and logged, it would be even better to see the current line/column numbers in the input at which the token was matched.
I acknowledge that logging of parser state would be a much harder venture, especially to do so in a readable manner. So even just token stream logging would be quite a boost.
The text was updated successfully, but these errors were encountered: