Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.12 Peg Parser Grammar has a small problem #3957

Open
hasii2011 opened this issue Feb 5, 2024 · 5 comments
Open

Python 3.12 Peg Parser Grammar has a small problem #3957

hasii2011 opened this issue Feb 5, 2024 · 5 comments
Labels

Comments

@hasii2011
Copy link

Pycharm and Python 3.12 in general accept lines in this format

class MyMetaBaseWxCommand(BaseWxCommandMeta, ABCMeta):  # type: ignore
    pass

However, the Python parser generated by the above burps on the '# type: ignore' comment. Says it is a syntax error. Not sure why

@kaby76
Copy link
Contributor

kaby76 commented Feb 5, 2024

The grammar is scraped from the pegen grammar 3.12.1. If it doesn't parse, it's caused by one of the following:

  • The pegen grammar is incorrect.
  • The lexical structure is not being implemented correctly from the doc.
  • The lexical structure not being described correctly in the documentation.

Pegen does not take a description for the lexical structure. It's backwards captured from an implementation then placed into the Python documentation--something that is really bad practice. We have to read the source code for the Python3 compiler and find out which.

@hasii2011
Copy link
Author

Ok, thanks for the quick response. Was hoping it was a quick fix. I am not familiar with g4 files, mainly just a consumer.

@KvanTTT KvanTTT added the python3 label Feb 5, 2024
@RobEin
Copy link
Contributor

RobEin commented Feb 5, 2024

The lexical analysis documentation does not mention the generation of the TYPE_COMMENT token.
Maybe another documentation describes it somewhere, but I haven't found it so far.

The solution for the comment in the example (# type: ignore) could be that the lexer would recognize it as a plain COMMENT (or perhaps as a hidden TYPE_COMMENT).
In other words, the tokenizer would be statement-sensitive.
I'm afraid that this cannot be implemented in the lexer and that's probably why they don't write about it in the lexical analysis documentation.
I still have to think about that.

Until then, I temporarily set the TYPE_COMMENT tokens to hidden in my own repository.
This way, there is no parsing for type comments, but no errors are generated either.

@kaby76
Copy link
Contributor

kaby76 commented Feb 5, 2024

In other words, the tokenizer would be statement-sensitive.

We might be able to define an Antlr4 "lexer mode" to work around a parser-state-dependent lex, but "lexer modes" are basically hacks for the real deal. It would be best to have a parser-state aware lexer option for Antlr5. @ericvergnaud

@hasii2011
Copy link
Author

So I generated new lexer/parser files this morning and verified using the modified file provided by @RobEin ; They worked great and got me around this issue with the mypy commented files. This resolves the issue for me now as I don't look at that construct in my visitor code

Thanks very much for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants