Python 3.12 Peg Parser Grammar has a small problem #3957

hasii2011 · 2024-02-05T02:15:18Z

Pycharm and Python 3.12 in general accept lines in this format

class MyMetaBaseWxCommand(BaseWxCommandMeta, ABCMeta):  # type: ignore
    pass

However, the Python parser generated by the above burps on the '# type: ignore' comment. Says it is a syntax error. Not sure why

kaby76 · 2024-02-05T11:09:00Z

The grammar is scraped from the pegen grammar 3.12.1. If it doesn't parse, it's caused by one of the following:

The pegen grammar is incorrect.
The lexical structure is not being implemented correctly from the doc.
The lexical structure not being described correctly in the documentation.

Pegen does not take a description for the lexical structure. It's backwards captured from an implementation then placed into the Python documentation--something that is really bad practice. We have to read the source code for the Python3 compiler and find out which.

hasii2011 · 2024-02-05T12:09:28Z

Ok, thanks for the quick response. Was hoping it was a quick fix. I am not familiar with g4 files, mainly just a consumer.

RobEin · 2024-02-05T12:50:52Z

The lexical analysis documentation does not mention the generation of the TYPE_COMMENT token.
Maybe another documentation describes it somewhere, but I haven't found it so far.

The solution for the comment in the example (# type: ignore) could be that the lexer would recognize it as a plain COMMENT (or perhaps as a hidden TYPE_COMMENT).
In other words, the tokenizer would be statement-sensitive.
I'm afraid that this cannot be implemented in the lexer and that's probably why they don't write about it in the lexical analysis documentation.
I still have to think about that.

Until then, I temporarily set the TYPE_COMMENT tokens to hidden in my own repository.
This way, there is no parsing for type comments, but no errors are generated either.

kaby76 · 2024-02-05T13:53:47Z

In other words, the tokenizer would be statement-sensitive.

We might be able to define an Antlr4 "lexer mode" to work around a parser-state-dependent lex, but "lexer modes" are basically hacks for the real deal. It would be best to have a parser-state aware lexer option for Antlr5. @ericvergnaud

hasii2011 · 2024-02-05T14:15:44Z

So I generated new lexer/parser files this morning and verified using the modified file provided by @RobEin ; They worked great and got me around this issue with the mypy commented files. This resolves the issue for me now as I don't look at that construct in my visitor code

Thanks very much for this.

KvanTTT added the python3 label Feb 5, 2024

hasii2011 mentioned this issue Feb 5, 2024

Get updated Python lexer/parser hasii2011/pyutplugins#108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python 3.12 Peg Parser Grammar has a small problem #3957

Python 3.12 Peg Parser Grammar has a small problem #3957

hasii2011 commented Feb 5, 2024

kaby76 commented Feb 5, 2024

hasii2011 commented Feb 5, 2024

RobEin commented Feb 5, 2024

kaby76 commented Feb 5, 2024

hasii2011 commented Feb 5, 2024

Python 3.12 Peg Parser Grammar has a small problem #3957

Python 3.12 Peg Parser Grammar has a small problem #3957

Comments

hasii2011 commented Feb 5, 2024

kaby76 commented Feb 5, 2024

hasii2011 commented Feb 5, 2024

RobEin commented Feb 5, 2024

kaby76 commented Feb 5, 2024

hasii2011 commented Feb 5, 2024