Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Inheritance on Grammars and Visitors, enhance Visitor Definition #69

Open
bitranox opened this issue Nov 26, 2019 · 2 comments
Open

Comments

@bitranox
Copy link

bitranox commented Nov 26, 2019

I love Arpeggio, but now when really working with it, I would like to have certain features :

since I do Parsers for many different Formats, there are some Grammars which are repeating over and Over again - so i want to put them into a class, inherit them, monkey patch them etc. - thats working at the moment like this :

class GrammarBase(object):
    grammar = arpeggio.ParsingExpression()
    whitespace = '\t '

    class Visitor(arpeggio.PTNodeVisitor):
        pass


class GrammarBasic(GrammarBase):
    # PROBLEM1 - it works only with staticmethods
    @staticmethod
    def grammar_basic_1():
        return arpeggio.RegExMatch(r"...")

    @staticmethod
    def grammar_basic_2():
        return arpeggio.RegExMatch(r"...")

    # grammar is the default I use - like in pypeg2
    @staticmethod
    def grammar():
        return arpeggio.Sequence(arpeggio.RegExMatch(r"..."), GrammarBasic.grammar_basic_1, GrammarBasic.grammar_basic_2)


    class Visitor(GrammarBase.Visitor):
        # PROBLEM2 - the prefix "visit_" is really useless here
        def visit_grammar_basic_1(self, node, children):
            return <something useful>        
        def visit_grammar_basic_2(self, node, children):
            return <something useful>        

class GrammarSpecific1(GrammarBasic): 
    # override the default whitespace with a specific one
    whitespace = '\t \n'
    @staticmethod
    def grammar_specific_1():
        return arpeggio.Sequence(GrammarBasic.grammar_basic_1, GrammarBasic.grammar_basic_2)

    # grammar is the default I use - like in pypeg2
    @staticmethod
   def grammar():
        return arpeggio.Sequence(arpeggio.RegExMatch(r"..."), GrammarSpecific1.grammar_specific_1, GrammarBasic.grammar_basic_1)


    class Visitor(GrammarBasic.Visitor, <and maybe some more>):
        def visit_grammar_specific_1(self, node, children):
            return <something useful>
        # PROBLEM3 - Visitor gets confused, because it does not know
        # which one to use - GrammarBasic.grammar or GrammarSpecific1.grammar
        # so this is not possible !
        # def visit_grammar(self, node, children):
        #     return <something useful>

def parse_file(path_file: Union[str, pathlib.Path], grammar: GrammarBase):
    """
    >>> read_file(path_file='/some_file.txt', grammar=GrammarSpecific1())
    """

    with open(str(path_file), 'r') as data_file:
        string_data = data_file.read()
    parser = arp.ParserPython(grammar.grammar, ws=grammar.whitespace)
    parse_tree = parser.parse(string_data)
    data = arp.visit_parse_tree(parse_tree, grammar.Visitor())
    return data

So my proposals are :

  1. do not only store the function name of grammar definitions in the parse nodes, but the Object specification - so we dont get name confusions. At the moment You can not use the same names in GrammarBasic and GrammarSpecific.

  2. get rif of the visit_ prefix in the Visitor Class - its useless. You can look out for both for backward compatibility (with deprecation warning)

  3. Grammars should be able to be defined as standard methods of the class, not as @staticmethod

  4. Visitors definition should be better like :

class Visitor(arpeggio.PTNodeVisitor):
    # if grammar "grammarobject" like class.method is hit, some_name gots called
    def some_name(self, grammarobject, node, children ):
        return <something useful>

So the name of the Visitor.method can be chosen as You like, and the correct method of the Visitor should be called, even it is inherited from another Visitor super().....

In short :
grammars defined in Classes should not collide, even if they have the same name as a method in their Base Class
Visitors definition should be not ambiguouse , by not using the Method Name, but the rule object to define/map it.

I hope I made myself clear, here an example :

class Visitor(arpeggio.PTNodeVisitor):
    # if grammar "grammarobject" like class.method is hit, some_name gots called
    def some_name(self, grammarobject1, node, children ):
        return <something useful>

    def some_name2(self, grammarobject2, node, children ):
        return <something useful>

class Visitor2(Visitor):
    def some_name3(self, grammarobject3, node, children ):
        return <something useful>

    # on grammarobject4, Visitor2.some_name should be called
    def some_name(self, grammarobject4, node, children ):
        return <something useful>

    # on grammarobject1, Visitor2.some_name5 should be called,
    # because it is overwritten with that definition
    def some_name5(self, grammarobject1, node, children ):
        return <something useful>

what do You developers think - any chances for that ?
do You think its a bad/good idea, and why ?
Maybe there are easier ways and I use it wrong ?
yours sincerely and
желим вам пријатан дан Igor !
Robert

@igordejanovic
Copy link
Member

igordejanovic commented Nov 27, 2019

Hi Robert,

Thanks for the suggestions. I think that many of what you proposed make sense. Grammar/Visitor inheritance is definitely something that would be useful. Also I agree that visit_ prefix in Visitor method names is something that we could get rid of.

I'm not sure that I'm following your thoughts on point 4. What would be the purpose of grammarobject<x> param in the Visitor calls? Are you proposing to select the right method based on the name of the param?

At first thought, one way to attack on this would be to create a new parser class. Like we have now ParserPython and ParserPEG we could introduce ParserClass/ParserObject which would be a class based grammar definition with inheritance. That way we should stay mostly backward compatible.

I'll leave this issue open as a feature request so will play with it when find some time or if you or someone from the community wants to contribute please comment here.

@bitranox
Copy link
Author

bitranox commented Nov 27, 2019

Dear Igor,
yes, I agree, another ParserClass/ParserObject definitely makes sense.

for point 4 - I tried to archive that Visitors can be subclassed, and that the name of the methods do not collide.
The names should have nothing to do with the method called - that means somehow we need to make a mapping between the grammar, and the visitor method to call.
My first suggested approach was to put the grammar object as parameter . but I am not sure how to build a hashed list with that, which Visitor Method to be called when a grammar is found in the peg-tree.

maybe something like that :

# the new classed Visitor  PTNodeClassVisitor ;-)
class Visitor(arpeggio.PTNodeClassVisitor):
    # the mapping between grammars and Visitor Methods - so we can choose 
    # the visitor method names as we like - and we can change the mapping as we like
    def visitor_mapping(self):
        map=dict()
        map[GrammarClass1.grammarmethod1] = self.some_name
        map[GrammarClass1.grammarmethod2] = self.some_name2
        return map

    def some_name(self, node, children ):
        return <something useful>

    def some_name2(self,node, children ):
        return <something useful>

class Visitor2(Visitor):
    def visitor_mapping(self):
        map = super().visitor_mapping()  # get the mappings of the parent class
        map[GrammarClass2.grammarmethod3]=self.some_name3
        map[GrammarClass2.grammarmethod4]=self.some_name
        map[GrammarClass1.grammarmethod1]=self.some_name5 # overwrite a mapping from the parent class
        return map

    def some_name3(self, node, children ):
        return <something useful>

    # even if the method has the same name as in Class "Visitor", it will not collide,
    # because in the mapping dict we have: 
    # Visitor.somename and Visitor1.some_name mapping dict values
    def some_name(self, node, children ):
        return <something useful>

    def some_name5(self, node, children ):
        return <something useful>

I think that can be a good solution, what do You think ?
хвала на труду, игор
искрено Ваш
Robert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants