Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for injecting extra rules written in Python into PEG grammars #29

Open
ninmesara opened this issue Oct 5, 2016 · 5 comments
Open

Comments

@ninmesara
Copy link

ninmesara commented Oct 5, 2016

It would be useful to be able to inject rules written as python functions into PEG grammars.
This would accomplish two things:

  1. Greater portability for libraries. I could publish a library with python functions which anyone could use regardless of whether they're using the peg, cleanpeg or python parsers. Python functions, although more cumbersome to write, are more composable.
  2. It would allow allow the user to write special rules able to respect whitespace in PEG files, while skipping whitespace in the rest of the rules. I believe this is currently impossible without rewriting the whole grammar in Python.

I'd suggest the following API:

from lib.external import rule1, rule2
from arpeggio.cleanpeg import ParserPEG
parser = ParserPEG(calc_grammar,
    "calc", 
    extra_rules={'rule_name1': rule1,  'rule_name2': rule2})

The user could then use 'rule_name1' and 'rule_name2' in the file, and the rules would be automatically resolve. There might be a problem with name clashes between user defined rules and inner rules defined by the external functions, though. I'm not familiar enough with Arpeggio's internals to be sure.

@igordejanovic
Copy link
Member

igordejanovic commented Oct 7, 2016

I'm planing a more general approach for parser composability.

Something like this:

from lib.external import rule1, rule2
from arpeggio import GrammarPython, GrammarPEG, GrammarCPEG, Parser

...
parser = Parser(GrammarPython(calc), GrammarPEG(calc_override_in_peg),
                GrammarPython(rule1, rule2), GrammarCPEG(clean_peg_addition))

Grammar* callables will know how to read grammar written in different styles and
transform it to internal grammar representation which is known to Parser class.
Parser will do grammar composition and full resolving using some predetermined
override rule (e.g. rules that come later in the grammar list will override
former rules with the same name).

In this approach you could mix and match grammars using different styles. E.g.,
you could do the override in PEG or in clean PEG or in some other form. You
could write your own Grammar* wrapper and specify grammar how you see fit and
still be able to compose with other grammars.

Grammars could be incomplete, i.e. rules could reference unexisting rules thus
providing a kind of extension points. Of course, when forming a final parser all
the rules must be available.

Additionally, in the list of the grammars you shall be able to use
ParsingExpressions directly thus enabling work in a parser combinator style.

All this stuff require some non-trivial changes to the core though.

@ninmesara
Copy link
Author

It sounds excelent! Although I like the possibility of refering to rules of
a different grammar, I think there shoud be a "blackbox" option that allows
you to hide the inner rules of a grammar. This way you could use rules
written by different authors without worrying about name collisions.

Anyway, thanks for writing Arpeggio and making it available for free. It's
a great library and the documentation is among the best I've ever read.

On Friday, 7 October 2016, Igor Dejanović notifications@github.com wrote:

I'm planing a more general approach for parser composability.

Something like this:

from lib.external import rule1, rule2from arpeggio import GrammarPython, GrammarPEG, GrammarCPEG
...
parser = Parser(GrammarPython(calc), GrammarPEG(calc_override_in_peg),
GrammarPython(rule1, rule2), GrammarCPEG(clean_peg_addition))

Grammar* callables will know how to read grammar written in different
styles and
transform it to internal grammar representation which is known to Parser
class.
Parser will do grammar composition and full resolving using some
predetermined
override rule (e.g. rules that come later in the grammar list will override
former rules with the same name).

In this approach you could mix and match grammars using different styles.
E.g.,
you could do the override in PEG or in clean PEG or in some other form. You
could write your own Grammar* wrapper and specify grammar how you see fit
and
still be able to compose with other grammars.

Grammars could be incomplete, i.e. rules could reference unexisting rules
thus
providing a kind of extension points. Of course, when forming a final
parser all
the rules must be available.

Additionally, in the list of the grammars you shall be able to use
ParsingExpressions directly thus enabling work in a parser combinator
style.

All this stuff require some non-trivial changes to the core though.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#29 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/APSENYWbMe5f4Y9nK3eJp_DQor0IIOdYks5qxsv0gaJpZM4KOxM7
.

@vuvova
Copy link

vuvova commented Feb 20, 2018

Just FYI, this is how I did it — https://github.com/vuvova/gdb-tools/blob/64a9280/duel/parser.py

The main grammar starts from line 52, note the token cast in the line 72. And see above how it's created as a separate Arpeggio parser which later tries line 26, and it that succeeds the token matches, otherwise it doesn't match.

It'd be cleaner to inherit from Match, not to monkey-patch it, but Arpeggio doesn't allow it at the moment.

@igordejanovic
Copy link
Member

Thanks. It would indeed be better if new Match inherited class is used. What do you get if you try to inherit? I haven't tried something myself but it should generally work, or at least it should be easily fixable if it doesn't work at the moment. I looked into implementation of parser construction and general Match inherited class instances should be handled at this line.

@vuvova
Copy link

vuvova commented Feb 20, 2018

May be I used an older version? There was isinstance(..., Match), as far as I remember.

You can try to inherit with a dummy class, like

class MatchChild(Match)
    pass

and see where it won't work. It should be easily fixable, I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants