Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support repetition qualifiers for closures #214

Open
mgrazebrook opened this issue Jul 15, 2021 · 2 comments
Open

Support repetition qualifiers for closures #214

mgrazebrook opened this issue Jul 15, 2021 · 2 comments

Comments

@mgrazebrook
Copy link

Could you support:
rule = {expression}{7} ;
or
rule = {expression}{2,5} ;

Example from the re syntax:
https://docs.python.org/3/library/re.html#regular-expression-syntax, search for "Repetition qualifiers"

I'm sometimes parsing log files, textified pdf, scanned docs or other things not designed to be parsed. One of the reasons I like TatSu for this is you can be sure you really understood the format within a section and can occasionally explain what you're doing to a non-programmer. In contrast when I do the same with regular expressions, I sometimes find myself silently skipping bits (and it's very hard to read!). Such formats often have fixed numbers of repetitions - and it's interesting to know if ones assumption always holds about the number of repetitions.

Also one sometimes gets cases where you have a repetitions followed by up to b repetitions followed by c repetitions where each group is of a different kind - possibly a harder case to manage.

rule = {int}{4} {int}{2,4} {int}{2} ;

Of course I can just measure the list length in semantics, but I feel this is more properly part of the grammar. So this is low priority.

@apalala
Copy link
Collaborator

apalala commented Jul 16, 2021

I think is this a good idea!

The syntax would have to be different, non regex-like, because TatSu already defines {} (and also () and []). There's already a lot of syntax around {}.

Perhaps it could be:

rule = {int}<4> {int}<2,4> {int}<2> ;

I think that TatSu only allows * after {}, so the new syntax could also be:

rule = int*4 int*2-4  (int string)*2 ;

We need to review the current syntax to choose a new one that makes the intention clear and doesn't collide with current semantics.

We should probably first provide an implementation, and decide about the syntax after.

@mgrazebrook
Copy link
Author

I just spent half an hour trying to find out what other syntaxes do and the only one I could find was 're'! To be fair, it's probably the only repetition qualifier most of your users know. And I understand you reason for rejecting it.

It may be necessary to constrain it so that a sequence of repetition qualifiers can only include one range. So:
rule = int*4 int*2:4 int*2:5 int*3
might not be allowed or might be formally determined so the LHS or RHS is greedy.

Did you notice I experimented with a colon in 2:4? I thought it had a more Pythonic flavour, though repetition isn't much like a slice. Of the two you offer, I mostly like the latter but found the '-' sign grated a little because my mind needs it to be subtraction. Too bad elipsis isn't on a standard keyboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants