Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to specify repetition count (like in regexps) #30

Open
ghost opened this issue Aug 11, 2011 · 22 comments
Open

Ability to specify repetition count (like in regexps) #30

ghost opened this issue Aug 11, 2011 · 22 comments
Labels
Milestone

Comments

@ghost
Copy link

ghost commented Aug 11, 2011

It would be helpful if the PEG.js grammar allowed something like range expressions of POSIX basic regular expressions to be used. E.g.:

  • "a"\{1,7\}
    

    matches a, aa, ..., aaaaaaa

  • "a"\{0,1\}
    

    matches the empty string and a

  • "a"\{,6\}
    

    matches a string with up to (and including) six a's

  • "a"\{6,\}
    

    matches a string of six or more a's

  • "a"\{3\}
    

    matches only aaa, being equivalent to "a"\{3,3\}

@dmajda
Copy link
Contributor

dmajda commented Aug 12, 2011

I will not implement this feature.

The main reason is that there is no room in the PEG.js grammar for the {m,n} syntax — braces are already taken for actions and I don't want to use backslashes as you suggest (they are ugly and not compatible with Perl regexps which are the most used ones now and also source of other PEG.js syntax) or other delimiters (that would be confusing).

In my experience this kind of limited repetition occurs mainly on the "lexical" parts of the grammar (rules like color = "#" hexdigit hexdigit hexdigit hexdigit hexdigit hexdigit) and not that often. I thinks it's OK to just use sequences of expressions and existing repetition operators (*, +, ?) there.

@dmajda dmajda closed this as completed Aug 12, 2011
@dmajda dmajda reopened this Jan 15, 2012
@dmajda
Copy link
Contributor

dmajda commented Jan 15, 2012

I've reconsidered and I am reopening this issue. It seems that ability to specify arbitrary number of repetitions is wanted a lot by users.

I'd like to avoid regexp-like {m,n} syntax because { and } are already taken for actions and re-using them would create ambiguity. I am currently thinking about something like this:

"foo" @ 1..10   // repeat 1 to 10 times
"foo" @ 1..     // repeat at least once
"foo" @ ..10    // repeat at most 10 times

The biggest question is what the separating character(s) should be and how to mark up ranges.

As for the separating character, @ seems nice to me. I was considering % and #, but in my mind the first one is already associated with string interpolation (e.g. in Python) and the second one with comments (in various languages). I am also thinking about skipping the separator entirely:

"foo" 1..10   // repeat 1 to 10 times
"foo" 1..     // repeat at least once
"foo" ..10    // repeat at most 10 times

As for the range markup, I took inspiration in Ruby. I was also thinking about -, but it looks too much like a minus sign. On the other hand, Python-like : looks also nice to me.

I am not sure about half-open ranges. Maybe it would be better to mark them up using + and - like this:

"foo" @ 1+    // repeat at least once
"foo" @ 10-   // repeat at most 10 times

Any ideas or comments?

@ghost ghost assigned dmajda Jan 15, 2012
@izuzak
Copy link

izuzak commented Jan 16, 2012

Really cool that you plan to support this feature!

I like your (default) suggestion:
"foo" @ 1..10 // repeat 1 to 10 times
"foo" @ 1.. // repeat at least once
"foo" @ ..10 // repeat at most 10 times

I don't like the +/- syntax for half-open ranges, the double-dot syntax is much more intuitive and readable IMO.

The only thing I had second thoughts about was using "#" vs "@", because IMO "#" naturally implies numbers/counting, whereas "@" naturally implies a reference, so "#" may be a bit more intuitive and readable (and perhaps you could use the "@" in the future for something?). But that's really a minor issue, and I would be happy with the "@" syntax.

Cheers!

@ghost
Copy link
Author

ghost commented Jan 16, 2012

Just a quick comment: I think that @ and % are better choices than # because syntax highlighters that do not support the PEG.js grammar, especially those that attempt to guess the syntax (e.g. Stack Overflow's code highlighter), will likely interpret # as the start of a comment, causing it to be shown—annoyingly—from that point until EOL in the "comment color". This is not a preference based on logic and reasoning, of course, but on pragmatism.

@curvedmark
Copy link

How about we special case for {num, num} alike? Which WILL mean repetition, since { , num} and { num, } aren't valid js code, and {num, num} and { num } are pointless.

They aren't likely to be meaningful even if the action is of other languages.

@shamansir
Copy link

I like these variants among suggested (but this is up to you of course to choose, since you're the author :) ):

// why we need separator, anyway? for me it looks very cool and simple to understand
"foo" 1..10   // repeat 1 to 10 times
"foo" 1..     // repeat at least once
"foo" ..10    // repeat at most 10 times

or

"foo"@1..10   // repeat 1 to 10 times
"foo"@1..     // repeat at least once
"foo"@..10    // repeat at most 10 times

but the second is less preferable

the x..y / ..y / x.. idea looks very cool, since .. looks as consistent operator thanks to it.

+/- are not ok as for me, because they confuse and become the additional operators above the .. (and + is already used)

@curvedmark
Copy link

Thinking about it again. Will these work?

'foo'<1,5>
'foo'< ,3>
'foo'<2, >

since < and > are currently unused by the grammar

@otac0n
Copy link

otac0n commented Sep 15, 2012

👍 from me, that looks good.

of course, <,3> is equivalent to <0,3>, so we may as well just require the min number. This would be congruent with what ECMA has done for JavaScript regular expressions.

@dignifiedquire
Copy link

I like the <,>. But I would also suggest the use of <3> being the same as <3,3>.

@otac0n
Copy link

otac0n commented Sep 18, 2012

I agree, the <> syntax should map directly to the behavior of {} in RegExp as much as possible.

@pygy
Copy link

pygy commented Oct 11, 2012

If I'm not mistaken, there's no need to add any delimiter, unless you want to allow variable names in the ranges.

foo 1,2 fighter
bar ,3 tender
baz 4, lurhmann
qux 5 quux

are all unambiguous.

@otac0n
Copy link

otac0n commented Oct 12, 2012

@pygy, the problem with not using a delimiter is that it potentially stifles evolution of the syntax of the language.

For example, if we wanted to use comma for something else later on down the road, we would now have issues with syntax collisions all over the place. Constraining it to within <> brackets reduces the surface area of commas and numbers substantially.

Plus, people are used to using the {1,6} style in RegExps anyways.

@rgrove
Copy link

rgrove commented Jan 23, 2013

I don't feel strongly about the syntax, but I do want this feature, and it'd be great if an expression could be used as a range value.

My use case: parsing literals in IMAP server responses, which look like {42}\r\n..., where 42 is the number of characters after the newline that represent a string (shown here as an ellipsis). Since there's no ending delimiter for an IMAP literal, character counting is the only way to parse this response.

@Mingun
Copy link
Contributor

Mingun commented Jun 18, 2013

How about variables in restrictions? This is very useful for messages with header, containing its length. For example, grammar

start
  = len:number message:.<len,len> .* {return message;}
number
  = n:[0-9] {return parseInt(n);}

must parse

4[__] -> ['[', '_', '_', ']']
4[___] -> ['[', '_', '_', '_']
4[_] -> Error: expected 4 chars, got 3

This is useful for many protocols.

@Mingun
Copy link
Contributor

Mingun commented Jun 21, 2013

May be use that syntax:
expression |min,max|, then angle brackets can be use for template rules.

@otac0n otac0n mentioned this issue Jul 3, 2013
Mingun added a commit to Mingun/pegjs that referenced this issue Sep 18, 2013
Mingun added a commit to Mingun/pegjs that referenced this issue Sep 18, 2013
Mingun added a commit to Mingun/pegjs that referenced this issue Sep 18, 2013
Mingun added a commit to Mingun/pegjs that referenced this issue Sep 19, 2013
Mingun added a commit to Mingun/pegjs that referenced this issue Sep 20, 2013
Mingun added a commit to Mingun/pegjs that referenced this issue Sep 20, 2013
@Mingun Mingun mentioned this issue Sep 20, 2013
Mingun added a commit to Mingun/pegjs that referenced this issue Sep 22, 2013
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 1, 2014
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 1, 2014
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 1, 2014
Mingun added a commit to Mingun/peggy that referenced this issue May 29, 2022
Mingun added a commit to Mingun/peggy that referenced this issue May 29, 2022
Mingun added a commit to Mingun/peggy that referenced this issue May 29, 2022
Mingun added a commit to Mingun/peggy that referenced this issue May 29, 2022
Mingun added a commit to Mingun/peggy that referenced this issue May 29, 2022
…s and regenerate parser and add documentation
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
```
expression|  exact |
expression|   ..   |
expression|min..   |
expression|   ..max|
expression|min..max|
```

Introduce two new opcodes:
* IF_LT <min>, <then part length>, <else part length>
* IF_GE <max>, <then part length>, <else part length>

Introduce a new AST node -- `repeated`, that contains expression and the minimum and maximum number of it repetition.
If `node.min.value` is `null` or isn't positive -- check of the minimum length isn't made.
If `node.max.value` is `null`, check of the maximum length isn't made.
If `node.min` is `null` then it is equals to the `node.max` (exact repetitions case)
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
Added two new opcodes:
- IF_LT_DYNAMIC: same as IF_LT, but the argument is a reference to the stack variable instead of constant
- IF_GE_DYNAMIC: same as IF_GE, but the argument is a reference to the stack variable instead of constant
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
Mingun added a commit to Mingun/peggy that referenced this issue Jun 11, 2022
Mingun added a commit to Mingun/peggy that referenced this issue Feb 19, 2023
```
expression|  exact |
expression|   ..   |
expression|min..   |
expression|   ..max|
expression|min..max|
```

Introduce two new opcodes:
* IF_LT <min>, <then part length>, <else part length>
* IF_GE <max>, <then part length>, <else part length>

Introduce a new AST node -- `repeated`, that contains expression and the minimum and maximum number of it repetition.
If `node.min.value` is `null` or isn't positive -- check of the minimum length isn't made.
If `node.max.value` is `null`, check of the maximum length isn't made.
If `node.min` is `null` then it is equals to the `node.max` (exact repetitions case)
Mingun added a commit to Mingun/peggy that referenced this issue Feb 19, 2023
Added two new opcodes:
- IF_LT_DYNAMIC: same as IF_LT, but the argument is a reference to the stack variable instead of constant
- IF_GE_DYNAMIC: same as IF_GE, but the argument is a reference to the stack variable instead of constant
Mingun added a commit to Mingun/peggy that referenced this issue Feb 19, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 19, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 19, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 19, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 19, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 19, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 21, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 21, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 21, 2023
Mingun added a commit to Mingun/peggy that referenced this issue Feb 21, 2023
hildjj added a commit to hildjj/peggy that referenced this issue Feb 21, 2023
* main: (104 commits)
  Audit CHANGELOG.md
  Release prep
  Update dependencies
  Ranges (pegjs/pegjs#30): Add documentation, examples and changelog entry
  Ranges (pegjs/pegjs#30): Add testcases for delimiter support in ranges and regenerate parser
  Ranges (pegjs/pegjs#30): Add support for delimiters in ranges
  Ranges (pegjs/pegjs#30): Add testcases for ranges with function boundaries and regenerate parser
  Ranges (pegjs/pegjs#30): Add ability to use code blocks as range boundaries
  Ranges (pegjs/pegjs#30): Add testcases for ranges with dynamic boundaries and regenerate parser
  Ranges (pegjs/pegjs#30): Add ability for use labels as range boundaries
  Ranges (pegjs/pegjs#30): Add testcases for ranges and regenerate parser
  Ranges (pegjs/pegjs#30): Implement ranges support. Range syntax: ``` expression|  exact | expression|   ..   | expression|min..   | expression|   ..max| expression|min..max| ```
  Typo
  Update the testTimeout, so Windows doesn't fail on slow-ass CI hardware
  Fix rollup issues with web tests
  Update deps in dependent projects as well
  Add changelog entry for updating node version
  BREAKING: update min node version to 14, because of jest.
  Update package-lock, using npm install --legacy-peer-deps to get around @rollup/plugin-node-resolve issue
  Update dependencies, make small changes to accomodate, re-build.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests