Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying names for capture groups #30

Open
adam-beneschan opened this issue Jul 27, 2013 · 10 comments
Open

Specifying names for capture groups #30

adam-beneschan opened this issue Jul 27, 2013 · 10 comments

Comments

@adam-beneschan
Copy link

Following up on go-oleg's comment on issue #17: I tried adding his suggestion to add names to capture groups. I've got something working but wanted to run my ideas past others first. Please note that this would be the first time I've ever contributed anything on github. Here's what I came up with:
beginCapture(group): The group parameter is optional. If it's an integer (or a string of all digits), the value must match the actual group number that would be assigned to the group (in JavaScript this would be the index into the array returned by regexp.exec()) or else it's an error. If it's some other string, it's a group name that's assigned to the capture group.
endCapture(group): The group parameter is optional; if present it must match the parameter of the corresponding beginCapture().
groupNumberFor(group): would be used after the expression is built to return the index number. E.g.
var tester = VerEx()
.startOfLine()
.beginCapture( "one" )
.then( "http" )
.maybe( "s" )
.endCapture( "one" )
.then( "://" )
.maybe( "www." )
.anythingBut( " " )
.endOfLine();
var testMe = "https://www.google.com";
var a = tester.exec(testMe);
a [tester.groupNumberFor( "one" )]) -- returns "https"

I'm not sure how to handle the errors, though (mismatched group name or number). I've made them throw exceptions, but if this is the wrong way to go I can change it. Again, I'm really new at this so any suggestions are appreciated.

@crawfordcomeaux
Copy link
Contributor

I started working on this, too. I'll take a look at your code in a second, but I recommend digging into the source for XRegExp (https://github.com/slevithan/xregexp). It augments js regexes with a lot of functionality found in other languages, including named captures.

@crawfordcomeaux
Copy link
Contributor

I think the way xregexp handles named captures makes sense, specifically:

  • bare integers aren't allowed, though they can be used as string names
  • named captures can be back-referenced
  • named captures can be referenced via results.namedGroup

The way it's implemented is by replacing the native replace & exec functions. I was going to start adapting that code for use here tomorrow night, but you're on the right track, so feel free to keep at it! I'll fetch any new commits made to your fork & pick it up where ever you leave off, unless you knock it all out.

@crawfordcomeaux
Copy link
Contributor

Also, I'm not sure if requiring a group name in endCapture() is necessary or worth doing. It feels like it might add a little more cognitive load to anyone using the library, since it requires them to track which group is being closed, though I can also see it being helpful seeing the group names when examining the code.

It might be worth creating a separate pull request just for that aspect of the named captures that can be held on to while verex gains more traction/usage. If the need/desire for supporting names in endCapture emerges, the PR can be merged in at that point.

@adam-beneschan
Copy link
Author

@crawfordcomeaux I wasn't planning to require a group name in endCapture()--just allow it if someone wants the additional sanity check in their code. (I've found that in more complex regexes with nested groups, it's easy to get something wrong.) I'll look at xregexp as you suggested--thanks! GitHub newbie question: when you said "I'll take a look at your code", do you mean you already found it in my fork? Or do I need to do something else first?

@crawfordcomeaux
Copy link
Contributor

I already cloned your fork & took a peek at things, though I haven't tried using it yet. In the future, as a courtesy, you could link to what you want feedback on. I'm not sure if there's a magical autocomplete/auto-link syntax for doing that, but check out the links I added in #32 for git/Github protips.

As for optionally passing group names to endCapture, you've convinced me 👍

@barneycarroll
Copy link

I think having distinct chainable methods for 'start' and 'end' is a bad idea — there's a potential for missing the 'end', or introducing one too many, that is the kind of problem that RegEx suffers from in its illegibility and VerEx supposedly solves.

How about group('name', VerEx.find('stuff')) (where the first argument is optional)?

This is more flexible (you can predefine VerEx / RegExs for groups), more legible (the group is cleanly separated), and makes invalid RegEx construction impossible (throws a JS syntax error on parse instead of on execution).

@crawfordcomeaux
Copy link
Contributor

Makes a lot more sense to me. +1

On Wed, Aug 7, 2013 at 3:15 AM, Barney Carroll notifications@github.com
wrote:

I think having distinct chainable methods for 'start' and 'end' is a bad idea — there's a potential for missing the 'end', or introducing one too many, that is the kind of problem that RegEx suffers from in its illegibility and VerEx supposedly solves.
How about group('name', VerEx.find('stuff')) (where the first argument is optional)?

This is more flexible (you can predefine VerEx / RegExs for groups), more legible (the group is cleanly separated), and makes invalid RegEx construction impossible (throws a JS syntax error on parse instead of on execution).

Reply to this email directly or view it on GitHub:
#30 (comment)

@adam-beneschan
Copy link
Author

I agree, and that's something that occurred to me also. The same thinking could apply to other functions, particularly "or". Writing a regex for something like /a.*|b[012]c|de/ looks something like

VerEx().find("a")
           .anything()
           .or()
           .find("b")
           .anyOf("012")
           .then("c")
           .or()
           .find("de");

which I guess is OK, but I think something like this might be clearer, since the different alternatives are grouped together clearly:

VerEx().oneOf (VerEx().find("a").anything(),
VerEx().find("b").anyOf("012").then("c"),
VerEx().find("de"));

Or not--I don't know. It's annoying that every argument would have to be prefixed with VerEx(). I tried to think of a way to set up a scope (perhaps in a nested function) in which the functions like "find", "anyOf", etc., could be used at the top level without a prefix, but I couldn't come up with one. (Maybe it's possible with eval()...)

Anyway, doing things without chaining may be necessary to write certain kinds of regex's. I'm not sure it's currently possible with VerbalExpressions to write something like /(?:[A-Za-z][A-Za-z0-9]+, *)+/, i.e. putting a quantifier on a group, even with a non-capture group. Maybe there is a way, and I just haven't figured it out.

@zakdances
Copy link

Instead of assigning string names to capture groups, how about just defining them as separate VerEx objects and allow them to be passed around as arguments? Any update on this issue?

@shreyasminocha
Copy link
Member

This is 6 years later, but support for ES2018 named capture groups is coming in 2.0.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants