Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable access to matched parts of a regular expression #229

Open
vincentbernat opened this issue Mar 18, 2022 · 9 comments
Open

Enable access to matched parts of a regular expression #229

vincentbernat opened this issue Mar 18, 2022 · 9 comments
Labels

Comments

@vincentbernat
Copy link
Contributor

Hey!

It would be nice to be able to access the parts matching a regex when using the matches operator. The captured parts could be assigned to some variables ($1, $2, etc) or to a special matched map indexed by the index and names of matched parts. All this could be done in a provided function, but putting it in the language allows one to get better performance as the regular expressions can be compiled at compile-time instead of at executing time.

@antonmedv
Copy link
Member

regular expressions can be compiled at compile-time instead of at executing time.

This is also really easy to do on user side: see ConstExpr

@antonmedv
Copy link
Member

antonmedv commented Nov 5, 2022

What about adding regexp() builtin? It can be used something like this:

regexp("f.+").FindAllString(str, -1)

@vincentbernat
Copy link
Contributor Author

As an example, I have a function ClassifySite("something") and a variant ClassifySiteRegex(Exporter.Name, "^([^-]+)-", "$1"). To leverage ConstExpr, I would need the user to wrap the regex into a function, so it would be inconvenient.

As for the builtin you propose, I suppose it returns an array. I was hoping for something a little more magic as it would mean to write something like this:

ClassifySite(regexp("^([^-]+)-").FindAllString(Exporter.Name)[0])

And what happens if there is no match? I would prefer something like:

Exporter.Name matches "^([^-]+)-" && ClassifySite($1)

But I would understand that you don't like such magic variables as they make the language non-pure.

@antonmedv
Copy link
Member

Exporter.Name matches "^([^-]+)-" && ClassifySite($1)

I actually like this idea! Neat! I think it’s understandable what is going on here.

@vincentbernat
Copy link
Contributor Author

Oh, great! It's how it is in Perl (I think, I don't remember exactly).

@meharo
Copy link

meharo commented Feb 23, 2024

Sorry to bump the old thread, here is a way to do this by extending expr.

// Declare a global cache for compiled regex (optional).
// Proctect it using lock as concurrent reads/writes may happen.
var (
        compiledRegex = make(map[string]*regexp.Regexp)
        mutex         sync.RWMutex
)

// This function may be called concurrently but it's local vars are safe.
func myFunc() {
        // Build the env map
        env := make(map[string]interface{})

        // reMatch holds the captured groups by regex if any.
        reMatch := make([]string, 0)
        // reFind() holds the closure function with access to var "env". This access is needed as you can see below.
        // reFind() returns true if succeesfully captured any groups. Else, false.
        reFind := func(input string, pattern string) bool {
                mutex.RLock()
                regex, exists := compiledRegex[pattern]
                mutex.RUnlock()

                if !exists {
                        var err error
                        regex, err = regexp.Compile(pattern)
                        if err != nil {
                                log.Error("Regex compile error:", err)
                                return false
                        }

                        mutex.Lock()
                        compiledRegex[pattern] = regex
                        mutex.Unlock()
                }

                // we store the captured groups if any.
                matches := regex.FindStringSubmatch(input)

                // we overwrite our captured strings slice to env["reMatch"] so that we can access the matches like reMatch[0] inside the expression.
                env["reMatch"] = &matches

                if matches == nil {
                        return false
                }

                return true
        }

        // This is where we set the initial empty slice to env["reMatch"]. This can be overwritten by the reFind() later though.
        env["reMatch"] = &reMatch
        // we map the closure function reFind() to env["reFind"] so that it is accessible as reFind(input, 'regex_pattern') in the expression.
        env["reFind"] = reFind

        //Compile, cache it, and run. Or just run.
        compiled, err := expr.Compile(exprString, expr.Env(env))
        result, err := vm.Run(compiled, env)
}

Now the expression can be written like:

"reFind(input_string, '^(..)') ? reMatch[0] : 'unknown'"

reMatch is overwritten once reFind() is called. We may also call reFind() multiple times in the same expression. Access what you want from reMatch soon after each call.

@antonmedv
Copy link
Member

Expr supports variables inside expressions now. They also can be used:

let matches = reFind(“…”); matches[0]

@PranavPeshwe
Copy link
Contributor

TFS, @antonmedv . Where could I have learnt about this? Any non-obvious document or code-sample that I should keep an eye on, to know of such updates?
Thanks.

@antonmedv
Copy link
Member

I post all changes to https://github.com/expr-lang/expr/releases
But I guess a dedicated blog post for release changes will be nice to have: https://expr-lang.org/blog

PS https://expr-lang.org/docs/language-definition#variables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants