Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional rewrite #196

Open
jehna opened this issue Jun 27, 2019 · 9 comments · May be fixed by #198
Open

Functional rewrite #196

jehna opened this issue Jun 27, 2019 · 9 comments · May be fixed by #198

Comments

@jehna
Copy link
Contributor

jehna commented Jun 27, 2019

I've been thinking about re-writing JSVerbalExpressions to use function composition rather than the builder-like pattern it has now.

So now the README.md describes a simple example for using VerbalExpressions as such:

const tester = VerEx()
    .startOfLine()
    .then('http')
    .maybe('s')
    .then('://')
    .maybe('www.')
    .anythingBut(' ')
    .endOfLine();

This can be described as a builder-like extension for the native RegExp object; you can chain the expression and add more stuff to "build" a complete regular expression.

This is very clear approach for building simple, "one-dimensional" regular expressions. The problem with current implementation starts to surface when we start doing more complicated stuff like capture groups, lookaheads/behinds, using "or" pipe etc makes the expression quickly grow out of maintainability and readability.

For example, I think something like this is impossible to implement with VerbalExpressions at the moment:

/^((?:https?:\/\/)?|(?:ftp:\/\/)|(?:smtp:\/\/))([^ /]+)$/

To make it simpler, I'm proposing a 2.0 rewrite of VerbalExpressions that would take a functional approach, something like:

VerEx(
  startOfLine,
  "http",
  maybe("s"),
  "://",
  maybe("www."),
  anythingBut(" "),
  endOfLine
)

Motivation for this approach would be:

  • We can split regular expressions into multiple variables
    • Naming "sub-expressions" allows better naming, different abstraction levels in regular expressions
    • Each small part is testable with unit tests
  • Makes grouping explicit (enforce closing an opened capture group)

So the simplest example could be something like this:

const regex = VerEx(
  startOfLine,
  "http",
  maybe("s"),
  "://",
  maybe("www."),
  anythingBut(" "),
  endOfLine
);

And the complex example could be written e.g. like this:

VerEx(
  startOfLine,
  group(
    or(
      concat("http", maybe("s"), "://", maybe("www.")),
      "ftp://",
      "smtp://"
    )
  ),
  group(anythingBut(" /"))
);

While this looks a bit more complex, we can more easily split it up and name things:

const protocol = or(concat("http", maybe("s"), "://"), "ftp://", "smtp://");
const removeWww = maybe("www.");
const domain = anythingBut(" /");
const regex = VerEx(startOfLine, group(protocol), removeWww, group(domain));

This way we could test all of those "sub-expressions" (variables) in isolation.

@jehna
Copy link
Contributor Author

jehna commented Jun 27, 2019

Some examples where compositional/functional patterns has been used:

@shreyasminocha
Copy link
Member

Huh. Interesting.

@shreyasminocha
Copy link
Member

So for something like:

VerEx(
  startOfLine,
  "http",
  maybe("s"),
  "://",
  maybe("www."),
  anythingBut(" "),
  endOfLine
)

… would the import statement look like one of the following:

import { VerEx, startOfLine, maybe, anythingBut, endOfLine } from verbal-expressions;
import * from verbal-expressions;

A bit concerned about global scope pollution…

@jehna
Copy link
Contributor Author

jehna commented Jun 27, 2019

ES module/TypeScript imports would look like this:

import { VerEx, startOfLine, maybe, anythingBut, endOfLine } from 'verbal-expressions'

On node.js require you can use:

const { VerEx, startOfLine, maybe, anythingBut, endOfLine } = require('verbal-expressions')

If we want to still support global browser scripts, then a common practice with this kind of libraries (e.g. Ramda, lodash) is to use a short single-character namespace. We could namespace with V or ve. In that case you would use the library as:

V.VerEx(
  V.startOfLine,
  "http",
  V.maybe("s"),
  "://",
  V.maybe("www."),
  V.anythingBut(" "),
  V.endOfLine
)

@shreyasminocha
Copy link
Member

Sounds good.

I'd like to help out with this. How do we work this out?

@jehna
Copy link
Contributor Author

jehna commented Jun 28, 2019

I can create a POC draft pull request to show a couple of ideas, and we can iterate from that. Does that sound good?

@shreyasminocha
Copy link
Member

Sure.

@shreyasminocha
Copy link
Member

shreyasminocha commented Jun 29, 2019

@jehna How about I create a 2.0.0 branch and write some failing tests while you build your proof of concept?

@jehna
Copy link
Contributor Author

jehna commented Jul 20, 2019

Ok, so I did some work that I'd like to show you:
#197

@shreyasminocha shreyasminocha pinned this issue Oct 25, 2019
@shreyasminocha shreyasminocha linked a pull request Oct 25, 2019 that will close this issue
@shreyasminocha shreyasminocha linked a pull request Feb 10, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants