Skip to content

xysun/regex

Repository files navigation

A regex engine in Python following Thompson's Algorithm. This will perform significantly better than the backtracking approach implemented in Python's re module on some pathological patterns.

I wrote a blog post about this project here

It has the same interface as Python's re module:

import regex
n = 20
p = 'a?' * n + 'a' * n
nfa = regex.compile(p)
input_string = 'a' * n
matched = nfa.match(input_string)
print(matched) # True

Currently it supports the following:

  • Repetition operators: * + ?
  • Parenthesis
  • Characters (no character sets)

regex.py is the interface, parse.py holds main implementation logic, sample.py shows some samples.

You can run python3 testing.py -v to ensure it passes all test cases in test_suite.dat

Performance

This regex engine underperforms Python's re module on normal inputs (using Glenn Fowler's test suite -- see below), however it outperforms significantly on pathological inputs.

normal pathological

Credits

  • Test suite is based on Glenn Fowler's regex test suites.

  • Russ Cox has an excellent collection of articles on regex.

About

Regular expression engine in Python using Thompson's algorithm.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages