Ruler

Ruler is a lightweight regular expressions wrapper aiming to make regex definitions more modular, intuitive, readable and the mismatch reporting more informative.

Installation

pip install ruler

Quick start

Let's implement the following grammar, given in EBNF:

grammar = who, ' likes to drink ', what;
who = 'John' | 'Peter' | 'Ann' | 'Paul' | 'Rachel';
what = tea | juice;
juice = 'juice';
tea = 'tea', [milk];
milk = ' with milk';

Using ruler it looks almost identical to EBNF:

>>> class Morning(Grammar): ... who = OneOf('John', 'Peter', 'Ann', 'Paul', 'Rachel') ... juice = Rule('juice') ... milk = Optional(' with milk') ... tea = Rule('tea', milk) ... what = OneOf(juice, tea) ... grammar = Rule(who, ' likes to drink ', what, '.') ... ... morning = Morning.create()

A member named grammar must be always present - it acts as the start rule. Let's begin rather with a mismatch:

>>> morning.match('John likes to drink coffee') False

match() returns True if the match was successful and False otherwise. One of the major advantages of ruler, as opposed to working directly with regular expressions, is the ability to know exactly what went wrong:

>>> print(morning.error.long_description) Mismatch at 20: John likes to drink coffee ^ "coffee" does not match "juice" "coffee" does not match "tea"

Let's fix our text:

>>> morning.match('John likes to drink tea.') True

Any rule that is declared as a member variable of your grammar class acts as a named capture group arranged hierarchically. Use matched attribute to retrieve the text matched by a specific rule:

>>> morning.matched 'John likes to drink tea.' >>> morning.who.matched 'John' >>> morning.what.matched 'tea'

Branches of OneOf rules that didn't match and optional rules that didn't match have None as their values making it easy to ask whether they matched:

>>> morning.what.juice.matched is None True >>> morning.what.tea.matched is None False >>> morning.what.tea.milk.matched is None True

Rules can be reused multiple times. If the same rule appears multiple times under the same parent, these rules are collected into a list:

>>> class Morning(Grammar): ... person = OneOf('John', 'Peter', 'Ann', 'Paul', 'Rachel') ... who = Rule(person, Optional(', ', person), Optional(' and ', person)) ... juice = Rule('juice') ... milk = Optional(' with milk') ... tea = Rule('tea', milk) ... what = OneOf(juice, tea) ... grammar = Rule(who, ' like', Optional('s'), ' to drink ', what, '.') ... ... morning = Morning.create() ... morning.match('Peter, Rachel and Ann like to drink juice.') True >>> morning.who.matched 'Peter, Rachel and Ann' >>> morning.who.person[0].matched 'Peter' >>> morning.who.person[1].matched 'Rachel' >>> morning.who.person[2].matched 'Ann'

Notice that, in the grammar above, person rule is never a direct child of who but still is accessed as such. That is because when a rule hierarchy is built, a rule is placed under its closest named ancestor.

Rules' string arguments may actually be any valid regular expression. So we could rewrite our grammar like this:

>>> class Morning(Grammar): ... who = OneOf('w+') ... juice = Rule('juice') ... milk = Optional(' with milk') ... tea = Rule('tea', milk) ... what = OneOf(juice, tea) ... grammar = Rule(who, ' likes to drink ', what, '.') ... ... morning = Morning() ... morning.match('R2D2 likes to drink juice. And nothing else matters.') True >>> morning.matched 'R2D2 likes to drink juice.' >>> morning.who.matched 'R2D2'

Performance

The library is well optimized for fast matching. Nevertheless it is important to remember that this is a Python wrapper of the regex library and as such can never outperform matching directly using the regex library. Currently ruler measures approximately ten times slower than re.

Development

To run the tests:
```
pytest tests
```
To compare the performance to the re library:
```
python performance/re_compare.py
```
To run performance profiling of a specific method, Rule.match for example:
```
python performance/profile.py Rule.match
```
More than one method can be specified in the same command.

Tox

Tox takes care of everything without installing anything manually. There are two groups of tox environments: py*-test and py*-profile. The test environments run the unit tests while the profile environments run the performance profiling scripts. If tox is not enough then a development environment can be generated by creating a new virtualenv and then running pip install -r requirements_develop.txt.

Dependency management

For the development needs, there are three requirements files in the project's root directory:

requirements_test.txt contains all the dependencies needed to run the unit tests,
requirements_profile.txt contains all the dependencies needed to run the performance profiling,
requirements_develop.txt contains the testing dependencies, the profiling dependencies and some additional dependencies used in development.

The requirements files mentioned above are not intended for manual editing. Instead they are managed using pip-tools. The process of updating the requirements is as follows:

Add, remove or update a dependency in one of the reqs_*.dep files:
- Update reqs_install.dep if the dependency is needed for the regular installation by the end user,
- Update reqs_test.dep if the dependency is needed to run the unit tests but is not necessary for the regular installation,
- Update reqs_profile.dep if the dependency is needed to run the performance profiling but is not necessary for the regular installation,
- Update reqs_develop.dep if the dependency is not in one of the previous categories.
Generate the requirements file running pip-compile. The exact command is documented in the beginning of each requirements file.
Consider running pip-sync requirements_develop.txt.

Notice that there is no need to edit setup.py - it will pull the dependencies by itself from reqs_install.dep.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
performance		performance
src/ruler		src/ruler
tests		tests
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
reqs_develop.dep		reqs_develop.dep
reqs_install.dep		reqs_install.dep
reqs_profile.dep		reqs_profile.dep
reqs_test.dep		reqs_test.dep
requirements_develop.txt		requirements_develop.txt
requirements_profile.txt		requirements_profile.txt
requirements_test.txt		requirements_test.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

License

yanivmo/ruler

Folders and files

Latest commit

History

Repository files navigation

Ruler

Installation

Quick start

Performance

Development

Tox

Dependency management

About

Topics

Resources

License

Stars

Watchers

Forks

Languages