Documentation/regex #18

vityok · 2015-06-15T14:39:11Z

Hi,

I am collecting information for my pet project and would like to know more about the regex implementation in this library.

Could you please provide some background/explanations if this regex engine is intended to be used to process strings/sequences/vectors, etc. how is it used (API) what are its features?

Thanks,
Victor

Tarrasch · 2015-06-15T14:46:57Z

Where's the regex engine used in this project? (I have kind of forgotten, been literally years since I poked around here)

vityok · 2015-06-16T07:41:36Z

Arrash,

I thought that regular expressions are used in this project because of the regex.lisp file in the sources. A brief look at its contents shows that it deals with converting regular expressions (in some form) into DFA and NFA using McNaughton-Yamada-Thompson algorithm.

However, documentation is really rigid and in order to better understand what is going on, one must look a bit deeper.

If you still can recall, could you please provide some more information what is this code about?

Thanks,
Victor

Tarrasch · 2015-06-16T21:25:08Z

We do regex stuff in the parser generator implementations. Doing sorts of "Dragoon bookie" stuff. To generate a LL(1) or LL(*) parser, it's useful to go back and forth between finite [non]deterministic state machines and the equivalent regex representation.

Does this give you a slightly better picture? I have no idea what @ndantam does nowadays though.

ndantam · 2015-06-16T21:50:27Z

The regex and FA implementations here are not really aimed at text processing or pattern matching, but at general symbolic analysis -- i.e., we do not assume that the terminal symbol alphabet consists of characters, and we implement various set operations on the regular languages.

You could build a text pattern matcher on top of this work using the resulting minimized DFA. Compared to CL-PPCRE -- which I believe is not doing DFA minimization -- this may produce a more compact and efficient matcher. However, some features of typical regex/pattern engines such as capture groups may be more difficult to implement.

If you are interested in improving the state of Common Lisp text pattern matching, I personally would like to see a pattern matcher that operates on ropes (https://en.wikipedia.org/wiki/Rope_(data_structure)) instead of just array-based strings.

vityok · 2015-06-17T13:24:05Z

Hi Neil, @Tarrasch,

thanks for your explanation. My goal is not so bold as to improve the state of regular expressions engines in Common Lisp, just to learn a bit of different algorithms and programming techniques.

Though I do have a very practical interest in Boyer-Moore-Horspool and Commentz-Walter implementations.

BTW, I've created a ticket for creating a BMH matcher for ropes and it looks like it will be not that difficult (for the ropes implementation by @Ramarren).

Thanks,
Victor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation/regex #18

Documentation/regex #18

vityok commented Jun 15, 2015

Tarrasch commented Jun 15, 2015

vityok commented Jun 16, 2015

Tarrasch commented Jun 16, 2015

ndantam commented Jun 16, 2015

vityok commented Jun 17, 2015

Documentation/regex #18

Documentation/regex #18

Comments

vityok commented Jun 15, 2015

Tarrasch commented Jun 15, 2015

vityok commented Jun 16, 2015

Tarrasch commented Jun 16, 2015

ndantam commented Jun 16, 2015

vityok commented Jun 17, 2015