Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation/regex #18

Open
vityok opened this issue Jun 15, 2015 · 5 comments
Open

Documentation/regex #18

vityok opened this issue Jun 15, 2015 · 5 comments

Comments

@vityok
Copy link

vityok commented Jun 15, 2015

Hi,

I am collecting information for my pet project and would like to know more about the regex implementation in this library.

Could you please provide some background/explanations if this regex engine is intended to be used to process strings/sequences/vectors, etc. how is it used (API) what are its features?

Thanks,
Victor

@Tarrasch
Copy link
Contributor

Where's the regex engine used in this project? (I have kind of forgotten, been literally years since I poked around here)

@vityok
Copy link
Author

vityok commented Jun 16, 2015

Arrash,

I thought that regular expressions are used in this project because of the regex.lisp file in the sources. A brief look at its contents shows that it deals with converting regular expressions (in some form) into DFA and NFA using McNaughton-Yamada-Thompson algorithm.

However, documentation is really rigid and in order to better understand what is going on, one must look a bit deeper.

If you still can recall, could you please provide some more information what is this code about?

Thanks,
Victor

@Tarrasch
Copy link
Contributor

We do regex stuff in the parser generator implementations. Doing sorts of "Dragoon bookie" stuff. To generate a LL(1) or LL(*) parser, it's useful to go back and forth between finite [non]deterministic state machines and the equivalent regex representation.

Does this give you a slightly better picture? I have no idea what @ndantam does nowadays though.

@ndantam
Copy link
Member

ndantam commented Jun 16, 2015

The regex and FA implementations here are not really aimed at text processing or pattern matching, but at general symbolic analysis -- i.e., we do not assume that the terminal symbol alphabet consists of characters, and we implement various set operations on the regular languages.

You could build a text pattern matcher on top of this work using the resulting minimized DFA. Compared to CL-PPCRE -- which I believe is not doing DFA minimization -- this may produce a more compact and efficient matcher. However, some features of typical regex/pattern engines such as capture groups may be more difficult to implement.

If you are interested in improving the state of Common Lisp text pattern matching, I personally would like to see a pattern matcher that operates on ropes (https://en.wikipedia.org/wiki/Rope_(data_structure)) instead of just array-based strings.

@vityok
Copy link
Author

vityok commented Jun 17, 2015

Hi Neil, @Tarrasch,

thanks for your explanation. My goal is not so bold as to improve the state of regular expressions engines in Common Lisp, just to learn a bit of different algorithms and programming techniques.

Though I do have a very practical interest in Boyer-Moore-Horspool and Commentz-Walter implementations.

BTW, I've created a ticket for creating a BMH matcher for ropes and it looks like it will be not that difficult (for the ropes implementation by @Ramarren).

Thanks,
Victor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants