Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Aho-Corasick for alts of literals during regexp AST compilation #221

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

katef
Copy link
Owner

@katef katef commented Jul 19, 2020

I'll have more to say about this, but I'm too tired to write about it at the moment.

@katef
Copy link
Owner Author

katef commented Jul 19, 2020

It's difficult to see the structure for unanchored strings (because we have no way to inform graphviz to weight the backwards edges differently).

But here's the simpler case for anchored strings, showing 100 alts of {3,20} characters in length. The same (randomly generated) data for each. Hopefully you can see the difference in the structure, with the trie shape visible.

Recursive Thompson construction Aho-Corasick
image image

@katef katef force-pushed the kate/re-into branch 2 times, most recently from 1d4a02b to 5778d25 Compare July 25, 2020 21:14
katef added 23 commits August 4, 2020 22:25
…`re_strings()`, which is already a convenience.
…e()`.

This was neglected for the previous commit (of passing in an fsm for `ast_compile()`). Here I am attempting to rectify that by passing along the start states explicitly.
This avoids needing to construct an intermediate string, and allocating storage for it.
When a single end state is provided, we cannot set these as accepting. So here I'm hooking them up with epsilons, instead. Yes, this means we don't always produce a DFA.

To do this, we need to identify leaf nodes, so I've introduced `has_child()` for that.
…ode.

This allows for accepting states in the middle of branches in the trie, rather than hooking them up using epsilons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant