Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex gym and fsm simulator gets stuck #15

Open
Mahesha999 opened this issue Oct 2, 2017 · 1 comment
Open

Regex gym and fsm simulator gets stuck #15

Mahesha999 opened this issue Oct 2, 2017 · 1 comment

Comments

@Mahesha999
Copy link

I was trying to find the dfa accepting strings containing at least three occurrences of three consecutive 1's on alphabet Σ={0,1} with overlapping permitted.

I come up with following long regex:

(
(0+1)*111(0+1)*111(0+1)*111(0+1)*
+(0+1)*111(0+1)*1111(0+1)*
+(0+1)*1111(0+1)*111(0+1)*
+(0+1)*11111(0+1)*
)

First line for strings containing non overlapping three occurrences of three consecutive 1's
Second line for strings containing first two occurrences overlapping (that is four consecutive 1's)
Third line for strings containing last two occurrences overlapping (that is four consecutive 1's)
Last line for all occurrences overlapping (that is five consecutive 1's)

I fed above regex to FSM simulator and it kept on processing. I can see the CPU utilization in both chrome and windows task managers. So I tried to feed the regex to regex gym. It also showed same behaviour.

Interestingly, when I omit last line (for all overlapping occurrences) in the regex, it returned proper FSM:

(
(0+1)*111(0+1)*111(0+1)*111(0+1)*
+(0+1)*111(0+1)*1111(0+1)*
+(0+1)*1111(0+1)*111(0+1)*
)

So whats going on here? Is it that the original regex is excessively complex?

@izuzak
Copy link
Owner

izuzak commented Oct 9, 2017

Hi @Mahesha999 👋

I fed above regex to FSM simulator and it kept on processing. I can see the CPU utilization in both chrome and windows task managers.

Just to make sure I understand -- did you try to generate a DFA or an eNFA? I see the "blocking" processing when I try to generate a DFA, but an eNFA is generated just fine.

So I tried to feed the regex to regex gym. It also showed same behaviour.

Strange -- the regex gym starts reducing the regex just fine for me.

Interestingly, when I omit last line (for all overlapping occurrences) in the regex, it returned proper FSM:

Yeah, strange again -- I see no difference in the behavior I described above when I remove the last part.

So whats going on here? Is it that the original regex is excessively complex?

Yeah, it's likely that converting the eNFA into a DFA and then minimizing the DFA is expensive. It's possible the might be some performance optimizations that can be made to the library, but I don't have time to look into that right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants