Skip to content

the regex .{32768} ended up running pretty slow. how can I make it go faster? #2530

Answered by BurntSushi
robinwhittleton asked this question in Q&A
Discussion options

You must be logged in to vote

The problem is that when you write something like .{5}, it is quite literally translated as ...... So when you do .{32768}, it turns into a very giant regex. . is also further complicated by the fact that it is itself a small little state machine that matches the UTF-8 encoding of any Unicode scalar value (sans \n). It's small in the sense that it's only about 12x bigger than (?-u:.) (which matches any byte value except for \n), but when you repeat it a large number of times, that small increase can add up. So you could try using (?-u:.) instead if your data set is mostly ASCII, or if you can abide codepoints matching multiple ..

Otherwise, the main thing you can probably do is use --dfa-…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@robinwhittleton
Comment options

Answer selected by BurntSushi
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants