Skip to content

Regarding huge performance boost from --no-unicode flag #2584

Answered by BurntSushi
taoxinyi asked this question in Q&A
Discussion options

You must be logged in to vote

Yes, the Unicode word boundary is what's killing you. It's easily confirmed by running this under a profiler:

$ perfr rg-13.0.0 --json -auuu "Apple[\s_-]*Banana[\s_-]*[0-9]*|\bAB[\s_-]*[0-9]\b" ./

(With perfr defined here.)

As you can see, the vast majority of the time is being spent in the PikeVM (the slowest engine):

But if I profile with --no-unicode:

$ perfr --callgraph rg-13.0.0 --json -auuu --no-unicode "Apple[\s_-]*Banana[\s_-]*[0-9]*|\bAB[\s_-]*[0-9]\b" ./

Then most of the time is being spent in the lazy DFA, which is much much faster:

You'll get similar results if you selectively use ASCII word boundaries instead of Unicode word boundaries:

$ time rg-13.0.0 --json -auuu "Appl…

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@taoxinyi
Comment options

Comment options

You must be logged in to vote
1 reply
@BurntSushi
Comment options

Answer selected by BurntSushi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants