Speed up the branchless UTF-8 decoder by removing !len #7

danielthegray · 2021-08-21T23:01:35Z

In your post, you say: "Adding that !len is actually somewhat costly,
though I couldn’t figure out why."

My suspicion was that it is because the "!" operator would essentially
behave like a branch, returning 1 if the input is 0 and 0 otherwise.

So, my idea was to copy the table of lengths you have and create another
one for "error lengths" to get that same effect (0 when it's OK and 1
when there is an error, to ensure that it moves forward at least one
byte, as mentioned).

The throughput went up from 504 MB/s to 557 MB/s on my machine.

In your post, you say: "Adding that !len is actually somewhat costly, though I couldn’t figure out why." My suspicion was that it is because the "!" operator would essentially behave like a branch, returning 1 if the input is 0 and 0 otherwise. So, my idea was to copy the table of lengths you have and create another one for "error lengths" to get that same effect (0 when it's OK and 1 when there is an error, to ensure that it moves forward at least one byte, as mentioned). The throughput went up from 504 MB/s to 557 MB/s on my machine.

N-R-K · 2022-06-23T06:21:53Z

For what it's worth, I actually see the speed drop from ~647 MB/s to ~611 MB/s with this patch applied on my system (3700x).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up the branchless UTF-8 decoder by removing !len #7

Speed up the branchless UTF-8 decoder by removing !len #7

danielthegray commented Aug 21, 2021

N-R-K commented Jun 23, 2022

Speed up the branchless UTF-8 decoder by removing !len #7

Are you sure you want to change the base?

Speed up the branchless UTF-8 decoder by removing !len #7

Conversation

danielthegray commented Aug 21, 2021

N-R-K commented Jun 23, 2022