Optimized `Lt128` operator for RVV #2079

lsrcz · 2024-04-11T03:26:34Z

This pull request adds an optimized implementation of the Lt128 operator for RVV targets. The new implementation is synthesized using a program synthesizer.

The main computations use LMUL 1/8, which is usually more efficient than vector groups (LMUL > 1) and can outperform full vector registers (LMUL = 1) on some microarchitectures.

lsrcz · 2024-04-11T03:52:11Z

The compilation result: https://lt128.godbolt.org/z/xEK6v4f6f.

jan-wassenberg

I'm sorry it has taken so long, we have now recovered from the crashing RVV toolchain and reactivated tests.

This is a great idea to use the mask bits directly, congrats on finding it via the synthesizer!

Add the synthesized, efficient Lt128 implementation.

b574568

jan-wassenberg approved these changes May 22, 2024

View reviewed changes

jan-wassenberg added the ready to pull label May 22, 2024

lsrcz closed this May 23, 2024

copybara-service bot merged commit 281d856 into google:master May 23, 2024
32 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized `Lt128` operator for RVV #2079

Optimized `Lt128` operator for RVV #2079

lsrcz commented Apr 11, 2024

lsrcz commented Apr 11, 2024

jan-wassenberg left a comment

Optimized Lt128 operator for RVV #2079

Optimized Lt128 operator for RVV #2079

Conversation

lsrcz commented Apr 11, 2024

lsrcz commented Apr 11, 2024

jan-wassenberg left a comment

Choose a reason for hiding this comment

Optimized `Lt128` operator for RVV #2079

Optimized `Lt128` operator for RVV #2079