Optimize `range()` to enable more auto-vectorization #9428

sklam · 2024-02-08T22:05:31Z

It is known that Numba's range() implementation is not optimal for LLVM loop optimizers. Even when one manually unroll loop for a constant loop bound, the way range() is implemented prohibits the loop-vectorizer to compute the loop bounds; thus failing to auto-vectorize in many cases.

This patch focuses on adjusting the ind-var computation to avoid computing reminders and instead relies mainly on additions, multiplications and floordiv which are common in low-level address computations.

There is still a reminder computation only if user ask for length_of_iterator(iter(range())), but it is a rare use-case. LLVM is able to optimize-away that reminder (and the storage of range_iterator_type.count)

Here's a notebook with a use-case I used to optimize range(): https://gist.github.com/sklam/6beddb2041580ceea4f25e0e496f50a0/420bc173da75996a307465f134c23bb3f4bb6562.

Avoid using srem in common path. Only use sdiv, add and smul for computing the indvar.

sklam added 2 commits February 8, 2024 15:56

Rework range to allow LLVM loop-vectorizer.

282596e

Avoid using srem in common path. Only use sdiv, add and smul for computing the indvar.

Make sure range_iterator.count is updated.

e204022

kc611 added enhancement 2 - In Progress labels Feb 9, 2024

sklam mentioned this pull request Feb 9, 2024

Numba not vectorizing 2d copy loop #8314

Closed

sklam mentioned this pull request Feb 23, 2024

Add performance suite #9460

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `range()` to enable more auto-vectorization #9428

Optimize `range()` to enable more auto-vectorization #9428

sklam commented Feb 8, 2024 •

edited

Optimize range() to enable more auto-vectorization #9428

Are you sure you want to change the base?

Optimize range() to enable more auto-vectorization #9428

Conversation

sklam commented Feb 8, 2024 • edited

Optimize `range()` to enable more auto-vectorization #9428

Optimize `range()` to enable more auto-vectorization #9428

sklam commented Feb 8, 2024 •

edited