Reshaping vs `core_ndim` #218

max-sixty · 2023-12-07T22:26:01Z

When you originally wrote the library, you wrote some logic for allowing an arbitrary tuple of axes — the function is recompiled on a change in the number of axes:

numbagg/numbagg/decorators.py

Lines 187 to 205 in da469cc

    
           numba_sig = [] 
        
           for input_sig in self.signature: 
        
               new_sig = ( 
        
                   (input_sig.args[0][(slice(None),) * max(core_ndim, 1)],) 
        
                   + input_sig.args[1:] 
        
                   + (input_sig.return_type[:],) 
        
               ) 
        
               numba_sig.append(new_sig) 
        
           first_sig = self.signature[0] 
        
           gufunc_sig = gufunc_string_signature( 
        
               ( 
        
                   first_sig.args[0][(slice(None),) * core_ndim] 
        
                   if core_ndim 
        
                   else first_sig.args[0], 
        
               ) 
        
               + first_sig.args[1:] 
        
               + (first_sig.return_type,) 
        
           )

More recently, we've been doing the simpler thing of just reshaping, calling the gufunc, and then reshaping back if needed. The code is arguably simpler:

numbagg/numbagg/decorators.py

Lines 551 to 557 in 8aac114

    
           def move_axes(arr: np.ndarray, axes: tuple[int, ...]): 
        
               """ 
        
               Move & reshape a tuple of axes to an array's final axis. 
        
               """ 
        
               moved_arr = np.moveaxis(arr, axes, range(arr.ndim - len(axes), arr.ndim)) 
        
               new_shape = moved_arr.shape[: -len(axes)] + (-1,) 
        
               return moved_arr.reshape(new_shape)

What was the rationale for compiling a new gufunc, vs. the reshape? IIUC the reshape is quite cheap. It's also possibly easier to inspect, when debugging. It is less elegant, and to the extent it would be upstream-ed into numba, it's plausibly more general.

Do you have a view of how we should be doing this going forward?

shoyer · 2023-12-07T23:06:46Z

Here was my original reasoning: compiling a new gufunc helps because unlike reshaping you can be sure you'll never need to copy the data with the reshape. Then as long as you use NumPy iterators like .flat, Numba will still generate code that iterates in the most efficient possible order. This can considerably speed up functions like nansum, because it's way most efficient to compute a reduction if you only need iterate over data once in the same order that it is laid out in memory.

My thinking has shifted a bit since then. This really only matters in a best case scenario, and in practice I'm not sure these advantages are usually realized. When you start to think about full programs, non-trivial programs in Python usually involve at least a handful of operations that do need full memory copies, so you barely notice the extra copies. Given the way that Numbagg is used to replace specific bottleneck in NumPy code, which generally is not compiled end-to-end, I suspect this is usually the case.

The other consideration is that more complex functions may require iterating over data in a more specific order, in which case a copy to speed-up iteration may be a good idea anyways. matmul is the classsic example of this sort of thing, and it's why deep learning frameworks generally don't worry too much about using views instead of copies, e.g., instead of supporting a memory model with views, JAX/XLA uses whole program JIT to fuse together computation and avoid unnecessary copies.

max-sixty · 2023-12-07T23:33:42Z

Interesting, and thanks for the quick response.

I had been holding that .reshape is a view rather than a copy. But as you point out, that's not always the case. I wonder how often, in practice, people are passing in non-contiguous arrays which would require a copy. I'm guessing it's a minority but still significant.

For the moment, we can leave things as they are — no need to shift either approach to the other proactively...

max-sixty · 2023-12-26T19:36:49Z

We use the "new" form in #268, which is required to add additional scalar args such as ddof

max-sixty mentioned this issue Feb 29, 2024

Optional fastmath optimizations via env var #290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reshaping vs `core_ndim` #218

Reshaping vs `core_ndim` #218

max-sixty commented Dec 7, 2023

shoyer commented Dec 7, 2023 •

edited

max-sixty commented Dec 7, 2023

max-sixty commented Dec 26, 2023

Reshaping vs core_ndim #218

Reshaping vs core_ndim #218

Comments

max-sixty commented Dec 7, 2023

shoyer commented Dec 7, 2023 • edited

max-sixty commented Dec 7, 2023

max-sixty commented Dec 26, 2023

Reshaping vs `core_ndim` #218

Reshaping vs `core_ndim` #218

shoyer commented Dec 7, 2023 •

edited