Binary operations with Arrays of different memory layout #681

geggo · 2023-04-14T10:12:56Z

For pyopencl.array.Array I get wrong results for binary operations such as adding if the memory layout differs for both arguments. Say, "A" is a C-contiguous, and "AF" is F-contiguous, then adding A+AF gives unexpected results. See this notebook for an example (tested with PyOpenCL 2022.3.1)

import numpy as np
import pyopencl as cl
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
import pyopencl.array as cla

# create simple numpy array 'a'
a = np.zeros((2,2))
a[0,1] = 1
a

array([[0., 1.],
       [0., 0.]])

# with different memory layout
af = np.asarray(a, order='F')
af

array([[0., 1.],
       [0., 0.]])

a + af

array([[0., 2.],
       [0., 0.]])

A = cla.to_device(queue, a)
AF = cla.to_device(queue, af)

A, AF

(cl.Array([[0., 1.],
        [0., 0.]]),
 cl.Array([[0., 1.],
        [0., 0.]]))

# bug: A + AF gives different result than a + aF
B = A + AF
B

cl.Array([[0., 1.],
       [1., 0.]])

Is this intended to work? Looking quickly at the source code, I could not see that the strides are taken into account. It seems that the underlying arrays are added in memory order.

Thanks
Gregor

The text was updated successfully, but these errors were encountered:

inducer · 2023-04-14T18:12:54Z

Yep, this is a trade-off that I made at the outset, for two reasons:

First, figuring out an appropriate launch config for strided accesses would add considerable overhead to an already too-hot code path leading up to arithmetic. (cf. numarray)
Second, the existing code generation infrastructure won't easily stretch to true n-dimensional arrays. (https://github.com/inducer/loopy was intended, among other things, to address that use case)

It's obviously embarrassing that adding different-stride arrays silently gives wrong results, and I've been meaning to at least add a (C-based, to avoid a big impact on said hot code path) check to raise an error. PRs are definitely welcome along those lines.

That said, even if this issue were fixed, eagerly evaluating array calculations is still not a winning move, so I'm not super likely to devote much more attention than adding that check, cf. https://github.com/inducer/pytato/ for a possible remedy.

geggo · 2023-04-18T12:39:06Z

Thanks for the clarification. It is clear to me that the array arithmetic is not optimal in regard with performance - but convenient to use. I consider moving my code to pytorch or JAX for this purpose.

To add a check that throws an error if memory layout is different, I ended up patching PyOpenCL

_pyopencl_array_get_broadcasted_binary_op_result_SAVED = None
def patch_pyopencl_array():
    import pyopencl.array
    global _pyopencl_array_get_broadcasted_binary_op_result_SAVED
    if _pyopencl_array_get_broadcasted_binary_op_result_SAVED is None:
        _pyopencl_array_get_broadcasted_binary_op_result_SAVED = pyopencl.array._get_broadcasted_binary_op_result

    def _get_broadcasted_binary_op_result_PATCHED(obj1, obj2, cq, dtype_getter=pyopencl.array._get_common_dtype):
        if obj1.strides != obj2.strides:
            raise NotImplementedError('Memory layouts (strides) of arrays are not the same')
        else:
            return _pyopencl_array_get_broadcasted_binary_op_result_SAVED(
                obj1, obj2, cq, dtype_getter
            )

    pyopencl.array._get_broadcasted_binary_op_result = _get_broadcasted_binary_op_result_PATCHED

Essentially, it adds a check that the strides are equal. This check is performed for add, mul and the other binary arithmetic operators, but not for the inplace operators. The inplace operators have all a explicit tests for matching shape, a little bit of refactoring would be required to move this into a common method that incorporates the additional check for same strides. Will such a PR be acceptable? Compared to the code already in place I thinks the overhead is acceptable to avoid silent failures. I am wondering if it sensible to make the check more general, similar to strides_equal

Gregor

inducer mentioned this issue May 17, 2023

elwise: support more shapes for binary ops #687

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary operations with Arrays of different memory layout #681

Binary operations with Arrays of different memory layout #681

geggo commented Apr 14, 2023

inducer commented Apr 14, 2023

geggo commented Apr 18, 2023

Binary operations with Arrays of different memory layout #681

Binary operations with Arrays of different memory layout #681

Comments

geggo commented Apr 14, 2023

inducer commented Apr 14, 2023

geggo commented Apr 18, 2023