Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

floor_divide returns different result for arrays of floating dtype on GPU and CPU devices #1652

Open
antonwolfy opened this issue Apr 23, 2024 · 2 comments

Comments

@antonwolfy
Copy link
Collaborator

In below example the behavior is different between CPU and GPU devices:

import numpy, dpctl, dpctl.tensor as dpt

dpctl.__version__
# Out: '0.17.0dev0+300.g7757857466'

a = dpt.arange(1, 10, dtype='f', device='gpu')
b = dpt.arange(1, 10, dtype='f', device='gpu')
dpt.floor_divide(a, b)
# Out: usm_ndarray([1., 1., 1., 1., 1., 1., 0., 1., 1.], dtype=float32)

a = dpt.arange(1, 10, dtype='f', device='cpu')
b = dpt.arange(1, 10, dtype='f', device='cpu')
dpt.floor_divide(a, b)
# Out: usm_ndarray([1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)

na = numpy.arange(1, 10, dtype='f')
nb = numpy.arange(1, 10, dtype='f')
numpy.floor_divide(na, nb)
# Out: array([1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)

So we have 0 as 7th element of the result array for GPU device and 1 on CPU and in numpy.

If we look into divide function output:

a = dpt.arange(1, 10, dtype='f')
b = dpt.arange(1, 10, dtype='f')

dpt.divide(a, b)
# Out:
# usm_ndarray([1.        , 1.        , 1.        , 1.        , 1.        ,
#             1.        , 0.99999994, 1.        , 1.        ], dtype=float32)

there will be the value 0.99999994 < 1. for GPU device.
Based on the code:

auto div = in1 / in2;
return (div == resT(0)) ? div : resT(sycl::floor(div));

dpctl uses sycl::floor() function, which is intended to return

The value x rounded to an integral value using the round to negative infinity rounding mode

And I guess this is the reason why 0.99999994 rounds to 0 here.
While in Python array API it states that:

Rounds the result of dividing each element x1_i of the input array x1 by the respective element x2_i of the input array x2 to the greatest (i.e., closest to +infinity) integer-value number that is not greater than the division result.

Thus I wonder if it is expected dpctl behavior or an issue.

@ndgrigorian
Copy link
Collaborator

The rounding mode is not exactly at fault here. Per array API

Rounds the result of dividing each element x1_i of the input array x1 by the respective element x2_i of the input array x2 to the greatest (i.e., closest to +infinity) integer-value number that is not greater than the division result

In this case, 1.0 > 0.99999994, so 0.0 is the appropriate result. So the behavior checks out per array API. The surprising result is caused by the division itself being inaccurate, possibly due to lower precision on GPU devices.

@antonwolfy
Copy link
Collaborator Author

antonwolfy commented Apr 23, 2024

@ndgrigorian, now I see, thank you for the clarification.
Would it be worst then to have a special handling in the code? something like

if (sycl::fmod(in1, in2) == 0) {
    return resT(std::rint(in1/in2);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants