Fix gh-1279, implement tensor.allclose #1343

oleksandr-pavlyk · 2023-08-15T15:22:10Z

Cleans up operator implementations in "dpctl/tensor/_usmarray.pyx" and removes unused static method to improve coverage
Provides private methods implementing dpctl.tensor.abs and dpctl.tensor.sqrt for complex types. This closes dpctl.tensor.abs() does not work on Iris Xe on Windows #1279.
Implements, and adds tests to dpctl.tensor.allclose.

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you opening the PR as a draft?

oleksandr-pavlyk · 2023-08-15T15:23:25Z

@ndgrigorian This PR superseded gh-1340, as it also ensures that dpctl.tensor.abs(dpctl.tensor.asarray([-0.0])) gives "+0.0".

1. Removed unused usm_ndarray._clone static C-only method 2. Removed _dispatch* utilities 3. Used direct calls to unary/binary operators in implementation of special methods

Provide cabs private method implementating abs for complex types, paying attention to array-API mandated special values. To work-around gh-1279, use std::hypot to compute value for finite inputs. Compile with -DUSE_STD_ABS_FOR_COMPLEX_TYPES to use std::abs(z) instead of std::hypot(std::real(z), std::imag(z)).

This change provides private method csqrt to evaluate square-root for complex types. It handles special values as mandated by array API. The finite input, it provides its own implementation based on std::hypot and std::sqrt for real types instead of calling std::sqrt on finite input of complex type. Compile with -DUSE_STD_SQRT_FOR_COMPLEX_TYPES to use std::sqrt instead of custom implementation. Cursory performance study suggests that custom implementation is at least not worse than std::sqrt one.

This utility function is based on symmetric check, unlike numpy.allclose, and verifies that abs(x1-x2) < atol + rtol * max(abs(x1), abs(x2)) This way allclose(x1, x2) is symmetric, and allclose(x1,x2) implies allclose(x2, x1).

github-actions · 2023-08-15T16:01:10Z

View rendered docs @ https://intelpython.github.io/dpctl/pulls/1343/index.html

coveralls · 2023-08-15T16:19:02Z

coverage: 85.469% (+0.4%) from 85.041% when pulling c4312cb on fix-gh-1279 into 9c18949 on master.

github-actions · 2023-08-15T16:43:18Z

Array API standard conformance tests for dpctl=0.14.6dev2=py310h7bf5fec_9 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

github-actions · 2023-08-15T16:45:07Z

Array API standard conformance tests for dpctl=0.14.6dev2=py310h7bf5fec_8 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

github-actions · 2023-08-15T20:05:12Z

Array API standard conformance tests for dpctl=0.14.6dev2=py310h7bf5fec_9 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

github-actions · 2023-08-15T23:17:29Z

Array API standard conformance tests for dpctl=0.14.6dev3=py310h7bf5fec_8 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

oleksandr-pavlyk · 2023-08-16T11:53:14Z

Verified that after the changes dpctl is able to execute problematic inputs on machine with Iris Xe running Windows:

(fix-gh-1279) C:\Users\opavlyk\gh>python -c "import dpctl.tensor as dpt; x = dpt.ones(10, dtype=dpt.complex64); y = dpt.abs(x); print(y)"
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

(fix-gh-1279) C:\Users\opavlyk\gh>python -c "import dpctl.tensor as dpt; x = dpt.ones(10, dtype=dpt.complex64); y = dpt.sqrt(x); print(y)"
[1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]

dpctl/tensor/_testing.py

1. Aligned default values with those of np.allclose 2. Replaced less test with less_equal to align with NumPy.

Also added tests for early exits to improve coverage.

ndgrigorian · 2023-08-16T18:32:47Z

There is a compelling case for another change here:

numpy/numpy#10161

so the implementation would become

dpt.abs(x - y) <= dpt.maximum(rtol * dpt.maximum(dpt.abs(x), dpt.abs(y)), atol)

We aren't nearly as burdened by backwards compatibility, so it's interesting to consider, as the incorrect cases pointed out in this issue are incorrect here, too.

In [11]: rtol, atol = 1e-4, 2e-5

In [12]: a = dpt.asarray(0.142253)

In [13]: b = dpt.asarray(0.142219)

In [14]: dpt.allclose(a, b, atol=atol, rtol=rtol)
Out[14]: usm_ndarray(True)

In [15]: dpt.abs(a - b) <= dpt.maximum(rtol * dpt.maximum(dpt.abs(a), dpt.abs(b)), atol)
Out[15]: usm_ndarray(False)

Downside would be another call to maximum.

oleksandr-pavlyk · 2023-08-16T18:38:30Z

Downside would be another call to maximum.

It won't be a downside, we would just trade a call to dpt.add with a call to dpt.maximum.

oleksandr-pavlyk · 2023-08-16T18:44:49Z

@ndgrigorian I have pushed the change to use abs(a-b) <= max(atol, rtol*max(abs(a), abs(b)) formula. I have also reflected that in the tensor.allclose docstring.

ndgrigorian · 2023-08-16T19:00:58Z

@oleksandr-pavlyk Does it make sense to change other calls to std::abs in dpctl be changed to use this implementation?

And if so, maybe it should be in math_utils.hpp. It may be better off in another PR to not let this one get too bloated, though.

oleksandr-pavlyk · 2023-08-16T19:04:42Z

I do not think so, but even if we do, we should do it in a separate PR.

github-actions · 2023-08-16T19:32:33Z

Array API standard conformance tests for dpctl=0.14.6dev3=py310h7bf5fec_16 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

github-actions · 2023-08-16T19:58:27Z

Array API standard conformance tests for dpctl=0.14.6dev3=py310h7bf5fec_17 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

This changes builds up on gh-1265 and takes into account queue from the pre-allocated buffer, if provided.

ndgrigorian · 2023-08-16T21:21:24Z

@oleksandr-pavlyk
The sqrt implementation seems to have some issues with very large values and very small values.

In [1]: import dpctl.tensor as dpt, numpy as np

In [2]: val_max = dpt.finfo(dpt.complex128).max

In [3]: x = dpt.asarray(complex(val_max, val_max), dtype="c16")

In [4]: dpt.sqrt(x)
Out[4]: usm_ndarray(inf+0.j)

In [5]: x_np = np.asarray(complex(val_max, val_max), dtype="c16")

In [6]: np.sqrt(x_np)
Out[6]: (1.4730945569055655e+154+6.101757441282702e+153j)

In [7]: val_min = dpt.finfo(dpt.complex128).min

In [8]: x = dpt.asarray(complex(val_min, val_min), dtype="c16")

In [9]: dpt.sqrt(x)
Out[9]: usm_ndarray(0.-infj)

In [10]: x_np = np.asarray(complex(val_min, val_min), dtype="c16")

In [11]: np.sqrt(x_np)
Out[11]: (6.101757441282702e+153-1.4730945569055655e+154j)

github-actions · 2023-08-17T01:32:43Z

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_23 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

This improves accuracy at extremes of supported range. Use sycl:: namespace ldexp and ilogb to prevent problem with VS 2017 headers.

oleksandr-pavlyk · 2023-08-17T11:50:55Z

@ndgrigorian I pushed the change to normalize the scale of the arguments in csqrt and rescale the result appropriately, but use of these scaling functions causes failure to offload on Windows to recur.

Perhaps the solution is to use sych functions from sycl namespace rather than from std:: namespace.

ilogb would have to pay attention to correctly computing scale of denormal floats, while simpler code suffices. Also use unscaled version in most cases, and scaled version only for very large inputs.

We work around issues with these functions when their implementation is taken from VS 2017 headers on Windows though.

ndgrigorian · 2023-08-17T20:48:40Z

@oleksandr-pavlyk

Since you removed other superfluous iostream includes in this PR, can you also remove the one in positive.hpp?

github-actions · 2023-08-17T21:35:38Z

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_32 ran successfully.
Passed: 912
Failed: 88
Skipped: 119

github-actions · 2023-08-17T21:59:57Z

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_34 ran successfully.
Passed: 912
Failed: 88
Skipped: 119

github-actions · 2023-08-17T22:04:33Z

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_33 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

oleksandr-pavlyk · 2023-08-18T11:44:39Z

Reconfirmed that the latest build resolves gh-1279 on Windows machine with Iris Xe integrated GPU:

(gh-1279) C:\Users\user\repos>python -c "import dpctl.tensor as dpt; x = dpt.ones(10, dtype='c8'); y = dpt.sqrt(x); print(y)"
[1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]

(gh-1279) C:\Users\user\repos>python -c "import dpctl.tensor as dpt; x = dpt.ones(10, dtype='c8'); y = dpt.abs(x); print(y)"
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

(gh-1279) C:\Users\user\repos>python -m dpctl -f
Platform  0 ::
    Name        Intel(R) OpenCL
    Version     OpenCL 3.0 WINDOWS
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
        Version             2023.16.6.0.28_042959
        Filter string       opencl:cpu:0
Platform  1 ::
    Name        Intel(R) OpenCL HD Graphics
    Version     OpenCL 3.0
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Iris(R) Xe Graphics
        Version             31.0.101.4032
        Filter string       opencl:gpu:0
Platform  2 ::
    Name        Intel(R) Level-Zero
    Version     1.3
    Vendor      Intel(R) Corporation
    Backend     ext_oneapi_level_zero
    Num Devices 1
      # 0
        Name                Intel(R) Iris(R) Xe Graphics
        Version             1.3.24931
        Filter string       level_zero:gpu:0

(gh-1279) C:\Users\user\repos>conda list dpctl
# packages in environment at C:\Users\user\Miniconda3\envs\gh-1279:
#
# Name                    Version                   Build  Channel
dpctl                     0.14.6dev3      py310h81563b4_34    file:///C:/Users/user/repos/channel

ndgrigorian

This LGTM, thank you @oleksandr-pavlyk

github-actions · 2023-08-18T18:11:33Z

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

github-actions · 2023-08-18T19:25:37Z

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_34 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

oleksandr-pavlyk added 5 commits August 15, 2023 10:28

Clean up of operator special methods

48c2ad2

1. Removed unused usm_ndarray._clone static C-only method 2. Removed _dispatch* utilities 3. Used direct calls to unary/binary operators in implementation of special methods

Removed leftover include iostream

7d9974e

Implements dpctl.tensor.allclose

26862b4

This utility function is based on symmetric check, unlike numpy.allclose, and verifies that abs(x1-x2) < atol + rtol * max(abs(x1), abs(x2)) This way allclose(x1, x2) is symmetric, and allclose(x1,x2) implies allclose(x2, x1).

oleksandr-pavlyk force-pushed the fix-gh-1279 branch from 3f971ff to 26862b4 Compare August 15, 2023 15:29

oleksandr-pavlyk requested a review from ndgrigorian August 15, 2023 16:25

Adds tests for special FP values for dpt.abs and dpt.sqrt

b121d67

ndgrigorian mentioned this pull request Aug 15, 2023

dpctl.tensor.abs fixed for signed zeros #1340

Closed

6 tasks

Added tests for type promotion in tensor.allclose

d60d58e

ndgrigorian reviewed Aug 16, 2023

View reviewed changes

dpctl/tensor/_testing.py Outdated Show resolved Hide resolved

ndgrigorian reviewed Aug 16, 2023

View reviewed changes

dpctl/tensor/_testing.py Outdated Show resolved Hide resolved

oleksandr-pavlyk added 2 commits August 16, 2023 13:30

Fixes per PR review feedback

ea6dd27

1. Aligned default values with those of np.allclose 2. Replaced less test with less_equal to align with NumPy.

Adds tests for atol/rtol

f36af57

Also added tests for early exits to improve coverage.

tensor.allclose to use abs(a-b) < max(atol, rtol*max(abs(a), abs(b)))

a75fff8

Completion of fix for gh-1058

ff3c680

This changes builds up on gh-1265 and takes into account queue from the pre-allocated buffer, if provided.

Scale down arguments and scale back the result

c4312cb

This improves accuracy at extremes of supported range. Use sycl:: namespace ldexp and ilogb to prevent problem with VS 2017 headers.

oleksandr-pavlyk force-pushed the fix-gh-1279 branch from 59a7d11 to c4312cb Compare August 17, 2023 12:38

oleksandr-pavlyk added 3 commits August 17, 2023 11:49

Avoid using sycl::ilogb, but use own implementation

bb52bb1

ilogb would have to pay attention to correctly computing scale of denormal floats, while simpler code suffices. Also use unscaled version in most cases, and scaled version only for very large inputs.

Set defines to use std::abs and std::sqrt on Linux

ba9a595

We work around issues with these functions when their implementation is taken from VS 2017 headers on Windows though.

Removed stray include iostream

142190f

ndgrigorian approved these changes Aug 18, 2023

View reviewed changes

oleksandr-pavlyk merged commit 2f3be1f into master Aug 18, 2023
39 of 42 checks passed

oleksandr-pavlyk deleted the fix-gh-1279 branch August 18, 2023 18:11

oleksandr-pavlyk mentioned this pull request Aug 18, 2023

Development milestone 0.14.6dev4 #1354

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gh-1279, implement tensor.allclose #1343

Fix gh-1279, implement tensor.allclose #1343

oleksandr-pavlyk commented Aug 15, 2023

oleksandr-pavlyk commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

coveralls commented Aug 15, 2023 •

edited

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

oleksandr-pavlyk commented Aug 16, 2023

ndgrigorian commented Aug 16, 2023

oleksandr-pavlyk commented Aug 16, 2023

oleksandr-pavlyk commented Aug 16, 2023

ndgrigorian commented Aug 16, 2023 •

edited

oleksandr-pavlyk commented Aug 16, 2023

github-actions bot commented Aug 16, 2023

github-actions bot commented Aug 16, 2023

ndgrigorian commented Aug 16, 2023

github-actions bot commented Aug 17, 2023

oleksandr-pavlyk commented Aug 17, 2023

ndgrigorian commented Aug 17, 2023

github-actions bot commented Aug 17, 2023

github-actions bot commented Aug 17, 2023

github-actions bot commented Aug 17, 2023

oleksandr-pavlyk commented Aug 18, 2023

ndgrigorian left a comment

github-actions bot commented Aug 18, 2023

github-actions bot commented Aug 18, 2023

Fix gh-1279, implement tensor.allclose #1343

Fix gh-1279, implement tensor.allclose #1343

Conversation

oleksandr-pavlyk commented Aug 15, 2023

oleksandr-pavlyk commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

coveralls commented Aug 15, 2023 • edited

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

oleksandr-pavlyk commented Aug 16, 2023

ndgrigorian commented Aug 16, 2023

oleksandr-pavlyk commented Aug 16, 2023

oleksandr-pavlyk commented Aug 16, 2023

ndgrigorian commented Aug 16, 2023 • edited

oleksandr-pavlyk commented Aug 16, 2023

github-actions bot commented Aug 16, 2023

github-actions bot commented Aug 16, 2023

ndgrigorian commented Aug 16, 2023

github-actions bot commented Aug 17, 2023

oleksandr-pavlyk commented Aug 17, 2023

ndgrigorian commented Aug 17, 2023

github-actions bot commented Aug 17, 2023

github-actions bot commented Aug 17, 2023

github-actions bot commented Aug 17, 2023

oleksandr-pavlyk commented Aug 18, 2023

ndgrigorian left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 18, 2023

github-actions bot commented Aug 18, 2023

coveralls commented Aug 15, 2023 •

edited

ndgrigorian commented Aug 16, 2023 •

edited