Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gh-1279, implement tensor.allclose #1343

Merged
merged 15 commits into from Aug 18, 2023
Merged

Fix gh-1279, implement tensor.allclose #1343

merged 15 commits into from Aug 18, 2023

Conversation

oleksandr-pavlyk
Copy link
Collaborator

Fixes gh-1279.

  1. Cleans up operator implementations in "dpctl/tensor/_usmarray.pyx" and removes unused static method to improve coverage
  2. Provides private methods implementing dpctl.tensor.abs and dpctl.tensor.sqrt for complex types. This closes dpctl.tensor.abs() does not work on Iris Xe on Windows #1279.
  3. Implements, and adds tests to dpctl.tensor.allclose.
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

@oleksandr-pavlyk
Copy link
Collaborator Author

@ndgrigorian This PR superseded gh-1340, as it also ensures that dpctl.tensor.abs(dpctl.tensor.asarray([-0.0])) gives "+0.0".

1. Removed unused usm_ndarray._clone static C-only method
2. Removed _dispatch* utilities
3. Used direct calls to unary/binary operators in implementation
   of special methods
Provide cabs private method implementating abs for complex types, paying
attention to array-API mandated special values.

To work-around gh-1279, use std::hypot to compute value for finite inputs.
Compile with -DUSE_STD_ABS_FOR_COMPLEX_TYPES to use std::abs(z) instead of
std::hypot(std::real(z), std::imag(z)).
This change provides private method csqrt to evaluate square-root
for complex types. It handles special values as mandated by array API.

The finite input, it provides its own implementation based on std::hypot
and std::sqrt for real types instead of calling std::sqrt on finite
input of complex type.

Compile with -DUSE_STD_SQRT_FOR_COMPLEX_TYPES to use std::sqrt instead
of custom implementation.

Cursory performance study suggests that custom implementation is at least
not worse than std::sqrt one.
This utility function is based on symmetric check, unlike numpy.allclose,
and verifies that abs(x1-x2) < atol + rtol * max(abs(x1), abs(x2))

This way allclose(x1, x2) is symmetric, and allclose(x1,x2) implies
allclose(x2, x1).
@github-actions
Copy link

@coveralls
Copy link
Collaborator

coveralls commented Aug 15, 2023

Coverage Status

coverage: 85.469% (+0.4%) from 85.041% when pulling c4312cb on fix-gh-1279 into 9c18949 on master.

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev2=py310h7bf5fec_9 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev2=py310h7bf5fec_8 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev2=py310h7bf5fec_9 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev3=py310h7bf5fec_8 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

@oleksandr-pavlyk
Copy link
Collaborator Author

Verified that after the changes dpctl is able to execute problematic inputs on machine with Iris Xe running Windows:

(fix-gh-1279) C:\Users\opavlyk\gh>python -c "import dpctl.tensor as dpt; x = dpt.ones(10, dtype=dpt.complex64); y = dpt.abs(x); print(y)"
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

(fix-gh-1279) C:\Users\opavlyk\gh>python -c "import dpctl.tensor as dpt; x = dpt.ones(10, dtype=dpt.complex64); y = dpt.sqrt(x); print(y)"
[1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]

1. Aligned default values with those of np.allclose
2. Replaced less test with less_equal to align with NumPy.
Also added tests for early exits to improve coverage.
@ndgrigorian
Copy link
Collaborator

There is a compelling case for another change here:

numpy/numpy#10161

so the implementation would become

dpt.abs(x - y) <= dpt.maximum(rtol * dpt.maximum(dpt.abs(x), dpt.abs(y)), atol)

We aren't nearly as burdened by backwards compatibility, so it's interesting to consider, as the incorrect cases pointed out in this issue are incorrect here, too.

In [11]: rtol, atol = 1e-4, 2e-5

In [12]: a = dpt.asarray(0.142253)

In [13]: b = dpt.asarray(0.142219)

In [14]: dpt.allclose(a, b, atol=atol, rtol=rtol)
Out[14]: usm_ndarray(True)

In [15]: dpt.abs(a - b) <= dpt.maximum(rtol * dpt.maximum(dpt.abs(a), dpt.abs(b)), atol)
Out[15]: usm_ndarray(False)

Downside would be another call to maximum.

@oleksandr-pavlyk
Copy link
Collaborator Author

Downside would be another call to maximum.

It won't be a downside, we would just trade a call to dpt.add with a call to dpt.maximum.

@oleksandr-pavlyk
Copy link
Collaborator Author

@ndgrigorian I have pushed the change to use abs(a-b) <= max(atol, rtol*max(abs(a), abs(b)) formula. I have also reflected that in the tensor.allclose docstring.

@ndgrigorian
Copy link
Collaborator

ndgrigorian commented Aug 16, 2023

@oleksandr-pavlyk Does it make sense to change other calls to std::abs in dpctl be changed to use this implementation?

And if so, maybe it should be in math_utils.hpp. It may be better off in another PR to not let this one get too bloated, though.

@oleksandr-pavlyk
Copy link
Collaborator Author

I do not think so, but even if we do, we should do it in a separate PR.

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev3=py310h7bf5fec_16 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev3=py310h7bf5fec_17 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

This changes builds up on gh-1265 and takes into account queue
from the pre-allocated buffer, if provided.
@ndgrigorian
Copy link
Collaborator

@oleksandr-pavlyk
The sqrt implementation seems to have some issues with very large values and very small values.

In [1]: import dpctl.tensor as dpt, numpy as np

In [2]: val_max = dpt.finfo(dpt.complex128).max

In [3]: x = dpt.asarray(complex(val_max, val_max), dtype="c16")

In [4]: dpt.sqrt(x)
Out[4]: usm_ndarray(inf+0.j)

In [5]: x_np = np.asarray(complex(val_max, val_max), dtype="c16")

In [6]: np.sqrt(x_np)
Out[6]: (1.4730945569055655e+154+6.101757441282702e+153j)

In [7]: val_min = dpt.finfo(dpt.complex128).min

In [8]: x = dpt.asarray(complex(val_min, val_min), dtype="c16")

In [9]: dpt.sqrt(x)
Out[9]: usm_ndarray(0.-infj)

In [10]: x_np = np.asarray(complex(val_min, val_min), dtype="c16")

In [11]: np.sqrt(x_np)
Out[11]: (6.101757441282702e+153-1.4730945569055655e+154j)

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_23 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

This improves accuracy at extremes of supported range.

Use sycl:: namespace ldexp and ilogb to prevent problem
with VS 2017 headers.
@oleksandr-pavlyk
Copy link
Collaborator Author

@ndgrigorian I pushed the change to normalize the scale of the arguments in csqrt and rescale the result appropriately, but use of these scaling functions causes failure to offload on Windows to recur.

Perhaps the solution is to use sych functions from sycl namespace rather than from std:: namespace.

ilogb would have to pay attention to correctly computing
scale of denormal floats, while simpler code suffices.

Also use unscaled version in most cases, and scaled version
only for very large inputs.
We work around issues with these functions when their implementation
is taken from VS 2017 headers on Windows though.
@ndgrigorian
Copy link
Collaborator

@oleksandr-pavlyk

Since you removed other superfluous iostream includes in this PR, can you also remove the one in positive.hpp?

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_32 ran successfully.
Passed: 912
Failed: 88
Skipped: 119

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_34 ran successfully.
Passed: 912
Failed: 88
Skipped: 119

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_33 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

@oleksandr-pavlyk
Copy link
Collaborator Author

Reconfirmed that the latest build resolves gh-1279 on Windows machine with Iris Xe integrated GPU:

(gh-1279) C:\Users\user\repos>python -c "import dpctl.tensor as dpt; x = dpt.ones(10, dtype='c8'); y = dpt.sqrt(x); print(y)"
[1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]

(gh-1279) C:\Users\user\repos>python -c "import dpctl.tensor as dpt; x = dpt.ones(10, dtype='c8'); y = dpt.abs(x); print(y)"
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

(gh-1279) C:\Users\user\repos>python -m dpctl -f
Platform  0 ::
    Name        Intel(R) OpenCL
    Version     OpenCL 3.0 WINDOWS
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
        Version             2023.16.6.0.28_042959
        Filter string       opencl:cpu:0
Platform  1 ::
    Name        Intel(R) OpenCL HD Graphics
    Version     OpenCL 3.0
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Iris(R) Xe Graphics
        Version             31.0.101.4032
        Filter string       opencl:gpu:0
Platform  2 ::
    Name        Intel(R) Level-Zero
    Version     1.3
    Vendor      Intel(R) Corporation
    Backend     ext_oneapi_level_zero
    Num Devices 1
      # 0
        Name                Intel(R) Iris(R) Xe Graphics
        Version             1.3.24931
        Filter string       level_zero:gpu:0

(gh-1279) C:\Users\user\repos>conda list dpctl
# packages in environment at C:\Users\user\Miniconda3\envs\gh-1279:
#
# Name                    Version                   Build  Channel
dpctl                     0.14.6dev3      py310h81563b4_34    file:///C:/Users/user/repos/channel

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thank you @oleksandr-pavlyk

@oleksandr-pavlyk oleksandr-pavlyk merged commit 2f3be1f into master Aug 18, 2023
39 of 42 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the fix-gh-1279 branch August 18, 2023 18:11
@github-actions
Copy link

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.6dev3=py310ha25a700_34 ran successfully.
Passed: 913
Failed: 87
Skipped: 119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dpctl.tensor.abs() does not work on Iris Xe on Windows
3 participants