Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reshape improvements #1677

Merged
merged 4 commits into from
May 16, 2024
Merged

Reshape improvements #1677

merged 4 commits into from
May 16, 2024

Conversation

oleksandr-pavlyk
Copy link
Collaborator

Closes gh-1664.

Improves performance of dpt.reshape(X, new_shape, order="F") when copy is needed.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented May 15, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_355 ran successfully.
Passed: 887
Failed: 18
Skipped: 91

@coveralls
Copy link
Collaborator

Coverage Status

coverage: 87.951% (+0.003%) from 87.948%
when pulling 9d2633f on reshape-improvements
into 7bc3124 on master.

@antonwolfy
Copy link
Collaborator

53 tests affecting call of gemm function are failed in dpnp with that PR.
But I believe this might be due to some unproper internal logic in dpnp.

The first issue I've found is incorrect result returned below:

import dpnp
from dpnp.dpnp_utils.dpnp_utils_linearalgebra import _define_contig_flag

a = dpnp.arange(3).reshape((1, 3, 1))
_define_contig_flag(a)
# Out: False

but it's expected to return True, because a is both C-contiguous or F-contiguous along last two dimensions.
It will result in unexpected extra memory allocation to copy array a to temporary C-contiguous array.

The second one is somewhere in _gemm_batch implementation:

import dpctl, dpctl.tensor as dpt
import dpnp.backend.extensions.blas._blas_impl as bi

a = dpt.reshape(dpt.arange(60, dtype='f4'), (5, 4, 3))
b = dpt.reshape(dpt.arange(3, dtype='f4'), (1, 3, 1))
b = dpt.copy(b)
c = dpt.zeros((5, 4, 1))

a.strides, b.strides, c.strides
# Out: ((12, 3, 1), (3, 1, 1), (4, 1, 1))

ev,  _, _ = bi._gemm_batch(a.sycl_queue, a, b, c)
ev.wait()

c
# Out:
# usm_ndarray([[[ 5.],
#               [14.],
#               [23.],
#               [32.]],
# 
#              [[ 0.],
#               [ 0.],
#               [ 0.],
#               [ 0.]],
# 
#              [[ 0.],
#               [ 0.],
#               [ 0.],
#               [ 0.]],
# 
#              [[ 0.],
#               [ 0.],
#               [ 0.],
#               [ 0.]],
# 
#              [[ 0.],
#               [ 0.],
#               [ 0.],
#               [ 0.]]], dtype=float32)

@vtavana , could you please look on that?

@vtavana
Copy link
Collaborator

vtavana commented May 15, 2024

53 tests affecting call of gemm function are failed in dpnp with that PR. But I believe this might be due to some unproper internal logic in dpnp.

The necessary changes are implemented in dpnp-gh-1828. Relevant tests in dpnp are now passed with both master branch of dpctl and this branch.

@ndgrigorian
Copy link
Collaborator

ndgrigorian commented May 15, 2024

The first issue I've found is incorrect result returned below:

import dpnp
from dpnp.dpnp_utils.dpnp_utils_linearalgebra import _define_contig_flag

a = dpnp.arange(3).reshape((1, 3, 1))
_define_contig_flag(a)
# Out: False

but it's expected to return True, because a is both C-contiguous or F-contiguous along last two dimensions. It will result in unexpected extra memory allocation to copy array a to temporary C-contiguous array.

First case is working correctly in dpctl

In [1]: import dpctl.tensor as dpt, dpctl, numpy as np

In [2]: x = dpt.reshape(dpt.arange(10), (1, 10, 1))

In [3]: x.flags.contiguous
Out[3]: True

In [4]: x.flags
Out[4]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  WRITABLE : True

@vtavana does dpnp-gh-1828 address this case too?

@vtavana
Copy link
Collaborator

vtavana commented May 15, 2024

First case is working correctly in dpctl

In [1]: import dpctl.tensor as dpt, dpctl, numpy as np

In [2]: x = dpt.reshape(dpt.arange(10), (1, 10, 1))

In [3]: x.flags.contiguous
Out[3]: True

In [4]: x.flags
Out[4]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  WRITABLE : True

@vtavana does dpnp-gh-1828 address this case too?

Yes, both examples provided by @antonwolfy also work fine in dpnp-gh-1828.

As a side note, _define_contig_flag from dpnp is used in batch calculation. And the goal is to check if each 2D array that forms the N-D array is f-contiguous or c-contiguous. So, we do not use built-in flag there.

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this out, this LGTM!

@oleksandr-pavlyk oleksandr-pavlyk merged commit c994666 into master May 16, 2024
60 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the reshape-improvements branch May 16, 2024 15:26
oleksandr-pavlyk added a commit that referenced this pull request May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dpt.reshape changes strides of input array when reshaping to the same shape
5 participants