Improve behaviour of `torch.linalg.lstsq` on CUDA GPU for rank defficient matrices #117122

tvercaut · 2024-01-10T17:06:38Z

🚀 The feature, motivation and pitch

As per the documentation:

For CUDA input [torch.linalg.lstsq] assumes that A is full-rank.

While documented, this behaviour is counter-intuitive for end-users especially if the function silently fails.

Interestingly, currently calling torch.linalg.lstsq on CUDA for rank defficient input silently fails in non-batched-mode but throws a _LinAlgError in batched mode.

It is also counter-intuitive that torch.linalg.lstsq on CUDA is not able to fallback to a more stable SVD driver despite torch.linalg.svd being supported on CUDA.

It would be great if:

The QR based implementation always threw an error is the input is not full rank
An SVD backend would be available on CUDA as well

Alternatives

An alternative is to implement an SVD-based least-squares and use that instead of torch.linalg.lstsq. Here is a basic implementation (feel free to post refinements):

def svd_lstsq(AA, BB, tol=1e-5):
    U, S, Vh = torch.linalg.svd(AA, full_matrices=False)
    Spinv = torch.zeros_like(S)
    Spinv[S>tol] = 1/S[S>tol]
    UhBB = U.adjoint() @ BB
    if Spinv.ndim!=UhBB.ndim:
      Spinv = Spinv.unsqueeze(-1)
    SpinvUhBB = Spinv * UhBB
    return Vh.adjoint() @ SpinvUhBB

Additional context

Simple script to replicate:

import torch
print(f'Running PyTorch version: {torch.__version__}')

torchdevice = torch.device('cpu')
if torch.cuda.is_available():
  torchdevice = torch.device('cuda')
  print('Default GPU is ' + torch.cuda.get_device_name(torch.device('cuda')))
print('Running on ' + str(torchdevice))

b = 2
r = 5
c = 3
k = 1

if b==1:
  A = torch.randn(r, c, device=torchdevice)
  if k==1:
    B = torch.randn(r, device=torchdevice)
  else:
    B = torch.randn(r, k, device=torchdevice)
else:
  A = torch.randn(b, r, c, device=torchdevice)
  B = torch.randn(b, r, k, device=torchdevice)

# degrade rank
A[...,-1] = 0
print("A",A)

try:
  X_lstsq = torch.linalg.lstsq(A, B).solution
  print("X_lstsq",X_lstsq)
except Exception as error:
  print("An error occurred:", type(error).__name__, "–", error)

X_pinv=torch.linalg.pinv(A) @ B
print("X_pinv",X_pinv)

def svd_lstsq(AA, BB, tol=1e-5):
    U, S, Vh = torch.linalg.svd(AA, full_matrices=False)
    Spinv = torch.zeros_like(S)
    Spinv[S>tol] = 1/S[S>tol]
    UhBB = U.adjoint() @ BB
    if Spinv.ndim!=UhBB.ndim:
      Spinv = Spinv.unsqueeze(-1)
    SpinvUhBB = Spinv * UhBB
    return Vh.adjoint() @ SpinvUhBB

X_svd= svd_lstsq(A, B)
print("X_svd",X_svd)

Related issues: #88101 #85021 #10454

cc @ptrblck @jianyuh @nikitaved @pearu @mruberry @walterddr @xwang233 @lezcano

The text was updated successfully, but these errors were encountered:

lezcano · 2024-01-29T10:12:56Z

Yep, this makes sense. I had this in mind when we were implementing torch.linalg but never got around implementing it. Would you want to send a PR adding this behaviour?

tvercaut · 2024-01-29T14:02:05Z

Sorry I don't think I will be able to work on a PR.

ZelboK · 2024-04-27T18:21:56Z

@lezcano I can work on a PR. Hopefully done by this weekend.

lezcano added the actionable label Jan 29, 2024

ZelboK linked a pull request Apr 28, 2024 that will close this issue

Allow linalg.lstsq to use svd to compute the result for rank deficient matrices. #125110

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve behaviour of `torch.linalg.lstsq` on CUDA GPU for rank defficient matrices #117122

Improve behaviour of `torch.linalg.lstsq` on CUDA GPU for rank defficient matrices #117122

tvercaut commented Jan 10, 2024 •

edited by pytorch-bot bot

lezcano commented Jan 29, 2024

tvercaut commented Jan 29, 2024

ZelboK commented Apr 27, 2024 •

edited

Improve behaviour of torch.linalg.lstsq on CUDA GPU for rank defficient matrices #117122

Improve behaviour of torch.linalg.lstsq on CUDA GPU for rank defficient matrices #117122

Comments

tvercaut commented Jan 10, 2024 • edited by pytorch-bot bot

🚀 The feature, motivation and pitch

Alternatives

Additional context

lezcano commented Jan 29, 2024

tvercaut commented Jan 29, 2024

ZelboK commented Apr 27, 2024 • edited

Improve behaviour of `torch.linalg.lstsq` on CUDA GPU for rank defficient matrices #117122

Improve behaviour of `torch.linalg.lstsq` on CUDA GPU for rank defficient matrices #117122

tvercaut commented Jan 10, 2024 •

edited by pytorch-bot bot

ZelboK commented Apr 27, 2024 •

edited