Different tensor computation result between using CPU and GPU devices #255

xiangfuli · 2023-07-05T21:05:32Z

🐛 Describe the bug

During the development of the skew2vec function (trying to transform the skew-symmetric matrix to the 3D vector), I found when using this formula err_pose = (torch.mm(desired_pose.T, Rwb) - torch.mm(Rwb.T, desired_pose))(All tensor is 3x3 matrix) which is similar with torch.mm(A, B.T) - torch.mm(B, A.T), the result will not match the skew-symmetric characteristic and the difference is pretty small.

The implementation of skew2vec function is:

def skew2vec(input:torch.Tensor) -> torch.Tensor:
    v = input.tensor() if hasattr(input, 'ltype') else input
    assert v.shape[-2:] == (3, 3), "Last 2 dim should be (3, 3)"
    assert torch.equal(v.permute(0, 2, 1), -v), "Each matrix must be a skew matrix"

    return torch.stack([torch.stack([-v[..., 1, 2]], dim=-1),
                            torch.stack([ v[..., 0, 2]], dim=-1),
                            torch.stack([-v[..., 0, 1]], dim=-1)], dim=-1)

The following is the tensor calculation result when I use cuda type tensor.

tensor([[[ 0.00000000000000000000, -0.00632718438454868588, -0.18133159863325493122],
[ 0.00632718438454868675,  0.00000000000000000000, -0.18298103584576658198],
[ 0.18133159863325493122,  0.18298103584576663749, 0.00000000000000000000]]], device='cuda:0', dtype=torch.float64)

which will cause the second assertion in the skew2vec function being failed. The PyTorch official is aware of this NUMERICAL ACCURACY problem.

I also found some functions have added the rtol and atol parameters in the method, e.g. this mat2SO3 function

pypose/pypose/lietensor/convert.py

Line 258 in 3b31492

def mat2Sim3(mat, check=True, rtol=1e-5, atol=1e-5):

.

Maybe the best way to also add the rtol and atol in the skew2vec parameters and use torch.allclose to check the equality?

Versions

Collecting environment information...
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.31

Python version: 3.10.9 (main, Mar  1 2023, 18:23:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1036-gcp-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 450.236.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.5
[pip3] numpydoc==1.5.0
[pip3] torch==2.0.1
[conda] blas                      1.0                         mkl
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-service               2.4.0           py310h7f8727e_0
[conda] mkl_fft                   1.3.1           py310hd6ae3a3_0
[conda] mkl_random                1.2.2           py310h00e6091_0
[conda] numpy                     1.23.5          py310hd5efca6_0
[conda] numpy-base                1.23.5          py310h8e6c178_0
[conda] numpydoc                  1.5.0           py310h06a4308_0
[conda] torch                     2.0.1                    pypi_0    pypi

The text was updated successfully, but these errors were encountered:

wang-chen · 2023-07-07T01:26:19Z

I agree with this change, @xiangfuli you may feel free to change it and create a PR. Thank you so much!

xiangfuli mentioned this issue Jul 8, 2023

Support controller parameters tuning pipeline #250

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different tensor computation result between using CPU and GPU devices #255

Different tensor computation result between using CPU and GPU devices #255

xiangfuli commented Jul 5, 2023

wang-chen commented Jul 7, 2023

Different tensor computation result between using CPU and GPU devices #255

Different tensor computation result between using CPU and GPU devices #255

Comments

xiangfuli commented Jul 5, 2023

🐛 Describe the bug

Versions

wang-chen commented Jul 7, 2023