Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 'min_area_polygons' CUDA error: an illegal memory access was encountered #2407

Closed
Shiyang980713 opened this issue Nov 13, 2022 · 1 comment · May be fixed by #2788
Closed

[BUG] 'min_area_polygons' CUDA error: an illegal memory access was encountered #2407

Shiyang980713 opened this issue Nov 13, 2022 · 1 comment · May be fixed by #2788

Comments

@Shiyang980713
Copy link

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The unexpected results still exist in the latest version.

Describe the Issue
This bug occurs when I use mmrotate, but it's caused by the Unexpected polys value. So I think it might be a mmcv bug.

For some specific pts tensor value:

pts = torch.load('pts.pt').detach().cpu().cuda()
polys = min_area_polygons(pts)
bboxes = poly2obb(polys, self.version)

will return:

RuntimeError:CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

for any self.version, same error returned.
The pts.pt file is in the attachment

Environment
{'sys.platform': 'linux', 'Python': '3.9.0 (default, Nov 15 2020, 14:28:56) [GCC 7.3.0]', 'CUDA available': True, 'GPU 0,1,2,3,4,5,6,7': 'GeForce RTX 3090', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Cuda compilation tools, release 11.3, V11.3.58', 'GCC': 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0', 'PyTorch': '1.11.0+cu113', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.3\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n - CuDNN 8.2\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n', 'TorchVision': '0.12.0+cu113', 'OpenCV': '4.6.0', 'MMCV': '1.7.0', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': '11.3'}

BTW, I have tested this code on mmcv 1.6.1/1.6.2/1.7.0, and get the same error

Error traceback
image

Bug fix
I guess this bug might caused by some specific pts value, so I just add a minor random noise to avoid the bug.

pts = pts + torch.rand(pts.shape).to(pts.device) * 0.0001
polys = min_area_polygons(pts)
bboxes = poly2obb(polys, self.version)

This noise is very small, so the evaluation results is almost consistent.
pts.pt.zip

@zhouzaida zhouzaida removed their assignment Nov 13, 2022
@zytx121
Copy link
Contributor

zytx121 commented Nov 14, 2022

Hi @Shiyang980713
Due to numerical instability, min_ area_polygons cannot handle some extreme situations. When the points are close to each other and almost overlap, an error will be reported. You've done a good job. We can limit the input to prevent errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants