OpenCL errors for larger matrices w/ NVIDIA implementation #183

jeffhammond · 2017-06-16T22:47:29Z

OpenCL transpose breaks with matrices of rank 1296 or greater with the NVIDIA OpenCL implementation. This is NVIDIA-specific, because the Intel OpenCL is fine for much larger matrices.

It is possible that there is something that I can query to know in advance that this problem will appear. CL_DEVICE_ADDRESS_BITS exists but if the problem is 32b indexing, that should not manifest at 1296 (which is only 12.8 MiB).

jrhammon@klondike:~/Work/PRK/github-official/Cxx11$ ./transpose-opencl 10 1295
Parallel Research Kernels version 2.16
C++11/OpenCL Matrix transpose: B = A^T
Available OpenCL platform: NVIDIA CUDA
Available OpenCL platform: Intel(R) OpenCL
Matrix order          = 1295
Number of iterations  = 10
Solution validates
Rate (MB/s): 12611.9 Avg time (s): 0.00106378

jrhammon@klondike:~/Work/PRK/github-official/Cxx11$ ./transpose-opencl 10 1296
Parallel Research Kernels version 2.16
C++11/OpenCL Matrix transpose: B = A^T
Available OpenCL platform: NVIDIA CUDA
Available OpenCL platform: Intel(R) OpenCL
Matrix order          = 1296
Number of iterations  = 10
ERROR: Aggregate squared error 1896 exceeds threshold 1e-08

The text was updated successfully, but these errors were encountered:

jeffhammond · 2017-06-20T20:47:40Z

I see the same thing in https://github.com/jeffhammond/PRK/blob/9fdcc953e8a962a9d13508e3a3a092c07c05fd45/Cxx11/transpose-cuda.cu so it is presumably a problem with the low-level implementation.

jeffhammond · 2017-06-21T23:24:18Z

With CUDA 8.0, I don't see these issues any more, at least with OpenCL.

jrhammon@klondike:~/Work/PRK/github-official/Cxx11$ ./transpose-opencl 10 1296
./transpose-opencl: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by ./transpose-opencl)
./transpose-opencl: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by ./transpose-opencl)
Parallel Research Kernels version 2.16
C++11/OpenCL Matrix transpose: B = A^T
Available OpenCL platforms: 
CL_PLATFORM_NAME=NVIDIA CUDA, CL_PLATFORM_VENDOR=NVIDIA Corporation (DEFAULT)
   CL_DEVICE_NAME=GeForce GTX 960
   CL_DEVICE_VENDOR=NVIDIA Corporation
   CL_DEVICE_AVAILABLE=1
   CL_DEVICE_TYPE=GPU
   CL_DEVICE_MAX_COMPUTE_UNITS=8
   CL_DEVICE_GLOBAL_MEM_SIZE=2090270720
   CL_DEVICE_MAX_CLOCK_FREQUENCY=1228
   CL_DEVICE_MAX_MEM_ALLOC_SIZE=522567680
   CL_DEVICE_LOCAL_MEM_SIZE=49152
   CL_DEVICE_EXTENSIONS contains cl_khr_fp64

CL_PLATFORM_NAME=Intel(R) OpenCL, CL_PLATFORM_VENDOR=Intel(R) Corporation
   CL_DEVICE_NAME=Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
   CL_DEVICE_VENDOR=Intel(R) Corporation
   CL_DEVICE_AVAILABLE=1
   CL_DEVICE_TYPE=CPU
   CL_DEVICE_MAX_COMPUTE_UNITS=16
   CL_DEVICE_GLOBAL_MEM_SIZE=16645246976
   CL_DEVICE_MAX_CLOCK_FREQUENCY=3000
   CL_DEVICE_MAX_MEM_ALLOC_SIZE=4161311744
   CL_DEVICE_LOCAL_MEM_SIZE=32768
   CL_DEVICE_EXTENSIONS contains cl_khr_fp64

Matrix order          = 1296
Number of iterations  = 10
CPU Precision         = 64-bit
Solution validates
Rate (MB/s): 15035.8 Avg time (s): 0.00178733
GPU Precision         = 64-bit
Solution validates
Rate (MB/s): 20127.7 Avg time (s): 0.00133517

jeffhammond added the OpenCL label Jun 16, 2017

jeffhammond self-assigned this Jun 16, 2017

jeffhammond added OpenCL and removed OpenCL labels Jan 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCL errors for larger matrices w/ NVIDIA implementation #183

OpenCL errors for larger matrices w/ NVIDIA implementation #183

jeffhammond commented Jun 16, 2017

jeffhammond commented Jun 20, 2017

jeffhammond commented Jun 21, 2017

OpenCL errors for larger matrices w/ NVIDIA implementation #183

OpenCL errors for larger matrices w/ NVIDIA implementation #183

Comments

jeffhammond commented Jun 16, 2017

jeffhammond commented Jun 20, 2017

jeffhammond commented Jun 21, 2017