Skip to content

artuppp/EllipseFitCUDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EllipseFitCUDA

Ellipse Fit Implementation in CUDA

This repository contains an CUDA GPU-accelerated implementation of a fitting ellipse based on least-squares approach. This is completely based on Ohad Gal (2023). fit_ellipse (https://www.mathworks.com/matlabcentral/fileexchange/3215-fit_ellipse), MATLAB Central File Exchange.

It uses CUBLAS and CUSOLVE libraries which are included into the NVIDIA CUDA TOOLKIT

Note: for small 5x5 system solve I recomend using this code block instead of cuSolve call. This is because for last matrix-vector multiplication (which is also performed on the cuSolve call) there is some auto-tunning overhead that is slowing our execution (avoided down using simple 5x5 matrix multiplcation and picking just first column of the result). Since it also depends on your hardware capabilities, you should try both versions and use the faster one for your device.

        // Solving through PSEUDOINVERSE (A^T*A)-1 * A^T * b (with A = A^T*A)
        cudaDeviceSynchronize();
        t.start();
        // Calculate A^T*A
        cublasSgemm(cublas_handle, CUBLAS_OP_N, CUBLAS_OP_T, 5, 5, size, &alpha, d_a, 5, d_a, 5, &beta, d_a2inv,
                    5);
        // Inverse (A^T*A)-1
        cublasSmatinvBatched(cublas_handle, 5, d_array, 5, d_array, 5, d_info, 1);

        // Multiply (A^T*A)-1 * A^T
        cublasSgemm(cublas_handle, CUBLAS_OP_N, CUBLAS_OP_T, 5, 5, 5, &alpha, d_a2inv, 5, d_a, 5, &beta, d_a,
                    5);
        // Multiply (A^T*A)-1 * A^T
        // I dont know why it is quite fast to do the 5x5 * 5x5 matrix multiplication instead of 5x5 * 5x1 matrix multiplication, so I do it this way and take the first column of the result
        cublasSgemm(cublas_handle, CUBLAS_OP_N, CUBLAS_OP_N, 5, 5, 5, &alpha, d_a, 5, d_b, 5, &beta, d_x,
                    5);

        cudaDeviceSynchronize();
        time = t.elapsed();
        cout << "Time solve system: " << time / 10e6 << " ms" << endl;
        //        printBMatrix<<<1, 1>>>(d_x);