[cuSolver] Avoid repeated ctxCreate/Destroy for all Lapack API calls. #298

HaoweiZhangIntel · 2023-03-30T08:47:23Z

Description

Mainly improve the performance of Lapack for CUDA backend by avoiding repeated cuCtxCreate/Destroy calls.

Apply the same logic as cuBlas to cuSolver at placedContext_.
This could avoid calling cuCtxCreate & cuCtxDestroy every time when using multiple lapck APIs.
For example, when solving Ax=b with Cholesky factorization, one needs to use both lapack::potrf & lapack::potrs APIs.
cuCtxCreate/Destroy takes much longer than most GPU lapack kernels, see the below images from nvvp diagnostics:

Before modification:

After modification:
Fix deprecation warnings from cuda.hpp for cuSolver ([BLAS] fix deprecation warnings from cuda.hpp #295).
Fix the bug in dft (mklgpu => mklcpu).

Checklist

All Submissions

Do all unit tests pass locally? Attach a log.
unit_test_lapack.txt
unit_test_rand.txt
unit_test_blas.txt
Have you formatted the code using clang-format?

* Apply the same logic as cuBlas to cuSolver at placedContext_. Avoid calling cuCtxCreate every time when using multiple lapck APIs. * Fix deprecation warnings from cuda.hpp for cuSolver (oneapi-src#295). * Fix the bug in dft (mklgpu => mklcpu).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuSolver] Avoid repeated ctxCreate/Destroy for all Lapack API calls. #298

[cuSolver] Avoid repeated ctxCreate/Destroy for all Lapack API calls. #298

HaoweiZhangIntel commented Mar 30, 2023 •

edited

[cuSolver] Avoid repeated ctxCreate/Destroy for all Lapack API calls. #298

Are you sure you want to change the base?

[cuSolver] Avoid repeated ctxCreate/Destroy for all Lapack API calls. #298

Conversation

HaoweiZhangIntel commented Mar 30, 2023 • edited

Description

Checklist

All Submissions

HaoweiZhangIntel commented Mar 30, 2023 •

edited