New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of BLAS and LAPACK for Optimized ndarray Operations: A Proof of Concept with GEMM #1647
base: devel
Are you sure you want to change the base?
Conversation
Hello! Welcome to Pyccel! Thank you very much for your contribution ❤️. I am the GitHub bot. I will help guide you through the different stages necessary to validate a review in Pyccel. If you haven't yet seen our developer docs make sure you check them out here. Amongst other things they describe the review process that we have just started. You can also get in touch with our other developers on our Pyccel Discord Server. To begin with I will give you a short checklist to make sure your pull request is complete. Please tick items off when you have completed them or determined that they are not necessary for this pull request. If you want me to run any specific tests to check out corner cases that you can't easily check on your computer, you can request this using the command Please begin by requesting your checklist using the command |
@@ -9,6 +9,7 @@ | |||
# include <complex.h> | |||
# include <stdbool.h> | |||
# include <stdint.h> | |||
# include <cblas.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to import this here rather than adding it to the imports of the generated file?
@@ -1736,6 +1736,11 @@ def _print_NumpySum(self, expr): | |||
return f'numpy_sum_bool({name})' | |||
raise NotImplementedError('Sum not implemented for argument') | |||
|
|||
def _print_NumpyMatmul(self, expr, lhs): | |||
if all(x.dtype == NativeFloat() for x in expr.args) and all(len(x.shape) == 2 for x in expr.args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to also beware of the precision
I think it is a good idea to use GEMM to implement
I am not sure I understand your argument here. I agree that the existing naive implementation of ndarray functions results in suboptimal performance. This is evident from our benchmarks. However I am not sure why you feel that BLAS/LAPACK is the thing that will give the most benefit in Pyccel. My opinion is that this is an enhancement of the current implementation, adding functionalities that were previously missing, not an improvement which will affect the performance of code written using the currently supported subset of Python. Is there something I am missing? It is probably also worth mentioning that there has been a lot of discussion recently about replacing our current implementation of Please send me a message on discord so we can discuss this in more detail. |
@jalalium Do you want to leave this PR open or can we close it? |
Overview
This pull request represents a significant step forward in enhancing Pyccel's performance capabilities. Currently, Pyccel's ndarray functions rely on naive implementations, which, when functional, do not leverage the computational efficiency and speed optimizations provided by advanced libraries like BLAS and LAPACK. These libraries are cornerstones in scientific computing for their unparalleled optimizations in linear algebra operations.
Motivation
The motivation for this pull request is twofold:
Performance Boost: The existing naive implementation of ndarray functions in Pyccel results in suboptimal performance, particularly evident in large-scale computations. Integrating BLAS and LAPACK will enable Pyccel to achieve a substantial speed-up, aligning it with the performance of native implementations in more lower-level languages.
Application of the DRY concept: The integration of widely recognized and utilized libraries like BLAS and LAPACK brings Pyccel in line with numpy methods, enhancing its appeal and utility in the scientific computing community.
Proof of Concept: GEMM for Float Arrays
As a proof of concept, this pull request introduces the integration of the GEMM (General Matrix Multiply) operation from BLAS for float arrays. This implementation showcases the potential performance improvements Pyccel can achieve by utilizing these optimized libraries.
Input:
Command:
Result:
Discussion: Roadmap for Full Integration
This proof of concept is just the beginning of fully harnessing BLAS and LAPACK in Pyccel. The complete integration of these libraries necessitates a comprehensive discussion on several key points:
Scope and Extent: Identifying the most essential functions from BLAS and LAPACK for early integration, based on Pyccel's common use-cases.
Integration Strategy: Developing a systematic approach for incorporating these functions, with considerations for ease of use, maintainability, and compatibility with the existing Pyccel codebase.
Discrepency mitigation: For gemm for example, there is no implementation for int types, however numpy does support this. Discussion is needed in such scenario or when there is no equivalent function in blas/lapack.
Performance Benchmarks: Establishing a benchmarking framework to quantitatively evaluate the performance gains from this integration.
Conclusion
This pull request not only aims to improve the performance of Pyccel but also seeks to align it with the best practices in scientific computing. Your feedback and contributions are crucial as we set out to enhance Pyccel's efficiency and utility through the integration of BLAS and LAPACK.