Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of BLAS and LAPACK for Optimized ndarray Operations: A Proof of Concept with GEMM #1647

Draft
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

jalalium
Copy link
Member

@jalalium jalalium commented Jan 4, 2024

Overview

This pull request represents a significant step forward in enhancing Pyccel's performance capabilities. Currently, Pyccel's ndarray functions rely on naive implementations, which, when functional, do not leverage the computational efficiency and speed optimizations provided by advanced libraries like BLAS and LAPACK. These libraries are cornerstones in scientific computing for their unparalleled optimizations in linear algebra operations.

Motivation

The motivation for this pull request is twofold:

  1. Performance Boost: The existing naive implementation of ndarray functions in Pyccel results in suboptimal performance, particularly evident in large-scale computations. Integrating BLAS and LAPACK will enable Pyccel to achieve a substantial speed-up, aligning it with the performance of native implementations in more lower-level languages.

  2. Application of the DRY concept: The integration of widely recognized and utilized libraries like BLAS and LAPACK brings Pyccel in line with numpy methods, enhancing its appeal and utility in the scientific computing community.

Proof of Concept: GEMM for Float Arrays

As a proof of concept, this pull request introduces the integration of the GEMM (General Matrix Multiply) operation from BLAS for float arrays. This implementation showcases the potential performance improvements Pyccel can achieve by utilizing these optimized libraries.

Input:

import numpy as np

if __name__ == "__main__":
    a = np.array([[1,2,4],[4,5,3]], dtype=np.float32)
    b = np.array([[1,2],[4,5], [3,4]], dtype=np.float32)
    c = np.matmul(a,b)
    print(c)
    

Command:

pyccel --language c test.py --libs=openblas

Result:

#include "test.h"
#include <stdlib.h>
#include "ndarrays.h"
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <inttypes.h>
int main()
{
    t_ndarray a = {.shape = NULL};
    t_ndarray b = {.shape = NULL};
    t_ndarray c = {.shape = NULL};
    int64_t i;
    int64_t i_0001;
    int64_t i_0002;
    a = array_create(2, (int64_t[]){INT64_C(2), INT64_C(3)}, nd_float, false, order_c);
    float Dummy_0000[] = {INT64_C(1), INT64_C(2), INT64_C(4), INT64_C(4), INT64_C(5), INT64_C(3)};
    memcpy(&a.nd_float[INT64_C(0)], Dummy_0000, 6 * a.type_size);
    b = array_create(2, (int64_t[]){INT64_C(3), INT64_C(2)}, nd_float, false, order_c);
    float Dummy_0001[] = {INT64_C(1), INT64_C(2), INT64_C(4), INT64_C(5), INT64_C(3), INT64_C(4)};
    memcpy(&b.nd_float[INT64_C(0)], Dummy_0001, 6 * b.type_size);
    c = array_create(2, (int64_t[]){INT64_C(2), INT64_C(2)}, nd_float, false, order_c);
    cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,2, 2, 3, 1.0f, a.nd_float, 3, b.nd_float, 2, 0.0f, c.nd_float, 2);
    printf("%s", "[");
    for (i = INT64_C(0); i < INT64_C(1); i += INT64_C(1))
    {
        printf("%s", "[");
        for (i_0001 = INT64_C(0); i_0001 < INT64_C(1); i_0001 += INT64_C(1))
        {
            printf("%.12f ", GET_ELEMENT(c, nd_float, (int64_t)i, (int64_t)i_0001));
        }
        printf("%.12f]\n", GET_ELEMENT(c, nd_float, (int64_t)i, (int64_t)INT64_C(1)));
    }
    printf("%s", "[");
    for (i_0002 = INT64_C(0); i_0002 < INT64_C(1); i_0002 += INT64_C(1))
    {
        printf("%.12f ", GET_ELEMENT(c, nd_float, (int64_t)INT64_C(1), (int64_t)i_0002));
    }
    printf("%.12f]]\n", GET_ELEMENT(c, nd_float, (int64_t)INT64_C(1), (int64_t)INT64_C(1)));
    free_array(&a);
    free_array(&b);
    free_array(&c);
    return 0;
}

Discussion: Roadmap for Full Integration

This proof of concept is just the beginning of fully harnessing BLAS and LAPACK in Pyccel. The complete integration of these libraries necessitates a comprehensive discussion on several key points:

  • Scope and Extent: Identifying the most essential functions from BLAS and LAPACK for early integration, based on Pyccel's common use-cases.

  • Integration Strategy: Developing a systematic approach for incorporating these functions, with considerations for ease of use, maintainability, and compatibility with the existing Pyccel codebase.

  • Discrepency mitigation: For gemm for example, there is no implementation for int types, however numpy does support this. Discussion is needed in such scenario or when there is no equivalent function in blas/lapack.

  • Performance Benchmarks: Establishing a benchmarking framework to quantitatively evaluate the performance gains from this integration.

Conclusion

This pull request not only aims to improve the performance of Pyccel but also seeks to align it with the best practices in scientific computing. Your feedback and contributions are crucial as we set out to enhance Pyccel's efficiency and utility through the integration of BLAS and LAPACK.

@pyccel-bot
Copy link

pyccel-bot bot commented Jan 4, 2024

Hello! Welcome to Pyccel! Thank you very much for your contribution ❤️.

I am the GitHub bot. I will help guide you through the different stages necessary to validate a review in Pyccel. If you haven't yet seen our developer docs make sure you check them out here. Amongst other things they describe the review process that we have just started. You can also get in touch with our other developers on our Pyccel Discord Server.

To begin with I will give you a short checklist to make sure your pull request is complete. Please tick items off when you have completed them or determined that they are not necessary for this pull request. If you want me to run any specific tests to check out corner cases that you can't easily check on your computer, you can request this using the command /bot run X. Use the command /bot show tests to see a full list of the tests I can run. Once you have finished preparing your pull request and are ready to request reviews just take your PR out of draft, or let me know with the command /bot mark as ready. I will then run the full suite of tests to check that everything is as neat as you think before asking other contributors for reviews. Tests will not run automatically before this point to avoid wasting resources. You can get a full list of commands that I understand using /bot commands.

Please begin by requesting your checklist using the command /bot checklist

@github-actions github-actions bot marked this pull request as draft January 4, 2024 17:03
@@ -9,6 +9,7 @@
# include <complex.h>
# include <stdbool.h>
# include <stdint.h>
# include <cblas.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to import this here rather than adding it to the imports of the generated file?

@@ -1736,6 +1736,11 @@ def _print_NumpySum(self, expr):
return f'numpy_sum_bool({name})'
raise NotImplementedError('Sum not implemented for argument')

def _print_NumpyMatmul(self, expr, lhs):
if all(x.dtype == NativeFloat() for x in expr.args) and all(len(x.shape) == 2 for x in expr.args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to also beware of the precision

@EmilyBourne
Copy link
Member

EmilyBourne commented Jan 4, 2024

I think it is a good idea to use GEMM to implement matmul in C. However, given that these functions are exposed to Python in scipy I think it would be better to attack this from that angle first. This allows us to then easily map matmul to those functions once we have a functioning implementation following the DRY principle. This work was started in #1022 however unfortunately no-one has worked on the PR for 2 years. When this was begun we had several discussions about some of the other points that you raise regarding the roadmap. Maybe it would be helpful if we organise a meeting to discuss this?

Performance Boost: The existing naive implementation of ndarray functions in Pyccel results in suboptimal performance, particularly evident in large-scale computations. Integrating BLAS and LAPACK will enable Pyccel to achieve a substantial speed-up, aligning it with the performance of native implementations in more lower-level languages.

I am not sure I understand your argument here. I agree that the existing naive implementation of ndarray functions results in suboptimal performance. This is evident from our benchmarks. However I am not sure why you feel that BLAS/LAPACK is the thing that will give the most benefit in Pyccel. My opinion is that this is an enhancement of the current implementation, adding functionalities that were previously missing, not an improvement which will affect the performance of code written using the currently supported subset of Python. Is there something I am missing?

It is probably also worth mentioning that there has been a lot of discussion recently about replacing our current implementation of ndarrays_t in order to improve the poor performance of the low-level functionalities (as shown in the benchmarks). Lists and sets will be implemented in the coming months and the library chosen has an implementation of arrays with good performance (see discussion here for more detail: #1053 (comment) ). A proof of concept needs to be written to test whether this implementation indeed improves the results of our benchmarks. However if this is the case then we may switch to this implementation. As a result I think this is a bad time to begin major developments involving ndarray. Again this is something that we can probably discuss more easily in a meeting.

Please send me a message on discord so we can discuss this in more detail.

@EmilyBourne
Copy link
Member

@jalalium Do you want to leave this PR open or can we close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants