-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low performance on PortBLAS vs DPC++ #1417
Comments
Rebuilt with dpcpp for the sake for sanity
Run benchmark:
Output:
Summary:
Which is quite similar to the above. Something is slowing down acpp. |
It's an Intel/Codeplay library. Obviously the focus of optimization and validation was on DPC++. To their credit, at least they tried to make it work, which cannot be said of all of the oneAPI SYCL libraries. Those don't even try to support anything but DPC++. There seem to be a couple of AdaptiveCpp-specific code paths in portBLAS, so the executed code won't be the same: I don't know why exactly they are needed. Some seem to be to work around that the generic SSCP compiler does not yet implement the SYCL 2020 group algorithms library. If there are no other issues and it turns out that it is bound by group algorithm performance we could close this issue as it's a known limitation and on the todo list. |
I didn't realize this and assumed it's the same code.
Is there a checklist of the things that are implemented and that are not? |
Compared to the older compilation flows, it's really only the SYCL 2020 group algorithm library, and SYCL 2020 reductions. The latter of which are also not fully implemented in the old SMCP compilers. Plus some less important features: The scoped parallelism extension, and the hierarchical parallelism model which was explicitly discouraged in the SYCL 2020 spec and which is likely to be removed in future SYCL versions. On the other hand, the SSCP compiler supports functionality that the old compilers do not implement such as Compared to DPC++... this is actually a contentious issue. There is no consensus between implementations about which features from SYCL 2020 are actually portable and implementable across implementations. DPC++ implements some functionality that is in SYCL 2020 that was merged without any prior implementation experience and only makes sense for DPC++.. |
Hi,
I've just built adaptivecpp on Nvidia GPU and then built PortBLAS. Compared the benchmarks to dpcpp.
Install dependencies for PortBLAS:
Build portBLAS with acpp:
To benchmark, I ran:
This would yield following results:
To summarize this:
Data from my previous run with dpc++
References: https://chsasank.com/portblas-portable-blas-across-gpus.html
The text was updated successfully, but these errors were encountered: