Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVE Backend #842

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

SVE Backend #842

wants to merge 1 commit into from

Conversation

jeremylt
Copy link
Member

No description provided.

@jeremylt
Copy link
Member Author

jeremylt commented Nov 12, 2021

This now compiles and passes the t3 tests on Ookami.

ToDo:

  • Performance comparisons
  • Makefile flag fix (not sure how the AVX one even works)
  • Improved vectorization instructions?
  • Add SVE backend to README
  • Do we want manual unrolling for opt? The unrolling is pretty straightforward

@LeilaGhaffari
Copy link
Member

I remember from Ookami's talk that performance with GCC was the worst among all compilers. I did a brief experiment before losing access to Ookami.
sve/blocked with armclang: DoFs/Sec in CG: 1.43289 (1.43289) million
opt/blocked with gcc: DoFs/Sec in CG: 0.729637 (0.729637) million

I am not sure why these numbers are so small compared to what we had Friday but I think the poor performance with sve might partly have something to do with compiler, maybe?! I have to apply for an account to do more experiments though.

@jedbrown
Copy link
Member

jedbrown commented Sep 6, 2022

Is this ready for review? Should we include it in v0.11?

@jeremylt
Copy link
Member Author

jeremylt commented Sep 6, 2022

The two big todos are fixing the makefile magic and seeing if this actually does anything different than OPT in terms of performance.

@jeremylt jeremylt force-pushed the jeremy/sve branch 3 times, most recently from d3cd77e to e57bab0 Compare September 12, 2022 15:53
@jeremylt jeremylt marked this pull request as ready for review September 12, 2022 15:54
@jedbrown
Copy link
Member

I noticed that libxsmm contains aarch64/SVE code and it's announced as supported for the next release.

@jedbrown
Copy link
Member

Do we have a place where we can measure performance? There is a machine at Sandia that you can access if you put in a Sarape request and AWS c7g also has SVE. JLSE also has a system that I could try requesting.

@sebastiangrimberg
Copy link
Collaborator

Hi @jedbrown @jeremylt, if there is any way I can help here let me know. Happy to take the time to test on AWS's c6g/c7g for you if there's a specific case you're interested in, also against whatever libxsmm has for Arm optimizations as well.

@jeremylt
Copy link
Member Author

jeremylt commented Mar 3, 2023

I haven't had time to close the loop on this one. The big two todos are fixing the makefile detection of SVE support and running some basic performance comparisons to the /CPU/self/opt backends. I'm not sure the best way to do the first but we have a script to run our PETSc examples for the second.

@sebastiangrimberg
Copy link
Collaborator

Sounds good, no rush. I'd be very happy to run the PETSc benchmarks on whatever instance types you want whenever it is ready, if that would be helpful to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants