Feature/raja vec #83

vsrana01 · 2020-05-21T19:56:34Z

Added vectorization in stream/add

rhornung67 · 2020-05-21T20:42:11Z

src/polybench/POLYBENCH_2MM-OMP.cpp

@@ -53,26 +53,25 @@ void POLYBENCH_2MM::runOpenMPVariant(VariantID vid)

  POLYBENCH_2MM_VIEWS_RAJA;

-  auto poly_2mm_lam1 = [=](Index_type /*i*/, Index_type /*j*/, Index_type /*k*/,                           Real_type &dot) {
+  auto poly_2mm_lam1 = [=](Real_type &dot) {


@vsrana01 have you checked whether these changes do not adversely affect performance across a range of compilers?

I don't think we want to make changes like this in this PR, since they are orthogonal. We can think about adding additional variants like this later to further stress compilers.

@rhornung67 These changes may be needed with the latest Lambda changes: if not all segments are in active loops, the Lambda form will static_assert out

rhornung67 · 2020-05-21T20:42:57Z

src/polybench/POLYBENCH_2MM-Seq.cpp

@@ -43,27 +43,24 @@ void POLYBENCH_2MM::runSeqVariant(VariantID vid)

  POLYBENCH_2MM_VIEWS_RAJA;

-  auto poly_2mm_lam1 = [=](Index_type /*i*/, Index_type /*j*/, Index_type /*k*/,                           Real_type &dot) {
+  auto poly_2mm_lam1 = [=](Real_type &dot) {


Same comment as previous one.

rhornung67

@vsrana01 the addition of vector variants for the ADD and DAXPY kernels look good.

It's good that you are looking at eliminating the unused lambda expression arguments with the newer RAJA 'Segs', etc. stuff. But, that has to be assessed for performance carefully with as many compilers as you can try. If it's all good that stuff can come in in a separate PR.

ajkunen · 2020-05-21T20:48:44Z

CMakeLists.txt

@@ -21,6 +21,7 @@ if (PERFSUITE_ENABLE_WARNINGS)
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Werror")
 endif()

+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2")


This flag should go in the scripts/lc-builds/XXX files. Also I think there is a architecture agnostic flag for (at least gnu and clang) like -march=native or something... in case the machine has SSE, AVX, AVX2 or AVX512, it will pick the best one.

Yes, I saw the lc-builds file and this is where I have it now. Not sure how this slipped in..but thanks for the feedback!

ajkunen · 2020-05-21T20:54:51Z

Perhaps we should take a step back and do a PR for the Lamba changes first? What do you think @vsrana01 ?

vsrana01 · 2020-05-21T20:59:31Z

I agree with

Perhaps we should take a step back and do a PR for the Lamba changes first? What do you think @vsrana01 ?

vsrana01 · 2020-05-21T21:01:32Z

@rhornung67 those changes were needed. @ajkunen I agree with you, I can go do another PR for just the Lambda changes and then a second one for the vec stuff. I can test across the different compilers with the new lambda changes and see if there is a performance difference.

rhornung67 · 2020-05-21T21:08:54Z

@vsrana01 and @ajkunen if the lambda args changes are needed for the vectorization stuff, then it may be a good idea to figure out a good way to have both variants (with and without the 'Segs' business) for the non-vector variants. My main concern is that we want to make sure both versions of each kernel perform the same for each compiler. If not, then this is a good place for vendors to mine for why they are not. But, let's not do that now.

I suggest only making additions you need to support the vector variants and leave all non-vector variants as is for now. Does that make sense?

ajkunen · 2020-05-21T21:52:56Z

@rhornung67 i think the current RAJA develop branch imposes the Lambda requirements, which means the use of the new Lambda notation is necessary. I think if @vsrana01 does a PR for RAJAPerf with just the Lambda changes (and updated RAJA) we can see if there is a performance difference there. After that's complete, then the vector_exec work will be more narrowly scoped, and we can test that performance separately.

But we cant do the vector_exec stuff now without the Lambda stuff.

vsrana01 · 2020-05-21T22:03:55Z

@ajkunen and @rhornung67 when I build with adams vectorization branch of RAJA i get compiler errors due to the new Lambda requirements. I will start looking at the changes in performance that the kernels have on them with the new lambda requirements and create a new pr.

rhornung67 · 2020-05-21T22:16:44Z

@vsrana01 and @ajkunen OK. I misunderstood the constraints.

I think it would be best to do a PR with a new variant added (RAJA_Seq_Args) so we can assess performance. Then, move on from there. Agree?

vsrana01 · 2020-05-22T00:39:30Z

@rhornung67 @ajkunen agreed.

ajkunen · 2020-12-07T22:22:41Z

src/basic/DAXPY.hpp

+#define DAXPY_DATA_VEC_SETUP3 \
+  RAJA_INDEX_VALUE_T(I, Int_type, "I");\
+  using element_t = RAJA::StreamVector<Real_type,2>::element_type; \
+  element_t X[iend], Y[iend]; \


This really isn't the intended use. You are basically telling the compiler to hold the entire array X and Y in register at the same time.
You want to keep they arrays as "Real_type X[iend], Y[iend]", and load vector-sized chunks of those arrays using element_t. You can do the load/stores either with Views+VectorIndexs, or by using the load() and store() functions in the vector or register classes.

ajkunen · 2020-12-07T22:23:40Z

src/basic/DAXPY.hpp

+  Yview(i) += a*Xview(i);
+
+#define DAXPY_VEC_BODY3 \
+  for(int i = 0;i < iend; ++i){ \


going along with the last comment: the "i" index here should be over Real_types... so it should increment by the vector width

updating polybench

Verinder Rana and others added 5 commits May 15, 2020 15:57

Initial vec feature added in add

290fb27

temp remove RAJA submodule

ef92cd4

Attemtp to add RAJAvec lib

ab57da5

Added DAXPY kernel w/ vec

e38062a

Adding RAJA back as submodule

3a18563

vsrana01 requested a review from rhornung67 May 21, 2020 19:56

rhornung67 reviewed May 21, 2020

View reviewed changes

ajkunen reviewed May 21, 2020

View reviewed changes

vsrana01 closed this May 21, 2020

vsrana01 reopened this May 21, 2020

Verinder Rana and others added 7 commits November 15, 2020 16:24

Test push

1d6bbea

Initial kernels add to test vectorization

c3192c4

Merged in develop for feature/variant-output

7179925

Added Vectorized Kernels

cdccd30

merge feature/rajaVec2

793fc36

Mod'd ADD-Seq file

0cd73e6

Mod'd 2mm

e37904a

ajkunen reviewed Dec 7, 2020

View reviewed changes

Polybench kernel

1a909d0

Verinder Rana added 2 commits December 10, 2020 14:07

Merge branch 'feature/rajaVec2' into feature/rajaVec

1ae4806

updating polybench

Added polybench 2mm

b53e74d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/raja vec #83

Feature/raja vec #83

vsrana01 commented May 21, 2020

rhornung67 May 21, 2020

ajkunen May 21, 2020

rhornung67 May 21, 2020

rhornung67 left a comment

ajkunen May 21, 2020

vsrana01 May 21, 2020

ajkunen commented May 21, 2020

vsrana01 commented May 21, 2020

vsrana01 commented May 21, 2020

rhornung67 commented May 21, 2020

ajkunen commented May 21, 2020

vsrana01 commented May 21, 2020

rhornung67 commented May 21, 2020

vsrana01 commented May 22, 2020 •

edited

ajkunen Dec 7, 2020

ajkunen Dec 7, 2020

Feature/raja vec #83

Are you sure you want to change the base?

Feature/raja vec #83

Conversation

vsrana01 commented May 21, 2020

rhornung67 May 21, 2020

Choose a reason for hiding this comment

ajkunen May 21, 2020

Choose a reason for hiding this comment

rhornung67 May 21, 2020

Choose a reason for hiding this comment

rhornung67 left a comment

Choose a reason for hiding this comment

ajkunen May 21, 2020

Choose a reason for hiding this comment

vsrana01 May 21, 2020

Choose a reason for hiding this comment

ajkunen commented May 21, 2020

vsrana01 commented May 21, 2020

vsrana01 commented May 21, 2020

rhornung67 commented May 21, 2020

ajkunen commented May 21, 2020

vsrana01 commented May 21, 2020

rhornung67 commented May 21, 2020

vsrana01 commented May 22, 2020 • edited

ajkunen Dec 7, 2020

Choose a reason for hiding this comment

ajkunen Dec 7, 2020

Choose a reason for hiding this comment

vsrana01 commented May 22, 2020 •

edited