Bitonic sort2.0 #83

chitalu · 2018-08-22T13:12:40Z

Summary

This pull request modifies SyclParallelSTL by adding an improved version of the the sort algorithm.

Updates:

Update sorting algorithm implementation
Update cmake files to use FindComputeCpp.cmake from ComputeCpp version 0.9.1
Add sample file which tests the new sorting algorithm

Notes
Since SYCL does not currently have the shuffle builtins, an additional namespace is added in "sort.hpp" containing emulated copies. Current these emulated builtin are hard-coded for vectors with 4 components.
The Cmake files in the tests directory are yet to be updated

cjdb

Overall, it looks great: well done!

There are a few things that need to be addressed, but this is some excellent work.

cjdb · 2018-08-22T13:15:08Z

CMakeLists.txt

@@ -23,21 +23,23 @@ if (USE_COMPUTECPP)
  add_definitions(-DSYCL_PSTL_USE_OLD_ALGO)
  set(COMPUTECPP_DEVICE_COMPILER_FLAGS "${COMPUTECPP_DEVICE_COMPILER_FLAGS} -DSYCL_PSTL_USE_OLD_ALGO")

-  include_directories("${COMPUTECPP_INCLUDE_DIRECTORY}")
+  include_directories("${ComputeCpp_DIR}/include")


We should probably change this to target_include_directories.

cjdb · 2018-08-22T13:15:29Z

CMakeLists.txt

+#
+# TODO: update CMakeLists.txt to work with updated FindComputeCpp.cmake
+#
+#add_subdirectory (tests)


No commented-out code in the MR please.

cjdb · 2018-08-22T13:16:42Z

cmake/Modules/FindComputeCpp.cmake

  if (targetCxxStandard MATCHES 17)
    set(device_compiler_cxx_standard "-std=c++1z")
  elseif (targetCxxStandard MATCHES 14)
    set(device_compiler_cxx_standard "-std=c++14")
  elseif (targetCxxStandard MATCHES 11)
    set(device_compiler_cxx_standard "-std=c++11")
  elseif (targetCxxStandard MATCHES 98)
-    message(FATAL_ERROR "SYCL implementations cannot be compiled using C++98")
+    message(FATAL_ERROR "SYCL applications cannot be compiled using C++98")


Can you please add a check for C++03 too?

cjdb · 2018-08-22T13:18:34Z

examples/sycl_example_02.cpp

+  bool sorted = true;
+
+  std::cout << __FUNCTION__ << "<" << typename_as_str<T>::name_ << ">"
+            << std::endl;


endl isn't necessary here, please replace with "\n".

Hmmm... "\n" sounds so old-fashioned, especially overkill compared to '\n'.
My stylistic C++ mood push me towards std::endl anyway... :-)

cjdb · 2018-08-22T13:19:34Z

examples/sycl_example_02.cpp

+    std::cout << "out : ";
+    for (size_t j = 0; j < v.size(); j++) {
+      std::cout << (v[j]) << (j == v.size() - 1 ? "" : ", ");
+    }


cjdb · 2018-08-22T13:29:15Z

include/sycl/algorithm/sort.hpp

+    for (size = 2; size < item.get_local_range(0); size <<= 1) {
+      dir = (item.get_local_id(0) / size & 1) * -1;
+
+      for (stride = size; stride > 1; stride >>= 1) {


This is interesting. Why are we halving the stride each time?

cjdb · 2018-08-22T13:30:34Z

include/sycl/algorithm/sort.hpp

+  typename bitonic_sort_base<T, U>::global_buffer_accessor_t m_globalBuf;
+  typename bitonic_sort_base<T, U>::local_buffer_accessor_t m_localBuf;
+  const unsigned int m_stage;
+  const int mDir;


Please choose a name that's more meaningful than "dir" (I don't know what that is).

Sorting direction. Zero means 'ascending', one means 'descending'.

cjdb · 2018-08-22T13:31:43Z

include/sycl/algorithm/sort.hpp

@@ -233,6 +1014,7 @@ void bitonic_sort(cl::sycl::queue q, cl::sycl::buffer<T, 1, Alloc> buf,
    }  // passStage
  }    // stage
 }  // bitonic_sort
+#endif


Please provide a comment to indicate which #if you're ending.

cjdb · 2018-08-22T13:33:25Z

examples/sycl_example_02.cpp

+#include <experimental/algorithm>
+#include <sycl/execution_policy>
+
+using namespace std::experimental::parallel;


I don't think this is used at all.

cjdb · 2018-08-22T13:33:46Z

examples/sycl_example_02.cpp

+#include <vector>
+
+#include <experimental/algorithm>
+#include <sycl/execution_policy>


Please place these two above the other three includes.

Ruyk

Many thanks for the contribution. Seems to me there should be some additional clarifications on the algorithm, specially all the magic values, so it is clearer how it is work. An overall explanation may be useful.

Ruyk · 2018-08-22T14:40:04Z

benchmarks/CMakeLists.txt


-  add_sycl_to_target(${SOURCE_NAME} ${CMAKE_CURRENT_BINARY_DIR}
+  add_sycl_to_target( TARGET ${SOURCE_NAME} SOURCES 


where are all this CMake changes coming from? did you have to do them or you just updated the cmake from some other project?

Ruyk · 2018-08-22T14:44:33Z

examples/sycl_example_02.cpp

+
+sycl::sycl_execution_policy<> sycl_policy;
+
+/* This sample tests the updated multi-kernel bitonic sort implementation.


is either a sample or a test, cannot be both! :-)
If this is the test, move it to the test directory and create a simpler sample.
Otherwise, create a simpler sample !

Ruyk · 2018-08-22T14:46:04Z

examples/sycl_example_02.cpp

+  sorted = sorted && test<float>(); 
+  sorted = sorted && test<double>(); 
+
+  return !sorted;


this looks like a test to me!

Ruyk · 2018-08-22T14:46:44Z

include/sycl/algorithm/sort.hpp

+namespace emulated_shuffle_builtins {
+
+template <typename vec_type>
+typename vec_type::element_type get_vector_component(vec_type &x,


why do you need the vector passed as reference here if its not modified?

At least const vec_type &x

Ruyk · 2018-08-22T14:48:02Z

include/sycl/algorithm/sort.hpp

+  }
+}
+
+static void set_bits32(cl::sycl::cl_uint *const dst, const cl::sycl::cl_uint src,


can you add some documentation explaining what this method is trying to achieve?

Ruyk · 2018-08-22T14:51:53Z

include/sycl/algorithm/sort.hpp

+  static_assert(std::is_arithmetic<T>::value,
+                "Bitonic sort implementation only works with arithmetic types");
+  static_assert(
+      U == 4,


the 4 is in several places, worth promoting it to a constant

Ruyk · 2018-08-22T15:29:59Z

include/sycl/algorithm/sort.hpp

+
+    relational_op_vec_type_ add1(1, 1, 3, 3);
+    relational_op_vec_type_ add2(2, 3, 2, 3);
+    relational_op_vec_type_ add3(4, 5, 6, 7);


need some explanation for the numbers

See book: OpenCL in Action by Matthew Scarpino (Manning Publication)

Ruyk · 2018-08-22T15:31:10Z

include/sycl/algorithm/sort.hpp

+void bitonic_sort(cl::sycl::queue q, cl::sycl::buffer<T, 1, Alloc> buf,
+                  size_t vectorSize) {
+  using namespace cl::sycl;
+  int direction = 0 /*0 = ascending, -1 = descending*/;


this is a configuration parameter, should be an enum class

Ruyk · 2018-08-22T15:32:08Z

include/sycl/algorithm/sort.hpp

+      .wait();
+
+  q.wait_and_throw();
+  return;


why is this return here? there is code afterwards

debugging artifact.

Ruyk · 2018-08-22T15:32:32Z

include/sycl/algorithm/sort.hpp

+       cgh.parallel_for<kernel_bitonic_sort_merge<T, Alloc>>(
+           ndrange, bitonic_sort_merge<T, 4>(g, l, stage, direction));
+     })
+        .wait();


why do you have to wait immediately after every submit?

cjdb

Anything in quote format is from @chitalu.

cjdb · 2018-08-23T14:02:58Z

include/sycl/algorithm/sort.hpp

+  }
+}
+
+static void set_bits32(cl::sycl::cl_uint *const dst, const cl::sycl::cl_uint src,


You're setting a range of bits at a given offset.

cjdb · 2018-08-23T14:03:45Z

include/sycl/algorithm/sort.hpp

+      read_bits32(mask.s0(), 0, k), read_bits32(mask.s1(), 0, k),
+      read_bits32(mask.s2(), 0, k), read_bits32(mask.s3(), 0, k));
+
+  for (int i = 0; i < 4; ++i) {


Emulating vector with four components

cjdb · 2018-08-23T14:04:40Z

include/sycl/algorithm/sort.hpp

+  using namespace cl::sycl;
+  vec<gentypem_type, gentypem_size> ret;
+
+  const unsigned int k = 3;


In the spec for the shuffle built-in, only the k-least significant bits are used to determine the value used to select the component from the vector x (or y),

cjdb · 2018-08-23T14:05:11Z

include/sycl/algorithm/sort.hpp

+  static_assert(std::is_arithmetic<T>::value,
+                "Bitonic sort implementation only works with arithmetic types");
+  static_assert(
+      U == 4,


cjdb · 2018-08-23T14:09:28Z

include/sycl/algorithm/sort.hpp

+
+    relational_op_vec_type_ add1(1, 1, 3, 3);
+    relational_op_vec_type_ add2(2, 3, 2, 3);
+    relational_op_vec_type_ add3(4, 5, 6, 7);


See book: OpenCL in Action by Matthew Scarpino (Manning Publication)

cjdb · 2018-08-23T14:10:10Z

include/sycl/algorithm/sort.hpp

+  typename bitonic_sort_base<T, U>::global_buffer_accessor_t m_globalBuf;
+  typename bitonic_sort_base<T, U>::local_buffer_accessor_t m_localBuf;
+  const unsigned int m_stage;
+  const int mDir;


Sorting direction. Zero means 'ascending', one means 'descending'.

keryell

Great to have a new sorting algorithm.

Just a few stylistic comments.

keryell · 2018-08-24T00:11:22Z

examples/sycl_example_02.cpp

@@ -0,0 +1,152 @@
+/* Copyright (c) 2015 The Khronos Group Inc.


I am curious about why 2015 here...

keryell · 2018-08-24T00:12:41Z

examples/sycl_example_02.cpp

+ * the device.
+ * Note that for the moment the sycl variants of the algorithm
+ *   are on the sycl namespace and not in std::experimental.
+ */


Please use SYCL when you are meaning the standard :-)

keryell · 2018-08-24T00:15:36Z

examples/sycl_example_02.cpp

+  bool sorted = true;
+
+  std::cout << __FUNCTION__ << "<" << typename_as_str<T>::name_ << ">"
+            << std::endl;


Hmmm... "\n" sounds so old-fashioned, especially overkill compared to '\n'.
My stylistic C++ mood push me towards std::endl anyway... :-)

keryell · 2018-08-24T00:16:41Z

examples/sycl_example_02.cpp

+  for (int i = minInputSizeLog2; i <= maxInputSizeLog2; ++i) {
+
+    std::vector<T> v;
+    v.resize(1 << i);


Curious.
Why not

std::vector<T> v(1 << i);

?

keryell · 2018-08-24T00:21:18Z

examples/sycl_example_02.cpp

+
+    std::cout << "in : ";
+    for (int j = 0; j < v.size(); ++j) {
+      v[j] = init_num(static_cast<T>((v.size() - 1) - j), v.size());


It could be a v.emplace_back(...) and then we could remove v pre-allocation...
But not that important for a test anyway...

keryell · 2018-08-24T01:21:00Z

include/sycl/algorithm/sort.hpp

+      const typename bitonic_sort_base<T, U>::local_buffer_accessor_t &localBuf,
+      const unsigned int stage, const int dir)
+      : m_globalBuf(globalBuf), m_localBuf(localBuf), m_stage(stage),
+        mDir(dir) {}


keryell · 2018-08-24T01:22:04Z

include/sycl/algorithm/sort.hpp

+    typedef typename bitonic_sort_base<T, U>::data_vec_type data_vec_type_;
+    typedef
+        typename bitonic_sort_base<T, U>::mask_op_vec_type mask_op_vec_type_;
+    typedef typename bitonic_sort_base<T, U>::relational_op_vec_type


keryell · 2018-08-24T01:22:14Z

include/sycl/algorithm/sort.hpp

+    relational_op_vec_type_ comp, add;
+    unsigned int global_start, global_offset;
+
+    add = relational_op_vec_type_(4, 5, 6, 7);


keryell · 2018-08-24T01:24:07Z

include/sycl/algorithm/sort.hpp

+         cgh.parallel_for<kernel_bitonic_sort_phase_stage_n<T, Alloc>>(
+             ndrange, bitonic_sort_stage_n<T, 4>(g, l, stage, high_stage));
+       })
+          .wait();


Curious to have a wait()

keryell · 2018-08-24T01:27:35Z

include/sycl/algorithm/sort.hpp

+    input1 = shuffle2(input1, input2, (comp).template as<mask_op_vec_type_>());
+    input2 = shuffle2(input2, temp, (comp).template as<mask_op_vec_type_>());
+    VECTOR_SORT(input1, mDir);
+    VECTOR_SORT(input2, mDir);


By browsing some code that seems similar at different level accress different kernels, I wonder whether it could not be possible to have some even more generic code and code refactorizing...

CLAassistant · 2019-09-02T16:52:05Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Floyd seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Floyd added 2 commits August 22, 2018 12:16

update Cmake files to use recent version of FindComputeCpp.cmake

236d5fc

add working example of new bitonic sort

b4b6f21

cjdb suggested changes Aug 22, 2018

View reviewed changes

Ruyk suggested changes Aug 22, 2018

View reviewed changes

remove unnecesary wait after CG calls

d774e33

cjdb reviewed Aug 23, 2018

View reviewed changes

keryell added the enhancement label Aug 24, 2018

keryell reviewed Aug 24, 2018

View reviewed changes


		add_sycl_to_target(${SOURCE_NAME} ${CMAKE_CURRENT_BINARY_DIR}
		add_sycl_to_target( TARGET ${SOURCE_NAME} SOURCES


		sycl::sycl_execution_policy<> sycl_policy;

		/* This sample tests the updated multi-kernel bitonic sort implementation.

		@@ -0,0 +1,152 @@
		/* Copyright (c) 2015 The Khronos Group Inc.

Bitonic sort2.0 #83

Are you sure you want to change the base?

Bitonic sort2.0 #83

Conversation

chitalu commented Aug 22, 2018

cjdb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ruyk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjdb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keryell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Sep 2, 2019 • edited

CLAassistant commented Sep 2, 2019 •

edited