Dev #114

ThomasRetornaz · 2018-03-16T06:12:29Z

First pull request around isue #107

Add all_of,any_of,copy,copy_n,count,count_if,equal,fill,find,find_if,find_if_not,lexicographical_compare,max,max_element,min,min_element,none_of,reduce,replace,replace_if,transform,transform_reduce "STL" like algorithm
Provide non regressions tests (Validated on Visual2017 and GCC) and documentation
Please pay attention on workaround i make around masktype. May i miss something and better approach exist
I think a preliminary refactoring could be to move some usefull Unary/Binary predicate in a dedicated header

Other fix and/or proposal

fix TestData& operator=(const TestData& other) assignment operator
reduce warning (of course last commit coud be dropped)

…o dev

p12tic

Many thanks for the PR! I really like it :-)

I raised a number of comments, but it seems that there's nothing serious.

For most of the algorithms I think we could rewrite them to not use non-SIMD operations in the prologue and epilogue at all. We could do a single unaligned load that overlaps with the main aligned SIMD body, do the computations and then do unaligned store that also overlaps with the main aligned SIMD body. This would be faster in most cases, as the scalar code is multiple times slower than SIMD.

p12tic · 2018-03-21T21:45:15Z

simdpp/algorithm/all_of.h

@@ -0,0 +1,69 @@
+/*  Copyright (C) 2018  Povilas Kanapickas <povilas@radix.lt>


Don't be shy to use your name :-) It's you who wrote the code, the copyrights belong to you.

Thanks! I think we could share the copyright. Anyway its your lib :) and i just add new capabilities

p12tic · 2018-03-21T21:45:56Z

simdpp/algorithm/all_of.h

+#include <simdpp/algorithm/helper_input_range.h>
+
+namespace simdpp {
+   namespace SIMDPP_ARCH_NAMESPACE {


Indentation: Please don't indent the namespace blocks. Also, the library uses 4 space indent.

Fixed. I switch between different computer (linux,windows) and default indentation are not the same between my ide.

p12tic · 2018-03-21T21:47:04Z

simdpp/algorithm/all_of.h

+      template<typename T, typename UnaryPredicate>
+         bool all_of(T const* first, T const* last, UnaryPredicate pred)
+      {
+#ifndef NDEBUG //precondition debug mode           


I think we could add something like SIMDPP_DEBUG and use it throughout the library.

Fixed switch between NDEBUG->SIMDPP_DEBUG. Now we have to decide when/where we activate these flag regarding input configurations

p12tic · 2018-03-21T21:47:43Z

simdpp/algorithm/all_of.h

+            throw std::runtime_error("all_of - null ptr last.");
+#endif
+
+         using simd_type_T = typename typetraits<T>::simd_type;


I'd prefer type_traits. Is there any conflict that prevents this name by chance?

OK i could switch to type_traits or simd_type_traits to avoid any conflicts

p12tic · 2018-03-21T21:48:12Z

simdpp/algorithm/all_of.h

+
+         //prologue
+         auto lastprologue = first + size_prologue_loop;
+         if(!std::all_of(first, lastprologue, pred)) return false;


Space after if, return on next line.

fixed. And also fixed for any_of

p12tic · 2018-03-21T22:56:16Z

test/insn/all_of.cc

+         const auto predEqualTen = UnaryPredicateEqualValue<T>((T)10);
+         const auto predEqualFive = UnaryPredicateEqualValue<T>((T)5);
+         { //test prologue
+            vector_t ivect = { (T)10,(T)10 };


I think for higher level algorithms we need more exhaustive testing. Like all small array lengths, plus all combinations of alignments of both sequence begin and end pointers. Probably makes sense to create some kind of generic generator and use it in all tests.

OK i will think about this. And provide something asap

p12tic · 2018-03-21T23:02:16Z

simdpp/algorithm/transform_reduce.h

+         //---prologue
+         for (; i < size_prologue_loop; ++i)
+         {
+            init = binary_op(init,unary_op(*first++));


I think it makes sense to do possibly overlapping SIMD operations for the prologue and epilogue and then mask out the extra lanes.

Sorry i don't understand this point. Could you make an exemple?

p12tic · 2018-03-21T23:03:49Z

simdpp/algorithm/transform.h

+            auto i = 0u;
+
+            //---prologue
+            for (; i < size_prologue_loop; ++i)


I think it makes sense to do possibly overlapping SIMD operations for the prologue and epilogue. Some elements will be computed twice, but the code will be much faster this way.

p12tic · 2018-03-21T23:07:27Z

simdpp/dispatch/get_arch_string_list.h

@@ -84,10 +84,10 @@ inline Arch get_arch_string_list(const char* const strings[], int count, const c
    return res;
 #endif

-    int prefixlen = std::strlen(prefix);
-    for (int i = 0; i < count; ++i) {
+    auto prefixlen = std::strlen(prefix);


I'd prefer the explicit type for the simple numeric types. Auto makes the code less readable in these cases.

OK use size_t instead

p12tic · 2018-03-21T23:15:26Z

test/utils/test_helpers.h

@@ -90,6 +90,7 @@ class TestData {
    TestData& operator=(const TestData& other)
    {
        data_ = other.data_;
+        return (*this);


Could we drop the parentheses? I thing they're not needed in this case.

ThomasRetornaz · 2018-03-25T08:00:11Z

Many thanks for the PR! I really like it :-)

Thanks !

For most of the algorithms I think we could rewrite them to not use non-SIMD operations in the prologue and epilogue at all. We could do a single unaligned load that overlaps with the main aligned SIMD body, do the computations and then do unaligned store that also overlaps with the main aligned SIMD body. This would be faster in most cases, as the scalar code is multiple times slower than SIMD.

May i miss something but as i spotted above, we also need prologue if data lenght is too small to fit in simd registers( eg 7 uint , in this way we could use transparently simddp::function everywhere )
For epilogue i think i understand the overlapp concept but may we could give me some hints how to achieve this. Anyway i will try on my side

p12tic · 2018-03-31T09:11:19Z

May i miss something but as i spotted above, we also need prologue if data lenght is too small to fit in simd registers( eg 7 uint , in this way we could use transparently simddp::function everywhere )
For epilogue i think i understand the overlapp concept but may we could give me some hints how to achieve this. Anyway i will try on my side

Yes, if the total length is less than the width of the register, then the scalar part is needed. My point was that if you have, say, a range of 15 uint16 elements to process, then it's faster to just process two overlapping 8 element pairs instead of doing 1 full wave and 7 element scalar prologue/epilogue.

Follow review * fix indent * add "fuzzing" tests for all algorithm * add TEST_EQUAL_COLLECTIONS * add nrt helpers for generating data (to be moved elsewhere ?)

* Try to fix visual 2013/2015 compilation issues * enforce const/inline and noexcept for predicate

Cazadorro · 2018-04-11T17:08:28Z

what is the purpose of having SIMDPP_NOEXECPT and changing inline to a custom macro?

ThomasRetornaz · 2018-04-17T08:31:31Z

what is the purpose of having SIMDPP_NOEXECPT and changing inline to a custom macro?

SIMDPP_NOEXECPT : is added for portability reason. MSVC compiler below MSVC2015 doesn't support noexecpt keyword
Inline macro: In this library we use SIMDPP_INL, which is an alias (depending on compiler) to enforce inlining (https://github.com/p12tic/libsimdpp/blob/master/simdpp/setup_arch.h#L371)

Regards
TR

* Add google benchmark as ExternalProject * Add three bench suite transform "unary", reduce "unary", load/store Todo: * strange behavior on transform bench suite. STD seems faster than SIMD on MSVC2017 <--- to be checked on gcc>5 * add other cases

* add binary flavor of transform reduce bench

ThomasRetornaz and others added 18 commits February 26, 2018 05:47

warning --

e8c0df8

fix TestData& operator=(const TestData& other) assignment operator

d70bbc3

wip issue p12tic#107 add transform/reduce algorithm

8d91b7e

issue p12tic#107 add fill,copy,copy_n algorithm

c679f81

issue p12tic#107 gcc compil fix

4bc2c63

issue p12tic#107 add search max/min

22e5357

issue p12tic#107 add find,find_if,find_if_not

5c48da0

issue p12tic#107 fix gcc and release mode for find*

ab3e92b

issue p12tic#107 add max_element and min_element

b0735f5

issue p12tic#107 gcc compil/warning fix

0025a8f

issue p12tic#107 add count, count_if

ae48025

Merge branch 'dev' of https://github.com/ThomasRetornaz/libsimdpp int…

c8d12f4

…o dev

issue p12tic#107 add all_of, any_of, none_of

dc33d00

issue p12tic#107 add replace,replace_if

d6a6bfa

issue p12tic#107 add equal and lexicographic_compare

f95aa05

issue p12tic#107 add transform_reduce

b8b0b34

issue p12tic#107 ras

179cc90

issue p12tic#107 visual compilation fix

f57deb0

p12tic reviewed Mar 21, 2018

View reviewed changes

ThomasRetornaz closed this Mar 25, 2018

ThomasRetornaz reopened this Mar 25, 2018

ThomasRetornaz and others added 3 commits April 9, 2018 17:53

issue p12tic#107

3d9fb98

Follow review * fix indent * add "fuzzing" tests for all algorithm * add TEST_EQUAL_COLLECTIONS * add nrt helpers for generating data (to be moved elsewhere ?)

issue p12tic#107 gcc and c++11 only compil fix

d8b2eda

issue p12tic#107

9a3636a

* Try to fix visual 2013/2015 compilation issues * enforce const/inline and noexcept for predicate

ThomasRetornaz and others added 2 commits July 16, 2018 03:41

issue p12tic#115 Proof of concept

6ae2a4a

* Add google benchmark as ExternalProject * Add three bench suite transform "unary", reduce "unary", load/store Todo: * strange behavior on transform bench suite. STD seems faster than SIMD on MSVC2017 <--- to be checked on gcc>5 * add other cases

#issue 115 linux/gcc fix

5977ee9

* warn --

e286519

* add binary flavor of transform reduce bench

Auburn mentioned this pull request Mar 20, 2021

Changes by @ThomasRetornaz Auburn/libsimdpp#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev #114

Dev #114

ThomasRetornaz commented Mar 16, 2018

p12tic left a comment

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

p12tic Mar 21, 2018

ThomasRetornaz Mar 25, 2018

ThomasRetornaz commented Mar 25, 2018

p12tic commented Mar 31, 2018

Cazadorro commented Apr 11, 2018

ThomasRetornaz commented Apr 17, 2018

		@@ -0,0 +1,69 @@
		/* Copyright (C) 2018 Povilas Kanapickas <povilas@radix.lt>

Dev #114

Are you sure you want to change the base?

Dev #114

Conversation

ThomasRetornaz commented Mar 16, 2018

p12tic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasRetornaz commented Mar 25, 2018

p12tic commented Mar 31, 2018

Cazadorro commented Apr 11, 2018

ThomasRetornaz commented Apr 17, 2018