Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Algorithms using explicit batch type #496

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

michaelbacci
Copy link

@michaelbacci michaelbacci commented Jun 23, 2021

New algorithms are added:

  • count
  • count_if
  • transform_batch
  • reduce_batch

Tests are updated.
README.md updated.

Using the added algorithms I've create a nanmean_fast function where the benchmark are the faster respect other implementation:

2021-06-24T00:59:51+02:00
Running ./bench_math
Run on (16 X 2300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 2.46, 2.32, 2.13
------------------------------------------------------------------------------
Benchmark                                    Time             CPU   Iterations
------------------------------------------------------------------------------
BM_nanmean/250                             487 ns          486 ns      1313789
BM_nanmean/1024                           2036 ns         2034 ns       348920
BM_nanmean_with_xtensor/250               5533 ns         5529 ns       116447
BM_nanmean_with_xtensor/1024             20085 ns        20063 ns        35643
BM_nanmean_xt/250                         1461 ns         1460 ns       478433
BM_nanmean_xt/1024                        4973 ns         4969 ns       138529
BM_nanmean_fast/250                        273 ns          272 ns      2553337
BM_nanmean_fast/1024                      1014 ns         1014 ns       674205
  • BM_nanmean has a classic C style while
  • BM_nanmean_with_xtensor is essentially xt::mean(xt::filter(e, !xt::isnan(e))) where e is a tensor
  • BM_nanmean_xt is the xt::nanmean
  • BM_nanmean_fast use the added reduce_batch and count_if algorithms

@michaelbacci michaelbacci force-pushed the new_algorithms branch 3 times, most recently from a2cc837 to 2be89a0 Compare June 24, 2021 09:44
Copy link
Member

@JohanMabille JohanMabille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! The implementation is really neat. I have some questions regarding the names of the functions as you can see below.

Regarding the failure in the tests, I think you have to include xsimd_fallback.hpp in the cpp file, so that the compiler can find the default implementation when a type is not supported by the current instructions set.

template <class I1, class I2, class O1, class UF>
void transform(I1 first, I2 last, O1 out_first, UF&& f)
template <class I1, class I2, class O1, class UF, class UFB>
void transform_batch(I1 first, I2 last, O1 out_first, UF&& f, UFB&& fb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keeping transform as the function name?

include/xsimd/stl/algorithms.hpp Outdated Show resolved Hide resolved
@michaelbacci michaelbacci force-pushed the new_algorithms branch 3 times, most recently from 5459d0a to 983a7c4 Compare June 30, 2021 08:57
@@ -216,6 +275,102 @@ TEST_F(xsimd_reduce, using_custom_binary_function)
}
}

TEST(algorithms, reduce_batch)
{
const double nan = std::numeric_limits<double>::quiet_NaN();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ARM, vectorization for double is available only on 64bits arch. Therefore, this test should be guarded with something like
#if XSIMD_ARM_INSTR_SET >= XSIMD_ARM8_64_NEON_VERSION || XSIMD_X86_INSTR_SET >= XSIMD_X86_SSE2_VERSION

using enable_if_increment = typename std::enable_if<has_increment<T>::value>::type;

template <class T>
using enable_if_not_increment = typename std::enable_if<!has_increment<T>::value>::type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to move these metafunctions in some detail namespace, they're not supposed to be part of the API.

typename = enable_if_increment<I2>,
typename = enable_if_increment<O1>,
typename = enable_if_not_increment<UF>,
typename = enable_if_not_increment<UFB>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more readable to gather these conditions so that you can use a single enable_if condition. That could be something like:

template <class... Args>
struct have_increment : all_true<has_increment<Args>::value...> {};

template <class... Args>
struct not_have_increment : all_true<!has_increment<Args>::value...> {};

template <class I1, class I2, class I3, class UF, class UFB>
using enable_binary_algorithm_t = typename std::enable_if<have_increment<I1, I2, O1>::value && not_have_increment<UF, UFB>::value, int>::type;

Besides, default template parameters are not considered by the compiler for overload selection, so it's better to use the enable_if as the template parameter evaluating to an int when it's valid:

template <class I1, class I2, class O1, class UF, class UFB,
                 enable_binary_algorithm_t<I1, I2, O1, UF, UFB> = 0>
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants