Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]ENH: Convert loop unary fp into highway #26346

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

luyahan
Copy link

@luyahan luyahan commented Apr 25, 2024

@luyahan luyahan changed the title rewrite loop rint into HIGHWAY ENH: Convert loop fp rint into highway Apr 25, 2024
@luyahan luyahan changed the title ENH: Convert loop fp rint into highway ENH: Convert loop unary fp rint into highway Apr 25, 2024
@luyahan luyahan changed the title ENH: Convert loop unary fp rint into highway WIP: ENH: Convert loop unary fp rint into highway Apr 25, 2024
@luyahan luyahan changed the title WIP: ENH: Convert loop unary fp rint into highway [WIP]ENH: Convert loop unary fp rint into highway Apr 25, 2024
@Mousius
Copy link
Member

Mousius commented Apr 30, 2024

Hi @luyahan,

Have you measured if this changes/increases performance? Would be good to see some benchmarks 😸

Just a process point, I assume google/highway#2116 needs to be released before this can be merged?

@luyahan luyahan force-pushed the rint-hwy branch 2 times, most recently from c0e7346 to 86eddd1 Compare May 6, 2024 06:10
@Mousius Mousius added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label May 6, 2024
@luyahan
Copy link
Author

luyahan commented May 7, 2024

Hi @luyahan,

Have you measured if this changes/increases performance? Would be good to see some benchmarks 😸

Just a process point, I assume google/highway#2116 needs to be released before this can be merged?

benchmark https://paste.ubuntu.com/p/4SYzpt7BXT/

@luyahan
Copy link
Author

luyahan commented May 7, 2024

Just a process point, I assume google/highway#2116 needs to be released before this can be merged?

Yes, google/highway#2116 has been merged.😁

@r-devulap
Copy link
Member

namespace hn = hwy::HWY_NAMESPACE;

// Alternative to per-function HWY_ATTR: see HWY_BEFORE_NAMESPACE
#define SUPER(NAME, FUNC, IS_RECIP) \
Copy link
Member

@r-devulap r-devulap May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refrain from using macro's for functions this large. They are hard to read and will be a pain to debug. Could we make use of templates here?

Copy link
Author

@luyahan luyahan May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refrain from using macro's for functions this large. They are hard to read and will be a pain to debug. Could we make use of templates here?

I try to use lambda func but occurs a error:

(num) luyahan@plct-c7:~/source/numpy$ spin build 
$ /home/luyahan/source/num/bin/python vendored-meson/meson/meson.py compile -C build
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /home/luyahan/source/num/bin/ninja -C /home/luyahan/source/numpy/build
ninja: Entering directory `/home/luyahan/source/numpy/build'
[8/35] Generating numpy/generate-version with a custom command
Saving version to numpy/version.py
[12/35] Compiling C++ object numpy/_core/libloops_unary_fp.dispatch.h_SSE42.a.p/src_umath_loops_unary_fp.dispatch.cpp.o
FAILED: numpy/_core/libloops_unary_fp.dispatch.h_SSE42.a.p/src_umath_loops_unary_fp.dispatch.cpp.o 
c++ -Inumpy/_core/libloops_unary_fp.dispatch.h_SSE42.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/usr/include/python3.10 -I/usr/include/x86_64-linux-gnu/python3.10 -I/home/luyahan/source/numpy/build/meson_cpu -fdiagnostics-color=always -Wall -Winvalid-pch -std=c++17 -O2 -g -msse -msse2 -msse3 -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_SSE2 -DNPY_HAVE_SSE -DNPY_HAVE_SSE3 -DNPY_HAVE_SSSE3 -DNPY_HAVE_SSE41 -DNPY_HAVE_POPCNT -DNPY_HAVE_SSE42 -msse -msse2 -msse3 -mssse3 -DHWY_WANT_SSSE3=1 -msse4.1 -mpopcnt -msse4.2 -DHWY_WANT_SSE4=1 -DNPY_MTARGETS_CURRENT=SSE42 -MD -MQ numpy/_core/libloops_unary_fp.dispatch.h_SSE42.a.p/src_umath_loops_unary_fp.dispatch.cpp.o -MF numpy/_core/libloops_unary_fp.dispatch.h_SSE42.a.p/src_umath_loops_unary_fp.dispatch.cpp.o.d -o numpy/_core/libloops_unary_fp.dispatch.h_SSE42.a.p/src_umath_loops_unary_fp.dispatch.cpp.o -c ../numpy/_core/src/umath/loops_unary_fp.dispatch.cpp
In file included from ../numpy/_core/src/highway/hwy/highway.h:482,
                 from ../numpy/_core/src/umath/loops_unary_fp.dispatch.cpp:14:
../numpy/_core/src/highway/hwy/ops/x86_128-inl.h: In function ‘vec_f64 Round(vec_f64)’:
../numpy/_core/src/highway/hwy/ops/x86_128-inl.h:11224:27: error: inlining failed in call to ‘always_inline’ ‘hwy::N_SSE4::Vec128<double, N> hwy::N_SSE4::Round(hwy::N_SSE4::Vec128<double, N>) [with long unsigned int N = 2]’: target specific option mismatch
11224 | HWY_API Vec128<double, N> Round(const Vec128<double, N> v) {
      |                           ^~~~~
../numpy/_core/src/umath/loops_unary_fp.dispatch.cpp:115:51: note: called from here
  115 | static vec_f64 Round(vec_f64 x) { return hn::Round(x); }
NPY_NO_EXPORT void NPY_CPU_DISPATCH_CURFX(DOUBLE_rint)
(char **args, npy_intp const *dimensions, npy_intp const *steps, void *NPY_UNUSED(func))
{ 
  auto rint_hwy = [](vec_f64 x){ return hn::Round(x); };
  return simd_loop<npy_double>(args, dimensions, steps, rint_hwy, false);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r-devulap r-devulap self-assigned this May 8, 2024
@luyahan
Copy link
Author

luyahan commented May 9, 2024

121 tests fail https://dev.azure.com/numpy/numpy/_build/results?buildId=36327&view=logs&j=bb985aa7-6f2e-5862-34d1-fe760a3f4424&t=fedaa2b4-fa4d-5ee0-669f-9fb1714eeeb2 Have you looked into these yet?

May be highway dipatch func on SSE2 target.
Where can i find the cpu nfo of the windows ci? I have only i5-13500H windows.

@luyahan luyahan changed the title [WIP]ENH: Convert loop unary fp rint into highway [WIP]ENH: Convert loop unary fp into highway May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants