Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

Open
antonfirsov opened this issue Jan 21, 2021 · 0 comments

Comments

@antonfirsov
Copy link
Member

As @saucecontrol pointed out in his comment, we can get rid of VPERMS in the following code:

result256_0 = Fma.MultiplyAdd(
Unsafe.As<Vector4, Vector256<float>>(ref rowStartRef),
Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)bufferStart).AsSingle(), mask),
result256_0);
result256_1 = Fma.MultiplyAdd(
Unsafe.As<Vector4, Vector256<float>>(ref Unsafe.Add(ref rowStartRef, 2)),
Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)(bufferStart + 2)).AsSingle(), mask),
result256_1);

If FMA is detected we should allocate 4x buffer and to the duplication in ResizeKernelMap.Calculate, which should be much cheaper than doing it in every convolution:

public static ResizeKernelMap Calculate<TResampler>(
in TResampler sampler,
int destinationSize,
int sourceSize,
MemoryAllocator memoryAllocator)
where TResampler : struct, IResampler

@antonfirsov antonfirsov added this to the Future milestone Jan 21, 2021
@antonfirsov antonfirsov changed the title Pre-duplicate kernels in ResizeKernelMap for faster FMA convolution Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant