-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add complex.hlsl #592
base: master
Are you sure you want to change the base?
add complex.hlsl #592
Conversation
[CI]: Can one of the admins verify this patch? |
// Fast Fourier Transform | ||
template<typename T, uint32_t N> | ||
void fft_radix2(complex::complex_t<T> data[N], bool is_inverse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the is_inverse
a template parameter
assert(N<gl_SubgroupSize)
btw the data
need not be data[N]
, each invocation comes with its own register value and you can cooperatively exchange via the use of subgroupShuffleXor
(add it to https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/include/nbl/builtin/hlsl/glsl_compat/subgroup_shuffle.hlsl please).
I mean the original GLSL code is here
https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/include/nbl/builtin/glsl/subgroup/fft.glsl
// Cooley-Tukey | ||
for (uint32_t stride = 2; stride <= N; stride *= 2) | ||
{ | ||
complex::complex_t<T> twiddle_factor = { cos((is_inverse ? 2.0 : -2.0) * 3.14 / stride), sin((is_inverse ? 2.0 : -2.0) * 3.14 / stride) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the proper full FP64 value of PI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btwa nice piece of trivia @Fletterio , only GLSL defines fp64 transcendentals, HLSL doesn't (the sin,cos etc. functions always return float32, and if you try to use the GL.450.std extended instruction set to compute cos(double)
this will end up emitting Float64 capability which is a definite DO NOT WANT)
So these definitely need to be tabulated as very precise constants and C-casted into the correct nbl::hlsl::scalar_type_t<T>
so we don't accidentally emit Float64 capability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really following - we want to use the full PI value in f64, but not emit f64 capability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you want to cast the constant into a type you'd be using, see the Gauss Legendre / Quadrature header stuff that Erfan is using for example 62 CAD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm about the tabulation... its not a perf win compared to just calling sin and cos
for now just assert that T
for the FFT complex numbers is NOT float64_t
because that opens a whole can of worms of std.glsl spir-v intrinsic header
or if you feel like adding sin
and cos
in doubles, do it here:
[[vk::ext_instruction(GLSLstd450MatrixInverse)]] |
a07e79f
to
ffbd843
Compare
I see a certain ladder of features, and each should have their own branch (so I can merge in order without having all comments addressed):
They can branch off each other. P.S. Let me know if you need "runtime" transcendental/special functions for |
Missing compound operators and template specializations for integer types. Arithmetic logic is off, different from the STL
template<typename Scalar> | ||
struct complex_t | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont indent extra because of namespace
#define COMPOUND_ASSIGN(NAME,OP) template<typename T> struct assign_ ## NAME ## _t { \ | ||
void operator()(NBL_REF_ARG(T) lhs, NBL_CONST_REF_ARG(T) rhs) \ | ||
{ \ | ||
lhs = lhs OP rhs; \ | ||
} \ | ||
} | ||
COMPOUND_ASSIGN(add,+); | ||
COMPOUND_ASSIGN(subtract,-); | ||
COMPOUND_ASSIGN(mul,*); | ||
COMPOUND_ASSIGN(div,/); | ||
|
||
#undef COMPOUND_ASSIGN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add them here
and make sure they have names plus_assign
, multiplies_assign
to be inline with the rest.
Also because its just 4 struct defs and not specialization, IMHO you shouldn't use macros because that just makes the code iffy to read. Or if you are to use a macro make sure it has the same form as the ALIAS_STD
so you have to put th e };
afterwards to be able to add extra fields (i.e. identity)
static complex_t create(const complex_t other) | ||
{ | ||
return other; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no needed, regular assignment will do
return !(uint64_t(m_real) ^ uint64_t(rhs.m_real)) && !(uint64_t(m_imag) ^ uint64_t(rhs.m_imag)); | ||
} | ||
bool operator!=(const complex_t rhs) | ||
{ | ||
return uint64_t(m_real) ^ uint64_t(rhs.m_real) || uint64_t(m_imag) ^ uint64_t(rhs.m_imag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't cast to uint64_t
:
- you're not comparing bitpatterns, you're casting to uint64_T which truncatest the floats, etc.
Scalar=float32_t
would have been 32bit!=
and==
work on floats, so use them
// ---------------------- Compound assignment functors ---------------------- | ||
template <typename Scalar> | ||
struct ComplexCompoundHelper{ | ||
using add = assign_add_t<complex_t<Scalar> >; | ||
using subtract = assign_subtract_t<complex_t<Scalar> >; | ||
using mul = assign_mul_t<complex_t<Scalar> >; | ||
using div = assign_div_t<complex_t<Scalar> >; | ||
}; | ||
|
||
template <typename Scalar> | ||
using assign_add_complex = typename ComplexCompoundHelper<Scalar>::add; | ||
template <typename Scalar> | ||
using assign_subtract_complex = typename ComplexCompoundHelper<Scalar>::subtract; | ||
template <typename Scalar> | ||
using assign_mul_complex = typename ComplexCompoundHelper<Scalar>::mul; | ||
template <typename Scalar> | ||
using assign_div_complex = typename ComplexCompoundHelper<Scalar>::div; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this, we don't have them for regular plus<T>
{ | ||
namespace hlsl | ||
{ | ||
namespace fft::common { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- HLSL doesn't support nested namespace syntax
- use
detail
orimpl
for things you don't want to make public to others
template<typename Scalar, bool inverse> | ||
struct DIX { | ||
//Assumes I have precomputed twiddles, so just pass the index. | ||
static void radix2Butterfly(NBL_CONST_REF_ARG(complex_t<Scalar>) twiddle, NBL_REF_ARG(complex_t<Scalar>) lo, NBL_REF_ARG(complex_t<Scalar>) hi){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop Butterfly
from the name, just radix2
is fine
template<typename Scalar> | ||
complex_t<Scalar> getHigh(uint32_t highIdx); | ||
|
||
template<typename Scalar> | ||
complex_t<Scalar> getMid(uint32_t midIdx); | ||
|
||
template<typename Scalar> | ||
complex_t<Scalar> getLow(uint32_t lowIdx); | ||
|
||
// Define twiddles and getters for each Scalar template specialization | ||
#define float_t float32_t | ||
#define TYPED_NUMBER(N) NBL_CONCATENATE(N, f) // to add f after floating point numbers and avoid casting warnings and emitting ShaderFloat64 Caps | ||
#include <nbl/builtin/hlsl/fft/common/twiddles_impl.hlsl> | ||
#undef TYPED_NUMBER | ||
#undef float_t | ||
|
||
#define float_t float64_t | ||
#define TYPED_NUMBER(N) N | ||
#include <nbl/builtin/hlsl/fft/common/twiddles_impl.hlsl> | ||
#undef TYPED_NUMBER | ||
#undef float_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't bother precomputing twiddles, did this benchmark in shadertoy
const float _cos[32] = float[32](
1.0,0.980785,0.92388,0.83147,0.707107,0.55557,0.382683,0.19509,-4.37114e-08,-0.19509,-0.382684,-0.55557,-0.707107,-0.83147,-0.92388,-0.980785,-1.0,-0.980785,-0.92388,-0.83147,-0.707107,-0.55557,-0.382683,-0.19509,1.19249e-08,0.19509,0.382684,0.55557,0.707107,0.83147,0.92388,0.980785
);
void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
// Normalized pixel coordinates (from 0 to 1)
vec2 uv = fragCoord/iResolution.xy;
// Time varying pixel color
vec3 col = vec3(0.0);
const int spins = 2000;
for (int i=0; i<spins; i++)
{
int iarg = i;//+int(fragCoord.x)+int(fragCoord.y);
float arg = float(iarg)/float(spins);
float cs = cos(arg*6.2831853);
col.rg += vec2(_cos[iarg]);
}
// Output to screen
fragColor = vec4(col,1.0);
}
perf dropped 50%, it gets worse the bigger the array gets
namespace hlsl | ||
{ | ||
|
||
const float64_t PI = 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make a numbers.hlsl
a counterpart to https://en.cppreference.com/w/cpp/numeric/constants
// Fast Fourier Transform | ||
template<typename T, uint32_t N, bool is_inverse> | ||
void fft_radix2(complex_t<T> data[N]) | ||
{ | ||
const uint32_t log2N = uint32_t(log2(N)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you make this function?
I explicitly left behind a comment #592 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be a workgroup FFT? a subgroup FFT or what?
// Bit reversal permutation | ||
for (uint32_t i = 0; i < N; ++i) | ||
{ | ||
uint32_t j = 0; | ||
for (uint32_t bit = 0; bit < log2N; ++bit) | ||
j |= ((i >> bit) & 1) << (log2N - 1 - bit); | ||
|
||
if (j > i) | ||
{ | ||
complex_t<T> temp = data[i]; | ||
data[i] = data[j]; | ||
data[j] = temp; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't do the bit reversal for me, its often not needed
if(is_inverse) | ||
{ | ||
twiddle_factor = { cos(2.0 * PI / stride), sin(2.0 * PI / stride) }; | ||
} | ||
else | ||
{ | ||
twiddle_factor = { cos(-2.0 * PI / stride), sin(-2.0 * PI / stride) }; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the cosine is the same, because its an even function
- sine is an odd function
TL;DR its enough for you to do
const float arg = pi_v<T>*2.f;
twiddle_factor = {cos(arg),sin(arg)};
if (is_inverse)
twiddle_factor.imag = -twiddle_factor.imag; // compiler should optimize to a bitflip
// Cooley-Tukey | ||
assign_mul_complex<T> mul_assign; | ||
assign_div_complex<T> div_assign; | ||
for (uint32_t stride = 2; stride <= N; stride *= 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do a loop from stride=1
to stride=N/2
instead of 2->N
because that's the actual stride between your butterfly elements, what you have right now is a width
@@ -30,6 +30,14 @@ T subgroupShuffleDown(T value, uint32_t delta) | |||
return spirv::groupShuffleDown<T>(spv::ScopeSubgroup, value, delta); | |||
} | |||
|
|||
|
|||
template<typename T> | |||
T subgroupShuffleXor(T value, uint32_t delta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the spec call it a delta? I'd call it a mask
Description
Start drafting complex.hlsl