You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
constants that are filled with 1s from one side and 0s from another side, such as 0xFFFFFFF8 or 0x000000FF, can be computed directly rather than being broadcasted from memory which should be faster. these numbers are common such as 1, -8, 255, ...
if -1 is already present in a register, then vpcmpeqd is not needed and this will be just one instruction.
constants with 1s in the middle can be computed in similar way, perhaps faster than broadcast, should be faster if -1 is present. also common (2, 4, -2.0, 0.5, ...)
if a negative number is present, vpabsd can be used to get the positive value.
duplicate of a number can be computed via vpaddd.
if -1 is present, complement of a constant can be computed using vpxor.
adjacent numbers can be computed by adding or subtracting -1.
for some numbers, vpsubd or vpaddd can be used instead of double shifts to reduce port contention. for example, to get the number 2, compute (~0 >> 30) + ~0 instead of (~0 >> 31 << 1). (provided that -1 is present)
caveats:
these methods rely on having a register set to -1, which may induce register spilling in certain cases. However, in smaller code sections or where -1 is already available, these methods may be beneficial.
using too many shifts can lead to contention on execution ports. so depending on what instructions are scheduled, it might be better to use broadcast to better utilize ports.
The text was updated successfully, but these errors were encountered:
This tricks are implemented by LLVM backend (codegen), ISPC can handle it, but preferably it should be done in LLVM. I suggest verifying that LLVM doesn't do that for C/C++ code (using vector extension) and file this in LLVM project - and linking this issue, so we make sure that it happens in ISPC once it's implemented.
It's important to note in the LLVM issue, that it's for vector constants - as they would expect that it's for scalar by default.
constants that are filled with 1s from one side and 0s from another side, such as
0xFFFFFFF8
or0x000000FF
, can be computed directly rather than being broadcasted from memory which should be faster. these numbers are common such as1
,-8
,255
, ...if
-1
is already present in a register, thenvpcmpeqd
is not needed and this will be just one instruction.constants with 1s in the middle can be computed in similar way, perhaps faster than broadcast, should be faster if
-1
is present. also common (2
,4
,-2.0
,0.5
, ...)similar trick can be used to compute constants with 0s in the middle using a shift and a rotate. (AVX512 only)
if a negative number is present,
vpabsd
can be used to get the positive value.duplicate of a number can be computed via
vpaddd
.if
-1
is present, complement of a constant can be computed usingvpxor
.adjacent numbers can be computed by adding or subtracting
-1
.for some numbers,
vpsubd
orvpaddd
can be used instead of double shifts to reduce port contention. for example, to get the number2
, compute(~0 >> 30) + ~0
instead of(~0 >> 31 << 1)
. (provided that -1 is present)caveats:
-1
, which may induce register spilling in certain cases. However, in smaller code sections or where-1
is already available, these methods may be beneficial.The text was updated successfully, but these errors were encountered: