We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISPC generates suspicious code for rcp_fast function for avx512spr-x32 and avx512spr-x64 targets.
rcp_fast
The following example
uniform double foo(uniform double x) { return rcp_fast(x); }
compiled with
ispc --target=avx512spr-x32 test.c ...
generates
.LCPI1_0: .quad 0x46c8a6e32246c99c # double 9.9999999999999995E+32 .LCPI1_1: .quad 0x3914c4e977ba1f5c # double 1.0000000000000001E-33 .LCPI1_3: .quad 0x4000000000000000 # double 2 .LCPI1_2: .long 0x3f800000 # float 1 foo___und: # @foo___und vmovsd xmm1, qword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero vucomisd xmm1, xmm0 jb .LBB1_3 vucomisd xmm0, qword ptr [rip + .LCPI1_1] jb .LBB1_3 vcvtsd2ss xmm1, xmm0, xmm0 vmovss xmm2, dword ptr [rip + .LCPI1_2] # xmm2 = mem[0],zero,zero,zero vdivss xmm1, xmm2, xmm1 vcvtss2sd xmm1, xmm1, xmm1 vmovsd xmm2, qword ptr [rip + .LCPI1_3] # xmm2 = mem[0],zero vmovapd xmm3, xmm0 vfnmadd213sd xmm3, xmm1, xmm2 # xmm3 = -(xmm1 * xmm3) + xmm2 vmulsd xmm1, xmm3, xmm1 vfnmadd213sd xmm0, xmm1, xmm2 # xmm0 = -(xmm1 * xmm0) + xmm2 vmulsd xmm0, xmm1, xmm0 ret .LBB1_3: vmovq rax, xmm0 movabs rcx, 9214364837600034816 and rcx, rax movabs rax, 9209861237972664319 sub rax, rcx vmovq xmm1, rax vmulsd xmm2, xmm1, xmm0 vcvtsd2ss xmm2, xmm2, xmm2 vmovss xmm3, dword ptr [rip + .LCPI1_2] # xmm3 = mem[0],zero,zero,zero vdivss xmm2, xmm3, xmm2 vcvtss2sd xmm2, xmm2, xmm2 vmulsd xmm1, xmm1, xmm2 vmovsd xmm2, qword ptr [rip + .LCPI1_3] # xmm2 = mem[0],zero vmovapd xmm3, xmm0 vfnmadd213sd xmm3, xmm1, xmm2 # xmm3 = -(xmm1 * xmm3) + xmm2 vmulsd xmm1, xmm1, xmm3 vfnmadd213sd xmm0, xmm1, xmm2 # xmm0 = -(xmm1 * xmm0) + xmm2 vmulsd xmm0, xmm1, xmm0 ret
Whereas for avx512spr-x16, it is just a single instruction
foo___und: # @foo___und vrcp14sd xmm0, xmm0, xmm0 ret
The text was updated successfully, but these errors were encountered:
It's this TODO:
TODO
ispc/builtins/target-avx512spr-x32.ll
Line 536 in ea4617c
Here's this code in -x16 version:
-x16
ispc/builtins/target-avx512spr-x16.ll
Line 22 in ea4617c
rcp_fast_* should map to pure instructions/intrinsics without extra refining steps.
rcp_fast_*
Sorry, something went wrong.
No branches or pull requests
ISPC generates suspicious code for
rcp_fast
function for avx512spr-x32 and avx512spr-x64 targets.The following example
compiled with
generates
Whereas for avx512spr-x16, it is just a single instruction
The text was updated successfully, but these errors were encountered: