Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass free variables as parameters when possible for tight loops #111

Open
melsman opened this issue Oct 11, 2022 · 0 comments
Open

Pass free variables as parameters when possible for tight loops #111

melsman opened this issue Oct 11, 2022 · 0 comments
Assignees

Comments

@melsman
Copy link
Owner

melsman commented Oct 11, 2022

It turns out that passing free variables that are not of function type as parameters, as opposed to fetching them from the closure may be beneficial for tight loops. One example appears in the benchmark mlkit-bench/benchmarks/mandelbrot.sml. Here the two variables c_re and c_im are free in the function loop3. Two optimisations will almost double the performance of the code:

  1. passing the variables c_re and c_im as parameters (they will then be unboxed and passed in floating point registers.)
  2. passing the constant 4.0 as a parameter to the function.

It is straightforward to verify the effect of the two optimisations; here is how the function loop3 can be hand-optimised by the programmer:

fun loop3 (count, z_re : real, z_im : real, c_re, c_im, four) = 
  if (count < maxCount)
  then let val z_re_sq = z_re * z_re
           val z_im_sq = z_im * z_im
       in if (z_re_sq + z_im_sq) > four
          then count
          else let val z_re_im = z_re * z_im
               in loop3 (count+0w1,
                         z_re_sq - z_im_sq + c_re,
                         z_re_im + z_re_im + c_im,
                         c_re,
                         c_im,
                         four)
               end
       end
  else count

Notice that this optimisation is not in conflict with the function specialisation optimisation that specializes recursive functions that, invariantly, take other functions as parameters.

Here is the resulting assembler code - labels are slightly shortened:

FLab.loop3:
	leaq -8(%rsp),%rsp
	movq $0x400,%r11
	cmpq %r11,%rax
	jae .LLab.k33
	movsd %xmm0,%xmm6
	mulsd %xmm0,%xmm6
	movsd %xmm1,%xmm10
	mulsd %xmm1,%xmm10
	movsd %xmm6,%xmm8
	addsd %xmm10,%xmm8
	ucomisd %xmm4,%xmm8
	ja .LLab.k33
	mulsd %xmm1,%xmm0
	addq $0x1,%rax
	subsd %xmm10,%xmm6
	addsd %xmm2,%xmm6
	addsd %xmm0,%xmm0
	addsd %xmm3,%xmm0
	movsd %xmm0,%xmm1
	movsd %xmm6,%xmm0
	leaq 8(%rsp),%rsp
	jmp FLab.loop3
	.p2align 0x4
.LLab.k33:
	movq %rax,%rdi
	leaq 8(%rsp),%rsp
	popq %r11
	jmp *%r11

Compared to the original version of mandelbrot, we have also changed the parameter count to be of type word, which turns out not to have a huge effect (only one jo __overflow instruction is saved, after the addq instruction).

@melsman melsman self-assigned this Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant