Returning multiple arguments via struct #495

hugomg · 2021-10-15T02:28:53Z

I wonder if it would be faster to return multiple arguments via struct, instead of by passing one pointer for each argument.

typedef struct {
    lua_Number out1;
    lua_Integer out2;
} R1;

R1 function_02()
{
    R1 ret;
    ret.out1 = 3.14;
    ret.out2 = 42;
    return ret;
}

Possible advantages:

In some arquitectures, if the struct is small enough then it is returned via CPU registers.
If the struct is passed by reference, then it is a single pointer, instead of one pointer per argument
If we take the address of one of the x1 variables, that may stop it from being stored in a register.

Possible disadvantages:

The code would be more complicated
We need to test it to see if it is actually faster

The text was updated successfully, but these errors were encountered:

srijan-paul · 2021-10-15T11:05:45Z

Would it be useful to benchmark something like this before making any changes?
If the implementation turns out to be simple enough, we could perhaps even run the benchmarks after we have it working already. (Instead of editing C code like we did when testing out upvalue box merging).

hugomg · 2021-10-15T12:19:56Z

I agree that this needs to be benchmarked first, before we merge it.

I expect that the implementation will be complex enough that editing the generated code by hand may still be worth it. That said, if someone wants to try implementing the full thing straight away then I won't stop them.

hugomg · 2021-10-16T04:05:40Z

I did some tests for the N=2 case on some artificial microbenchmarks, with three variants

returning a struct

size_t s = 0;
for (size_t i = 0; i < N; i++) {
    Ret ret = foo(x)
    s += ret.a;
    s += ret.b;
    bar();
    s += ret.a;
    s += ret.b;
}

returning via pointers, assigning to "x1" variables

size_t s = 0;
size_t a, b;
for (size_t i = 0; i < N; i++) {
    foo(x, &a, &b)
    s += a;
    s += b;
    bar();
    s += a;
    s += b;
}

returning via pointers, but assing to temporary variables.

size_t s = 0;
size_t a,b;
for (size_t i = 0; i < N; i++) {
    {
        size_t c, d;
        foo(x, &c, &d);
        a = c; b = d;
    }
    s += a;
    s += b;
    bar();
    s += a;
    s += b;
}

In this extremely artificial microbenchmark, option 1 and 3 took about 6% less time than option 2. This is an extreme example, and the body of the foo and bar functions is extremely simple (it just returns the argument it receives). I would expect that the performance improvement would be less if the body of foo and bar were larger.

Based on this, the performance angle doesn't seem very impressive for N = 2. However, we may want to consider at least returning to a temporary variable, to avoid taking the address of a x1 variable.

I still haven't tested what happens with N >= 3.

hugomg added the enhancement New feature or request label Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning multiple arguments via struct #495

Returning multiple arguments via struct #495

hugomg commented Oct 15, 2021

srijan-paul commented Oct 15, 2021

hugomg commented Oct 15, 2021

hugomg commented Oct 16, 2021

Returning multiple arguments via struct #495

Returning multiple arguments via struct #495

Comments

hugomg commented Oct 15, 2021

srijan-paul commented Oct 15, 2021

hugomg commented Oct 15, 2021

hugomg commented Oct 16, 2021