Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd performance characteristics of using inputs vs. not using inputs #282

Open
PragTob opened this issue Mar 23, 2019 · 6 comments
Open
Labels

Comments

@PragTob
Copy link
Member

PragTob commented Mar 23, 2019

Reworking the README I came upon an interesting performance difference. So let's do our standard example first:

list = Enum.to_list(1..10_000)
map_fun = fn i -> [i, i * i] end


Benchee.run(
  %{
    "flat_map" => fn -> Enum.flat_map(list, map_fun) end,
    "map.flatten" => fn -> list |> Enum.map(map_fun) |> List.flatten() end
},
# inputs: %{"lol" => Enum.to_list(1..10_000) },
formatters: [{Benchee.Formatters.Console, extended_statistics: true}]
)

results in:

(...)

Name                  ips        average  deviation         median         99th %
flat_map           2.37 K      421.36 μs    ±14.80%      406.32 μs      725.85 μs
map.flatten        1.23 K      812.20 μs    ±18.43%      770.32 μs     1244.07 μs

Comparison: 
flat_map           2.37 K
map.flatten        1.23 K - 1.93x slower +390.84 μs

Extended statistics: 

Name                minimum        maximum    sample size                     mode
flat_map          345.36 μs     1199.50 μs        11.84 K                406.20 μs
map.flatten       520.54 μs     1552.89 μs         6.14 K     765.82 μs, 765.75 μs

Now let's switch it up slightly and hand the list in through inputs:

# list = Enum.to_list(1..10_000)
map_fun = fn i -> [i, i * i] end

Benchee.run(
  %{
    "flat_map" => fn list -> Enum.flat_map(list, map_fun) end,
    "map.flatten" => fn list -> list |> Enum.map(map_fun) |> List.flatten() end
},
inputs: %{"lol" => Enum.to_list(1..10_000) },
formatters: [{Benchee.Formatters.Console, extended_statistics: true}]
)

results in:

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Number of Available Cores: 8
Available memory: 15.61 GB
Elixir 1.8.1
Erlang 21.2.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: lol
Estimated total run time: 14 s

Benchmarking flat_map with input lol...
Benchmarking map.flatten with input lol...

##### With input lol #####
Name                  ips        average  deviation         median         99th %
flat_map           2.37 K      421.27 μs     ±9.58%      410.08 μs      663.05 μs
map.flatten        1.63 K      612.90 μs    ±12.73%      626.79 μs      900.98 μs

Comparison: 
flat_map           2.37 K
map.flatten        1.63 K - 1.45x slower +191.63 μs

Extended statistics: 

Name                minimum        maximum    sample size                     mode
flat_map          353.47 μs     1068.95 μs        11.84 K                409.76 μs
map.flatten       519.13 μs     1235.24 μs         8.13 K                519.95 μs

All of a sudden average and median for map.flatten are a lot better (~200μs better). The performance of flat_map stays about the same 🤷‍♀️

I tried this multiple times. It's no coincedence, the question is why - if anything I'd have thought that the version without inputs would be faster not the other way around.

Trying to summon @michalmuskala who maybe has some compiler input 🤷‍♂️

@PragTob
Copy link
Member Author

PragTob commented Mar 23, 2019

@PragTob
Copy link
Member Author

PragTob commented Mar 23, 2019

To summarize really quickly, without input performance seems a lot less stable and frequently jumping between top notch performance and bad performance:

newplot(2)

whereas with input it still seems jumpy, but at a much smaller/consistent scale:

newplot(3)

@PragTob
Copy link
Member Author

PragTob commented Mar 23, 2019

If anyone looks at this and says "SHOW ME THE CODE" - it's really not that much different, you can check https://github.com/bencheeorg/benchee/blob/master/lib/benchee/benchmark/runner.ex

The relevant parts that take the run time measurement:

  def collect(
        scenario = %Scenario{function: function},
        scenario_context = %ScenarioContext{
          num_iterations: 1
        },
        collector
      ) do
    new_input = Hooks.run_before_each(scenario, scenario_context)
    function = main_function(function, new_input)

    {measurement, return_value} = collector.collect(function)

    Hooks.run_after_each(return_value, scenario, scenario_context)
    measurement
  end

  def main_function(function, @no_input), do: function
  def main_function(function, input), do: fn -> function.(input) end

So code wise the only difference that I can think of is main_function what the function that is used looks like. Somewhat ironically, it is wrapped in another function which should make it slower... or so one would guess.

Maybe it's that the input isn't local to the function and hence other optimizations can kick in? @michalmuskala

@PragTob
Copy link
Member Author

PragTob commented Mar 23, 2019

I know I have my own level of fun talking to myself here, something that just came to my mind: With inputs the input/data is already "inside" the process we use for benchmarking, no idea how access is handled in the other case? 🤷‍♂️

@devonestes
Copy link
Collaborator

Based on those performance characteristics, it looks like maybe garbage collection is triggered with the inputs version?

Want to measure memory usage as well on there and see what the difference is? It should be the exact same, but if it’s not then that’s a good clue.

@PragTob
Copy link
Member Author

PragTob commented Mar 24, 2019

memory consumption is measured in the HTML samples I posted, and its exactly the same - was also my hope that this would show something. Maybe amount of GCs could still be different? Dunno.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants