Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eprof measures show gen_statem:call via gen:do_call slower #8341

Closed
ThomasArts opened this issue Apr 4, 2024 · 2 comments
Closed

Eprof measures show gen_statem:call via gen:do_call slower #8341

ThomasArts opened this issue Apr 4, 2024 · 2 comments
Assignees
Labels
bug Issue is reported as a bug duplicate

Comments

@ThomasArts
Copy link
Contributor

Describe the bug
Related to Martin's report in #8322

It seems that gen:do_call got about 10% slower on OTP27. It is very hard to deduce what could cause this.
I extracted a super simple gen_statem to show the behaviour. The total code is not much slower, but the calls to gen:do_call are according to eprof.
I avoid mixing in the binary_to_term and other operations that happen in the state machine in riak this comes from. The production state machine gets far slower in time than this one, but theat might be due to other parts as well.

To Reproduce
This little program shows a difference:

-module(lll).
-compile([export_all, nowarn_export_all]).


test(N) ->
  erlang:garbage_collect(),
  Term = [ {I, crypto:strong_rand_bytes(256)} || I <- lists:seq(1, 1_000)],
  %% N copies of the same binary
  Bin = term_to_binary(Term),
  {ok, Pid} = start(Bin),
  eprof:start(),
  eprof:start_profiling([Pid, self()]),

  %% make a lot of calls
  run(N, Pid),

  eprof:stop_profiling(),
  gen_statem:stop(Pid),
  eprof:analyze(total, [{filter, [{calls, 10}]}]),
  eprof:stop().

run(0, _) ->
  ok;
run(N, Pid) ->
  gen_statem:call(Pid, N),
  run(N-1, Pid).

%% -- gen_statem
callback_mode() ->
    state_functions.

start(Binary) ->
  gen_statem:start_link(?MODULE, Binary, []).  %% possibly need start options

init(Binary) ->
  {ok, started, Binary}.

started({call, From}, Msg, State) ->
  P = 8 * (Msg rem size(State)),
  <<_:P, Bin/binary>> = State,
  {keep_state_and_data, [{reply, From, Bin}]}.

On OTP-26.2.1 it takes about 425945 us for 1 million calls

Erlang/OTP 26 [erts-14.2.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V14.2.1 (press Ctrl+G to abort, type help(). for help)
1> c(lll).
{ok,lll}
2> lll:test(1_000_000).
FUNCTION                                     CALLS        %     TIME  [uS / CALLS]
--------                                     -----  -------     ----  [----------]
...
lll:started/3                              1000000     5.08    87949  [      0.09]
gen:reply/2                                1000000     7.33   126941  [      0.13]
erlang:demonitor/2                         1000001     7.39   127939  [      0.13]
gen_statem:loop_receive/3                  1000000    10.00   173259  [      0.17]
gen:do_call/4                              1000001    24.59   425945  [      0.43]
----------------------------------------  --------  -------  -------  [----------]
Total:                                    24000009  100.00%  1732254  [      0.07]

whereas on master (i.e. commit 8504..):

Erlang/OTP 27 [RELEASE CANDIDATE 2] [erts-14.2.3] [source-8504d0e0b8] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V14.2.3 (press Ctrl+G to abort, type help(). for help)
1> c(lll).
{ok,lll}
2> lll:test(1_000_000).
FUNCTION                                     CALLS        %     TIME  [uS / CALLS]
--------                                     -----  -------     ----  [----------]
...
lll:started/3                              1000000     5.14    97797  [      0.10]
erlang:demonitor/2                         1000001     7.05   134059  [      0.13]
gen:reply/2                                1000000     7.09   134868  [      0.13]
gen_statem:loop_receive/3                  1000000    10.15   192990  [      0.19]
gen:do_call/4                              1000001    27.65   525900  [      0.53]
----------------------------------------  --------  -------  -------  [----------]
Total:                                    24000009  100.00%  1902220  [      0.08]

it takes 100000us more.

Expected behavior
Possibly not as much performance difference as this.

Affected versions
This is OTP27. Most likely later than commit 49024e8, because there it still works equally fast, whereas the next commit make it slower, but then it gets even slower later on.

@ThomasArts ThomasArts added the bug Issue is reported as a bug label Apr 4, 2024
@ThomasArts
Copy link
Contributor Author

Turns out that this might be linked to my setup: MacOS Ventura 13.6 with M2 Pro.

@jhogberg jhogberg self-assigned this Apr 4, 2024
@jhogberg
Copy link
Contributor

jhogberg commented Apr 4, 2024

Thanks, this is a duplicate of #8322 as I can reproduce it on earlier OTP versions by fixing the virtual heap accounting bug.

@jhogberg jhogberg closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug duplicate
Projects
None yet
Development

No branches or pull requests

2 participants