Inform queue leaders of cluster node status changes #356

SimonUnge · 2023-02-24T17:48:13Z

Add a new optional callback, or similar, that gets called when a node joins or leaves the Erlang cluster.
The callback can take decisions on what to do with this information, such as adding or removing the node
as one of its members.

Suggestion:
Update ra_server_proc:leader state, that already handles nodeup/nodedown, to call a new optional callback.
To cause a randomized delay, perhaps add a erlang:send_after with a new info message, something like
(erlang:send_after(SOMERANDOMNUMBER, self(), {delayed_node_status_update, Node, Status}))

and a new clause to leader, something like

leader(info, {delayed_node_status_update, Node, Status}, State0) ->
    Effects = ra_server:NEW_OPT_CALLBACK(State0#state.server_state, Node, Status),
    {State, Actions} = ?HANDLE_EFFECTS(Effects,
                                        cast, State0),
    {keep_state, State, Actions};

Would perhaps also be good to send along the members of the raft, so that the user code does not have to call ra:members()

The text was updated successfully, but these errors were encountered:

kjnilsson · 2023-02-27T12:04:56Z

Yes something like that. I think, however, we can be somewhat more ambitious in the API perhaps.

You are right we should pass the current member configuration along with the call. In fact we may even want to pass the replication state (i.e. last confirmed index) as well so that we have some kind of "freshness" indicator. For example we may not want to auto-grow if one of the members is substantially behind the others.

The call should return a list of modifications and the Ra leader will spawn a transient process to perform these changes in turn (start a new ra server for example and join it to the cluster, then wait for replication to catch up before continuing). Whilst it is performing the modifications this callback will not be called, unless another node change is detected.

The Ra leader can then ensure that any shared configuration is properly consistent across members (something we have to ensure manually ourselves atm).

SimonUnge · 2023-02-28T19:21:47Z

Got it. So, similar to handle_aux, but more specific. We could re-use the logic of PID monitoring too perhaps, to run one change at a time, in sequential order.

So, perhaps add something like init_nodes_status/1, handle_node_status/N, with a new monitor, [{monitor, process, node_status, Pid}] - realize there are already handle_node_status funs, but some similar name.

SimonUnge · 2023-03-01T21:56:00Z

@kjnilsson I have a simple prototype working, and using a gen_statem timeout to trigger the handle_status actions. But, I am wondering what is a good way to make sure timeouts happen on the right node. I.e if a leader triggers a delayed timeout, then for some reason becomes a follower before the timeout triggers...

SimonUnge · 2023-03-02T18:40:45Z

I am currently setting this timeout trigger to all states, and moving the headache to the callback implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inform queue leaders of cluster node status changes #356

Inform queue leaders of cluster node status changes #356

SimonUnge commented Feb 24, 2023 •

edited

kjnilsson commented Feb 27, 2023

SimonUnge commented Feb 28, 2023 •

edited

SimonUnge commented Mar 1, 2023 •

edited

SimonUnge commented Mar 2, 2023

Inform queue leaders of cluster node status changes #356

Inform queue leaders of cluster node status changes #356

Comments

SimonUnge commented Feb 24, 2023 • edited

kjnilsson commented Feb 27, 2023

SimonUnge commented Feb 28, 2023 • edited

SimonUnge commented Mar 1, 2023 • edited

SimonUnge commented Mar 2, 2023

SimonUnge commented Feb 24, 2023 •

edited

SimonUnge commented Feb 28, 2023 •

edited

SimonUnge commented Mar 1, 2023 •

edited