Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hook to manually control redistribution and a cast message to manually trigger redistribution. #253

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

veedo
Copy link

@veedo veedo commented Aug 14, 2022

The allow_handoff function determines if the hand off is allowed to proceed when the node wants to eject a process.
This has the benefit of providing maximum flexibility for the super users that need it, but the function signatures feel simple.
An added benefit is that the user can manually trigger redistribution with a different function if the situation calls for it.
It's implemented as an option list in the handoff so that it can be further extended in the future.

I finally got around to working on this as we discussed in #198 and #197.

The customization seems to work really well in my project. I can look at the source/destination nodes and the child spec and make more complex decisions about whether the redist should proceed.

Add cast to manually trigger redistribution.

The allow_handoff function determines if the hand off is allowed to
proceed. This has the benefit of providing maximum flexibility for the
super users that need it.
An added benefit is that the user can manually trigger redistribution
with a different function if the situation calls for it.
@veedo
Copy link
Author

veedo commented Aug 14, 2022

Tests pass when running locally. Seems like a timing issue in the test that failed on that build.

@veedo
Copy link
Author

veedo commented Feb 25, 2023

@derekkraan Any issues/feedback? I can try a different style if it smells bad 😛

@derekkraan
Copy link
Owner

Hi @veedo,

Thanks for the PR. Sorry it took me so long to get to it.

I'm not so sure about the redistribute_children/2 function. Can you give me an example of when you would use this in your code?

@veedo
Copy link
Author

veedo commented Mar 4, 2023

The simplest example is just to delay redistribution to a quiet time on the network/device.

Task.start(fn ->
  Process.sleep(60_000)
  for sup <- supervisors, do: DynamicSupervisor.redistribute_children(sup)
end)

This still allows new processes to be Added when the node starts up and those will get load balanced.
Those processes already being load balanced will reduce the amount of re-balancing that must occur later.

A more non trivial example is closer to our application.
I am using, or plan to use, the allow handoff function for a few purposes:

  • Node affinity
    • Keep some processes together
    • Only re-distribute to a subset of nodes
  • Smooth out redistribution
    • Call redistribute_children multiple times with different handoff functions
    • Can be as simple as preventing a move 50% of the time
    • Can use a function that only redistributes a specific group of processes.
  • Don't move certain processes unless it is to recover functionality

example of keeping some processes together:
The node affinity is baked into the child_spec

def keep_process_on_best_nodes({current_node, chosen_node, _, _}) when current_node == chosen_node do
  true
end
def keep_process_on_best_nodes({_current_node, chosen_node, child_spec, _child_pid}) do
  best_nodes = Locality.get_nodes(child_spec)
  best_nodes_all_dead = Enum.all?(best_nodes, &(&1 not in [node() | Node.list()]))
  (chosen_node in best_nodes) or best_nodes_all_dead
end

...

DynamicSupervisor.redistribute_children(sup, &keep_process_on_best_nodes/1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants