Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify leadership election code #4908

Closed
timcharper opened this issue Dec 22, 2016 · 7 comments
Closed

Simplify leadership election code #4908

timcharper opened this issue Dec 22, 2016 · 7 comments
Assignees

Comments

@timcharper
Copy link
Contributor

timcharper commented Dec 22, 2016

Now that we've migrated to curator's LeaderLatch for leader election, it seems that our leader election code is doing WAY more than necessary. Some things I would like to see fixed:

  • remove all locks. As an example, LeaderLatch blocks and this led to a dead lock. With the amount of thread blocking we're doing, it should be no surprise.
  • AnythingBase is almost always an anti-pattern. Get rid of ElectionServiceBase. If we need a common interface, define it. Prefer to put common behavior in helper methods rather than inherit them,e tc.
  • Perform blocking IO in a thread pool designed for blocking IO. Don't block the global fork join pool.
  • Rip out state-machine-ish logic. When you get offer leadership, great, you're the leader. If you lose, then stand-by. If you transition from having leadership to not having leadership, CRASH. Curator's leader latch handles perpetually trying to obtain leadership in the event that the leader goes missing. It does not need to be "restarted" periodically.

The falling scala script illustrates how concise our leader election module should be:

https://gist.github.com/timcharper/22a1bca65e9a8268225dcfb97420cdf7

@timcharper timcharper added this to the Marathon 1.5 milestone Dec 22, 2016
@aquamatthias
Copy link
Contributor

I agree with most of your points. Some additions:

  • ElectionServiceBase: the common interface is already defined: ElectionService.
  • Perform blocking IO in a thread pool designed for blocking. This is a bigger topic for all places where we do IO. Do you suggest to use multiple different thread pools for each case?

@jasongilanfarr
Copy link
Contributor

Scala's own recommendation is to have a fixed size threadpool for blocking IO.

@jeschkies
Copy link
Contributor

The election code is using a different client instance. Should it use the same as the storage?

@jeschkies
Copy link
Contributor

I'd also like stopLeadership and startLeadership to be lock-free / non-blocking. See my comment in D379.

@timcharper timcharper changed the title Simply leadership election code Simplify leadership election code Jan 4, 2017
@timcharper
Copy link
Contributor Author

@jeschkies why even have stopLeadership ? Why not just crash?

@timcharper
Copy link
Contributor Author

timcharper commented Jan 4, 2017

@jeschkies it probably should use the same client instance but I'm not sure if Curator does any zookeeper connection pool aggregation behind the scenes.

On second thought, we should research more. What happens if there is a high amount of read/write traffic on the storage and it gets backlogged. Could we lose leadership? I'm not sure what the consequences are.

@aquamatthias aquamatthias modified the milestones: Next, Marathon 1.5 Jan 18, 2017
@marcomonaco marcomonaco modified the milestones: Marathon 1.5, Next Mar 1, 2017
@meichstedt
Copy link
Contributor

Note: This issue has been migrated to https://jira.mesosphere.com/browse/MARATHON-2008. For more information see https://groups.google.com/forum/#!topic/marathon-framework/khtvf-ifnp8.

@mesosphere mesosphere locked and limited conversation to collaborators Mar 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants