Simplify leadership election code #4908

timcharper · 2016-12-22T20:24:11Z

Now that we've migrated to curator's LeaderLatch for leader election, it seems that our leader election code is doing WAY more than necessary. Some things I would like to see fixed:

remove all locks. As an example, LeaderLatch blocks and this led to a dead lock. With the amount of thread blocking we're doing, it should be no surprise.
AnythingBase is almost always an anti-pattern. Get rid of ElectionServiceBase. If we need a common interface, define it. Prefer to put common behavior in helper methods rather than inherit them,e tc.
Perform blocking IO in a thread pool designed for blocking IO. Don't block the global fork join pool.
Rip out state-machine-ish logic. When you get offer leadership, great, you're the leader. If you lose, then stand-by. If you transition from having leadership to not having leadership, CRASH. Curator's leader latch handles perpetually trying to obtain leadership in the event that the leader goes missing. It does not need to be "restarted" periodically.

The falling scala script illustrates how concise our leader election module should be:

https://gist.github.com/timcharper/22a1bca65e9a8268225dcfb97420cdf7

The text was updated successfully, but these errors were encountered:

aquamatthias · 2016-12-23T07:42:21Z

I agree with most of your points. Some additions:

ElectionServiceBase: the common interface is already defined: ElectionService.
Perform blocking IO in a thread pool designed for blocking. This is a bigger topic for all places where we do IO. Do you suggest to use multiple different thread pools for each case?

jasongilanfarr · 2016-12-23T19:56:48Z

Scala's own recommendation is to have a fixed size threadpool for blocking IO.

jeschkies · 2017-01-02T16:44:03Z

The election code is using a different client instance. Should it use the same as the storage?

jeschkies · 2017-01-04T10:14:20Z

I'd also like stopLeadership and startLeadership to be lock-free / non-blocking. See my comment in D379.

timcharper · 2017-01-04T22:40:26Z

@jeschkies why even have stopLeadership ? Why not just crash?

timcharper · 2017-01-04T22:44:47Z

@jeschkies it probably should use the same client instance but I'm not sure if Curator does any zookeeper connection pool aggregation behind the scenes.

On second thought, we should research more. What happens if there is a high amount of read/write traffic on the storage and it gets backlogged. Could we lose leadership? I'm not sure what the consequences are.

meichstedt · 2017-03-07T10:47:11Z

Note: This issue has been migrated to https://jira.mesosphere.com/browse/MARATHON-2008. For more information see https://groups.google.com/forum/#!topic/marathon-framework/khtvf-ifnp8.

timcharper added this to the Marathon 1.5 milestone Dec 22, 2016

timcharper changed the title ~~Simply leadership election code~~ Simplify leadership election code Jan 4, 2017

aquamatthias modified the milestones: Next, Marathon 1.5 Jan 18, 2017

jeschkies mentioned this issue Feb 9, 2017

Jenkins Pipeline timeout does not work for deadlocked jobs #5036

Closed

marcomonaco modified the milestones: Marathon 1.5, Next Mar 1, 2017

marcomonaco added debt labels Mar 1, 2017

marcomonaco assigned meichstedt Mar 1, 2017

marcomonaco added the ready label Mar 1, 2017

meichstedt added in progress and removed ready labels Mar 6, 2017

aquamatthias closed this as completed Mar 27, 2017

aquamatthias removed the ready label Mar 27, 2017

mesosphere locked and limited conversation to collaborators Mar 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify leadership election code #4908

Simplify leadership election code #4908

timcharper commented Dec 22, 2016 •

edited by aquamatthias

aquamatthias commented Dec 23, 2016

jasongilanfarr commented Dec 23, 2016

jeschkies commented Jan 2, 2017

jeschkies commented Jan 4, 2017

timcharper commented Jan 4, 2017

timcharper commented Jan 4, 2017 •

edited

meichstedt commented Mar 7, 2017

Simplify leadership election code #4908

Simplify leadership election code #4908

Comments

timcharper commented Dec 22, 2016 • edited by aquamatthias

aquamatthias commented Dec 23, 2016

jasongilanfarr commented Dec 23, 2016

jeschkies commented Jan 2, 2017

jeschkies commented Jan 4, 2017

timcharper commented Jan 4, 2017

timcharper commented Jan 4, 2017 • edited

meichstedt commented Mar 7, 2017

timcharper commented Dec 22, 2016 •

edited by aquamatthias

timcharper commented Jan 4, 2017 •

edited