better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

supershabam · 2016-11-19T00:12:15Z

The vulcan cachers need to keep a configurable window of metrics in memory for the partitions they are responsible for (e.g. 4 hours). Backfilling this window takes time, so when a new vulcan cacher comes online (or goes offline) and the group membership changes, partitions are reshuffled and are assigned based on the kafka protocol.

We have tried a simple HashRing protocol so that reassignments are minimal when cachers come and go. But, the HashRing does little to ensure that each cacher is evenly balanced.

The RoundRobin is better than HashRing right now since each cacher can operate with similar performance since partitions are evenly distributed amongst online cachers. However, when a new cacher comes online (or goes offline) the topics are reassigned with no regard for minimizing partition ownership changes.

With both RoundRobin and HashRing, we do not have redundancy. If a cacher goes away, the partitions that it owned will be re-assigned to alive cachers, but it will take a while for the alive cachers to backfill the window of data they need to actually serve queries for that partition.

Ideally, we can have a kafka protocol that ensures a partition is handled by more than one cacher and when a new cacher comes online, or goes offline, it minimizes the cacher-partition assignment changes.

supershabam mentioned this issue Nov 19, 2016

cacher: use grpc for internal api #81

Merged

supershabam added this to the v0.1.0 milestone Dec 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

supershabam commented Nov 19, 2016

better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

Comments

supershabam commented Nov 19, 2016