Skip to content
This repository has been archived by the owner on Nov 9, 2017. It is now read-only.

better kafka protocol: evenly distribute, oversubscribe partitions, and minimize rebalance #80

Open
supershabam opened this issue Nov 19, 2016 · 0 comments
Projects
Milestone

Comments

@supershabam
Copy link
Contributor

The vulcan cachers need to keep a configurable window of metrics in memory for the partitions they are responsible for (e.g. 4 hours). Backfilling this window takes time, so when a new vulcan cacher comes online (or goes offline) and the group membership changes, partitions are reshuffled and are assigned based on the kafka protocol.

We have tried a simple HashRing protocol so that reassignments are minimal when cachers come and go. But, the HashRing does little to ensure that each cacher is evenly balanced.

The RoundRobin is better than HashRing right now since each cacher can operate with similar performance since partitions are evenly distributed amongst online cachers. However, when a new cacher comes online (or goes offline) the topics are reassigned with no regard for minimizing partition ownership changes.

With both RoundRobin and HashRing, we do not have redundancy. If a cacher goes away, the partitions that it owned will be re-assigned to alive cachers, but it will take a while for the alive cachers to backfill the window of data they need to actually serve queries for that partition.

Ideally, we can have a kafka protocol that ensures a partition is handled by more than one cacher and when a new cacher comes online, or goes offline, it minimizes the cacher-partition assignment changes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
v0.1.0
Icebox
Development

No branches or pull requests

1 participant