Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-warmed cluster upgrades #82

Open
ecordell opened this issue Sep 7, 2022 · 0 comments
Open

Pre-warmed cluster upgrades #82

ecordell opened this issue Sep 7, 2022 · 0 comments
Labels
priority/3 low This would be nice to have state/needs discussion This can't be worked on yet

Comments

@ecordell
Copy link
Contributor

ecordell commented Sep 7, 2022

Right now, the operator rolls out new versions of SpiceDB by updating a deployment.

New pods become available as soon as they are connected to the datastore and the dispatch ring, which means that cache is lost during upgrades. Depending on the queries SpiceDB is handling, this could cause a significant increase in latency.

Some options worth exploring:

  1. Slowly introducing new pods so that only some % of a cluster loses its cache at a time. This is probably the simplest option, but requires all dispatch API changes to be fully backwards-compatible.
  2. Traffic mirroring via external routing (similar to how flagger provides generic blue/green mirroring). Currently, spicedb-operator operates "below" the level that most of these tools work, so the scope would need to increase dramatically to include more networking/ingress concerns.
  3. Traffic mirroring via SpiceDB itself. We could introduce mirroring flags into SpiceDB itself, so that incoming traffic can be forwarded to a parallel set of nodes to fill their cache. This would require old and new clusters to be exposed under different service objects so that their hashrings don't collide.
  4. Saving and restoring the cache. Currently, SpiceDB caches exist only in memory. We could switch to a cache that syncs to the filesystem or provide apis for dumping the cache (either over the network or to disk), and the operator could ensure the caches come back in the new pods. We would likely want to switch to a StatefulSet if we try this.
@ecordell ecordell added priority/3 low This would be nice to have state/needs discussion This can't be worked on yet labels Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/3 low This would be nice to have state/needs discussion This can't be worked on yet
Projects
None yet
Development

No branches or pull requests

1 participant