Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Analyse the active snapshot upload when corresponding etcd shutdown/crashes. #572

Open
ishan16696 opened this issue Jan 9, 2023 · 0 comments
Labels
kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage) priority/4 Priority (lower number equals higher priority)

Comments

@ishan16696
Copy link
Member

ishan16696 commented Jan 9, 2023

Feature (What you would like to be added):
Analyse the behaviour of etcd-backup-restore during active snapshot upload and when its corresponding etcd process shutdown/crashes.

Motivation (Why is this needed?):
Currently when corresponding etcd shutdown/crashes while backup-restore is uploading the snapshot, snapshotter wait for context timeout set by flag --etcd-snapshot-timeout (15mins in our production) untill current snapshot upload is finished.

  • In case of single node etcd,
    • If etcd restarts, then backup-restore will starts the initialisation phase while full-snapshot is currently being uploaded.
  • In case of multi-node etcd,
    • If etcd leader restarts, then it might possible that it will become follower(lost leadership) and other cluster member will become cluster leader, hence there will be new backup-leader who will also try to take snapshot while full-snapshot is currently being uploaded by previous backup-leader.
@ishan16696 ishan16696 added the kind/enhancement Enhancement, improvement, extension label Jan 9, 2023
@ishan16696 ishan16696 added the priority/4 Priority (lower number equals higher priority) label Jan 9, 2023
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Sep 19, 2023
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage) priority/4 Priority (lower number equals higher priority)
Projects
None yet
Development

No branches or pull requests

2 participants