-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault with raft storage failed to join after applying changes in manifest #1627
Comments
@revathyr13 - did you ever figure out how to resolve vault-1 from crashlooping? Seeing something similar in one of our k8s clusters where vault-1 is crashlooping, however the raft cluster appears healthy (from vault-2):
|
Thank you for your contribution! This issue has been automatically marked as |
This issue has been marked |
Thank you for your contribution! This issue has been automatically marked as |
Hello Team,
We are facing auto join issues with vault HA raft cluster in kubernetes.
vault.txt
Issue
We are deploying the vault cluster in kubernetes as raft storage. The unseal keys are stored in our master vault.
The manifest we are using to deploy the vault cluster is attached . For the first time the deployment looks good. All vault pods will come up without any issues.
NAME READY STATUS RESTARTS AGE
vault-0 3/3 Running 0 9m44s
vault-1 3/3 Running 0 5m32s
vault-2 3/3 Running 0 49s
But once if make any change in manifest [say change settings veleroEnabled: true to veleroEnabled: false] and re-apply the changes using kubectl apply -f vault.yaml , the pod vault-2 will come up with applied change and it will show in ready state. But vault-1 wont comeup.
vault-0 3/3 Running 0 44m
vault-1 1/2 CrashLoopBackOff 12 40m
vault-2 2/2 Running 0 40m
While digging we noticed vault-2 is just showing in ready status whereas it's not joined to any of the cluster
$ kubectl exec -ti vault-2 sh
/ # export VAULT_ADDR='https://vault-2:8200'
/ # vault operator raft list-peers
No raft cluster configuration found
As it cameup with ready status, the operator starts to apply changes in vault-1 which breaks the whole raft cluster.
It shouldn't happen. The vault-2 should come up in ready state only once it joined to existing cluster which is not happening here.
Logs from vault-1
2022-05-23T06:28:40.757Z [INFO] core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2022-05-23T06:28:40.780Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="could not retrieve raft bootstrap package"
2022-05-23T06:28:40.780Z [ERROR] core: failed to join raft cluster: error="timed out on raft join: %!w()"
Logs from vault-2
2022-05-23T06:31:20.243Z [DEBUG] core: forwarding: error sending echo request to active node: error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing remote error: tls: internal error""
2022-05-23T06:31:25.243Z [DEBUG] core: forwarding: error sending echo request to active node: error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing remote error: tls: internal error"
We believe increasing the readiness limits or adding initialDelaySeconds in the operator will help. As its a operator side issue, can someone please have a look and assist.
Thank you
The text was updated successfully, but these errors were encountered: