Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TTL event watch failed to get start revision" #289

Open
tamalsaha opened this issue Mar 17, 2024 · 3 comments
Open

"TTL event watch failed to get start revision" #289

tamalsaha opened this issue Mar 17, 2024 · 3 comments

Comments

@tamalsaha
Copy link

I have been running an k8s extended apiserver whose data is stores in kine sqlite in a sidecar. After running nonstop for 8 hours, kine failed, all data is gone and I only see "TTL event watch failed to get start revision" error in the kine logs. What might be the cause of this issue. I have generally seen kine to result in this types of stability issues before. Any help will the appreciated.

@brandond
Copy link
Contributor

brandond commented Mar 17, 2024

all data is gone

It sounds like you forgot to put the database file for the kine sidecar on a real volume, and it was instead stored on tmpfs or simply within the container filesystem and was discarded when the sidecar restarted. You've not provided any logs so I can't say why it failed, but I can recommend that you keep the database file on a volume next time, if you want it to persist across container restarts.

I have generally seen kine to result in this types of stability issues before.

We have not seen stability issues like this. Kine is used daily by thousands of k3s users without issue.

@tamalsaha
Copy link
Author

@brandond , thanks for the quick response.

I think I am keeping the data inside a PVC (/var/data). You can see my helm chart here: https://github.com/kubeops/installer/blob/master/charts/scanner/templates/statefulset.yaml#L192-L211

When this happened, I tried to recover by restarting the kine pod. That did not fix the error. I had to stop the kine pod, delete the PVC, get a fresh PVC and restart kine pod to get everything back online. Obviously the data was lost. I am using DigitalOcean's Kubernetes service. So, it was PVC on their cloud. Not sure if that helps. From the looks of it, it seems that the sqlite.db file got corrupted some way and fresh start was the only solution.

@brandond
Copy link
Contributor

brandond commented Mar 18, 2024

That yaml would have been useful information to include in the original report.

Without actual logs from kine its hard to say what might have been going on. You haven't even included the full TTL event watch failed to get start revision error message; it should have included an error cause as part of the message.

If you can provide full logs, or steps to reproduce, please do so. Otherwise I'm liable to close this out due to insufficient information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants