-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support point-in-time backups #52
Comments
You are right regarding the restore procedure. Restoring is an one-off activity. |
#56 🎉 |
This assumes continuous stream of data 24x7x365, that does not apply to all cases. In our case the stream runs for X hours per day only, the backup happens only after that and is actually intended as a daily backup/snapshot of data. I think there should be a way to (internally) detect that there hasn't been any new messages for X amount of time (possibly configurable interval) after which the backup process would gracefully exit and thus terminating the process. Another (possibly simpler) alternative to this would be to only backup messages up to the timestamp of when backup was started. Not sure how this would play together with backing up offsets. Maybe first backup offsets, then we know the timestamp at which we backed up offsets and we can backup messages up to that timestamp. |
I see your point. Yeah, probably it would be nice to have a way to do point-in-time backups 🤔 What you can do in your case:
I understand that this is quite a common use case and I will provide more documentation for that with #2. For v0.1 Documentation is the last big issue so hopefully this should happen soonish ;) I see following approach
How to detect when a backup is "finished" (only applicable if the
What do you think? |
The issue is exactly with this step. We cannot keep it running in background. We only have a specific window when we can do the snapshot. It is not up to us to decide when we can do backup it is an external regulatory requirements.
Yes, that is exactly what I meant and I think this would remove requirement of having it running in background (and trying to catch the moment when all producers are done).
I think this option is mutual exclusive with the other one. And I think first one is better as it gives a specific reference point and does not rely on finding a window when there are no messages. |
Actually I wanted to write that this is nearly impossible with Kafka, but while writing I got an idea for a solution: The So now there is a clear path how to do a (more-or-less) point-in-time backup:
You see that this is really not that trivial. My current focus is to improve the test suite and stabilize Kafka Backup for a first release (See https://github.com/itadventurer/kafka-backup/milestone/1). I cannot give you an ETA for that feature. I would be more than happy to review a PR for that (and I am also searching for additional maintainers ;) ) I am happy to help if there are any questions |
I am more on a operations side of things (like setting up, monitoring Kafka clusters etc). So I trust you on this part. My point being is that from my side of work this is something me (and pretty sure many others) do need.
I am not that great with Java/Scala to be of much help here. If it were Python, C/C++ or at the very least Go I could help :P |
Hello! |
Hey, |
I am about to publish completely separate implementation written in Go that doesn’t rely on connect API. Just FYI. We are already using it in our production environment. |
@akamensky could you share your solution? as far as you have tested your solution it'll be fine |
Thank you @WesselVS for your PR #99! I have just merged it to master. Will do a release with this enhancement and some other fixes soonish. @akamensky Cool! Great to see some more work regarding Kafka Backups ;) |
This is a one-off tool (means it does not need to run in background after backup is done), so the reliance on background daemon process is funny. There is no need to run kafka-connect as a daemon at all.
The text was updated successfully, but these errors were encountered: