Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce snapshot disk I/O #346

Open
suizman opened this issue Jul 4, 2019 · 5 comments
Open

Reduce snapshot disk I/O #346

suizman opened this issue Jul 4, 2019 · 5 comments

Comments

@suizman
Copy link

suizman commented Jul 4, 2019

In our project QED the FSM persists the data on disk. On high loads this is very disk intensive task. It would be great to be able to take snapshots on-demand instead of doing at recurrent intervals.

Also we'd like to stream the snapshot directly to the nodes instead of waiting to be written to disk first and then send it over the network.

Are they any plans to add this functionality?

@schristoff
Copy link
Contributor

Hey @suizman,
For snapshots on demand there is this function in the API which allows for user triggered snapshots. When this is called a new random timeout is set for the configured snapshot interval.

I'm a little confused on the second question, could you give me more insight into what your use case and problem you're encountering?

@suizman
Copy link
Author

suizman commented Oct 14, 2019

@s-christoff sure, but what we needed to do is skip the periodic Snapshots and only take them on demand. We already implemented in our project a custom Snapshot strategy on top of Hashicorp's raft.

What I mean by streaming the snapshots is that right now the Snapshot must be written in leader disk before replaying it to the follower nodes. It would be great to have the possibility to stream them directly through the followers instead of waiting to be written on disk first.

For now, we're fine with our implementation for our use case with RocksDB. But It would be great to see this functionality in this library.

@stale stale bot removed the waiting-reply label Oct 14, 2019
@travisjeffery
Copy link

Streaming snapshots would be great. Here's a use case: Let's say the data managed by the FSM is already compacted, if you snapshot to a file then you need double the storage capacity of what you actually store. Whereas if you stream the snapshot then theoretically you don't need any more disk than what you store.

@travisjeffery
Copy link

Though I suppose you'd many of the benefits using an S3 snapshot store or something.

@ncabatoff
Copy link
Contributor

I'm reluctant to provide another option to opt-out of periodic snapshots, since their side effect (compaction) is needed for a healthy raft cluster. It is already possible to configure SnapshotInterval and SnapshotThreshold high enough that it doesn't happen.

We're open to the idea of reducing disk I/O for snapshots. Vault does take some steps in this direction, but there's more work to be done. I'm going to recast this issue to focus on that side of things.

@ncabatoff ncabatoff changed the title Support OnDemand Snapshots Reduce snapshot disk I/O Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants