Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental checkpoint serialisation #111

Open
adamgundry opened this issue Nov 12, 2018 · 2 comments
Open

Incremental checkpoint serialisation #111

adamgundry opened this issue Nov 12, 2018 · 2 comments

Comments

@adamgundry
Copy link
Contributor

At the moment, writing a checkpoint causes acid-state to realise the entire serialised representation in memory before it gets written to disk. This can be a significant memory cost for server applications with a
large state. It seems unavoidable with the existing archive backend, because the format consists of (length, CRC, bytestring) where the first two fields are not known until the bytestring is fully evaluated. However, we should be able to do better with an alternate backend (given #96) that stores chunk lengths rather than a single overall length, and moves the CRC to the end.

This raises a question: what should we do if an exception is thrown during checkpoint serialisation? In particular, this can happen if user code stores an unevaluated error thunk in the state. We already fail to handle this case gracefully (see #38). Should we simply document that the state must never contain pure exceptions? Given the possibility of multiple checkpoints per file, it seems hard to recover from a partially-written checkpoint.

@stepcut
Copy link
Member

stepcut commented Nov 12, 2018

For what it's worth, I think some people consider 'multiple checkpoints per file' to be a misfeature. multiple events per file makes sense -- it saves a lot of overhead that would otherwise make acid-state way too slow.

But I am not sure there is any advantage of multiple checkpoint per file -- it is just something that can happen due to the current implementation. In fact, people will use a combination of createCheckpoint and createArchive to try to ensure that they do not get multiple checkpoints in the same file.

@adamgundry
Copy link
Contributor Author

Right, I was wondering if it might be worth moving to one-checkpoint-per-file. I haven't thought about implementation / backwards compatibility, but it seems like it would make things simpler.

Perhaps we can ensure that an exception during checkpoint serialisation merely aborts the current checkpoint (perhaps leaving a half-written file on disk) and throws the exception from createCheckpoint? Though we'd somehow have to make sure that the partial checkpoint was ignored by createArchive...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants