Incremental checkpoint serialisation #111

adamgundry · 2018-11-12T10:57:56Z

At the moment, writing a checkpoint causes acid-state to realise the entire serialised representation in memory before it gets written to disk. This can be a significant memory cost for server applications with a
large state. It seems unavoidable with the existing archive backend, because the format consists of (length, CRC, bytestring) where the first two fields are not known until the bytestring is fully evaluated. However, we should be able to do better with an alternate backend (given #96) that stores chunk lengths rather than a single overall length, and moves the CRC to the end.

This raises a question: what should we do if an exception is thrown during checkpoint serialisation? In particular, this can happen if user code stores an unevaluated error thunk in the state. We already fail to handle this case gracefully (see #38). Should we simply document that the state must never contain pure exceptions? Given the possibility of multiple checkpoints per file, it seems hard to recover from a partially-written checkpoint.

The text was updated successfully, but these errors were encountered:

stepcut · 2018-11-12T17:07:07Z

For what it's worth, I think some people consider 'multiple checkpoints per file' to be a misfeature. multiple events per file makes sense -- it saves a lot of overhead that would otherwise make acid-state way too slow.

But I am not sure there is any advantage of multiple checkpoint per file -- it is just something that can happen due to the current implementation. In fact, people will use a combination of createCheckpoint and createArchive to try to ensure that they do not get multiple checkpoints in the same file.

adamgundry · 2018-11-13T08:36:07Z

Right, I was wondering if it might be worth moving to one-checkpoint-per-file. I haven't thought about implementation / backwards compatibility, but it seems like it would make things simpler.

Perhaps we can ensure that an exception during checkpoint serialisation merely aborts the current checkpoint (perhaps leaving a half-written file on disk) and throws the exception from createCheckpoint? Though we'd somehow have to make sure that the partial checkpoint was ignored by createArchive...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental checkpoint serialisation #111

Incremental checkpoint serialisation #111

adamgundry commented Nov 12, 2018

stepcut commented Nov 12, 2018

adamgundry commented Nov 13, 2018

Incremental checkpoint serialisation #111

Incremental checkpoint serialisation #111

Comments

adamgundry commented Nov 12, 2018

stepcut commented Nov 12, 2018

adamgundry commented Nov 13, 2018