Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: modify auto flush callback behavior #2088

Merged
merged 1 commit into from Sep 19, 2022

Conversation

metanivek
Copy link
Member

@metanivek metanivek commented Sep 15, 2022

Instead of expecting the caller to flush, it now sends the results of a flush to the callback. Part of #2039

This PR updates the behavior of flushing for append only files by letting an initializer to choose between automatic flushing and manually controlled flushing.

@metanivek metanivek added the no-changelog-needed No changelog is needed here label Sep 15, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2022

Codecov Report

Merging #2088 (bc78499) into main (b72a868) will increase coverage by 0.03%.
The diff coverage is 68.18%.

❗ Current head bc78499 differs from pull request most recent head fef11e7. Consider uploading reports for the commit fef11e7 to get more accurate results

@@            Coverage Diff             @@
##             main    #2088      +/-   ##
==========================================
+ Coverage   64.73%   64.76%   +0.03%     
==========================================
  Files         131      132       +1     
  Lines       15658    15658              
==========================================
+ Hits        10136    10141       +5     
+ Misses       5522     5517       -5     
Impacted Files Coverage Δ
src/irmin-pack/unix/gc.ml 40.54% <0.00%> (+0.54%) ⬆️
src/irmin-pack/unix/import.ml 88.88% <ø> (ø)
src/irmin-pack/unix/stats.ml 58.20% <0.00%> (+0.87%) ⬆️
src/irmin-pack/unix/stats_intf.ml 50.00% <50.00%> (ø)
src/irmin-pack/unix/append_only_file.ml 85.00% <87.50%> (-0.97%) ⬇️
src/irmin-pack/unix/file_manager.ml 83.61% <100.00%> (ø)
src/irmin-pack/unix/mapping_file.ml 94.24% <100.00%> (+1.43%) ⬆️
src/irmin-fs/unix/irmin_fs_unix.ml 68.38% <0.00%> (+0.64%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Ngoguey42
Copy link
Contributor

Your PR changes the expected behavior which is that the dict should be flushed before the suffix. https://github.com/mirage/irmin/blob/main/src/irmin-pack/unix/file_manager.ml#L179

All these file synchronisation procedures are very sensitive. This is the reason why I've gathered them in the file_manager so that the reader of the code doesn't have to guess which callback mutates what.

A way forward could be to rename Append_only's auto_flush_callback to something like flush_procedure or global_flush or write_cache_is_full or demand_a_flush.

@metanivek
Copy link
Member Author

@Ngoguey42 ah, thanks for that insight. It gave me an idea for how to refine the flushing behavior to (I think) clarify intentions of different ways append only files are used.

Comment on lines 32 to 42
type flush_callback =
[ `Auto of (unit, Io.write_error) result -> unit | `Manual of t -> unit ]
(** [flush_callback] is the callback when a flush threshold has been reached.

- Use [`Auto] callbacks if you only care about the result of a flush. The
buffer is flushed automatically before calling the callback with the
result.
- Use [`Manual] callbacks if you need more control over when the flush
occurs, for instance if you want to coordinate with flushing other
files. The buffer is not automatically flushed before calling the
callback but flush is expected to be called by the callback. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a callback for Auto? Couldn't the error just be raised by append_only.ml?

Copy link
Member Author

@metanivek metanivek Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original motivation for this PR was my previous comment to not require flush to be called in "auto" flush callbacks. The naming doesn't make sense to me but perhaps it is just a historical remnant (looking at Io_legacy and config) that we deal with. I guess we can keep everything manually automatic (automatically manual?).

This is an attempt to clarify the two ways the flush threshold callback (maybe needs a better name) are used in the code, but happy to close this PR and keep code as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that solving this conundrum requires reserving the word auto to the auto/explicit distinction.

What about this?

type auto_flush_procedure = [ `Internal | `External of t -> unit ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

type auto_flush_procedure = [ `Default | `Custom of t -> unit ]

Copy link
Member Author

@metanivek metanivek Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I lean towards Internal/External. Internal still needs a function associated if we want to raise exceptions on error.

The auto prefix here is still weird/confusing to me, but I think I understand more where you are coming from now. It is more of a threshold_flush than an auto_flush in my mind but threshold_flush_procedure is quite the name 🙃.

(as an aside, and of no real importance, the typical english distinction is auto/manual or implicit/explicit)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a misunderstanding somewhere 🤔

On main, auto flush refers to flushes automatically triggered by the file stack code because a write buffer is full while flush (no prefix) is a call to File_manager.flush at the end of batch.

Also, wouldn't it be possible for append_only to call Errs.raise_if_error in the Auto aka Internal aka Default case? This was there is no need to pass a callback to raise the error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it completely auto/internal/default/magic if we add Errs to the functor.

Re: auto, this seems more of a concern of file manager than append only but I'm fine to leave it. ☑️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another pass at this based on our discussion.

src/irmin-pack/unix/file_manager.ml Outdated Show resolved Hide resolved
Add support for two types of flush behavior when the flush threshold is
reached: automatic internal flushes and manually controlled external
flushes.
Copy link
Contributor

@Ngoguey42 Ngoguey42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@metanivek metanivek merged commit 368f671 into mirage:main Sep 19, 2022
@metanivek metanivek deleted the refactor-auto-flush branch September 19, 2022 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog-needed No changelog is needed here
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants