Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: Use temp control files upon writing #2206

Merged
merged 1 commit into from Mar 20, 2023

Conversation

clecat
Copy link
Contributor

@clecat clecat commented Mar 1, 2023

This PR changes the writing of control files: They were currently updated while making the assumption that the file itself being smaller than a page, the writing would be atomic. However it was deemed dubious that we could use such an assumption.
This is why this PR:

  • Adds the usage of temporary control file when writing a new payload (path of the cf + .tmp).
  • Moves the temporary file in stead of the old control file.
  • Then updates the Io file stored in the Control_file.t with the new
  • On the RO instances side, it simply re-opens a new Io file, as the old file descriptor might point to the inode of a file replaced by the RW instance.
  • Updates the errors of the related functions.

This leaves however one question in the case of RO instances:
Because we open a new Io each time we want to reload the payload, storing said Io has currently purpose. Maybe we could do something about that (separating RO & RW control files ? Using an option ? Remove the reload function ?)

@clecat clecat requested review from metanivek and art-w March 1, 2023 17:32
@clecat clecat added the no-changelog-needed No changelog is needed here label Mar 1, 2023
@codecov-commenter
Copy link

codecov-commenter commented Mar 1, 2023

Codecov Report

Merging #2206 (00b7f61) into main (eddf517) will decrease coverage by 0.01%.
The diff coverage is 85.71%.

❗ Current head 00b7f61 differs from pull request most recent head 9a33ae0. Consider uploading reports for the commit 9a33ae0 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #2206      +/-   ##
==========================================
- Coverage   68.04%   68.04%   -0.01%     
==========================================
  Files         135      135              
  Lines       16378    16390      +12     
==========================================
+ Hits        11144    11152       +8     
- Misses       5234     5238       +4     
Impacted Files Coverage Δ
src/irmin-pack/unix/control_file_intf.ml 82.14% <ø> (ø)
src/irmin-pack/unix/file_manager.ml 84.43% <50.00%> (ø)
src/irmin-pack/layout.ml 81.35% <60.00%> (-2.29%) ⬇️
src/irmin-pack/unix/pack_store.ml 79.12% <71.42%> (+0.10%) ⬆️
src/irmin-pack/unix/dispatcher.ml 57.69% <75.00%> (-2.31%) ⬇️
src/irmin-pack/unix/utils.ml 89.65% <92.85%> (-3.21%) ⬇️
src/irmin-pack/unix/control_file.ml 88.46% <100.00%> (+0.65%) ⬆️
src/irmin-pack/unix/lower.ml 70.00% <100.00%> (+0.30%) ⬆️
src/irmin-pack/unix/sparse_file.ml 76.85% <100.00%> (-1.86%) ⬇️
src/irmin-pack/unix/traverse_pack_file.ml 68.82% <100.00%> (ø)
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@art-w
Copy link
Contributor

art-w commented Mar 1, 2023

Thanks for looking into this! current-bench doesn't report anything alarming but it would be nice to also run the replay trace, I'm curious if the posix way has any impact on performances in the long run :)

@clecat clecat force-pushed the temp-cf branch 3 times, most recently from 048cd9c to 9350821 Compare March 2, 2023 12:12
Copy link
Member

@metanivek metanivek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this! Left some comments for consideration. I agree with @art-w that we should run a bigger benchmark to test these changes before merging.

src/irmin-pack/layout.ml Outdated Show resolved Hide resolved
src/irmin-pack/unix/control_file.ml Show resolved Hide resolved
src/irmin-pack/unix/control_file.ml Show resolved Hide resolved
@clecat
Copy link
Contributor Author

clecat commented Mar 14, 2023

Finally ran the replay trace thx to the help of @art-w & @adatario, here are the results:

</details>

 |                          | baseline  |       pr
 | --                       | --        | --
 | -- main metrics --       |           | 
 | CPU time elapsed         |   103m42s |   103m54s 100%
 | Wall time elapsed        |   103m43s |   103m55s 100%
 | TZ-transactions per sec  |   739.362 |   737.883 100%
 | TZ-operations per sec    |  4829.344 |  4819.686 100%
 | Context.add per sec      | 14179.475 | 14151.116 100%
 | tail latency (1)         |   1.416 s |   1.227 s  87%
 |                          |           | 
 | -- resource usage --     |           | 
 | disk IO (total)          |           | 
 |   IOPS (op/sec)          |   181_302 |   180_946 100%
 |   throughput (bytes/sec) |  18.099 M |  18.063 M 100%
 |   total (bytes)          | 112.608 G | 112.610 G 100%
 | disk IO (read)           |           | 
 |   IOPS (op/sec)          |   181_204 |   180_849 100%
 |   throughput (bytes/sec) |  10.648 M |  10.627 M 100%
 |   total (bytes)          |  66.249 G |  66.251 G 100%
 | disk IO (write)          |           | 
 |   IOPS (op/sec)          |        97 |        97 100%
 |   throughput (bytes/sec) |   7.451 M |   7.436 M 100%
 |   total (bytes)          |  46.359 G |  46.359 G 100%
 |                          |           | 
 | max memory usage (bytes) |   0.374 G |   0.391 G 104%
 | mean CPU usage           |      100% |      100%     ```

@adatario
Copy link
Contributor

I think there was a bug in the replay that caused a wrong version of Irmin to be used. This was a mistake on my side.

The stats above should not be considered accurate. I'm re-running the benchmarks.

@adatario
Copy link
Contributor

Here's the stats.txt for another benchmark run where the correct version (this PR commit) was used:

baseline pr
hostname posada
os type Unix
big endian false
runtime params (no gc) b=1,H=0,p=0,t=0,v=0,w=1,W=0
backend type native
ocaml version 4.14.0
word size 64 bits
Start Time 2023/03/06 10:49:35 (GMT) 2023/03/15 09:30:27 (GMT)
Store Type pack
irmin-pack GC count 0
gc.minor_heap_size 262144
gc.major_heap_increment 15
gc.space_overhead 120
gc.verbose 0
gc.max_overhead 500
gc.stack_limit 0
gc.allocation_policy 2
gc.window_size 1
gc.custom_major_ratio 44
gc.custom_minor_ratio 100
gc.custom_minor_max_size 8192
baseline pr
-- main metrics --
CPU time elapsed 106m05s 108m17s 102%
Wall time elapsed 106m05s 108m21s 102%
TZ-transactions per sec 722.768 708.047 98%
TZ-operations per sec 4720.957 4624.802 98%
Context.add per sec 13861.239 13578.917 98%
tail latency (1) 1.368 s 1.153 s 84%
-- resource usage --
disk IO (total)
IOPS (op/sec) 177_241 173_625 98%
throughput (bytes/sec) 17.693 M 17.332 M 98%
total (bytes) 112.611 G 112.609 G 100%
disk IO (read)
IOPS (op/sec) 177_146 173_531 98%
throughput (bytes/sec) 10.409 M 10.197 M 98%
total (bytes) 66.252 G 66.250 G 100%
disk IO (write)
IOPS (op/sec) 95 93 98%
throughput (bytes/sec) 7.284 M 7.135 M 98%
total (bytes) 46.359 G 46.359 G 100%
max memory usage (bytes) 0.386 G 0.388 G 100%
mean CPU usage 100% 100%

@clecat
Copy link
Contributor Author

clecat commented Mar 15, 2023

Nice, thx a lot, it should be good to merge now

@metanivek
Copy link
Member

@adatario thanks for running the benchmark! Do you have the full stats.txt? I'd like to see some of the more detailed stats around commit times.

src/irmin-pack/layout.ml Outdated Show resolved Hide resolved
@adatario
Copy link
Contributor

@adatario thanks for running the benchmark! Do you have the full stats.txt? I'd like to see some of the more detailed stats around commit times.

Yes, they are in the irmin-tezos-benchmarking repo: https://github.com/tarides/irmin-tezos-benchmarking/tree/main/benchmarks/2023-03-temp-control-file (currently not public)

@metanivek
Copy link
Member

Thanks @adatario!

These are the lines I wanted to look at:

 |                                                   |          |        cumu         | share |      min       |       max       |      avg

...

 | commit duration (s)                               | baseline |       36min25s      |   34% |  8.123 ms      |   1.368  s      | 21.855 ms     
 |                                                   | pr       |       36min57s 101% |   34% |  8.721 ms 107% |   1.153  s  84% | 22.176 ms 101%

There is a slight overhead on average for commit (we write the control file on every commit), but it is very small. I think this change is good to go! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog-needed No changelog is needed here
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants