Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: asynchronous gc #1950

Merged
merged 2 commits into from Jul 5, 2022
Merged

irmin-pack: asynchronous gc #1950

merged 2 commits into from Jul 5, 2022

Conversation

icristescu
Copy link
Contributor

@icristescu icristescu commented Jul 4, 2022

Lets use a dedicated branch instead of #1944.

Blocking tasks:

  • Add a hashtable in Gc.iter to not revisit objects;
  • Run Irmin-bench a lot to search for bugs;

Not blocking:

@icristescu icristescu mentioned this pull request Jul 4, 2022
@samoht
Copy link
Member

samoht commented Jul 4, 2022

Run Irmin-bench a lot to search for bugs;

Could you point out which command to run exactly?

src/irmin-pack/unix/gc.ml Outdated Show resolved Hide resolved
@samoht
Copy link
Member

samoht commented Jul 4, 2022

There's definitely an fd leak while running the tests:

  23 XXX _build/test-upgrade/store.0.suffix 1:open
  23 XXX _build/test-upgrade/store.0.suffix 2:close
  29 XXX _build/test-upgrade/store.1.mapping 1:open
  16 XXX _build/test-upgrade/store.1.mapping 2:close
  16 XXX _build/test-upgrade/store.1.out 1:open
  16 XXX _build/test-upgrade/store.1.out 2:close
  29 XXX _build/test-upgrade/store.1.prefix 1:open
  16 XXX _build/test-upgrade/store.1.prefix 2:close
   8 XXX _build/test-upgrade/store.1.reachable 1:open
   8 XXX _build/test-upgrade/store.1.reachable 2:close
  29 XXX _build/test-upgrade/store.1.suffix 1:open
  27 XXX _build/test-upgrade/store.1.suffix 2:close
  24 XXX _build/test-upgrade/store.control 1:open
  22 XXX _build/test-upgrade/store.control 2:close
  24 XXX _build/test-upgrade/store.dict 1:open
  22 XXX _build/test-upgrade/store.dict 2:close

almost half of the mapping and prefix files are not closed correctly and a couple of prefix/dict files too.

@icristescu
Copy link
Contributor Author

Run Irmin-bench a lot to search for bugs;

Could you point out which command to run exactly?

dune exec -- ./bench/irmin-pack/tree.exe --mode trace --keep-stat-trace data4_10310commits.repr --gc-every=200
where we can change the --gc-every to other values and use a larger trace, to try to uncover bugs.

I have tried so far with --gc-every = 1000, 200, 2 and no bugs.

Some (very preliminary) benchmarks :

 |                          |   no_gc   |    gc_1000     |     gc_200
 | --                       | --        | --             | --
 | -- main metrics --       |           |                |
 | CPU time elapsed         |     0m24s |     0m33s 138% |     0m37s 150%
 | Wall time elapsed        |     0m24s |     0m34s 138% |     0m37s 152%
 | TZ-transactions per sec  |   381.165 |   276.184  72% |   253.386  66%
 | TZ-operations per sec    |  4282.292 |  3102.863  72% |  2846.731  66%
 | Context.set per sec      | 30241.979 | 21912.729  72% | 20103.902  66%
 | tail latency (1)         |   0.698 s |   0.716 s 103% |   0.731 s 105%
 |                          |           |                |
 | -- resource usage --     |           |                |
 | disk IO (total)          |           |                |
 |   IOPS (op/sec)          |   102_094 |   530_371 519% |   493_874 484%
 |   throughput (bytes/sec) |   8.796 M |  29.714 M 338% |  27.857 M 317%
 |   total (bytes)          |   0.214 G |   0.995 G 466% |   1.017 G 476%
 | disk IO (read)           |           |                |
 |   IOPS (op/sec)          |    99_924 |   528_798 529% |   492_430 493%
 |   throughput (bytes/sec) |   4.579 M |  26.627 M 582% |  24.966 M 545%
 |   total (bytes)          |   0.111 G |   0.892 G 803% |   0.912 G 820%
 | disk IO (write)          |           |                |
 |   IOPS (op/sec)          |     2_170 |     1_573  72% |     1_444  67%
 |   throughput (bytes/sec) |   4.217 M |   3.087 M  73% |   2.891 M  69%
 |   total (bytes)          |   0.102 G |   0.103 G 101% |   0.106 G 103%
 |                          |           |                |
 | max memory usage         |   0.975 G |   0.848 G  87% |   0.737 G  76%
 | mean CPU usage           |      100% |      100%      |       99%

and

❯ du -h no_gc/root
520K	no_gc/root/index
 99M	no_gc/root
❯ du -h gc_1000/root
520K	gc_1000/root/index
 70M	gc_1000/root
❯ du -h gc_200/root
520K	gc_200/root/index
 70M	gc_200/root

@samoht
Copy link
Member

samoht commented Jul 4, 2022

I've pushed a commit to fix the fd leak - now all the tests seem to pass 🎉

@samoht
Copy link
Member

samoht commented Jul 4, 2022

So the benchmarks show that the GC is IO expensive and so makes things go slower if used too often. That's not very surprising but that should be clearly documented (as it is somehow counter-intuitive); would be also nice to provide a good default.

@samoht
Copy link
Member

samoht commented Jul 4, 2022

So I got an interesting trace with --gc-every=1

❯ dune exec -- ./bench/irmin-pack/tree.exe --mode trace --keep-stat-trace ~/Downloads/data4_10310commits.repr --gc-every=1

+000us  application Will check commit hashes against reference.
+013us  application Starting gc on commit idx 1 with key {"Direct":["CoV8t66qHkNAfDNB7aUUqri5nqtzC72bE9Hx28u6q1C3WzMCHbkm",12801997,68]} while latest commit has idx 5001 with key {"Direct":["CoUmuqvh5jnmq83RKkwEDSJSx2LHrtGg6WsgjGdWJ7hAe9BW7iKr",47367370,103]}
+013us  application Starting gc on commit idx 2 with key {"Direct":["CoVp7R6eweKbe1TEPRYiksUfXz3ckhhgBBPrd6WK1FGHAwQZpNtw",12803169,95]} while latest commit has idx 5002 with key {"Direct":["CoUhj4y2cPh6PiU9vEYUxkhkJ22m4JFbBzvSpXZD4cLXvsQcDm12",47374445,103]}
Replaying trace  5003/13315 commit 00:13 (ETA: --:--) []  37%
+014us  application Starting gc on commit idx 3 with key {"Direct":["CoWF7GfRYL3PamMjG1JShGU5C2fKkZaEu9MJjnwW2Nz5E5dXitA4",12804404,95]} while latest commit has idx 5003 with key {"Direct":["CoWCaKQtVzpyJ6jar4VPt35F7nAMkrta93fToUpSvVNHE3u6yy5W",47381934,103]}
+015us  application Starting gc on commit idx 4 with key {"Direct":["CoVyzSKmKK9LoeupCxiZrNTZ8iqcHHXhbjdb6Chj2Y5sGoYxozAf",12808492,96]} while latest commit has idx 5004 with key {"Direct":["CoV9PXhG8JXWVrwoCSGRhp5LEMX9cLHZ7pA4AQGRjuyJKMkBqnVD",47386744,103]}
+016us  application Starting gc on commit idx 5 with key {"Direct":["CoUrnTvJaYccfEfnSE4YoCvmHYV7eRA9BnAhrFmd5Nbg7WPvkNDf",12809610,96]} while latest commit has idx 5005 with key {"Direct":["CoWas7LbKQM7KLxBdZpQj9g1D5MsnGkvpHGVQW37ejWvBZpB9gRR",47392159,103]}
+016us  application Starting gc on commit idx 6 with key {"Direct":["CoWTgRNMUc3aGc8brcSNXHZS1Kk3ttqdycETeYXu2BKBbaTyGSJL",12809706,97]} while latest commit has idx 5006 with key {"Direct":["CoVr7RUrdKUERakb28H9pHu8MKW2pwF6CWikSmgbGYBt8amrQWZS",47397574,103]}
+017us  application Starting gc on commit idx 7 with key {"Direct":["CoVjnd3Ybpr7AKw6EaR6RD5L8Wo8anTJH6AmvnavfxsnGbDM4Cfi",12809803,97]} while latest commit has idx 5007 with key {"Direct":["CoWMiPaBi37shdSfSxfEzEJzUEWnZ4aGENFeT1GhzAGUGq99gqgz",47402989,103]}
+018us [ERROR] irmin-pack.unix GC: Close prefix failed: "Pending_flush"
+018us  application Stat trace kept at /Users/thomas/git/irmin/_artefacts/d6f899d3-21a9-42c7-89a0-73902548ed2e/stat_trace.repr
tree: internal error, uncaught exception:
      Pack_error: {"Invalid_prefix_read":"entry (off=12808588, len=1118) is supposed to be contained in chunk (poff=12802390,len=1118) and starting at 96 but is larger than it can be\n contained in chunk"}
      Raised at Irmin_pack_unix__Errors.Base.raise_error in file "src/irmin-pack/unix/errors.ml", line 99, characters 26-46
      Called from Irmin_pack_unix__Ext.Maker.Make.X.Repo.Gc.start in file "src/irmin-pack/unix/ext.ml", line 361, characters 18-106
      Called from Irmin_pack_unix__Ext.Maker.Make.X.Repo.Gc.start_or_wait in file "src/irmin-pack/unix/ext.ml", line 565, characters 26-52
      Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.c in file "src/core/lwt.ml", line 1860, characters 23-26
      Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3068, characters 20-29
      Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 31, characters 10-20
      Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 118, characters 8-13
      Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 124, characters 4-13
      Called from Dune__exe__Tree.main in file "bench/irmin-pack/tree.ml", line 390, characters 4-677
      Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
      Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 34, characters 37-44
+018us  application Stat trace kept at /Users/thomas/git/irmin/_artefacts/d6f899d3-21a9-42c7-89a0-73902548ed2e/stat_trace.repr
tree: internal error, uncaught exception:
      Invalid_argument("index out of bounds")
      Raised by primitive operation at Repr__Type_json.Decode.case1.(fun).aux in file "src/repr/type_json.ml", line 401, characters 16-28
      Called from Repr__Type_json.Decode.case1.(fun) in file "src/repr/type_json.ml", line 407, characters 8-13
      Called from Repr__Type_json.Decode.case1.(fun).aux in file "src/repr/type_json.ml", line 402, characters 55-67
      Called from Repr__Type_json.Decode.case1.(fun) in file "src/repr/type_json.ml", line 407, characters 8-13
      Called from Irmin_pack_unix__Io_errors.Make.of_json_string in file "src/irmin-pack/unix/io_errors.ml", line 74, characters 10-56
      Called from Irmin_pack_unix__File_manager.Make.read_gc_output in file "src/irmin-pack/unix/file_manager.ml", line 705, characters 8-29
      Called from Irmin_pack_unix__Ext.Maker.Make.X.Repo.Gc.finalise.go in file "src/irmin-pack/unix/ext.ml", line 510, characters 20-103
      Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.c in file "src/core/lwt.ml", line 1860, characters 23-26
      Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3068, characters 20-29
      Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 31, characters 10-20
      Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 118, characters 8-13
      Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 124, characters 4-13
      Called from Dune__exe__Tree.main in file "bench/irmin-pack/tree.ml", line 390, characters 4-677
      Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
      Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 34, characters 37-44
+018us [WARNING] irmin-pack.unix Killing gc process with pid 72432 failed with error (No such process, kill, )

@icristescu
Copy link
Contributor Author

So I got an interesting trace with --gc-every=1

❯ dune exec -- ./bench/irmin-pack/tree.exe --mode trace --keep-stat-trace ~/Downloads/data4_10310commits.repr --gc-every=1

This is an interesting bug.
The dispatcher allows for reads either in the prefix of in the suffix, it does not support for a read that starts in the prefix and ends in suffix. It should never be the case that we do such reads - we usually only need to read an object, which is either entirely in the prefix or in the suffix.

There is one exception, in the mapping.iter, reads are grouped together.

In this example, the parent of the commit is read from the suffix file:

XXX register_parent off 12_809_610 len 96  (*suffix starts at offset 12_809_610*)
(* following reads are in the prefix *)
XXX  register_obj off 12_809_544 len 66 (*prefix ends after this read *)
...
XXX register_obj off 12_808_588 len 35

but then, these are all grouped into one read from 12808588 of length 1118, and here the read fails, because it cannot read the entire chunk from the prefix only.

@codecov-commenter
Copy link

codecov-commenter commented Jul 5, 2022

Codecov Report

Merging #1950 (8cdf180) into main (c8e5327) will decrease coverage by 0.34%.
The diff coverage is 49.53%.

❗ Current head 8cdf180 differs from pull request most recent head d60a1fd. Consider uploading reports for the commit d60a1fd to get more accurate results

@@            Coverage Diff             @@
##             main    #1950      +/-   ##
==========================================
- Coverage   64.30%   63.95%   -0.35%     
==========================================
  Files         128      130       +2     
  Lines       15132    15479     +347     
==========================================
+ Hits         9731     9900     +169     
- Misses       5401     5579     +178     
Impacted Files Coverage Δ
src/irmin-pack/unix/errors.ml 30.00% <0.00%> (ø)
src/irmin-pack/unix/import.ml 88.88% <ø> (+11.11%) ⬆️
src/irmin-pack/unix/inode.ml 33.33% <ø> (ø)
src/irmin-pack/unix/io_legacy.ml 71.11% <ø> (ø)
src/irmin-pack/unix/gc.ml 9.75% <6.52%> (+0.04%) ⬆️
src/irmin-pack/unix/mapping_file.ml 16.03% <16.03%> (ø)
src/irmin-pack/unix/io_errors.ml 35.00% <33.33%> (-1.37%) ⬇️
src/irmin-pack/layout.ml 73.33% <50.00%> (-8.49%) ⬇️
src/irmin-pack/unix/io.ml 58.99% <55.55%> (-1.65%) ⬇️
src/irmin-pack/unix/ext.ml 72.62% <68.49%> (+0.76%) ⬆️
... and 15 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@samoht
Copy link
Member

samoht commented Jul 5, 2022

Rebased - the tests pass locally and the following command seems also ok with various variations of X (1, 2, 200):

❯ dune exec -- ./bench/irmin-pack/tree.exe --mode trace --keep-stat-trace ~/Downloads/data4_10310commits.repr --gc-every=X

@samoht
Copy link
Member

samoht commented Jul 5, 2022

I've also opened #1960 to keep track of the CHANGELOG placeholder.

Ngoguey42 and others added 2 commits July 5, 2022 19:28
Co-authored-by: Nicolas Goguey <ngoguey42@free.fr>
Co-authored-by: Ioana Cristescu <ioana@tarides.com>
Co-authored-by: Tom Ridge <tom.j.ridge@googlemail.com>
@samoht samoht merged commit 46c79a8 into main Jul 5, 2022
@samoht samoht changed the title Good gc Asynchronous gc Jul 6, 2022
@metanivek metanivek changed the title Asynchronous gc irmin-pack: asynchronous gc Aug 4, 2022
@icristescu icristescu mentioned this pull request Oct 3, 2022
34 tasks
@metanivek metanivek deleted the good-gc branch March 7, 2023 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants