Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: max memory usage 3x, main vs 3.3.1 #1988

Closed
tomjridge opened this issue Jul 19, 2022 · 9 comments
Closed

irmin-pack: max memory usage 3x, main vs 3.3.1 #1988

tomjridge opened this issue Jul 19, 2022 · 9 comments
Assignees
Labels
feature/layered-store Related to the Layered Store

Comments

@tomjridge
Copy link
Contributor

This issue is based on #1959, but focusing solely on the increased memory usage. From the metrics listed in that issue:

 |                          |   3.2-a   |     3.2-b      |    3.3.1-a     |    3.3.1-b     |   good-gc-a    |   good-gc-b
 | --                       | --        | --             | --             | --             | --             | --
 | -- main metrics --       |           |                |                |                |                | 
 | CPU time elapsed         |   164m49s |   165m36s 100% |   162m50s  99% |   163m50s  99% |   201m20s 122% |   200m53s 122%
 | Wall time elapsed        |   227m49s |   228m19s 100% |   220m18s  97% |   221m17s  97% |   205m50s  90% |   204m02s  90%
                          |           |                |                |                |                | 
 | max memory usage (bytes) |   0.402 G |   0.395 G  98% |   0.441 G 110% |   0.443 G 110% |   1.413 G 352% |   1.457 G 363%
 | mean CPU usage           |       72% |       73%      |       74%      |       74%      |       98%      |       98%     

For good-gc, the "max memory usage (bytes)" is increased roughly 3x compared to 3.3.1. We should identify the cause.

@tomjridge tomjridge self-assigned this Jul 19, 2022
@tomjridge tomjridge added the feature/layered-store Related to the Layered Store label Jul 19, 2022
@tomjridge
Copy link
Contributor Author

Some initial candidates for increase in max memory usage

  1. Size of the sparse map file, which is held in memory (these benchmarks likely use gap_tolerance=0)
  2. (? not sure about this) When calculating reachability, a hashtable is used to avoid revisiting offsets; perhaps this is getting very large?

@tomjridge
Copy link
Contributor Author

A comparison using tree.exe benchmark, comparing main to 3.3.1, shows minor memory usage difference:

    |                          |   main   |      331
    | --                       | --       | --
    | -- main metrics --       |          | 
    | max memory usage         |  1.335 G |   1.518 G 114%
    | mean CPU usage           |     100% |      100%     

main-vs-3.3.1.txt

@tomjridge
Copy link
Contributor Author

The size of the mapping file, at least with this particular test run on main, is only 1.7MB; so this is not a source of significant memory usage.

@tomjridge
Copy link
Contributor Author

tomjridge commented Jul 19, 2022

The following gives the machine setups for the benchmarks from #1959. good-gc runs were using ocaml 4.12.1 whereas the others used 4.14. Also gc.space_overhead for good-gc was 80, whereas it was 120 for the other machines. So we can't just compare these stats directly.

 |                          | 3.2-a                              | 3.2-b                              | 3.3.1-a                               | 3.3.1-b                               | good-gc-a                 | good-gc-b
 | --                       | --                                 | --                                 | --                                    | --                                    | --                        | --
 | hostname                 | tezos-action-trace-recording-12-05 | tezos-action-trace-recording-12-05 | tezos-gc-action-trace-recording-12-05 | tezos-gc-action-trace-recording-12-05 | bench-gc                  | bench-gc
 | os type                  | Unix                               | <same>                             | <same>                                | <same>                                | <same>                    | <same>
 | big endian               | false                              | <same>                             | <same>                                | <same>                                | <same>                    | <same>
 | runtime params (no gc)   | b=1,H=0,p=0,t=0,v=0,w=1,W=0        | <same>                             | <same>                                | <same>                                | <same>                    | <same>
 | backend type             | native                             | <same>                             | <same>                                | <same>                                | <same>                    | <same>
 | ocaml version            | 4.14.0                             | 4.14.0                             | 4.14.0                                | 4.14.0                                | 4.12.1                    | 4.12.1
 | word size                | 64 bits                            | <same>                             | <same>                                | <same>                                | <same>                    | <same>
 | Start Time               | 2022/06/16 13:27:55 (GMT)          | 2022/06/16 19:47:03 (GMT)          | 2022/06/16 14:41:02 (GMT)             | 2022/06/16 20:32:41 (GMT)             | 2022/07/04 17:37:23 (GMT) | 2022/07/05 07:42:00 (GMT)
 | Store Type               | pack                               | <same>                             | <same>                                | <same>                                | <same>                    | <same>
 | gc.minor_heap_size       | 262144                             | <same>                             | <same>                                | <same>                                | <same>                    | <same>
 | gc.major_heap_increment  | 15                                 | <same>                             | <same>                                | <same>                                | <same>                    | <same>
 | gc.space_overhead        | 120                                | 120                                | 120                                   | 120                                   | 80                        | 80

The OCaml manual has this for gc.space_overhead: The major GC speed is computed from this parameter. This is the memory that will be "wasted" because the GC does not immediately collect unreachable blocks. It is expressed as a percentage of the memory used for live data. The GC will work more (use more CPU time and collect blocks more eagerly) if space_overhead is smaller. Default: 120

So if we just look at the number, it seems that good-gc should have a more aggressive GC, and so should ideally use less memory.

However, in OCaml 4.13.0, GC was changed to best-fit, and various other GC improvements were made. So, again, we can't just compare good-gc directly to the others, when considering max memory usage.

@tomjridge
Copy link
Contributor Author

Another comparison: tree benchmark, with gc, irmin main, but ocaml 4.12 vs 4.14. Max memory usage was 1.335G in 4.12, 1.212G in 4.14 (so not much difference).

Aside: Wall time was 13m28s vs 11m01s, so 4.14 seems about 20% quicker on this benchmark.

main412-main414.txt

@tomjridge
Copy link
Contributor Author

At this point, it seems that the tree benchmarks do not exhibit 3x memory usage (at least in the runs detailed above). So investigation should continue with the lib_context benchmarks instead.

@icristescu
Copy link
Contributor

Something else we should be careful at, is that I think the max memory usage reported by all our benchmarks are for the main process. I started a bench which record the maxrss (Mb) for the Gc child process at the beginning and at end of Gc.run:

 begin 131 end 1040 (gc 1)
 begin 235 end 1791
 begin 344 end 1945
 begin 399 end 2166
 begin 545 end 2130
 begin 619 end 2434
 begin 627 end 2442 (gc 7)

the begin value corresponds approx to the maxrss of the parent process.

@tomjridge
Copy link
Contributor Author

Are you saying that the main process uses 133MB maxrss at the point the child is forked, and that the child process then gets to 2442MB? The child process is supposed to use almost no memory at all (this was part of the design requirements, and was satisfied I believe in the previous version of the code). Do you think it is possible the new "reachable" calculation has changed the memory usage? (I think there is a hashtable there that perhaps grows very large?)

@metanivek
Copy link
Member

Memory in the main process is now in a good spot. See #1959 (comment)

Tracking GC stats for benchmarking here #2046

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/layered-store Related to the Layered Store
Projects
None yet
Development

No branches or pull requests

3 participants