Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Minor_heap_max and Max_domains as OCAMLRUNPARAM options #795

Open
kayceesrk opened this issue Dec 16, 2021 · 10 comments
Open

Make Minor_heap_max and Max_domains as OCAMLRUNPARAM options #795

kayceesrk opened this issue Dec 16, 2021 · 10 comments
Labels
post-MVP for 5.00 Items that need to be done before 5.00 release

Comments

@kayceesrk
Copy link
Contributor

kayceesrk commented Dec 16, 2021

Currently, both of these are defined in runtime/caml/config.h, with Minor_heap_max defined as 2GB and Max_domains as 128. This makes Multicore OCaml programs use at the minimum 256GB of virtual address space. This causes problems for tools like AFL and Valgrind, which cannot seem to deal with mmap reserve and expects to commit that much memory, and hence run out of memory on Multicore OCaml.

My proposal is to make both Minor_heap_max and Max_domains as OCAMLRUNPARAM options (whose default values can remain the same as now). With that change, we will be able to use AFL and Valgrind without having to have a separate compiler switch where these macros are defined to have much smaller values.

AFL would use Max_domains as 1, and Minor_heap_max to a small default such as 2 MB (non-determinism with multiple domains will affect AFL stability). Similarly, for valgrind, one you use a smaller Minor_heap_max. I have confirmed that both AFL and valgrind work with Max_domains as 1 and Minor_heap_max as 2 MB.

@sabine
Copy link

sabine commented Jan 17, 2022

Update: I'm tentatively putting the allocation of sampled_gc_stats into caml_init_gc.

--

When I change Max_domains to a parameter passed by OCAMLRUNPARAM, I need to modify the static array sampled_gc_stats, as it can no longer be a fixed-length array when the number of domains is a runtime parameter.

Where's the correct place to put the sampled GC stats?

I could move it into domain_state, but the way sampled_gc_stats is used, we need to iterate over all the domains. There is caml_try_run_on_all_domains which could probably be used to iterate over all the domains... But that's a locking operation, and I guess that the reason why sampled_gc_stats is a static array in the first place is that we want to avoid locks here and that we do not care about being 100% accurate.

It seems like a another option is to turn sampled_gc_stats into a pointer and allocate the array on the heap, but I don't see the right place to perform the allocation.

@kayceesrk
Copy link
Contributor Author

@sabine Thanks. Is there a branch where we can follow along this work?

It seems like a another option is to turn sampled_gc_stats into a pointer and allocate the array on the heap, but I don't see the right place to perform the allocation.

I would think that this is the right approach. caml_init_gc ingc_ctrl.c would be a good place to have it; sampled_gc_stats are related to the GC.

@sabine
Copy link

sabine commented Jan 18, 2022

Here's the branch compare: https://github.com/ocaml/ocaml/compare/trunk...sabine:minor_heap_max_domains_OCAMLRUNPARAM_795?expand=1

I placed the allocations for the max_domain-length arrays from runtime/domain.c into the caml_init_domains function.

I don't understand yet, which method of allocating memory should be used here. I can see caml_map_mem being used already in init_domains. Do we have a breakdown of the different memory allocation functions and their semantics somewhere?

In st_posix.h is an array tick_thread_stop: https://github.com/sabine/ocaml/blob/aac0479bc04a193668f9275c50c9a5cad0614042/otherlibs/systhreads/st_posix.h#L35
and in st_posix.c, we have the array thread_table:
https://github.com/sabine/ocaml/blob/aac0479bc04a193668f9275c50c9a5cad0614042/otherlibs/systhreads/st_stubs.c#L107

Not sure where these should go. Considering it's domains-related, they could possibly be initialized during init_domains, but maybe there's a better place somewhere else?

@kayceesrk
Copy link
Contributor Author

Don't use caml_mem_map. That goes through mmap on Linux and is really only to be used for the allocation of the minor heaps for the domains. For allocating blocks of memory, where you would typically use malloc, use caml_stat_alloc_noexc from caml/memory.h. It works the same as malloc, but has the ability to optionally track malloced blocks in the runtime so that we can detect memory leaks. See OCAMLRUNPARAM option c: https://ocaml.org/manual/runtime.html and ocaml/ocaml#10865.

caml_init_domains is also fine for allocating these arrays.

@abbysmal
Copy link
Collaborator

Not sure where these should go. Considering it's domains-related, they could possibly be initialized during init_domains, but maybe there's a better place somewhere else?

The problem with things in systhreads is that these sits in otherlibs, which means that the library may not be linked at all time.
It also makes the setup job for these independent from the "core runtime" setup job.
I guess one could argue about having systhreads intrinsics sits within the runtime itself (and this way we may manage them in a more uniform way, at least for the internal data structures.).

However, maybe another solution would be to just use the systhreads entrypoints:
If you looks at st_stubs.c, you can see caml_thread_initialize_domain called on domain spawn (and at the start of a program, to setup up the first domain).
This could do the trick. One thing that would be required though would be a sister hook to this to also handle tearing down memory on domain shutdown.

@kayceesrk
Copy link
Contributor Author

I really don't want more hooks if possible. Callbacks are terrible in general and doubly so in C.

@abbysmal
Copy link
Collaborator

Then we can move the handling of systhreads data structures within the runtime itself (would mean no hook, would be arguably cleaner.), but it would mean we now have references to systhreads in the runtime itself, sort of defeating the idea of having it sit in otherlibs.
I'm also wondering if there is not alternative path where we could let the GC handle these memory segments in some ways, since we have an assertion that the initial domain thread is always the last thread exiting on any given domain.

@kayceesrk
Copy link
Contributor Author

kayceesrk commented Jan 18, 2022

Hooks may be the simplest solution considering the alternatives. However, I don't know when they are supposed to free the memory? If the main domain terminating frees the memory, there may still be other threads (on other domains) that try to access tick_thread_stop and thread_table, and crash? We don't free all_domains for the same reason, because it is not clear when to free it.

Related to ocaml/ocaml#10865 (comment). Linux atexit to free this memory will also not work: https://bugzilla.redhat.com/show_bug.cgi?id=790837. I would suggest going ahead without the free functions for now (so no additional hooks!), but add a comment to the new allocations about them not being freed.

@sabine
Copy link

sabine commented Jan 25, 2022

I added the remaining allocations behind a NULL-check, since it looks like caml_thread_initialize_domain is probably called more than once - clearly we only want to allocate the arrays with length caml_params->max_domains on the first call.

I left comments on all the allocations that link back to this issue and state that they're not freed.

The code compiles and running the tests gives me

Summary:
  3098 tests passed
   66 tests skipped
    0 tests failed
  167 tests not started (parent test skipped or failed)
    0 unexpected errors
  3331 tests considered

How do I go about testing further? E.g. do we already have some code on which to test Valgrind or AFL?

Edit: do we need to use caml_stat_alloc_noexc or caml_stat_alloc?
So far, I've only been using caml_stat_alloc_noexc, but considering how some of these allocations must succeed in order for the program to run, it seems like caml_stat_alloc might be the appropriate allocation function?

@kayceesrk
Copy link
Contributor Author

Thanks. caml_stat_alloc raises an OCaml exception on failure, whereas the _noexc version returns a NULL value on failure. Some of these allocations occur before the program even gets into OCaml. So raising OCaml exceptions will not be the right thing to do. You should change your code such that it checks for NULL, and when it fails to allocate memory, you may do caml_fatal_error("not enough memory to startup").

Another point that we should discuss (upstream) is what the default values for Max_minor_heap_def and Max_domains_def should be. How much virtual memory do we want the default OCaml executable to consume? Now it consumes 256GB of virtual memory. My opinion is that the Max_minor_heap_def should be the same as the default minor heap size (2mb?), and possibly Max_domains_def should be 16?

If the program creates more domains than caml_params->max_domains, the error message here may point users at the Max_domains OCAMLRUNPARAM parameter.

For AFL, now there is a test that checks whether the program can run. @jmid introduced -m none option in the manual. Perhaps we may extend the manual section to say something about the new options. This depends on the default choices for Max_minor_heap_def and Max_domains_def. If the sizes are small enough (< 50mb default for AFL?), then we don't need -m none.

For Valgrind, confirm if $ Max_domains=1 Max_minor_heap=256k valgrind my_ocaml_prog.exe does not error out on start up. We may have to include this in the manual somewhere if the chosen defaults are large enough to cause valgrind failure.

It would be best to discuss this on ocaml/ocaml. Can you make a PR to OCaml with the changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
post-MVP for 5.00 Items that need to be done before 5.00 release
Projects
None yet
Development

No branches or pull requests

3 participants