Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More types of memory measurement (process heap size, retained memory) #262

Open
PragTob opened this issue Jan 14, 2019 · 2 comments
Open

Comments

@PragTob
Copy link
Member

PragTob commented Jan 14, 2019

In a recent blog post using benchee by @alvises I wondered why the author didn't use benchee for memory measurements so I reached out and it happens that what we measure doesn't seem to be what was important here:

I had an issue though I actually still don't understand. Benchmarking this https://bit.ly/2ChIS1X code, benchess says 2.16GB memory usage, while the observer doesn't show any memory peak (around 32MBytes peak). Maybe I've put some wrong settings..

The thing is that we measure the total memory allocated - even the memory that was garbage collected which is what we want :) However, it'd be interesting to see a couple of others:

  • maximum heap size of the process (aka how far does this push the memory consumption of my erlang process) - here streaming is a great example because we still deal with all the data just not at once (hence it is garbage collected in between). As such the sample and the source for the large CSV seem helpful for testing (doesn't seem to allow downloads for non users)
  • as @michalmuskala pointed out retained memory might also be worth measuring - so what's the long term memory impact this operation has on my system (for stateless functions this should hopefully be 0)

The question is of course how to implement this - as our memory measurer already has all the data the simplest solution might be to just let it return the 3 values in a tuple or so.

Another interesting thing is the output - for a lot of use cases (pure functions) I expect all of these values to not change so instead of printing 3 different sections we could just output "this value this, that value that and the other value that"

This is a post 1.0 thing, it's very nice but we should really get 1.0 out and it shouldn't be breaking to the outside as we'd just add fields/data not remove or rename.

@alvises
Copy link

alvises commented Jan 24, 2019

Hi Tobias!
I've started to dig into the Benchee.Benchmark.Measure.Memory module, writing some tests to see the different memory measurements. In listen_gc_end , printing the info keyword list I see different keys like: heap_block_size, bin_vheap_size etc.. does anyone of this keywords represent the "total heap size" on the erlang observer ? Or should I aggregate the results to calculate the total?

PragTob added a commit that referenced this issue Jan 26, 2019
#262 reminded me of just how strange this code might be to people new to the code base and also to myself after a couple of months. This general information should help :)
@PragTob
Copy link
Member Author

PragTob commented Jan 26, 2019

@alvises 👋

So I just opened #263 to make this maybe a bit more discoverable.

What you're looking for (and what too me more googling than it should have) is http://erlang.org/doc/man/erlang.html#gc_minor_start


heap_size
    The size of the used part of the heap.
heap_block_size
    The size of the memory block used for storing the heap and the stack.
old_heap_size
    The size of the used part of the old heap.
old_heap_block_size
    The size of the memory block used for storing the old heap.
stack_size
    The size of the stack.
recent_size
    The size of the data that survived the previous garbage collection.
mbuf_size
    The combined size of message buffers associated with the process.
bin_vheap_size
    The total size of unique off-heap binaries referenced from the process heap.
bin_vheap_block_size
    The total size of binaries allowed in the virtual heap in the process before doing a garbage collection.
bin_old_vheap_size
    The total size of unique off-heap binaries referenced from the process old heap.
bin_old_vheap_block_size
    The total size of binaries allowed in the virtual old heap in the process before doing a garbage collection. 

Total memory used to the best of my understanding is heap_size + old_heap_size - hence our total_memory helper method

So that size after running the function and triggering GC should be the retained memory.

As for what is the total process size I'd have to half guess/dig deeper - heap_block_size + old_heap_block_size + stack_size?

I'm unsure but sadly don't have the time to read up on it.

@devonestes @michalmuskala do you have any further insight?

Also @alvises great that you look at it, not to deter you but just wanna mention again that I think we'd most likely get this in for realz after 1.0 (gathering the values should be okay but displaying it in all the formatters would take quite some time so I see it more as a 1.1 right now but yes that means I should take the time to do 1.0 😁 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants