Skip to content
This repository has been archived by the owner on Nov 27, 2022. It is now read-only.

Memory allocation overhead varies depending on platform #1

Open
kabergstrom opened this issue Aug 19, 2020 · 7 comments
Open

Memory allocation overhead varies depending on platform #1

kabergstrom opened this issue Aug 19, 2020 · 7 comments

Comments

@kabergstrom
Copy link

When running the benchmarks, I got more than an 80% improvement (10ms to 2ms) when I switched from platform-provided malloc to rpmalloc on windows for the serialize_binary benchmark. Additionally, after switching, some other cases had up to 36% difference in runtime. I think a good avenue to explore for extending the benches would be to measure # and size of allocations for the test cases, and to warn people about platform-provided malloc. Maybe you should mandate a custom allocator that is cross-platform to avoid people benchmarking the wrong thing.

Additionally, I would advise pre-allocating serialization buffers to ensure it's not just a bench of Vec::grow.

@TomGillen
Copy link
Collaborator

TomGillen commented Aug 20, 2020

I just tried using rpmalloc, and while performance improves quite a bit, the shipyard allocate benchmark crashes with "memory allocation of 2424 bytes failed".

@kabergstrom
Copy link
Author

Perhaps @leudz could check why this is?

@leudz
Copy link
Collaborator

leudz commented Aug 22, 2020

I've narrowed it down to:

rayon::ThreadPoolBuilder::new().build().unwrap();

It triggers this assert sometimes.

@kabergstrom
Copy link
Author

I suppose the issue is that the benchmark creates a new World in the bench function, and in shipyard's case, creating a new World will create a new threadpool which immediately spawns threads. rpmalloc allocs heaps per-thread too, and I suppose this intense thread creation pressure is causing a OOM condition since Windows doesn't overcommit.

@kabergstrom
Copy link
Author

@leudz Do you have any opinion on how to solve this?

@leudz
Copy link
Collaborator

leudz commented Aug 24, 2020

For non parallel benchmarks removing the parallel feature would work.
Maybe using a custom pool could solve the problem, I'm not sure.

@leudz
Copy link
Collaborator

leudz commented Sep 7, 2020

Shipyard now uses the global ThreadPool, problem solved =)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants