New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with profiling OSX M1 #142
Comments
Neat, thanks! Do the tests and benchmarks compile using the makefiles in those directories? (May need to replace |
Compiled with no issue, running says
Full output:
I modified the code to allow me to run, but crash when running in -O3:
The specific crashing frame more up close
This does not happen when compiling the benchmarks with -O0. Let me see if I can poke around and find anything satisfying |
Some output from ubsan and asan:
|
Don't waste your time, the spsc implementation depends specifically on x86 semantics and is not portable. I guess we'd have to disable it completely to be able to run the benchmarks in O3. |
asan is also known to give false positives with lock-free code. |
Curious where those ubsan overflows are from though. |
The ubsan overflows are pretty easy to fix, just changing:
on line 270 of the bench file fixed it up (long long may be overkill, but this silenced it) |
Does this mean MPMC (concurrentqueue) is portable? Should I try and run some benchmarks on that? Happy to help in any way I can. |
Ah I see, it's a dummy variable just to disable dead code removal. Yeah switching to any unsigned type should fix it. |
"spsc" here in the benchmarks is not my code, but a different SPSC queue implementation from https://www.1024cores.net/home/lock-free-algorithms/queues/unbounded-spsc-queue just for comparison. |
Would be worth getting the benchmarks to run in O3 (without spsc) to see the results. |
Ah, got it! Sorry, I thought that was referring to the code in this repo (readerwriterqueue) Here are the afformentioned results, now that we know SPSC is dead and we don't care about it:
|
That's O3? |
Yep, this is no changes to the repo, just tip of main. the compile line being used is:
|
(to answer your previous concern about
|
Nice. Thanks for the results. Would have to dig into the assembly to see why the base operations with Folly are twice as fast. |
Sounds good. Please don't hesitate to reach out if I can assist. This project has helped me a ton so happy to contribute in any way I can. |
Hi!
I saw
Note that it's only been tested on x86(-64); if someone has access to other processors I'd love to run some tests on anything that's not x86-based.
In the readme.
I have a mac m1 (arm) processor. If there is a suite of tests I can run for you I'd be thrilled to help.
Let me know
The text was updated successfully, but these errors were encountered: