New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. #6261
base: develop
Are you sure you want to change the base?
Conversation
bb5f359
to
3bb9821
Compare
The two force pushes were for the linter (which I can't run on my computer) and replacing 'or' with '||' because windows. |
Have you run the tests available in the project? I'll give it a try. |
When there are a large number of temporary objects mNumToScan can overflow leading to undefined behaviour (from the overflow of a signed int) and the implementation defined behaviour of right shifting a negative value, creating a GC bug that only appears on some systems, and incorrect, potentially leaking, but non-crashing behaviour on others. This code cause a crash on some systems. ```supercollider b = Pwhite.new(0, 10, inf).asStream; Array.fill(1810000, {b.next});` ``` This commit places an upper limit on the number of objects to scan (5'000'000). Note, SC uses an incremental allocator, meaning this isn't a limit on the amount of memory, but amount of unchecked memory. While all allocations should cause a collection, when PyrParseNode.cpp creates a PyrMethod, sometimes it sets 'needsHeapContext' to false which disables collection. This is (presumably) done for optimization. What this fails to consider is that frames in supercollider are themselves heap allocated, meaning, no matter how small, they increase mNumToScan, leading to the above bug. While a change to PyrParseNode.cpp could be possible, the code is without comments and very complex. Setting a limit on mNumToScan has a potential minuscule impact on performance, as the additional scan happens only rarely, and in some cases, actually improves performance as the GC is allowed to recycle the memory, reducing the number of calls to malloc — this is only noticeable in tight loops with many iterations that create functions.
3bb9821
to
17ff2f1
Compare
While benchmarking this, I tried setting the threshold very low (500). This causes a crash because PyrGC's constructor calls its own member functions before it is initialised. |
Which tests are you referring to? |
In general, just to double check |
Yes, no crashes in the unit testing, nor with your original code
|
well done, my man. that was a sweet good job |
I think we need to test across a variety of use cases for performance and stability. Large sclang data arrays are not typical usage, and mostly we're prioritising realtime safety. Not that I see any obvious issues with this, but as it's at the core of everything... Would be good to add more GC specific stuff to the test suites. Should think about how best to do that... |
I was thinking about that, the unit tests in the project are not adequate at all. They use weak inference and hand-made tests. (I tried to raise this question, but it seems even with simple tests the CI already takes 20 minutes, and most devs don't like that, but the problem is real) The best options, without talking about working on another testing framework, would just push to the development branch and ask people to test it widely. EDIT: If one day we also adopt property-based tests or something along those lines, it could even run separately from the normal CI. Let it run and take as much time as necessary for the kind of test it is needed. That's how I see it. |
Wonder if it is possible to add some bookkeeping to the internal allocater(s) so they can check their memory is being addressed properly? |
That's a good idea, but I believe the first bit you mentioned is pretty involved. You would need to introduce some "metadata", and some pattern of bytes before and after each allocation. 2) keep track of pointers to each allocation. 3) record each allocation and deallocation. In a way, there are already tools to help with some of that: real-time tracing isn't already done by toolings like Valgrind?? Property-based (and even "fuzz testing") would continue to be a tool to reveal problems, just like the test that triggered this fix. (Unfortunately, it is not even discussed right now in the community, maybe some CI trauma? Or something else?) There was an unfortunate myth that good tests can be simple and hand-made, and if the same developers could also come up with that, and if something works with "1", "10" and "20" it will work everywhere. Property-based tests can be very powerful in systems that take care not to mix too many pure computations and side effects. That's not the case with SC, for which additional care would be necessary. But it doesn't mean it is not also powerful and helpful, it just needs adequate design. I would make an analogy. In a purely functional system, you could test the keys and the embouchure of the saxophone. But in a system like SC, the interaction between embouchure and fingering will cause extra complexities, like Evan Parker improvisation techniques, and we need to be aware of that. At the same time, I see some unit tests in a project with those qualities that we use with very little help in the real world, testing simple hand-made cases in complete isolation. It almost seems that the test is designed to never fail nor inform anything)) (Hope it made sense) -- In short, with the tools we have presently, I believe the best test is to push the change to develop and announce a call for testing. |
Some discussion and unmerged work here: #3384 |
@muellmusik There is a history of ideas and documented thought behind this that I don't have the time to study now. IF THAT'S THE CASE HERE, sorry if I made an uninformed noise. Eventually, I will catch up and check that. |
What I linked to was an attempt to improve GC safety through an enhanced implementation. Not a test, so something a bit different. |
There is also the GC_SANITYCHECK flag. But it makes things very slow in tests, so probably not great for testing large things like this. You can trigger the sanity check ad hoc with the |
That's pretty cool! I was thinking about the same thing recently and was going to try having a small stack inside the allocator class that all allocations are added to and cleared from periodically. This way, at least within a primitive scope which doesn't call other sc code, you guarantee the object exists. It wouldn't have caught this bug though. I might try making the allocator (when a flag is enabled) keep a record of all the alive objects and their size, this way you can ask the allocator if the object ptr is valid, how many objects are alive, etc. That would have been useful for this bug. |
I thought about if RAII could supplement the more inadequate parts of the GC. I don't remember if I never wrote or deleted it. I started to read some of the messages (it's a long history to catch everything), and it seems Tim Blechmann mentioned "smart pointers", which is a similar feature of modern C++. It's more or less the same general idea. Maybe there is one way to fix but also introduce C++ modern features that will have more support and knowledge around to fine-tune it. All those talks about refactoring never resulted in a major refactor? Or? EDIT: RAII official reference: https://en.cppreference.com/w/cpp/language/raii Interesting to see: RAII is the favorite recent feature in C++11 and later according to Bjarne Stroustrup. @JordanHendersonMusic Do you think they would supplement the GC? How do you understand that? |
I think the GC in principle is pretty awesome, incremental tricolour — I can't quite see what flip does, is it a to-and-from space? — but it pretty much does what you'd do today, maybe there'd be a generation system so new objects are scanned more often than old ones? Its main issue is that lack of comments, pretty odd naming conventions, and some bad practices that only trigger under odd conditions — the bug I found with PyrGC calling its own methods in the constructor is an example of that. There are some interfaces that could be improved, e.g., passing the whole stack to each primitive is quite dangerous and would be better to wrap this in something like Smart pointers seem problematic because of reference cycles, you'd also probably want to roll your own because they are atomic counted in C++ and sc is single threaded. Anyway, this conversation is drifting away from the PR and might be better moved to a github discussion? This PR thread should either focus on the code itself, or on how best to test it — there is a lot to unpack just there. It would also be nice to gather all previous discussions on gc in one place. |
I'm behind on everything GC, I will need a few days to catch up to be able to engage with details. You're right that this last message opened up a door that steps back from using GC as it is. So, I'm off-topic here, sorry. But eventually, let's discuss all this when I get more time. |
[Some preliminary notes I did, JUST TO CONCLUDE THIS PARALLEL SPECULATION ON HERE] Reading some discussions about whether to choose RAII or GC in an interpreter like SC, just taking note of the technical advantages, it would make more sense to implement mostly RAII, including for soft real-time applications, predictability. Also, if in the future SCLang receives an update regarding concurrency and parallelism, RAII would also be a better choice, according to what I researched. GC still is more idiomatic for OOP, in any case. Some references:
|
Just to be clear, what do you mean by 'alive' here?
Did I miss that?
Yeah I did that in the PR. We need something very light weight and simple. The normal C++ smart pointers seemed like overkill.
In general the way the receiver and args are passed to the primitives makes it easy to get wrong.
No worries. I've probably spent as much time mucking around in the GC as most people currently active, but not much in recent years so need to spend some more time myself. Would be good to add some comments and documentation and look at this more generally. |
I mentioned it in a comment, just submitted an issue: #6268
Post allocation, but before the GC marks it as unreachable. It would be a way to check the GC has done its job. I don't know if this is covered in any of the sanity check — to be honest I haven't looked that closely at them. |
So you mean reachable or unexamined? Forgive me but I'm still a little unclear. What mechanism can we use to determine if an object is unreachable besides the GC itself? If you mean making sure that newly created objects aren't unused, then the pointer thing I did in #3384 does that. If you mean checking that created objects are made reachable between calls to object creation functions the count in that covers that. Or is this addressing something else? The issues we identified earlier were:
|
So with this PR's initial bug, an pyrobject pointer was being incremented (actually decremented) and dereferenced. The PR you linked wouldn't help with this, but would certainly be useful in other areas — although I think it could be called |
@muellmusik I think this could even become a new guideline. SC codebase is an extreme case of avoiding comments. Let's take a look at the other end of this spectrum, and see how many informative comments one can find in the GHC compiler source code: https://github.com/ghc/ghc/blob/master/compiler/GHC/Core.hs GC-related code could be a start in this respect. If agreed, those with more experience could contribute with comments as part of the next patches pushed into develop. Scott, while you're revisiting it, it would be a great contribution to write down something. |
Okay. So this tells us more than the segfault, in that it would catch cases that wouldn't crash in debug builds (only by luck)? That makes sense! |
This conversation seems to have dwindled. Was there any thing I should be doing? As far as I can see, the only real way to test this is just to put it into debug branch and see if anyone has any issues. Ideally, we'd have a proper bank of test pieces to run this on — has there ever been a conversation about community members 'donating' piece to be tests or benchmarks? |
I understand that's the only test available. I just wonder why so many users and developers are not even able to build their systems for so long, which will affect this method. I can build and run a few more hairy codes here on Linux, but many more people would be needed. Maybe release a test/debug binary, since it seems a few people are being able to build? That would work. |
Purpose and Motivation
When there are a large number of temporary objects mNumToScan can overflow leading to undefined behaviour (from the overflow of a signed int) and the implementation defined behaviour of right shifting a negative value, creating a GC bug that only appears on some systems, and incorrect, potentially leaking, but non-crashing behaviour on others.
This code cause a crash on some systems.
This commit places an upper limit on the number of objects to scan (5'000'000). Note, SC uses an incremental allocator, meaning this isn't a limit on the amount of memory, but amount of unchecked memory.
While all allocations should cause a collection, when PyrParseNode.cpp creates a PyrMethod, sometimes it sets 'needsHeapContext' to false which disables collection. This is (presumably) done for optimization. What this fails to consider is that frames in supercollider are themselves heap allocated, meaning, no matter how small, they increase mNumToScan, leading to the above bug.
While a change to PyrParseNode.cpp could be possible, the code is without comments and very complex.
Setting a limit on mNumToScan has a potential minuscule impact on performance, as the additional scan happens only rarely, and in some cases, actually improves performance as the GC is allowed to recycle the memory, reducing the number of calls to malloc — this is only noticeable in tight loops with many iterations that create functions.
Fixes #6234, also see #6257 for history.
Benchmarks
Test involving numerous temporary functions (that doesn't crash with the old behaviour).
Old: 0.02174366042
New: 0.01934624595
Delta: +12%
A test without temporary functions that should be unaffected by this PR.
Old: 0.08627185895
New: 0.0867351175
Delta: -0.5%
Types of changes
To-do list