Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. #6261

JordanHendersonMusic · 2024-04-21T14:59:35Z

Purpose and Motivation

When there are a large number of temporary objects mNumToScan can overflow leading to undefined behaviour (from the overflow of a signed int) and the implementation defined behaviour of right shifting a negative value, creating a GC bug that only appears on some systems, and incorrect, potentially leaking, but non-crashing behaviour on others.

This code cause a crash on some systems.

b = Pwhite.new(0, 10, inf).asStream;
Array.fill(1810000, {b.next});`

This commit places an upper limit on the number of objects to scan (5'000'000). Note, SC uses an incremental allocator, meaning this isn't a limit on the amount of memory, but amount of unchecked memory.

While all allocations should cause a collection, when PyrParseNode.cpp creates a PyrMethod, sometimes it sets 'needsHeapContext' to false which disables collection. This is (presumably) done for optimization. What this fails to consider is that frames in supercollider are themselves heap allocated, meaning, no matter how small, they increase mNumToScan, leading to the above bug.

While a change to PyrParseNode.cpp could be possible, the code is without comments and very complex.

Setting a limit on mNumToScan has a potential minuscule impact on performance, as the additional scan happens only rarely, and in some cases, actually improves performance as the GC is allowed to recycle the memory, reducing the number of calls to malloc — this is only noticeable in tight loops with many iterations that create functions.

Fixes #6234, also see #6257 for history.

Benchmarks

Test involving numerous temporary functions (that doesn't crash with the old behaviour).

b = Pwhite.new(0, 10, inf).asStream; 
100.collect{ 
   { 
      Array.fill(100000, {b.next})
   }.bench(false) 
}.mean

Old: 0.02174366042
New: 0.01934624595
Delta: +12%

A test without temporary functions that should be unaffected by this PR.

100.collect{ 
   {   
      var a = (\v: 0);   
      while { a[\v] < 100000 } {   
         a[\v] = a[\v].pow(1) + 0.1   
      }     
   }.bench(false)  
}.mean

Old: 0.08627185895
New: 0.0867351175
Delta: -0.5%

Types of changes

Bug fix
Change in GC strategy, observed performance gains, but possible performance regressions although I haven't observed any significant losses.

To-do list

Code is tested
All tests are passing
This PR is ready for review

JordanHendersonMusic · 2024-04-21T15:50:09Z

The two force pushes were for the linter (which I can't run on my computer) and replacing 'or' with '||' because windows.

smoge · 2024-04-22T00:53:47Z

Have you run the tests available in the project? I'll give it a try.

When there are a large number of temporary objects mNumToScan can overflow leading to undefined behaviour (from the overflow of a signed int) and the implementation defined behaviour of right shifting a negative value, creating a GC bug that only appears on some systems, and incorrect, potentially leaking, but non-crashing behaviour on others. This code cause a crash on some systems. ```supercollider b = Pwhite.new(0, 10, inf).asStream; Array.fill(1810000, {b.next});` ``` This commit places an upper limit on the number of objects to scan (5'000'000). Note, SC uses an incremental allocator, meaning this isn't a limit on the amount of memory, but amount of unchecked memory. While all allocations should cause a collection, when PyrParseNode.cpp creates a PyrMethod, sometimes it sets 'needsHeapContext' to false which disables collection. This is (presumably) done for optimization. What this fails to consider is that frames in supercollider are themselves heap allocated, meaning, no matter how small, they increase mNumToScan, leading to the above bug. While a change to PyrParseNode.cpp could be possible, the code is without comments and very complex. Setting a limit on mNumToScan has a potential minuscule impact on performance, as the additional scan happens only rarely, and in some cases, actually improves performance as the GC is allowed to recycle the memory, reducing the number of calls to malloc — this is only noticeable in tight loops with many iterations that create functions.

JordanHendersonMusic · 2024-04-23T10:24:51Z

While benchmarking this, I tried setting the threshold very low (500). This causes a crash because PyrGC's constructor calls its own member functions before it is initialised.

JordanHendersonMusic · 2024-04-23T10:25:22Z

Have you run the tests available in the project? I'll give it a try.

Which tests are you referring to?

smoge · 2024-04-23T10:54:41Z

Have you run the tests available in the project? I'll give it a try.

Which tests are you referring to?

In general, just to double check

JordanHendersonMusic · 2024-04-23T11:03:56Z

In general, just to double check

Yes, no crashes in the unit testing, nor with your original code

a = Pwhite.new(1802903, 1924888, inf).asStream;
b = Pwhite.new(0, 10, inf).asStream;

10 do: {
  Array.fill(a.next.round.asInteger.postln, {b.next})
}

smoge · 2024-04-23T11:06:01Z

well done, my man. that was a sweet good job

muellmusik · 2024-04-23T13:51:04Z

I think we need to test across a variety of use cases for performance and stability. Large sclang data arrays are not typical usage, and mostly we're prioritising realtime safety. Not that I see any obvious issues with this, but as it's at the core of everything...

Would be good to add more GC specific stuff to the test suites. Should think about how best to do that...

smoge · 2024-04-23T13:56:42Z

I think we need to test across a variety of use cases for performance and stability. Large sclang data arrays are not typical usage, and mostly we're prioritising realtime safety. Not that I see any obvious issues with this, but as it's at the core of everything...

Would be good to add more GC specific stuff to the test suites. Should think about how best to do that...

I was thinking about that, the unit tests in the project are not adequate at all. They use weak inference and hand-made tests. (I tried to raise this question, but it seems even with simple tests the CI already takes 20 minutes, and most devs don't like that, but the problem is real)

The best options, without talking about working on another testing framework, would just push to the development branch and ask people to test it widely.

EDIT: If one day we also adopt property-based tests or something along those lines, it could even run separately from the normal CI. Let it run and take as much time as necessary for the kind of test it is needed. That's how I see it.

JordanHendersonMusic · 2024-04-23T15:21:07Z

Should think about how best to do that...

Wonder if it is possible to add some bookkeeping to the internal allocater(s) so they can check their memory is being addressed properly?

smoge · 2024-04-23T16:29:58Z

Wonder if it is possible to add some bookkeeping to the internal allocator (s) so they can check their memory is being addressed properly?

That's a good idea, but I believe the first bit you mentioned is pretty involved. You would need to introduce some "metadata", and some pattern of bytes before and after each allocation. 2) keep track of pointers to each allocation. 3) record each allocation and deallocation.

In a way, there are already tools to help with some of that: real-time tracing isn't already done by toolings like Valgrind??

Property-based (and even "fuzz testing") would continue to be a tool to reveal problems, just like the test that triggered this fix. (Unfortunately, it is not even discussed right now in the community, maybe some CI trauma? Or something else?)

There was an unfortunate myth that good tests can be simple and hand-made, and if the same developers could also come up with that, and if something works with "1", "10" and "20" it will work everywhere.

Property-based tests can be very powerful in systems that take care not to mix too many pure computations and side effects. That's not the case with SC, for which additional care would be necessary. But it doesn't mean it is not also powerful and helpful, it just needs adequate design.

I would make an analogy. In a purely functional system, you could test the keys and the embouchure of the saxophone. But in a system like SC, the interaction between embouchure and fingering will cause extra complexities, like Evan Parker improvisation techniques, and we need to be aware of that.

At the same time, I see some unit tests in a project with those qualities that we use with very little help in the real world, testing simple hand-made cases in complete isolation. It almost seems that the test is designed to never fail nor inform anything))

(Hope it made sense)

-- In short, with the tools we have presently, I believe the best test is to push the change to develop and announce a call for testing.

muellmusik · 2024-04-23T17:00:18Z

Wonder if it is possible to add some bookkeeping to the internal allocater(s) so they can check their memory is being addressed properly?

Some discussion and unmerged work here: #3384

smoge · 2024-04-23T17:15:24Z

Wonder if it is possible to add some bookkeeping to the internal allocater(s) so they can check their memory is being addressed properly?

Some discussion and unnerved work here: #3384

@muellmusik There is a history of ideas and documented thought behind this that I don't have the time to study now. IF THAT'S THE CASE HERE, sorry if I made an uninformed noise. Eventually, I will catch up and check that.

muellmusik · 2024-04-23T18:49:20Z

@muellmusik There is a history of ideas and documented thought behind this that I don't have the time to study now. IF THAT'S THE CASE HERE, sorry if I made an uninformed noise. Eventually, I will catch up and check that.

What I linked to was an attempt to improve GC safety through an enhanced implementation. Not a test, so something a bit different.

muellmusik · 2024-04-23T18:52:56Z

There is also the GC_SANITYCHECK flag. But it makes things very slow in tests, so probably not great for testing large things like this.

You can trigger the sanity check ad hoc with the gcSanity message.

JordanHendersonMusic · 2024-04-23T20:26:13Z

What I linked to was an attempt to improve GC safety through an enhanced implementation. Not a test, so something a bit different.

That's pretty cool! I was thinking about the same thing recently and was going to try having a small stack inside the allocator class that all allocations are added to and cleared from periodically. This way, at least within a primitive scope which doesn't call other sc code, you guarantee the object exists.

It wouldn't have caught this bug though.

I might try making the allocator (when a flag is enabled) keep a record of all the alive objects and their size, this way you can ask the allocator if the object ptr is valid, how many objects are alive, etc.

That would have been useful for this bug.

smoge · 2024-04-23T23:57:59Z

I thought about if RAII could supplement the more inadequate parts of the GC. I don't remember if I never wrote or deleted it.

I started to read some of the messages (it's a long history to catch everything), and it seems Tim Blechmann mentioned "smart pointers", which is a similar feature of modern C++. It's more or less the same general idea.

Maybe there is one way to fix but also introduce C++ modern features that will have more support and knowledge around to fine-tune it.

All those talks about refactoring never resulted in a major refactor? Or?

EDIT:

RAII official reference: https://en.cppreference.com/w/cpp/language/raii

Interesting to see: RAII is the favorite recent feature in C++11 and later according to Bjarne Stroustrup.
https://www.youtube.com/watch?v=LlZWqkCMdfk

@JordanHendersonMusic Do you think they would supplement the GC? How do you understand that?

JordanHendersonMusic · 2024-04-24T09:03:11Z

the more inadequate parts of the GC

I think the GC in principle is pretty awesome, incremental tricolour — I can't quite see what flip does, is it a to-and-from space? — but it pretty much does what you'd do today, maybe there'd be a generation system so new objects are scanned more often than old ones?

Its main issue is that lack of comments, pretty odd naming conventions, and some bad practices that only trigger under odd conditions — the bug I found with PyrGC calling its own methods in the constructor is an example of that. There are some interfaces that could be improved, e.g., passing the whole stack to each primitive is quite dangerous and would be better to wrap this in something like std::span<Slot>, that way it is harder for the primitive to destroy the stack. The temporary object problem is another obvious area for improvement, @muellmusik's pr used RAII to address this.

Smart pointers seem problematic because of reference cycles, you'd also probably want to roll your own because they are atomic counted in C++ and sc is single threaded.

Anyway, this conversation is drifting away from the PR and might be better moved to a github discussion? This PR thread should either focus on the code itself, or on how best to test it — there is a lot to unpack just there. It would also be nice to gather all previous discussions on gc in one place.

smoge · 2024-04-24T09:22:34Z

Anyway, this conversation is drifting away from the PR and might be better moved to a github discussion? This PR thread should either focus on the code itself, or on how best to test it — there is a lot to unpack just there. It would also be nice to gather all previous discussions on gc in one place.

I'm behind on everything GC, I will need a few days to catch up to be able to engage with details. You're right that this last message opened up a door that steps back from using GC as it is. So, I'm off-topic here, sorry. But eventually, let's discuss all this when I get more time.

smoge · 2024-04-24T09:40:59Z

[Some preliminary notes I did, JUST TO CONCLUDE THIS PARALLEL SPECULATION ON HERE]

Reading some discussions about whether to choose RAII or GC in an interpreter like SC, just taking note of the technical advantages, it would make more sense to implement mostly RAII, including for soft real-time applications, predictability.

Also, if in the future SCLang receives an update regarding concurrency and parallelism, RAII would also be a better choice, according to what I researched.

GC still is more idiomatic for OOP, in any case.

Some references:

Hans-J. Boehm and Mike Spertus. 2009. Garbage collection in the next C++ standard. In Proceedings of the 2009 international symposium on Memory management (ISMM '09). Association for Computing Machinery, New York, NY, USA, 30–38. https://doi.org/10.1145/1542431.1542437
[just to get an idea for future improvements] "C++ Concurrency in Action" by Anthony Williams
re: advanced C++ usage, including RAII in multithreaded environments
https://dl.acm.org/doi/abs/10.1145/1542431.1542437

muellmusik · 2024-04-24T10:18:08Z

I might try making the allocator (when a flag is enabled) keep a record of all the alive objects and their size, this way you can ask the allocator if the object ptr is valid, how many objects are alive, etc.

Just to be clear, what do you mean by 'alive' here?

the bug I found with PyrGC calling its own methods in the constructor is an example of that.

Did I miss that?

Smart pointers seem problematic because of reference cycles, you'd also probably want to roll your own because they are atomic counted in C++ and sc is single threaded.

Yeah I did that in the PR. We need something very light weight and simple. The normal C++ smart pointers seemed like overkill.

passing the whole stack to each primitive is quite dangerous and would be better to wrap this in something like std::span<Slot>, that way it is harder for the primitive to destroy the stack.

In general the way the receiver and args are passed to the primitives makes it easy to get wrong. QC_LANG_PRIMITIVEdoes a better job of this. It would be nice to see it hidden behind a safer interface.

Its main issue is that lack of comments
...
I'm behind on everything GC

No worries. I've probably spent as much time mucking around in the GC as most people currently active, but not much in recent years so need to spend some more time myself. Would be good to add some comments and documentation and look at this more generally.

JordanHendersonMusic · 2024-04-24T10:55:58Z

the bug I found with PyrGC calling its own methods in the constructor is an example of that.

Did I miss that?

I mentioned it in a comment, just submitted an issue: #6268

I might try making the allocator (when a flag is enabled) keep a record of all the alive objects and their size, this way you can ask the allocator if the object ptr is valid, how many objects are alive, etc.

Just to be clear, what do you mean by 'alive' here?

Post allocation, but before the GC marks it as unreachable. It would be a way to check the GC has done its job. I don't know if this is covered in any of the sanity check — to be honest I haven't looked that closely at them.

muellmusik · 2024-04-24T14:27:48Z

Post allocation, but before the GC marks it as unreachable. It would be a way to check the GC has done its job. I don't know if this is covered in any of the sanity check — to be honest I haven't looked that closely at them.

So you mean reachable or unexamined? Forgive me but I'm still a little unclear. What mechanism can we use to determine if an object is unreachable besides the GC itself?

If you mean making sure that newly created objects aren't unused, then the pointer thing I did in #3384 does that. If you mean checking that created objects are made reachable between calls to object creation functions the count in that covers that.

Or is this addressing something else? The issues we identified earlier were:

Make sure that created objects are made reachable between calls to object creation functions.
Make sure that objects made reachable by storing in another object get the correct corresponding calls to GCWrite.
Ensure that parent object sizes are correctly updated.
Make use of GCWriteNew rather than GCWrite where appropriate.
Catch any objects created which are unused. (This seems like it must be an error.)
Make common errors in things like stack access less likely.

JordanHendersonMusic · 2024-04-24T14:55:27Z

Or is this addressing something else?

So with this PR's initial bug, an pyrobject pointer was being incremented (actually decremented) and dereferenced.
What I was suggesting was a way to pass the pointer (before dereferencing) to the allocators, and they could confirm that the memory belonged to them. They could also tell you if the object had been released by the GC and the memory was ready to be recycled. If the object ptr does not belong to any allocator, something has gone wrong. This wouldn't be useful in release builds, but would be an additional way to debug and confirm that the GC is doing the correct thing, a prerequisite of testing, which we lack. It would also let you test the pointer arithmetic on pyrObjects.

The PR you linked wouldn't help with this, but would certainly be useful in other areas — although I think it could be called SmartPyrObject!

smoge · 2024-04-25T04:18:47Z

Would be good to add some comments and documentation and look at this more generally.

@muellmusik I think this could even become a new guideline. SC codebase is an extreme case of avoiding comments.

Let's take a look at the other end of this spectrum, and see how many informative comments one can find in the GHC compiler source code: https://github.com/ghc/ghc/blob/master/compiler/GHC/Core.hs

GC-related code could be a start in this respect. If agreed, those with more experience could contribute with comments as part of the next patches pushed into develop.

Scott, while you're revisiting it, it would be a great contribution to write down something.

muellmusik · 2024-04-25T14:05:57Z

What I was suggesting was a way to pass the pointer (before dereferencing) to the allocators, and they could confirm that the memory belonged to them. They could also tell you if the object had been released by the GC and the memory was ready to be recycled. If the object ptr does not belong to any allocator, something has gone wrong.

Okay. So this tells us more than the segfault, in that it would catch cases that wouldn't crash in debug builds (only by luck)? That makes sense!

JordanHendersonMusic · 2024-05-09T10:43:19Z

This conversation seems to have dwindled. Was there any thing I should be doing?

As far as I can see, the only real way to test this is just to put it into debug branch and see if anyone has any issues. Ideally, we'd have a proper bank of test pieces to run this on — has there ever been a conversation about community members 'donating' piece to be tests or benchmarks?

smoge · 2024-05-09T11:06:01Z

This conversation seems to have dwindled. Was there any thing I should be doing?

As far as I can see, the only real way to test this is just to put it into debug branch and see if anyone has any issues. Ideally, we'd have a proper bank of test pieces to run this on — has there ever been a conversation about community members 'donating' piece to be tests or benchmarks?

I understand that's the only test available. I just wonder why so many users and developers are not even able to build their systems for so long, which will affect this method.

I can build and run a few more hairy codes here on Linux, but many more people would be needed.

Maybe release a test/debug binary, since it seems a few people are being able to build? That would work.

JordanHendersonMusic force-pushed the topic/fix-segfault-place-limit-on-num-temporary-objects branch 2 times, most recently from bb5f359 to 3bb9821 Compare April 21, 2024 15:21

JordanHendersonMusic force-pushed the topic/fix-segfault-place-limit-on-num-temporary-objects branch from 3bb9821 to 17ff2f1 Compare April 23, 2024 10:19

JordanHendersonMusic mentioned this pull request Apr 24, 2024

SClang: PyrGC calls own methods in its constructor before fully initialised. #6268

Open

JordanHendersonMusic removed the bug Issues that relate to unexpected/unwanted behavior. Don't use for PRs. label Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. #6261

Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. #6261

JordanHendersonMusic commented Apr 21, 2024 •

edited

JordanHendersonMusic commented Apr 21, 2024

smoge commented Apr 22, 2024

JordanHendersonMusic commented Apr 23, 2024

JordanHendersonMusic commented Apr 23, 2024

smoge commented Apr 23, 2024

JordanHendersonMusic commented Apr 23, 2024

smoge commented Apr 23, 2024 •

edited

muellmusik commented Apr 23, 2024

smoge commented Apr 23, 2024 •

edited

JordanHendersonMusic commented Apr 23, 2024

smoge commented Apr 23, 2024 •

edited

muellmusik commented Apr 23, 2024 •

edited

smoge commented Apr 23, 2024 •

edited

muellmusik commented Apr 23, 2024

muellmusik commented Apr 23, 2024

JordanHendersonMusic commented Apr 23, 2024

smoge commented Apr 23, 2024 •

edited

JordanHendersonMusic commented Apr 24, 2024 •

edited

smoge commented Apr 24, 2024

smoge commented Apr 24, 2024 •

edited

muellmusik commented Apr 24, 2024

JordanHendersonMusic commented Apr 24, 2024

muellmusik commented Apr 24, 2024

JordanHendersonMusic commented Apr 24, 2024 •

edited

smoge commented Apr 25, 2024 •

edited

muellmusik commented Apr 25, 2024

JordanHendersonMusic commented May 9, 2024 •

edited

smoge commented May 9, 2024 •

edited

Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. #6261

Are you sure you want to change the base?

Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. #6261

Conversation

JordanHendersonMusic commented Apr 21, 2024 • edited

Purpose and Motivation

Benchmarks

Types of changes

To-do list

JordanHendersonMusic commented Apr 21, 2024

smoge commented Apr 22, 2024

JordanHendersonMusic commented Apr 23, 2024

JordanHendersonMusic commented Apr 23, 2024

smoge commented Apr 23, 2024

JordanHendersonMusic commented Apr 23, 2024

smoge commented Apr 23, 2024 • edited

muellmusik commented Apr 23, 2024

smoge commented Apr 23, 2024 • edited

JordanHendersonMusic commented Apr 23, 2024

smoge commented Apr 23, 2024 • edited

muellmusik commented Apr 23, 2024 • edited

smoge commented Apr 23, 2024 • edited

muellmusik commented Apr 23, 2024

muellmusik commented Apr 23, 2024

JordanHendersonMusic commented Apr 23, 2024

smoge commented Apr 23, 2024 • edited

@JordanHendersonMusic Do you think they would supplement the GC? How do you understand that?

JordanHendersonMusic commented Apr 24, 2024 • edited

smoge commented Apr 24, 2024

smoge commented Apr 24, 2024 • edited

muellmusik commented Apr 24, 2024

JordanHendersonMusic commented Apr 24, 2024

muellmusik commented Apr 24, 2024

JordanHendersonMusic commented Apr 24, 2024 • edited

smoge commented Apr 25, 2024 • edited

muellmusik commented Apr 25, 2024

JordanHendersonMusic commented May 9, 2024 • edited

smoge commented May 9, 2024 • edited

JordanHendersonMusic commented Apr 21, 2024 •

edited

smoge commented Apr 23, 2024 •

edited

smoge commented Apr 23, 2024 •

edited

smoge commented Apr 23, 2024 •

edited

muellmusik commented Apr 23, 2024 •

edited

smoge commented Apr 23, 2024 •

edited

smoge commented Apr 23, 2024 •

edited

JordanHendersonMusic commented Apr 24, 2024 •

edited

smoge commented Apr 24, 2024 •

edited

JordanHendersonMusic commented Apr 24, 2024 •

edited

smoge commented Apr 25, 2024 •

edited

JordanHendersonMusic commented May 9, 2024 •

edited

smoge commented May 9, 2024 •

edited