Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. #6261

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

JordanHendersonMusic
Copy link
Contributor

@JordanHendersonMusic JordanHendersonMusic commented Apr 21, 2024

Purpose and Motivation

When there are a large number of temporary objects mNumToScan can overflow leading to undefined behaviour (from the overflow of a signed int) and the implementation defined behaviour of right shifting a negative value, creating a GC bug that only appears on some systems, and incorrect, potentially leaking, but non-crashing behaviour on others.

This code cause a crash on some systems.

b = Pwhite.new(0, 10, inf).asStream;
Array.fill(1810000, {b.next});`

This commit places an upper limit on the number of objects to scan (5'000'000). Note, SC uses an incremental allocator, meaning this isn't a limit on the amount of memory, but amount of unchecked memory.

While all allocations should cause a collection, when PyrParseNode.cpp creates a PyrMethod, sometimes it sets 'needsHeapContext' to false which disables collection. This is (presumably) done for optimization. What this fails to consider is that frames in supercollider are themselves heap allocated, meaning, no matter how small, they increase mNumToScan, leading to the above bug.

While a change to PyrParseNode.cpp could be possible, the code is without comments and very complex.

Setting a limit on mNumToScan has a potential minuscule impact on performance, as the additional scan happens only rarely, and in some cases, actually improves performance as the GC is allowed to recycle the memory, reducing the number of calls to malloc — this is only noticeable in tight loops with many iterations that create functions.


Fixes #6234, also see #6257 for history.

Benchmarks

Test involving numerous temporary functions (that doesn't crash with the old behaviour).

b = Pwhite.new(0, 10, inf).asStream; 
100.collect{ 
   { 
      Array.fill(100000, {b.next})
   }.bench(false) 
}.mean

Old: 0.02174366042
New: 0.01934624595
Delta: +12%

A test without temporary functions that should be unaffected by this PR.

100.collect{ 
   {   
      var a = (\v: 0);   
      while { a[\v] < 100000 } {   
         a[\v] = a[\v].pow(1) + 0.1   
      }     
   }.bench(false)  
}.mean

Old: 0.08627185895
New: 0.0867351175
Delta: -0.5%

Types of changes

  • Bug fix
  • Change in GC strategy, observed performance gains, but possible performance regressions although I haven't observed any significant losses.

To-do list

  • Code is tested
  • All tests are passing
  • This PR is ready for review

@JordanHendersonMusic JordanHendersonMusic force-pushed the topic/fix-segfault-place-limit-on-num-temporary-objects branch 2 times, most recently from bb5f359 to 3bb9821 Compare April 21, 2024 15:21
@JordanHendersonMusic
Copy link
Contributor Author

The two force pushes were for the linter (which I can't run on my computer) and replacing 'or' with '||' because windows.

@JordanHendersonMusic JordanHendersonMusic added bug Issues that relate to unexpected/unwanted behavior. Don't use for PRs. comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library" crash things which cause a crash in the interpreter, servers, or IDE. do not use for PRs and removed crash things which cause a crash in the interpreter, servers, or IDE. do not use for PRs labels Apr 21, 2024
@smoge
Copy link
Contributor

smoge commented Apr 22, 2024

Have you run the tests available in the project? I'll give it a try.

@JordanHendersonMusic JordanHendersonMusic changed the title Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. PyrGC: Allocate & Collect Sclang: PyrGC: Fix GC segfault caused by uncollected temporary function objects by placing an upper limit on un-scanned objects. Apr 22, 2024
When there are a large number of temporary objects mNumToScan can overflow leading to undefined behaviour (from the overflow of a signed int) and the implementation defined behaviour of right shifting a negative value, creating a GC bug that only appears on some systems, and incorrect, potentially leaking, but non-crashing behaviour on others.

This code cause a crash on some systems.
```supercollider
b = Pwhite.new(0, 10, inf).asStream;
Array.fill(1810000, {b.next});`
```

This commit places an upper limit on the number of objects to scan (5'000'000). Note, SC uses an incremental allocator, meaning this isn't a limit on the amount of memory, but amount of unchecked memory.

While all allocations should cause a collection, when PyrParseNode.cpp creates a PyrMethod, sometimes it sets  'needsHeapContext' to false which disables collection. This is (presumably) done for optimization. What this fails to consider is that frames in supercollider are themselves heap allocated, meaning, no matter how small, they increase mNumToScan, leading to the above bug.

While a change to PyrParseNode.cpp could be possible, the code is without comments and very complex.

Setting a limit on mNumToScan has a potential minuscule impact on performance, as the additional scan happens only rarely, and in some cases, actually improves performance as the GC is allowed to recycle the memory, reducing the number of calls to malloc — this is only noticeable in tight loops with many iterations that create functions.
@JordanHendersonMusic JordanHendersonMusic force-pushed the topic/fix-segfault-place-limit-on-num-temporary-objects branch from 3bb9821 to 17ff2f1 Compare April 23, 2024 10:19
@JordanHendersonMusic
Copy link
Contributor Author

While benchmarking this, I tried setting the threshold very low (500). This causes a crash because PyrGC's constructor calls its own member functions before it is initialised.

@JordanHendersonMusic
Copy link
Contributor Author

Have you run the tests available in the project? I'll give it a try.

Which tests are you referring to?

@smoge
Copy link
Contributor

smoge commented Apr 23, 2024

Have you run the tests available in the project? I'll give it a try.

Which tests are you referring to?

In general, just to double check

@JordanHendersonMusic
Copy link
Contributor Author

In general, just to double check

Yes, no crashes in the unit testing, nor with your original code

a = Pwhite.new(1802903, 1924888, inf).asStream;
b = Pwhite.new(0, 10, inf).asStream;

10 do: {
  Array.fill(a.next.round.asInteger.postln, {b.next})
}

@smoge
Copy link
Contributor

smoge commented Apr 23, 2024

well done, my man. that was a sweet good job

@muellmusik
Copy link
Contributor

I think we need to test across a variety of use cases for performance and stability. Large sclang data arrays are not typical usage, and mostly we're prioritising realtime safety. Not that I see any obvious issues with this, but as it's at the core of everything...

Would be good to add more GC specific stuff to the test suites. Should think about how best to do that...

@smoge
Copy link
Contributor

smoge commented Apr 23, 2024

I think we need to test across a variety of use cases for performance and stability. Large sclang data arrays are not typical usage, and mostly we're prioritising realtime safety. Not that I see any obvious issues with this, but as it's at the core of everything...

Would be good to add more GC specific stuff to the test suites. Should think about how best to do that...

I was thinking about that, the unit tests in the project are not adequate at all. They use weak inference and hand-made tests. (I tried to raise this question, but it seems even with simple tests the CI already takes 20 minutes, and most devs don't like that, but the problem is real)

The best options, without talking about working on another testing framework, would just push to the development branch and ask people to test it widely.

EDIT: If one day we also adopt property-based tests or something along those lines, it could even run separately from the normal CI. Let it run and take as much time as necessary for the kind of test it is needed. That's how I see it.

@JordanHendersonMusic
Copy link
Contributor Author

Should think about how best to do that...

Wonder if it is possible to add some bookkeeping to the internal allocater(s) so they can check their memory is being addressed properly?

@smoge
Copy link
Contributor

smoge commented Apr 23, 2024

Wonder if it is possible to add some bookkeeping to the internal allocator (s) so they can check their memory is being addressed properly?

That's a good idea, but I believe the first bit you mentioned is pretty involved. You would need to introduce some "metadata", and some pattern of bytes before and after each allocation. 2) keep track of pointers to each allocation. 3) record each allocation and deallocation.

In a way, there are already tools to help with some of that: real-time tracing isn't already done by toolings like Valgrind??

Property-based (and even "fuzz testing") would continue to be a tool to reveal problems, just like the test that triggered this fix. (Unfortunately, it is not even discussed right now in the community, maybe some CI trauma? Or something else?)

There was an unfortunate myth that good tests can be simple and hand-made, and if the same developers could also come up with that, and if something works with "1", "10" and "20" it will work everywhere.

Property-based tests can be very powerful in systems that take care not to mix too many pure computations and side effects. That's not the case with SC, for which additional care would be necessary. But it doesn't mean it is not also powerful and helpful, it just needs adequate design.

I would make an analogy. In a purely functional system, you could test the keys and the embouchure of the saxophone. But in a system like SC, the interaction between embouchure and fingering will cause extra complexities, like Evan Parker improvisation techniques, and we need to be aware of that.

At the same time, I see some unit tests in a project with those qualities that we use with very little help in the real world, testing simple hand-made cases in complete isolation. It almost seems that the test is designed to never fail nor inform anything))

(Hope it made sense)

-- In short, with the tools we have presently, I believe the best test is to push the change to develop and announce a call for testing.

@muellmusik
Copy link
Contributor

muellmusik commented Apr 23, 2024

Wonder if it is possible to add some bookkeeping to the internal allocater(s) so they can check their memory is being addressed properly?

Some discussion and unmerged work here: #3384

@smoge
Copy link
Contributor

smoge commented Apr 23, 2024

Wonder if it is possible to add some bookkeeping to the internal allocater(s) so they can check their memory is being addressed properly?

Some discussion and unnerved work here: #3384

@muellmusik There is a history of ideas and documented thought behind this that I don't have the time to study now. IF THAT'S THE CASE HERE, sorry if I made an uninformed noise. Eventually, I will catch up and check that.

@muellmusik
Copy link
Contributor

@muellmusik There is a history of ideas and documented thought behind this that I don't have the time to study now. IF THAT'S THE CASE HERE, sorry if I made an uninformed noise. Eventually, I will catch up and check that.

What I linked to was an attempt to improve GC safety through an enhanced implementation. Not a test, so something a bit different.

@muellmusik
Copy link
Contributor

There is also the GC_SANITYCHECK flag. But it makes things very slow in tests, so probably not great for testing large things like this.

You can trigger the sanity check ad hoc with the gcSanity message.

@JordanHendersonMusic
Copy link
Contributor Author

What I linked to was an attempt to improve GC safety through an enhanced implementation. Not a test, so something a bit different.

That's pretty cool! I was thinking about the same thing recently and was going to try having a small stack inside the allocator class that all allocations are added to and cleared from periodically. This way, at least within a primitive scope which doesn't call other sc code, you guarantee the object exists.

It wouldn't have caught this bug though.

I might try making the allocator (when a flag is enabled) keep a record of all the alive objects and their size, this way you can ask the allocator if the object ptr is valid, how many objects are alive, etc.

That would have been useful for this bug.

@smoge
Copy link
Contributor

smoge commented Apr 23, 2024

I thought about if RAII could supplement the more inadequate parts of the GC. I don't remember if I never wrote or deleted it.

I started to read some of the messages (it's a long history to catch everything), and it seems Tim Blechmann mentioned "smart pointers", which is a similar feature of modern C++. It's more or less the same general idea.

Maybe there is one way to fix but also introduce C++ modern features that will have more support and knowledge around to fine-tune it.

All those talks about refactoring never resulted in a major refactor? Or?


EDIT:

RAII official reference: https://en.cppreference.com/w/cpp/language/raii

Interesting to see: RAII is the favorite recent feature in C++11 and later according to Bjarne Stroustrup.
https://www.youtube.com/watch?v=LlZWqkCMdfk

@JordanHendersonMusic Do you think they would supplement the GC? How do you understand that?

@JordanHendersonMusic
Copy link
Contributor Author

JordanHendersonMusic commented Apr 24, 2024

the more inadequate parts of the GC

I think the GC in principle is pretty awesome, incremental tricolour — I can't quite see what flip does, is it a to-and-from space? — but it pretty much does what you'd do today, maybe there'd be a generation system so new objects are scanned more often than old ones?

Its main issue is that lack of comments, pretty odd naming conventions, and some bad practices that only trigger under odd conditions — the bug I found with PyrGC calling its own methods in the constructor is an example of that. There are some interfaces that could be improved, e.g., passing the whole stack to each primitive is quite dangerous and would be better to wrap this in something like std::span<Slot>, that way it is harder for the primitive to destroy the stack. The temporary object problem is another obvious area for improvement, @muellmusik's pr used RAII to address this.

Smart pointers seem problematic because of reference cycles, you'd also probably want to roll your own because they are atomic counted in C++ and sc is single threaded.

Anyway, this conversation is drifting away from the PR and might be better moved to a github discussion? This PR thread should either focus on the code itself, or on how best to test it — there is a lot to unpack just there. It would also be nice to gather all previous discussions on gc in one place.

@smoge
Copy link
Contributor

smoge commented Apr 24, 2024

Anyway, this conversation is drifting away from the PR and might be better moved to a github discussion? This PR thread should either focus on the code itself, or on how best to test it — there is a lot to unpack just there. It would also be nice to gather all previous discussions on gc in one place.

I'm behind on everything GC, I will need a few days to catch up to be able to engage with details. You're right that this last message opened up a door that steps back from using GC as it is. So, I'm off-topic here, sorry. But eventually, let's discuss all this when I get more time.

@smoge
Copy link
Contributor

smoge commented Apr 24, 2024

[Some preliminary notes I did, JUST TO CONCLUDE THIS PARALLEL SPECULATION ON HERE]

Reading some discussions about whether to choose RAII or GC in an interpreter like SC, just taking note of the technical advantages, it would make more sense to implement mostly RAII, including for soft real-time applications, predictability.

Also, if in the future SCLang receives an update regarding concurrency and parallelism, RAII would also be a better choice, according to what I researched.

GC still is more idiomatic for OOP, in any case.

Some references:

  1. Hans-J. Boehm and Mike Spertus. 2009. Garbage collection in the next C++ standard. In Proceedings of the 2009 international symposium on Memory management (ISMM '09). Association for Computing Machinery, New York, NY, USA, 30–38. https://doi.org/10.1145/1542431.1542437

  2. [just to get an idea for future improvements] "C++ Concurrency in Action" by Anthony Williams
    re: advanced C++ usage, including RAII in multithreaded environments
    https://dl.acm.org/doi/abs/10.1145/1542431.1542437

@muellmusik
Copy link
Contributor

I might try making the allocator (when a flag is enabled) keep a record of all the alive objects and their size, this way you can ask the allocator if the object ptr is valid, how many objects are alive, etc.

Just to be clear, what do you mean by 'alive' here?

the bug I found with PyrGC calling its own methods in the constructor is an example of that.

Did I miss that?

Smart pointers seem problematic because of reference cycles, you'd also probably want to roll your own because they are atomic counted in C++ and sc is single threaded.

Yeah I did that in the PR. We need something very light weight and simple. The normal C++ smart pointers seemed like overkill.

passing the whole stack to each primitive is quite dangerous and would be better to wrap this in something like std::span<Slot>, that way it is harder for the primitive to destroy the stack.

In general the way the receiver and args are passed to the primitives makes it easy to get wrong. QC_LANG_PRIMITIVEdoes a better job of this. It would be nice to see it hidden behind a safer interface.

Its main issue is that lack of comments
...
I'm behind on everything GC

No worries. I've probably spent as much time mucking around in the GC as most people currently active, but not much in recent years so need to spend some more time myself. Would be good to add some comments and documentation and look at this more generally.

@JordanHendersonMusic
Copy link
Contributor Author

the bug I found with PyrGC calling its own methods in the constructor is an example of that.

Did I miss that?

I mentioned it in a comment, just submitted an issue: #6268

I might try making the allocator (when a flag is enabled) keep a record of all the alive objects and their size, this way you can ask the allocator if the object ptr is valid, how many objects are alive, etc.

Just to be clear, what do you mean by 'alive' here?

Post allocation, but before the GC marks it as unreachable. It would be a way to check the GC has done its job. I don't know if this is covered in any of the sanity check — to be honest I haven't looked that closely at them.

@muellmusik
Copy link
Contributor

Post allocation, but before the GC marks it as unreachable. It would be a way to check the GC has done its job. I don't know if this is covered in any of the sanity check — to be honest I haven't looked that closely at them.

So you mean reachable or unexamined? Forgive me but I'm still a little unclear. What mechanism can we use to determine if an object is unreachable besides the GC itself?

If you mean making sure that newly created objects aren't unused, then the pointer thing I did in #3384 does that. If you mean checking that created objects are made reachable between calls to object creation functions the count in that covers that.

Or is this addressing something else? The issues we identified earlier were:

  1. Make sure that created objects are made reachable between calls to object creation functions.
  2. Make sure that objects made reachable by storing in another object get the correct corresponding calls to GCWrite.
  3. Ensure that parent object sizes are correctly updated.
  4. Make use of GCWriteNew rather than GCWrite where appropriate.
  5. Catch any objects created which are unused. (This seems like it must be an error.)
  6. Make common errors in things like stack access less likely.

@JordanHendersonMusic
Copy link
Contributor Author

JordanHendersonMusic commented Apr 24, 2024

Or is this addressing something else?

So with this PR's initial bug, an pyrobject pointer was being incremented (actually decremented) and dereferenced.
What I was suggesting was a way to pass the pointer (before dereferencing) to the allocators, and they could confirm that the memory belonged to them. They could also tell you if the object had been released by the GC and the memory was ready to be recycled. If the object ptr does not belong to any allocator, something has gone wrong. This wouldn't be useful in release builds, but would be an additional way to debug and confirm that the GC is doing the correct thing, a prerequisite of testing, which we lack. It would also let you test the pointer arithmetic on pyrObjects.

The PR you linked wouldn't help with this, but would certainly be useful in other areas — although I think it could be called SmartPyrObject!

@smoge
Copy link
Contributor

smoge commented Apr 25, 2024

Would be good to add some comments and documentation and look at this more generally.

@muellmusik I think this could even become a new guideline. SC codebase is an extreme case of avoiding comments.

Let's take a look at the other end of this spectrum, and see how many informative comments one can find in the GHC compiler source code: https://github.com/ghc/ghc/blob/master/compiler/GHC/Core.hs

GC-related code could be a start in this respect. If agreed, those with more experience could contribute with comments as part of the next patches pushed into develop.

Scott, while you're revisiting it, it would be a great contribution to write down something.

@muellmusik
Copy link
Contributor

What I was suggesting was a way to pass the pointer (before dereferencing) to the allocators, and they could confirm that the memory belonged to them. They could also tell you if the object had been released by the GC and the memory was ready to be recycled. If the object ptr does not belong to any allocator, something has gone wrong.

Okay. So this tells us more than the segfault, in that it would catch cases that wouldn't crash in debug builds (only by luck)? That makes sense!

@JordanHendersonMusic JordanHendersonMusic removed the bug Issues that relate to unexpected/unwanted behavior. Don't use for PRs. label Apr 28, 2024
@JordanHendersonMusic
Copy link
Contributor Author

JordanHendersonMusic commented May 9, 2024

This conversation seems to have dwindled. Was there any thing I should be doing?

As far as I can see, the only real way to test this is just to put it into debug branch and see if anyone has any issues. Ideally, we'd have a proper bank of test pieces to run this on — has there ever been a conversation about community members 'donating' piece to be tests or benchmarks?

@smoge
Copy link
Contributor

smoge commented May 9, 2024

This conversation seems to have dwindled. Was there any thing I should be doing?

As far as I can see, the only real way to test this is just to put it into debug branch and see if anyone has any issues. Ideally, we'd have a proper bank of test pieces to run this on — has there ever been a conversation about community members 'donating' piece to be tests or benchmarks?

I understand that's the only test available. I just wonder why so many users and developers are not even able to build their systems for so long, which will affect this method.

I can build and run a few more hairy codes here on Linux, but many more people would be needed.

Maybe release a test/debug binary, since it seems a few people are being able to build? That would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: sclang sclang C++ implementation (primitives, etc.). for changes to class lib use "comp: class library"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation Fault Crash with Specific Large Array Sizes
3 participants