Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: limit the rate of interruption checks #942

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

szhorvat
Copy link
Member

@szhorvat szhorvat commented Nov 4, 2023

This is a suggestion for improving on (if not fixing) #940.

It uses a combination of limiting the number of times the function runs (currently every 4th call) and limiting the number of checks to a millisecond rate. Both are needed, as retrieving the time can be expensive.

I tested this quite a bit on macOS, and I think the current parameters are fine. The timing check is not cheap (even though using the fastest available timing method I could find), but limiting it to every 4th call helps.

I included code for Linux and FreeBSD but I could not test either. Testing is important: we need to make sure that the timing calls are not too expensive. Can you help with this for Linux @Antonov548 @krlmlr ? Here's what needs to be done:

  • Check performance with interruption checks disabled, using command line R. Just put a return IGRAPH_SUCCESS at the beginning of the interrupt handler.
  • Check performance without the changes in this PR, using command line R.
  • Check performance with this PR, and see where performance is relative to the previous two benchmarks. It may be necessary to adjust the number of skips (currently 4) by platform. Ideally this would be 1, but if timing checks are too slow, we need to put higher numbers, like I did on macOS. I would not go higher than 10-20 under any circumstances.

Note that CLOCK_MONOTONIC_COARSE is platform-specific, and the Linux version is unrelated to the FreeBSD one. They just happen to have the same name.

If we go ahead with this approach, we need to add a Windows version as well (perhaps @Antonov548 can try?)

@szhorvat szhorvat force-pushed the fix/interruption-check-frequency branch from 3a4f145 to ec6c05c Compare November 4, 2023 23:38
@szhorvat
Copy link
Member Author

szhorvat commented Nov 4, 2023

With this PR, on macOS, I get:

RStudio:

> g<-make_ring(100000000)
> system.time(is_connected(g))
   user  system elapsed 
  1.566   0.158   1.724 

Command line:

   user  system elapsed 
  1.569   0.033   1.602 

R GUI:

   user  system elapsed 
  1.574   0.102   1.674 

This is basically the same performance as with interruption checks disabled.


Without this PR, I get:

RStudio:

   user  system elapsed 
 24.681   0.355  25.030 

Command line:

   user  system elapsed 
  2.970   0.048   3.018 

R GUI:

   user  system elapsed 
  3.741   0.158   3.899 

@szhorvat
Copy link
Member Author

szhorvat commented Nov 4, 2023

Using this PR, but not limiting timing checks to every 4th call gives me this timing:

   user  system elapsed 
  1.939   0.025   1.963 

This shows that in general timing checks are non-negligible.

@szhorvat
Copy link
Member Author

szhorvat commented Nov 5, 2023

One problem with this timing approach is that it assumes that the interruption handler would be called at least once per second, but that may not always be the case.

@szhorvat
Copy link
Member Author

szhorvat commented Nov 6, 2023

If we do go with this approach (which is something to be decided—I'm not 100% comfortable with this), the best function to use on Windows is likely QueryPerformanceCounter

@ntamas
Copy link
Member

ntamas commented Nov 7, 2023

I wonder why RStudio is so slow compared to the CLI or the R GUI. Maybe there's some inter-thread synchronization involved? (R code is probably running on its own thread but it needs to call back to the GUI thread to check whether there was an interruption, and that call may be blocking until the next iteration of the event loop of the GUI thread).

Anyhow, I'm okay with this PR and I don't think we could do any better than this given the constraints involved.

@ntamas
Copy link
Member

ntamas commented Nov 7, 2023

I'm inclined to propose throttling interruption checks to 10 msec or even slower - no one would notice the difference, but for instance the resolution of the coarse monotonic clock on Linux (Ubuntu 20.04, kernel version 5.4) is 4 msec according to a response on SO, and if the kernel happens to round upwards to the nearest 4 msec, then the rate limiting won't have any effect at all. With 10 msec we are more likely to land in a regime that is coarser than the granularity of the coarse monotonic clock.

@ntamas
Copy link
Member

ntamas commented Nov 7, 2023

Also, maybe we should move this to the C core so that all interfaces benefit from it?

@szhorvat
Copy link
Member Author

szhorvat commented Nov 7, 2023

and if the kernel happens to round upwards to the nearest 4 msec, then the rate limiting won't have any effect at all

I don't think that's true (I actually saw that response on SO). The code measures from the last interruption check until now, not from the last call to the interruption handler to now. Right now the rate is set to 1 ms. That means that the check will happen at most once per millisecond, but if the clock granularity is 4 ms, then it will happen only every 4 ms and that's perfectly fine. My point is that we don't need to set a rate that is lower than the clock resolution. Nor can we, since different platforms might have very different resolutions.

We can't possibly aim to check resolutions, and adapt. The code has to be such that it works regardless resolutions. Also, good to be aware: what clock_getres() returns is not the rate at which the timer updates. I checked this on macOS, and CLOCK_MONOTONIC_RAW_APPROX has a "resolution" of 42 ns according to getres(), but it updates much less frequently (presumably on each context switch).

What I am concerned about is:

  • If the interruption handler gets called less often than once per second (which can happen, e.g. with GLPK) then this code won't work properly anymore.
  • All this timing voodoo is extremely platform-specific and even hardware-specific (I know the behaviour is different on Apple Silicon vs Intel on macOS).
  • Calling timing functions an already be slow, and I worry that on some platforms it can be very slow. Even with the fastest option available on macOS, it has a noticeable performance impact. This is why I had to limit the rate of checks both by counting calls and timing. Calling the timer each time was too slow.

Also, maybe we should move this to the C core so that all interfaces benefit from it?

I really don't want to do that. The interruption handler in Python and Mathematica is faster than the call to the fastest timer on macOS! This code would hurt those interfacers, not benefit them. Plus consider how non-portable all this is, and how little we are able to test such timing issues on exotic hardware. The C core is very portable, and I don't want to sacrifice that.


So, all in all, I am not quite happy with this, but it's the best I could come up with given R's very slow interruption mechanism.

@krlmlr
Copy link
Contributor

krlmlr commented Nov 9, 2023

If we want to take this upstream to fix RStudio, what's a good way to replicate it?

I still propose thinning out calls to the interruption handler on a case-by-case basis. Nothing's faster than a function not being called at all. Can we somehow do this as well for the problem here? The "thinning out" by a factor of four, part of this PR, is it good for all cases? We started with 10 IIRC.

Alternatively: Could we somehow make the "thinning out" adaptive? Start with checking every x iterations, check timing, and adapt to check less (or more) frequently?

Copy link
Contributor

aviator-app bot commented Apr 13, 2024

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This pull request is currently open (not queued).

How to merge

To merge this PR, comment /aviator merge or add the mergequeue label.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@krlmlr
Copy link
Contributor

krlmlr commented Apr 13, 2024

The best solution, with zero cost (in the ideal case) for the interrupt checks, would be to run the igraph computation in a separate thread. The main thread would only be responsible for interruption checks and for memory allocation, and sleep the rest of the time.

Let's not do timers but keep this open for now as a reference. I might be able to raise this with Posit based on the original example in #942.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants