Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout used by park_timeout seems way too short, causes heavy CPU use #5

Open
hutch opened this issue Aug 3, 2016 · 13 comments
Open

Comments

@hutch
Copy link

hutch commented Aug 3, 2016

I have a small test program (implementing a component of a real application) using Bus that was idling at about 100-110% CPU utilization (1+ cores of 8), which is excessive.

When I changed the timeouts used by the two calls to park_timeout to be 1ms (Duration::from_millis(1)) rather than 1us currently in the official release, the CPU utilization under a pretty realistic but on the heavy side real-world load (not just idle, and including some computation) fell to about 3.5%.

I think this 3.5% CPU utilization is pretty impressive. Nice library you've got there :-)

@jonhoo
Copy link
Owner

jonhoo commented Aug 4, 2016

I agree. The goal is to switch to adaptive backoff spinning, but I'd prefer to not implement it myself. I'm half-waiting for mvdnes/spin-rs#29 to be fixed, and then simply use the SpinWait that library would then expose. I don't think I'd want to change the static wait time to 1ms, as that means increasing the expected wait time for very busy applications fairly significantly. If you want to give writing an exponential back-off system a go though, I'd be happy to look at a PR!

Also, thanks :)

@arthurprs
Copy link

Is this really fixable by backoff spinning? I mean, spinning burns CPU instead of properly giving away the thread time slice.

I think the proper fix would be to use some sort of CondVar or other real parking/unparking support.

@jonhoo
Copy link
Owner

jonhoo commented Aug 22, 2016

Given that there aren't any locks, CondVar won't work. This is also the reason why we need to use park_timeout instead of just park (though note that we are using park/unpark), as there are races where the other peer won't be woken up. Notice that we're not using a spin_lock_, it's more like we are implementing CondVar ourselves. Spinning initially, and then parking for increasingly long times is a pretty sensible approach (parking_lot does something similar).

jonhoo added a commit that referenced this issue Aug 29, 2016
This may fix the concerns raised in #5.
@jonhoo
Copy link
Owner

jonhoo commented Aug 29, 2016

@hutch can you give it a try with the new SpinWait now in place?

@bbigras
Copy link

bbigras commented Sep 1, 2016

I think I have the high cpu usage problem with 1.1.0.

@jonhoo
Copy link
Owner

jonhoo commented Sep 1, 2016

@BrunoQC Try 1.1.1?

@bbigras
Copy link

bbigras commented Sep 1, 2016

It seems better. The average is about 15% and it peaks at about 30%. With 1.1.0 I had peaks at about 70%.

Do you think the cpu usage could be even lower?

@jonhoo
Copy link
Owner

jonhoo commented Sep 1, 2016

Well, it depends what you're doing really. If you're sending and receiving in a tight loop, I'd expect it to hit 100%. If you're doing other compute-heavy work, I'd expect the overhead of bus to be marginal.

@xobs
Copy link

xobs commented Mar 2, 2017

Using bus 1.1.3 on a Raspberry Pi, each thread doing a recv() uses 15-35% CPU with the parking time set to 1 uS. If I set it to 1 mS, I have trouble measuring the usage because it's so low.

@jonhoo
Copy link
Owner

jonhoo commented Mar 2, 2017

@xobs I'm confused -- the default parking time in bus now is 100µs (thread::park_timeout(Duration::new(0, 100000)); 100000ns = 100µs). Your diff (xobs@c8bb287) is using 100000000ns = 100000µs = 100ms.

@xobs
Copy link

xobs commented Mar 3, 2017

@jonhoo You're right, I set it to 100ms instead of 1ms.

With the default 100µs value, a single instance of waiting in BusReader.read() burns 16% according to "top -H". If I add more BusReaders then the program's usage goes up with each instance.

The accidental high-latency in my own patch is actually fine for my purposes where, but I'll check what happens when I set it to 1mS instead of 100mS.

@xobs
Copy link

xobs commented Mar 6, 2017

With the default of 100 uS, the output of "top -H" looks like this with three listeners (in a thread called B-Hook):

  688 pi        20   0   46568  13964   5780 S 16.2  1.4   0:04.48 B-Hook
  683 pi        20   0   46568  13964   5780 S 15.8  1.4   0:04.48 B-Hook
  685 pi        20   0   46568  13964   5780 S 15.5  1.4   0:04.49 B-Hook
  677 pi        20   0    8604   3380   2872 R  1.3  0.3   0:00.44 top

There are four cores on this system, so each thread is being assigned one core. The bus is idle, so they are burning that while simply polling for data.

If I correct my patch and set the poll time to 1 mS, output looks like the following:

 1405 pi        20   0   40424   6864   5732 S  2.3  0.7   0:00.30 B-Hook
 1407 pi        20   0   40424   6864   5732 S  2.3  0.7   0:00.31 B-Hook
 1410 pi        20   0   40424   6864   5732 S  2.3  0.7   0:00.27 B-Hook
  677 pi        20   0    8604   3380   2872 R  1.0  0.3   0:03.42 top

Each instance of bus.recv() is still the busiest thing on the system, but it's much better now.

I'd like to be able to increase the value to 10 mS or even up to 100 mS, but 100 uS is definitely using too much power.

@jonhoo
Copy link
Owner

jonhoo commented Mar 6, 2017

So, the reason I'm hesitant to raise the timeout is that, as the comment above alludes to, the timeout dictates how long after a free slot opens up a thread might notice that this happened. Or, to put it differently, with a timeout of 100ms, a reader or writer may, in particular racy conditions, block on a read or write 100ms after it should have unblocked and continued. That's a long time for performance critical systems... I guess one thing to do here would be to make the timeout configurable per Bus so you'd have some control over it.. @hutch @BrunoQC thoughts about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants