Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization coding pass over AudioMixerSlavePool and AvatarMixerSlavePool pt. 2. #1361

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

odysseus654
Copy link
Contributor

This is an optimization pass over the mutex logic in AudioMixerSlavePool and AvatarMixerSlavePool (which is effectively the same code).

This is a continuation of the changes in #1358 , with more of a focus on contracts and invariants and effectively no use of mutexes. This should (almost by definition) have reduced lock contention, but "thread-safety through the use of code comments" may have negative side effects. I don't know which of these PRs will end up being used (perhaps a mixture of both?)

Additional changes made to this PR:

  • Removed all use of "friend" by factoring out shared variables into a private class that the slaves can see but pool users cannot
  • A number of counters have been changed from "int" to "std::atomic"
  • Much documentation of invariants and code contracts

Testing is almost identical to #1358 :

  • Smoketesting: make sure that audio and avatars still work on a server built with this (requires a server with multiple users)
  • Race conditions: exercise a server long enough to know whether there are any lockups or crashes that occur in these two areas
  • Profile: see if this improves the profiling numbers that have been described in issue High CPU usage of assignment client when idle on Linux High CPU usage of assignment client when idle on Linux #1355

@digisomni digisomni changed the title Optimization coding pass over AudioMixerSlavePool and AvatarMixerSlavePool (again!) Optimization coding pass over AudioMixerSlavePool and AvatarMixerSlavePool pt. 2. Sep 24, 2021
@digisomni digisomni added allow-build-upload Allows the upload of a build for non white-listed users domain-server Issues related to the domain server in general enhancement New feature or request threading/deadlock (beware) needs CR (code review) labels Sep 24, 2021
@odysseus654
Copy link
Contributor Author

Hey, if anyone wants to use std::shared_mutex, there's a polyfill in the commits (that was later yanked out) ;-)

- changed "slave" to "worker" when dealing with classes in this file
- moved the "data" struct to be a private member instead of a base class
@odysseus654
Copy link
Contributor Author

Made suggested changes to this PR:

  • changed "slave" to "worker" when dealing with classes in this file
  • moved the "data" struct to be a private member instead of a base class

Attempted to factor out common code to a template base class but... can't do that and still use Qt

@digisomni digisomni added this to In progress in 2022.1.0 Selene Release via automation Sep 24, 2021
@digisomni digisomni added this to the 2021.2.0 Selene Release milestone Sep 24, 2021
{
Lock poolLock(_data.poolMutex);
if (_data.numFinished < _data.numThreads) {
_data.poolCondition.wait(poolLock, [&] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm…is it possible that all the threads will finish and notify between line 188 above and here? so this wait would never finish?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I thought about that this morning. The most recent commit now has all "condition.notify" calls holding the corresponding lock when calling it (thus forcing the notify to be before line 192 or after line 194) and the "if" statement wrapping all "condition.wait" calls bypasses it if the conditions for it to be notified already exist.

@odysseus654
Copy link
Contributor Author

Added commit: renamed _pool to _data as suggested.

2022.1.0 Selene Release automation moved this from In progress to Reviewer approved Sep 25, 2021
Copy link
Collaborator

@HifiExperiments HifiExperiments left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, definitely want to stress test this with a bunch of people on a server and do another perf analysis to see if this made a difference

@digisomni digisomni removed this from Reviewer approved in 2022.1.0 Selene Release Oct 30, 2021
@digisomni digisomni added this to In progress in 2022.1.1 Selene Release via automation Oct 30, 2021
@odysseus654
Copy link
Contributor Author

I managed to piece together a linux debugging environment and debugged it enough to no longer assert (note to self: deleting assert commands is a great way to remove those pesky asserts).

Anyhow, really hoping this is the last patch necessary in order to successfully smoketest this. Also hoping to have a chance to flatten these test commits (there's an awful lot of them) although I understand if that doesn't happen.

@ksuprynowicz
Copy link
Contributor

I just gave it a try, and nothing is crashing right now, I'm able to connect, but only audio mixer starts. Other types of assignment clients don't start.

@ksuprynowicz
Copy link
Contributor

I'm sorry. I ran assignment server with wrong command line parameters. Everything seems to work correctly now, but CPU usage is still very high even when no client is connected.

@odysseus654
Copy link
Contributor Author

Heh, finally got it! So I guess the question is what the profiler is saying about this now.

@ksuprynowicz
Copy link
Contributor

Congratulations!
I will do a profiler run today when I finish work and see what is shows :)

@ksuprynowicz
Copy link
Contributor

CPU use seems to be significantly lower now compared to master when built with the same compiler flags - it drops almost 2 times when no user is logged in.
Here are the profiling results for audio mixer:

-   83.79%     0.55%  AudioMixerWorke  [kernel.kallsyms]            [k] entry_SYSCALL_64_after_hwframe
   - 83.24% entry_SYSCALL_64_after_hwframe
      - do_syscall_64
         - 81.87% __x64_sys_futex
            - do_futex
               - 60.03% futex_wait
                  - 44.78% futex_wait_queue_me
                     - 43.66% schedule
                        - __schedule
                           - 27.91% pick_next_task_fair
                              - 27.66% newidle_balance
                                 - 24.60% load_balance
                                    - 16.37% find_busiest_group
                                       - update_sd_lb_stats.constprop.0
                                          - 2.43% cpumask_next_and
                                               _find_next_bit
                                            1.51% idle_cpu
                                    + 5.40% raw_spin_rq_lock_nested
                                   1.18% update_blocked_averages
                           - 6.39% finish_task_switch.isra.0
                              - 5.11% __perf_event_task_sched_in
                                 + 3.33% ctx_sched_in
                                 + 0.59% perf_event_sched_in
                           + 3.25% dequeue_task_fair
                           + 2.46% psi_task_switch
                           + 1.04% __perf_event_task_sched_out
                       0.74% plist_add
                  + 14.39% futex_wait_setup
               + 20.80% futex_wake
         + 1.04% syscall_exit_to_user_mode

And for avatar mixer:

-   82.22%     0.52%  AvatarMixerWork  [kernel.kallsyms]        [k] entry_SYSCALL_64_after_hwframe
   - 81.69% entry_SYSCALL_64_after_hwframe
      - do_syscall_64
         - 80.28% __x64_sys_futex
            - do_futex
               - 55.44% futex_wait
                  + 36.85% futex_wait_queue_me
                  + 17.69% futex_wait_setup
               - 23.86% futex_wake
                  + 16.12% _raw_spin_lock
                  + 5.26% wake_up_q
                  + 0.96% mark_wake_futex
         + 1.01% syscall_exit_to_user_mode

@ksuprynowicz
Copy link
Contributor

Audio mixer uses about 32% of a core right now when no one is logged in, and avatar mixer uses about 15% with compiler flags as in PR 1424. I suggest pulling changes from master to make comparisons easier.

@ksuprynowicz
Copy link
Contributor

The weird thing is that high CPU usage disappears completely on PR 1429. I have no idea why.

@ksuprynowicz
Copy link
Contributor

We tested it with Kalila and PR 1429 indeed seems to solve high CPU usage completely. It dropped to 3% for audio mixer and 3% for avatar mixer when nobody is logged in.

@odysseus654
Copy link
Contributor Author

Okay kewl! I'm gonna want to flatten all those commits I made while testing this (because ew), but other than that it sounds like we've cleared the "at least as good as master" bar.

@odysseus654
Copy link
Contributor Author

Okay I looked at the history and... it may not be as bad as I thought it was. I'll squish it if it makes sense to others, otherwise I'll say this is good to merge.

@HifiExperiments
Copy link
Collaborator

sorry if I’m missing it in the above comments: did this PR have a measurable difference then? if this isn’t an improvement, maybe we don’t need to merge it?

@ksuprynowicz
Copy link
Contributor

sorry if I’m missing it in the above comments: did this PR have a measurable difference then? if this isn’t an improvement, maybe we don’t need to merge it?

With current version it does have a measurable improvement, but I suggest to wait with testing it until PR 1429 is merged. I will benchmark it again then and see if it also improves things or not.

@digisomni
Copy link
Member

Needs another QA pass to confirm whether or not it performs better than current master.

@digisomni digisomni requested review from namark and removed request for HifiExperiments February 2, 2022 07:52
@namark
Copy link
Contributor

namark commented Feb 4, 2022

I tried to test audio mixer locally, just idling as suggested, and couldn't really find any significant performance difference between this and the master branch. Here is what the CPU utilization looks like for me:
plot1
plot2

I tried a profiler as well, and it does seem that this version spends a bit less time in the run and wait functions of the two classes, but it's just by a few percent and not very consistent between runs, so nothing conclusive there either.

Since the high CPU usage when idle is fixed now on master, I guess it would make more sense to try some sort of a stress test, but I don't fully understand the changes so not sure what would be a good way to do it. I could maybe try using the web sdk to connect a bunch of dummy clients and send some random audio samples, if that's something we expect to be improved here.

@stale
Copy link

stale bot commented Aug 4, 2022

Hello! Is this still an issue?

@stale stale bot added the stale Issue / PR has not had activity label Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
allow-build-upload Allows the upload of a build for non white-listed users domain-server Issues related to the domain server in general enhancement New feature or request needs CR (code review) needs testing (QA) The PR is ready for testing stale Issue / PR has not had activity threading/deadlock (beware)
Projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants