Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded rubberband #13143

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

acolombier
Copy link
Contributor

@acolombier acolombier commented Apr 20, 2024

Slightly outdated info

This is a naive attempt to parallelise rubberband processing as per suggested here.

(Test using 44100Hz/11.6ms output settings)

So far the result are very promising! previously, audio drop would occur as soon as start playing a second stem deck, when using powersave governor, and third stem deck when using performance, with both R2 and R3.

Now I can play 4 stem decks in R3 in powersave, with near no audio drop, but it increase when I increase pitches on decks. No issue in performance, although the indicator is very high

In R2 there is no audio drop at all, and CPU indicator remains in the green, even in powersave!

$ nproc --all
20
$ cat /proc/cpuinfo
...
model name	: 12th Gen Intel(R) Core(TM) i7-12700H
cpu MHz (max): 2200 in powersave, 4100 in performance

Adds a worker pool for RubberBand. This pool should be used a single threaded environment (engine thread) and will have to be reviewed if the engine is moved to a multi thread design. This decision was made since such a refactor will likely invalidate the need for such a task specific pool.

@JoergAtGithub
Copy link
Member

JoergAtGithub commented Apr 20, 2024

This looks indeed very promissing! I can now run 2 stem decks in parallel and it worked stable.

But the thread utilization is not optimal yet. Here you see 3 stem decks running in parallel (green is calculation time, red is syncronisation betwee threads (waiting for a mutex etc.)):
grafik

Here you see one deck playing a normal stereo track, and two stem decks (main engine thread is on top):
grafik

@acolombier
Copy link
Contributor Author

I assume you would like to see the deck computing in parallel, right?

I'm afraid that won't only require a refactor of the RubberBand, but rather a refactor of the audio engine which processes the channel sequentially.

Also, note that simple deck will not use the threaded version and will perform RubberBand in the calling thread

Or did you mean something else?

@acolombier
Copy link
Contributor Author

Note that this is only a PoC, and production ready solution could include to make a global pool of thread, instead of owned thread per channel (meaning a max of 4 thread, rather than the current 4(deck)x4(stem))

@acolombier acolombier changed the title Multithreaded rubberband [PoC] Multithreaded rubberband Apr 20, 2024
@JoergAtGithub
Copy link
Member

I assume you would like to see the deck computing in parallel, right?

Everything in parallel is too much, at least for my laptop. A thread stealing pool with e.g. 4 workers seems more appropriated.

I'm afraid that won't only require a refactor of the RubberBand, but rather a refactor of the audio engine which processes the channel sequentially.

This would be of cause beyond the scope of the stems project, but from the pure technical viewpoint, I think this is the way to go. There can be many running decks, if we think about the use of samplers.

@acolombier
Copy link
Contributor Author

With the amount of in-flight PR for the STEM feature, I will pause that one for now. I guess that proves that we have some quick wins to action in order to make STEM more usable.
Once we have some of these out of the way (at least till #13086 as this is based on it), I will look in implementing this properly if that's okay with you.

@JoergAtGithub JoergAtGithub added this to the 2.6-beta milestone May 8, 2024
@acolombier
Copy link
Contributor Author

I think this feature should be ready for review and testing. I believe the thread usage should now be optimal. I have tested parameters and the current one seems to be the best, slightly better than my initial commit.

@acolombier acolombier force-pushed the feat/multithreaded-rubberband branch from 470a227 to c3f57ee Compare May 13, 2024 20:54
@acolombier acolombier marked this pull request as ready for review May 13, 2024 20:54
@acolombier acolombier changed the title [PoC] Multithreaded rubberband Multithreaded rubberband May 13, 2024
@acolombier acolombier force-pushed the feat/multithreaded-rubberband branch from c3f57ee to 94d3f99 Compare May 13, 2024 21:05
@JoergAtGithub

This comment was marked as resolved.

@daschuer
Copy link
Member

The last commit contains a lot of useful refactoring.
Unfortunately it makes it hard to review the essence of multi threading.

Can you please create a separate PR with these changes. I expect that we can merge them quickly.

@acolombier
Copy link
Contributor Author

No provider(s) registered for file extension "stem.mp4"

That's very odd! Just to double check, did you use STEM=ON?

Can you please create a separate PR with these changes. I expect that we can merge them quickly.

Yes, no problem, I will remove the STEM changes, only dependency is the stem channel count constant, so I will hardcore it for now and put a TODO

@daschuer
Copy link
Member

This solution unfortunately involves a priority inversion.
The critical high prio real time thread waits for the lower prio workers, which the OS can freely reassign to other tasks. The whole engine is on the risk.

In the best case you gain a better CPU utilisation of otherwise idling cores. This comes with the cost of lost deterministic behavior which is more important IMHO.

A better working solution would be a buffer swap algorithm, that just swaps the prepared buffers from the engine thread and does the preparation in the low priority workers. The workers will have a whole cycle to finish the task which may be finished even in critical time situations, if not, only one stem is effected.

This may doubles the latency, but likely allows to compensate that by halving the minimum possible latency.

@daschuer
Copy link
Member

Did you consider the poor man's solution of mixing only two buffers of temps before doing sound stretching?

@acolombier
Copy link
Contributor Author

Can you move the refactoring of the stereo route into a separate PR or at least commit with a reasonable commit message?

I have removed all impact on the stereo channel, so they will proceed in the same way than previously, and scale directly in the engine thread. Happy to split in multiple commit, but not sure I can see a clear cut in the PR, could you highlights the bits that should live as a dedicated commit?

@daschuer
Copy link
Member

My idea was to have the introduction of the RubberbandWrapper in a first commit and the introduction of the threading in an other.
Does that make sense?

It would be also nice to have enginebufferscalerubberbandthread.h split in three files one per class.

m_pRubberBand needs to become m_rubberBand because it s no longer just a pointer.

@acolombier acolombier force-pushed the feat/multithreaded-rubberband branch 3 times, most recently from b179ec7 to 8cad156 Compare May 19, 2024 14:48
@acolombier
Copy link
Contributor Author

@daschuer I have addressed your request

@JoergAtGithub I found out why the mono split wasn't working. I know you mentioned that had some difficulties having 4 decks on R3 with your setup. If you can, could you try this version and edit this const and set to mono? If you see improvement, I'd be happy to slightly extend this PR to add an option in preference to distribute stereo scaling in two mono jobs
image

I think last thing we are missing is the use of QThread::idealThreadCount to adjust the pool size to the best size depending of the hardware.
Also, are we happy with not introducing Runnable QThreadPool at that stage yet?

@JoergAtGithub
Copy link
Member

you mentioned that had some difficulties having 4 decks on R3 with your setup. If you can, could you try this version and edit this const and set to mono?

I tested this with mono() and stereo() and mono() is significiantly better! With stereo() the 4 decks with Rubberband R3 sound awful due to the large amount of buffer underruns, with mono() I had only 3 buffer underruns at the beginning after loading the tracks.

@JoergAtGithub
Copy link
Member

I think last thing we are missing is the use of QThread::idealThreadCount to adjust the pool size to the best size depending of the hardware.

I recommend to keep this out of this PR, because this function returns just the number of logical processors, and we have to consider also other threads and processes. This is a topic on it's own.

@JoergAtGithub
Copy link
Member

Also, are we happy with not introducing Runnable QThreadPool at that stage yet?

@daschuer Do you aggree with that?

@acolombier
Copy link
Contributor Author

I tested this with mono() and stereo() and mono() is significiantly better! With stereo() the 4 decks with Rubberband R3 sound awful due to the large amount of buffer underruns, with mono() I had only 3 buffer underruns at the beginning after loading the tracks.

Glad to hear! @daschuer what do you think of making this accessible in the options? I know we said we don't want to change the rubberband for stereo by default, but it looks like this could help on some setups

@daschuer
Copy link
Member

Yes sure. I am however unsure what this means to stereo compatibility. I have just skimmed through the Rubberband Documentation but I have not found anything yet. On the other and I remember a discussion.
Maybe we can just compare the output of both options. If they are bit perfect equal mono can become our new default.

@daschuer
Copy link
Member

It is the ability that the stereo signal can be later still mixed down to a mono singnal without that frequencies are killed out.

Just found this in the 3.2 CHANGELOG:
* Fix inability of the R3 engine to produce mono-compatible output

@Swiftb0y
Copy link
Member

See #11361

@JoergAtGithub
Copy link
Member

I fear that the problem with the mono downmix functionality could catch us up, when we introduce stem mixing in the engine. After all, we're doing nothing other than mixing 4 independent rubberband channels together. But the OptionChannelsTogether option only seems to work for a single stereo channel pair.

@acolombier
Copy link
Contributor Author

I don't this would be a problem for stem: In principle, stem is only like playing different track on different decks, unlike channels of a stereo set.

If they are bit perfect equal mono can become our new default.

By the sound of it, there is too many risk going with this as default. Should I still put the option in the setting, with perhaps a warning that references #11361?

@JoergAtGithub
Copy link
Member

An option would be great, as only some users will be affected by the Mono downmixing issue. But I wonder how to label it, to make the behavior clear to the user.

@acolombier
Copy link
Contributor Author

Perhaps I could put a ⚠️ next to it with a tooltip explaining the risk?

@acolombier
Copy link
Contributor Author

Good enough? I guess the message will need to be slightly adjusted but just wondering about the UX.

Kooha-2024-05-22-16-07-16.mp4

I tried to make the setting adjustable live but it would require more synchronization which may not be a very good trade-off.

@JoergAtGithub
Copy link
Member

I tested this locally on Windows 11 and the new UI setting works great!

@JoergAtGithub
Copy link
Member

There are CI fails now!

@acolombier
Copy link
Contributor Author

Yep, the tests are broken. Just waiting to see what @daschuer and @Swiftb0y think of it! :)
Will revert otherwise

Copy link
Member

@Swiftb0y Swiftb0y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was pending for a long time. Idk when I'll find time/energy to properly review this.

src/engine/bufferscalers/enginebufferscalerubberband.cpp Outdated Show resolved Hide resolved
@acolombier acolombier force-pushed the feat/multithreaded-rubberband branch from 3475899 to 5a1caa1 Compare May 24, 2024 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants