Parallelisation for functors #3596

DavidBauer1984 · 2024-02-11T20:34:45Z

This PR adds the functionality to process functors in parallel and enables parallelisation for a few functors where this is straightforward. It should be seen as a starting point to demonstrate how parallelisation can be exploited. To parallelise a significant part of the layout requires a lot more work.

A bit of profiling shows that we actually achieve mild runtime improvements by processing in parallel.
For a medium sized piece (10 pages):

Number of threads	ResetAligner runtime [ms]	LayoutVertically runtime [ms]
1	52	121
2	36	101
4	38	100
8	35	103

For a large piece (266 pages):

Number of threads	ResetAligner runtime [ms]	LayoutVertically runtime [ms]
1	845	1560
2	635	1400
4	657	1230
8	651	1430

Further remarks:

Parallelisation is switched on by setting the new --max-threads option.
It is enabled on a functor by overriding the methods GetProcessingStrategy and CloneFunctor. The first one decides whether concurrent processing happens on measures (horizontal layout) or systems (vertical layout). The second one clones the functor before parallel processing and is quite similar to Object::Clone.
The method MergeFunctor can be overridden and is designed to merge the results of the calculation on several functors into one functor again (after parallel processing). One example where this makes sense is the m_crossStaffSlurs flag in AdjustSlurs.
It seems tempting to simply add parallelisation to all functors by adding the appropriate methods. However, several functors cannot be parallelised in a straightforward way. One obstacle is if the functor keeps information that lives outside measures or systems (e.g., overriding VisitScore or VisitPage) or transports information from one measure/system to the next (e.g., PrepareTimeSpanning or GenerateMIDI). Another obstacle is if the functor changes global information (e.g. in the document). This might lead to undefined behaviour and requires mutex protection. One needs to carefully check which data is changed by the functor before adding parallelisation.
The entire processing code in Object is currently duplicated for const and non-const functors. Not sure if we want to simplify this with templates.

craigsapp · 2024-02-11T20:59:33Z

Can parallelization be utilized in the Emscripten compiling process to the javascript version of verovio?

It is interesting that LayoutVertically runtime time goes up from 4 to 8 threads.

DavidBauer1984 · 2024-02-11T21:04:04Z

It is interesting that LayoutVertically runtime time goes up from 4 to 8 threads.

Yes, maybe this is just my machine. It also depends on which other software is busy at the moment (I had Xcode and the profiler open).

ahankinson · 2024-02-12T05:10:57Z

At a certain point thread overhead overtakes work done. That’s a pretty normal phenomenon.

is this true parallelisation or is it concurrency? JS has concurrency now(async/await) even though it’s still single threaded.

DavidBauer1984 · 2024-02-12T10:40:33Z

It is true parallelisation, i.e. multi-threaded. Async/await like suspension corresponds to coroutines in C++ which is something different.

DavidBauer1984 · 2024-04-09T12:29:42Z

@lpugin Do we want to proceed with this? The effect on total runtime is marginal at the moment. I guess we can currently achieve greater performance improvements by focusing on that the (single threaded) implementation is as smooth as possible (e.g. in the line of #3587 or #3630). Maybe just keep this as an idea.

lpugin · 2024-04-19T08:30:03Z

@DavidBauer1984 sorry for the delay on the follow up. I wanted to test the different bindings, and as far as I can see it does work as expected. I still want to investigate more precisely the impact (e.g., size) for the JS one.

As you pointed out, the performance improvement remains small. One follow-up refactoring might be to turn the drawing process into a functor. That way we could enable the parallelization for some bounding-box drawing steps, which might lead to some additional gain. However, this would be a quite significant refactoring and might yield some memory access problems to deal with. What do you think?

DavidBauer1984 · 2024-04-23T20:32:19Z

@lpugin I think the first step should be to move the beam and (initial) slur calculation from the drawing process into functors. This includes any other layout calculation that is done during drawing. Those calculations are currently done repeatedly and maybe it would be sufficient to do them just once or twice. It would also lead to a cleaner code structure.
Anyway, I can't find the time for another big refactoring project (like the one for functors) at the moment. But sure take your time to investigate the PR. From my point this is something like a proof of concept for parallelisation. It demonstrates that it works and has potential to improve performance, but we still have to figure out how to utilise this in an optimal way.

lpugin · 2024-04-29T15:16:15Z

Yes, moving some layout calculation steps would already enable some parallelization for some jobs and would be good. I think I would like to keep the PR as draft so we can keep an eye on it and keep it "mergeable".

DavidBauer1984 added 14 commits February 11, 2024 18:14

Add processing strategy

c7ae6b8

Set max number of threads via option

9a5b69c

Clone and merge functors

b29e8e3

Collect parallel processed objects

1cb5834

Introduce ProcessChildren method

6bec8cf

Process objects in parallel

0ce9a87

Add missing call

cd728e7

Fix functor cloning

c3d3478

Parallelize the CalcStemFunctor

ef8be15

Parallelize the CalcDotsFunctor

e0ccd40

Parallelize ResetFunctor

200a4d3

Parallelize CalcBBoxOverflows

12dd3ab

Parallelize AdjustSlurs

d93a143

Call base method

ab6c225

lpugin marked this pull request as draft April 29, 2024 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelisation for functors #3596

Parallelisation for functors #3596

DavidBauer1984 commented Feb 11, 2024 •

edited

craigsapp commented Feb 11, 2024

DavidBauer1984 commented Feb 11, 2024

ahankinson commented Feb 12, 2024

DavidBauer1984 commented Feb 12, 2024

DavidBauer1984 commented Apr 9, 2024

lpugin commented Apr 19, 2024

DavidBauer1984 commented Apr 23, 2024

lpugin commented Apr 29, 2024

Parallelisation for functors #3596

Are you sure you want to change the base?

Parallelisation for functors #3596

Conversation

DavidBauer1984 commented Feb 11, 2024 • edited

craigsapp commented Feb 11, 2024

DavidBauer1984 commented Feb 11, 2024

ahankinson commented Feb 12, 2024

DavidBauer1984 commented Feb 12, 2024

DavidBauer1984 commented Apr 9, 2024

lpugin commented Apr 19, 2024

DavidBauer1984 commented Apr 23, 2024

lpugin commented Apr 29, 2024

DavidBauer1984 commented Feb 11, 2024 •

edited