Multi-threading primitives that use a fixed thread pool #1105

matko · 2022-04-13T22:41:00Z

This pull request provides some predicates that can be used to do multi-threading in terminusdb. They are implemented using a thread pool that is initialized once at startup with an amount of threads equal to the amount of hardware threads available to terminusdb. Its purpose is to run cpu-bound work, that is, work that mostly (if not only) does calculation, and no I/O.
For truly cpu-bound workloads there is no point in having more than the amount of hardware threads, and in fact this hurts performance. So we don't want every request to potentially spawn its own number of threads causing thread count to fluctuate widely. Instead, parts of the code that need to do cpu-bound multi-threading can all go through this common thread pool.

Currently implemented:

cpu_concurrent_findall/4 takes a template, a generator goal and an action goal, generates results on the calling thread using the generator, and schedules the action goal to be run on the thread pool, collecting the results. cpu_concurrent_findall/4 is stable in its result order, including error results. Use case is schema checking where we want all witnesses.
cpu_concurrent_forall/2 takes a generator and an action goal where the generator goal generates results on the caslling thread, and the action goal is scheduled to run on the thread pool. This is a drop-in replacement for forall/2. Use case is document elaboration and insertion.
cpu_concurrent_findfirst/4 is like cpu_concurrent_findall/4, but stops at the first result and returns that, rather than collecting all results. Again, this is stable, always returning the same first result, rather than the result which happened to complete first. Use case is schema checking where we are interested in the first witness.

Planned:

cpu_generate/5: Like cpu_concurrent_findall/4 but backtracking over results rather than collecting a list. Extra argument is to specify how far ahead results should be generated. Use case is document retrieval (generate documents to return in the background while the main thread writes them to the cgi stream).
cpu_grouped_findall/5, cpu_grouped_findfirst/5 and cpu_grouped_forall/3 where an extra numerical argument is given. This argument N specifies how many elements should be generated before being handed off to a background thread, instead of processing each generated item on a background thread individually. This will likely improve performance due to less messaging overhead. However, it's a tradeoff, as this will also keep the cpu-bound threadpool busy longer, preventing other requests from getting work done, possibly leading to starvation issues.

This is a draft for now while those planned primitives are still under development.

matko · 2022-04-19T16:51:09Z

This draft has evolved a bit from its original form. I'm now implementing a task framework. The reason for doing this is that it can often happen that a task running on the threadpool itself will want to fork off other tasks, and these too need to run on the thread pool. The original implementation would not support this, as submitting work on the thread pool would block if no workers were available.
Such a task mechanism will come in handy especially for recursive algorithms, such as our document elaboration, where enough work can be done to warrant the overhead of multithreading. This is the case for example for enormous documents that consists of large lists of subdocuments.

Currently running into some weird deadlock behavior in task.pl with multiple uses of thread_wait/2, which I think is a problem in the core library. I'll be rewriting this to using a message queue to wake up the main scheduler instead.

…leanup

…heir argument list reordered to be more easy to use. Also scaling args.

matko added 4 commits April 14, 2022 00:48

Implement concurrent_findall

d3f0feb

cpu_concurrent_forall and cpu_concurrent_findfirst

79d24e5

remove resolved todo

5ac9a37

Fix test to expect error

0c699b1

matko force-pushed the multi_threading branch from 67987a8 to 0c699b1 Compare April 13, 2022 22:48

matko added 3 commits April 14, 2022 00:56

fix linting issues

a9490d6

beginnings of a task framework

e32b2f4

Implement tasks waiting for other tasks

cef147d

matko added 21 commits April 19, 2022 19:58

use message queues rather than thread_wait to communicate updates

e035cae

make wait_for_result unreified

3627eb7

nested example program a bit more

e35b9a0

perform more work on every wakeup

28aeb8b

implement task cleanup and killing

2d050e8

rewrite task queue to have an associated task which can be used for c…

af5178b

…leanup

make sure non-started tasks are properly destroyed on cleanup

3ea4eda

properly destroy engines when cleaning rather than rely on gc for that

af82501

logic for getting more results out of a backtracking task

0a84e09

ensure tasks with more results available are properly reaped

665e32d

remove debug prints

1001a56

implement waiting for more than one task at once

e2b5a3f

Merge branch 'main' into multi_threading

a7237b8

Merge branch 'main' into multi_threading

df2be05

Merge branch 'main' into multi_threading

d435a28

Merge branch 'main' into multi_threading

279f6ff

linter fix

07255cc

fix throw call

0001ddf

initial work for task message queues

1435ac2

Bunch of work into concurrent tasking

c7fb8f4

Merge branch 'main' into multi_threading

dcc57a2

matko added 4 commits April 28, 2022 14:20

generator which generates N steps ahead

995758f

configuration options for task_concurrent_goal

1d68b6b

task_concurrent_goal and task_generate take less arguments and have t…

6d09156

…heir argument list reordered to be more easy to use. Also scaling args.

typo fix

f7f16db

spl assigned matko May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threading primitives that use a fixed thread pool #1105

Multi-threading primitives that use a fixed thread pool #1105

matko commented Apr 13, 2022 •

edited

matko commented Apr 19, 2022

Multi-threading primitives that use a fixed thread pool #1105

Are you sure you want to change the base?

Multi-threading primitives that use a fixed thread pool #1105

Conversation

matko commented Apr 13, 2022 • edited

matko commented Apr 19, 2022

matko commented Apr 13, 2022 •

edited