Skip to content

alugowski/task-thread-pool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

tests codecov File size in bytes License License License: MIT

task-thread-pool is a fast and lightweight thread pool for C++11 and newer.

Easily add parallelism to your project without introducing heavy dependencies.

  • Focus on correctness, ease of use, simplicity, performance.
  • Small single header file and permissive licensing means easy integration.
  • Tested on all major platforms and compilers:
    • Linux, macOS, Windows
    • GCC, LLVM/Clang, MSVC, MinGW, Emscripten
      • CI tests on GCC 7+ and LLVM 5+, should work on older
    • C++11, C++14, C++17, C++20, C++23
  • Comprehensive test suite, including stress tests.
  • Benchmarks help confirm good performance.

Usage

#include <task_thread_pool.hpp>
task_thread_pool::task_thread_pool pool; // num_threads = number of physical cores
// or
task_thread_pool::task_thread_pool pool{4}; // num_threads = 4

Submit a function, a lambda, std::packaged_task, std::function, or any Callable, and its arguments:

pool.submit_detach( [](int arg) { std::cout << arg; }, 123456 );

To track task return values (and thrown exceptions), use submit() which returns an std::future:

std::future<int> future = pool.submit([] { return 1; });

int result = future.get(); // returns 1

std::future::get() waits for the task to complete.

To wait for all tasks to complete:

pool.wait_for_tasks();

Parallel Loops and More

Use poolSTL to parallelize loops, transforms, sorts, and other standard library algorithms using this thread pool. This approach is easy to start with and also keeps your code future-proof by employing standard C++ mechanisms. It is easy to later change parallelism libraries (or start using the compiler-provided ones, once they're available to you).

For example, use std::for_each to parallelize for and for-each loops:

std::vector<int> v = {0, 1, 2, 3, 4, 5};
task_thread_pool::task_thread_pool pool;

// parallel for
using poolstl::iota_iter;

std::for_each(poolstl::par.on(pool), iota_iter<int>(0), iota_iter<int>(v.size()), [](int i) {
    std::cout << v[i];  // loop body
});

// parallel for-each
std::for_each(poolstl::par.on(pool), v.cbegin(), v.cend(), [](auto value) {
    std::cout << value;  // loop body
});

Example

#include <iostream>
// Use #include "task_thread_pool.hpp" for relative path,
// and #include <task_thread_pool.hpp> if installed in include path
#include "task_thread_pool.hpp"

int sum(int a, int b) { return a + b; }

int main() {
    // Create a thread pool. The number of threads is equal to the number of cores in the system,
    // as given by std::thread::hardware_concurrency().
    // You can also specify the number of threads, like so: pool(4),
    // or resize the thread pool later using pool.set_num_threads(4).
    task_thread_pool::task_thread_pool pool;

    //---------------------------------------------
    // Submit a task that returns a value.
    std::future<int> one_future = pool.submit([] { return 1; });
    // Use std::future::get() to wait for the task to complete and return the value.
    std::cout << "Task returned: " << one_future.get() << std::endl;

    //---------------------------------------------
    // Tasks may have arguments:
    std::future<int> sum_future = pool.submit(&sum, 1, 2);
    std::cout << "Sum = " << sum_future.get() << std::endl;

    //---------------------------------------------
    // Submit a task that we don't need to track the execution of:
    pool.submit_detach([](int arg) {
        std::cout << "The argument is: " << arg << std::endl;
    }, 42);

    //---------------------------------------------
    // Wait for all tasks to complete:
    pool.wait_for_tasks();

    //---------------------------------------------
    // The pool can be paused:
    pool.pause();

    // Submit a task that won't be started until the pool is unpaused.
    std::future<void> paused_future = pool.submit([] {
        std::cout << "Paused task executes" << std::endl;
    });

    // prints 1
    std::cout << "Number of tasks in the pool: " << pool.get_num_tasks() << std::endl;
    
    // Resume executing queued tasks.
    pool.unpause();

    // Wait for the task to finish.
    paused_future.get();

    // prints 0
    std::cout << "Number of tasks in the pool: " << pool.get_num_tasks() << std::endl;

    //---------------------------------------------
    // All queued tasks are executed before the pool is destroyed:
    pool.submit_detach([]{
        std::cout << "One last task" << std::endl;
    });

    return 0;
}

Installation

Copy

task-thread-pool is a single header file.

You may simply copy task_thread_pool.hpp into your project or your system include/.

CMake

You may use CMake to fetch directly from GitHub:

include(FetchContent)
FetchContent_Declare(
        task-thread-pool
        GIT_REPOSITORY https://github.com/alugowski/task-thread-pool
        GIT_TAG main
        GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(task-thread-pool)

target_link_libraries(YOUR_TARGET task-thread-pool::task-thread-pool)

Use GIT_TAG main to use the latest version, or replace main with a version number to pin a fixed version.

vcpkg

vcpkg install task-thread-pool

Note for Clang and GCC <9 users

Some compilers, including non-Apple Clang and GCC 8 and older, require the -lpthread linker flag to use C++11 threads. The above CMake instructions will do that automatically.

How it works

Simplicity is a major goal so this thread pool does what you'd expect. Submitted tasks are added to a queue and worker threads pull from this queue.

Care is taken that this process is efficient. The submit methods are optimized to only do what they need. Worker threads only lock the queue once per task. Excess synchronization is avoided.

That said, this simple design is best used in low contention scenarios. If you have many tiny tasks or many (10+) physical CPU cores then this single queue becomes a hotspot. In that case avoid lightweight pools like this one and use something like Threading Building Blocks. They include work-stealing executors that avoid this hotspot at the cost of extra complexity and project dependencies.

Benchmarking

We include some Google Benchmarks for some pool operations in benchmark/.

If there is an operation you care about feel free to open an issue or submit your own benchmark code.

-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
pool_create_destroy                       46318 ns        26004 ns        26489
submit_detach_packaged_task/paused:1        254 ns          244 ns      2875157
submit_detach_packaged_task/paused:0        362 ns          304 ns      2296008
submit_detach_void_lambda/paused:1          263 ns          260 ns      3072412
submit_detach_void_lambda/paused:0          418 ns          374 ns      2020779
submit_void_lambda/paused:1                 399 ns          385 ns      1942879
submit_void_lambda/paused:0                 667 ns          543 ns      1257161
submit_void_lambda_future/paused:1          391 ns          376 ns      1897255
submit_void_lambda_future/paused:0          649 ns          524 ns      1238653
submit_int_lambda_future/paused:1           395 ns          376 ns      1902789
submit_int_lambda_future/paused:0           643 ns          518 ns      1146038
run_1k_packaged_tasks                    462965 ns       362080 ns         1939
run_1k_void_lambdas                      492022 ns       411069 ns         1712
run_1k_int_lambdas                       679579 ns       533813 ns         1368