GSoC 2017 Project Ideas

Table of Contents:

Introduction
Requirements
Vanguard Project Ideas (New or Revamped Ideas for GSOC 2017)

HPX Backend for OpenCV
Applying Machine Learning Techniques on HPX Parallel Algorithms
Stack Overflow Detection on Linux
Create Generic Histogram Performance Counter
Concurrent Data Structure Support
Add More Arithmetic Performance Counters
Re-implement hpx::util::unwrapped
Work on Parallel Algorithms for HPX
Add Vectorization to the par_unseq Implementations of the Parallel Algorithms
Dynamic Load Balancing Using HPX Migration for LibGeoDecomp and STORM Project
HPXCL - Asynchronous Integration of CUDA and OpenCL to HPX
Graphical and Terminal User Interface for Scimitar
SYCL backend for HPX Compute

Classic Project Ideas (Legacy Projects That Still Need Doing)

Extension/Evaluation of LibGeoDecomp::Region As An Alternative Adjacency Container to Boost Graph Library
Add Mask Move/Assign Wrappers for Vectorization Intrinsics
Newtonian Physics Sandbox
Implement Your Favorite Parcelport Backend
Implement A Faster Associative Container for GIDs
HPX in the Cloud
Hybrid Cloud-Batch-Scheduling
Port HPX to iOS
Augment CSV Files
Create A Parcelport Based on WebSockets
Script Language Bindings
All to All Communications
Distributed Component Placement
A C++ Runtime Replacement
Resumable Function Implementation
Coroutine-like Interface
Bug Hunter
Port Graph500 to HPX
Port Mantevo MiniApps to HPX
Create an HPX Communicator for Trilinos Project
Project Template

Introduction

Welcome to the HPX home page for Google Summer of Code (GSoC). This page provides information about student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started writing with HPX. This page is also used to collect project ideas for the Google Summer of Code 2017. The STE||AR Group will apply as an organization and our goal is to get at least two students funded.

We are looking to fund work on a number of different kinds of proposals (for more details about concrete project ideas, see below):

Extensions to existing library features,
New distributed data structures and algorithms
Multiple competing proposals for the same project

Requirements

Students must submit a proposal. A template for the proposal can be found here. Hints for writing a good proposal can be found here.

We strongly suggest that students interested in developing a proposal for HPX discuss their ideas on the mailing list in order to help refine the requirements and goals. Students who actively discuss projects on the mailing list are also ranked before those that do not.

If the descriptions of these projects seem a little vague... Well, that is intentional. We are looking for students to develop requirements for their proposals by doing initial background research on the topic, and interacting with the community on the HPX mailing list ( hpx-users ) to help identify expectations.

Vanguard Project Ideas

HPX Backend for OpenCV
Applying Machine Learning Techniques on HPX Parallel Algorithms
Stack Overflow Detection on Linux
Create Generic Histogram Performance Counter
Concurrent Data Structure Support
More Arithmetic Performance Counters
Re-Implementation of hpx::util::unwrapped
Parallel Algorithms for HPX
Vectorizing par_unseq Implementations of Parallel Algorithms
Dynamic Load balancing using HPX migration for LibGeoDecomp and STORM Project
HPXCL - Asynchronous integration of CUDA and OpenCL to HPX
SYCL backend for HPX Compute

Classic Project Ideas

Extension/Evaluation of LibGeoDecomp::Region as an alternative adjacency container for Boost Graph Library
Add Mask Move/Assign Wrappers for Vectorization Intrinsics
Newtonian Physics Sandbox
Implement Your Favorite Parcelport Backend
Implement a Faster Associative Container for GIDs
HPX in the Cloud
Hybrid Cloud-Batch-Scheduling
Port HPX to iOS
Augment CSV Files
Create A Parcelport Based on WebSockets
Script Language Bindings
All to All Communications
Distributed Component Placement
A C++ Runtime Replacement
Resumable Function Implementation
Coroutine like Interface
Bug Hunter
Port Graph500 to HPX
Port Mantevo MiniApps to HPX
Create an HPX Communicator for Trilinos project
Project Template

HPX Backend for OpenCV

Abstract: The Image processing toolbox OpenCV supports multithreading via TBB or OpenMP (see here), but not HPX. The aim of this project is to develop an HPX threading backend for OpenCV with the same API as the OpenMP class. Such that all parallel algorithms can be easily used within HPX. The existing algorithm should be tested with small examples. As a test case a simple Qt based application that displays a webcam captured image and applies an OpenCV filter to it should be provided.
Difficulty: Medium
Expected result: A demo application that runs OpenCV on HPX threads
Knowledge Prerequisite: C++, STL
Mentor: John Biddiscombe () and Patrick Diehl ()

Applying Machine Learning Techniques on HPX Parallel Algorithms

Abstract: Runtime information is often speculative, solely relying on it doesn’t guarantee maximizing parallelization performance. In general, the parallelization performance of an application depends on the values measured at runtime and the related transformations such as loop skewing and loop scheduling performed at compile time. Collecting the outcome of the static analysis performed by the compiler could significantly improve the runtime performance. These information are needed to be analyzed for choosing optimizations to be implemented with runtime system. Manually tuning the parameters becomes ineffective and almost impossible when there are too many features are given to the program. Hence, we rely on specific machine learning algorithms to predict the optimization method. These algorithms has been studied to automate this process, which is able to choose the optimum decision by considering different and non-linear static information. There are many existing works on automatically tuning compiler optimizations based on static information extracted at compile time. However, none of them consider both static information obtained at compile time and dynamic information obtained at runtime. Also, most of them require user to compile their application twice, first compilation for gathering information and the second one for recompiling the application based on the extracted data. The goal of this project is to optimize HPX performance by considering both static and dynamic information and to develop a technique for avoiding application from needing an extra compilation. We believe that this optimization outperforms the current HPX performance. The aim of the project is to a) studying on different machine learning algorithms as well as developing them with HPX, which not only be scalable while analyzing big data, but also could be able to learn an efficient model based on both static and dynamic information extracted from an application, and b) modifying different parallel HPX algorithms, such as for_each and transform, to be able to implement that learning model for predicting their parameters efficiently: such as controlling chunk sizes, determining prefetching distances and determining the execution policies.
Difficulty: Medium/Hard
Expected result: Analyzing performance of the implemented ML algorithms in HPX and studying their results on different benchmarks.
Knowledge Prerequisite: C++, STL
Mentor: Zahra Khatami (), Hartmut Kaiser () and Lukas Troska ()

Concurrent Data structure Support

Abstract: STL containers such as vectors/maps/sets/etc are not thread safe. One cannot safely add or remove elements from one of these containers in one thread, whilst iterating or adding/removing in another thread without potentially catastrophic consequences (usually segmentation faults leading to eventual program failure). Some work has begun on implementing concurrent structures in HPX, a concurrent unordered map with reader/writer lock and a partial implementation of concurrent vector exist, but they have not all been completed, do not have unit tests and need to be unified into an hpx::concurrent namespace. A number of libraries implementing thread safe (sometimes lockfree) containers already exist that can be used for ideas and where code uses a boost compatible license can be integrated into HPX. The aim of the project is to collect as much information and as many implementations of threads safe containers and create or integrate them into the HPX library.
Difficulty: Medium/Hard
Expected result: A contribution of an hpx::concurrent namespace including as many STL compatible containers (and/or helper structures) as possible, with unit testing and examples that use them.
Knowledge Prerequisite: Thread safe programming.
Mentor: John Biddiscombe () and Hartmut Kaiser ()
See issue #2235 on HPX bug tracker

Create Generic Histogram Performance Counter

Abstract: HPX supports performance counters that return a set of values for each invocation. We have used this to implement performance counters collecting histograms for various characteristics related to parcel coalescing (such as the histogram of the time intervals between parcels). The idea of this project is to create a general purpose performance counter which collects the value of any other given performance at given time intervals and calculates a histogram for those values. This project could be combined with Add more arithmetic performance counters.
Difficulty: Medium
Expected result: Implement a functioning performance counter which returns the histogram for any other given performance counter as collected at given time intervals.
Knowledge Prerequisite: Minimal knowledge of statistical analysis is required.
Mentor: Hartmut Kaiser ()
See issue #2237 on HPX bug tracker

Add More Arithmetic Performance Counters

Abstract: HPX already supports performance counters that can be used to dynamically calculate the result of addition, subtraction, multiplication, and division of values gathered from a set of other given performance counters. The idea of this project is to create more performance counters which are very similar to the existing ones, except that those calculate various other statistical results, such as minimum/maximum, mean, and median value (more are possible). This project could be combined with Create generic histogram performance counter.
Difficulty: Easy/Medium
Expected result: Implement a set of functioning performance counters which return the result of various statistical operations for a set of other given performance counters.
Knowledge Prerequisite: Minimal knowledge of statistical analysis is required.
Mentor: Hartmut Kaiser ()
See issue #2455 on HPX bug tracker

Re-implement `hpx::util::unwrapped`

Abstract: Our helper facility hpx::util::unwrapped exposes a couple of undesired problems (see for instance the tickets #1404, #1400), and #1126. A (partial) re-implementation/refactoring of this facility will help solving these issues.
Difficulty: Medium/Hard
Expected result: A new implementation of hpx::util::unwrapped which does not expose the problems described and a set of tests verifying it's functionality.
Knowledge Prerequisite: This project requires sufficient knowledge in using C++ templates and might also require some meta-template programming experience.
Mentor: Hartmut Kaiser ()
See issue #2456 on HPX bug tracker

Work on Parallel Algorithms for HPX

Abstract: N4409 is a C++ standardization proposal which has been voted into the new C++ 17 standard. It provides an abstraction of C++ parallel algorithms well aligned with the existing STL algorithms. HPX sorely misses an implementation of some of those algorithms. This project should implement some of the missing algorithms on top of HPX. Some work in this direction has already been finished (see #1141 for our progress). While there is still some work to be done for #1141, we're especially interested in extending those parallel algorithms to work with distributed data structures, like hpx::partitioned_vector. Some work has been done in this direction as well (see #1338) but we look for interested students to continue implementing the distributed parallel algorithms.
Difficulty: Medium/Hard
Expected result: Implementations of various parallel and distributed algorithms for HPX
Knowledge Prerequisite: C++, STL
Mentor: Hartmut Kaiser () and Thomas Heller ()
See #1141 and #1338

Add Vectorization to `par_unseq` Implementations of Parallel Algorithms

Abstract: Our parallel algorithms currently don't support the par_unseq execution policy. This project is centered around the idea to implement this execution policy for at least some of the existing algorithms (such as for_each and similar).
Difficulty: Medium/Hard
Expected result: The result should be functioning parallel algorithms when used with the par_unseq execution policy. The loop body should end up being vectorized.
Knowledge Prerequisite: Vectorization, parallel algorithms.
Mentor: Bryce Lelbach ()
See issue #2271 on HPX bug tracker

Adding Lustre Backend to hpxio

Abstract: hpxio is a side project to HPX which tries to uses HPX’s facilities to achieve asynchronous I/O on top of POSIX libraries as well as OrangeFS distributed file system. Lustre is parallel distributed file system that is used in many clusters. Adding a Lustre backend to hpxio will be a great addition hpxio since many clusters already use Lustre file system.
Difficulty: Easy/Medium
Expected result: hpxio library will be able to lustre file system as backend.
Knowledge Prerequisite: Good C++ knowledge
Mentor: Alireza Kheirkhahan (), Hartmut Kaiser ()

Dynamic Load Balancing Using HPX Migration for LibGeoDecomp and STORM Project

Abstract: The STORM project is a collaborative effort to use HPX and LibGeoDecomp to enable the hurricane storm surge forecasting application ADCIRC to efficiently run on large HPC systems. Currently, the ADCIRC code uses Fortran and MPI, and uses no load balancing at all. We now have a C++ application code that runs on LibGeoDecomp with an HPX backend for parallelization. A dynamic load balancing module for LibGeoDecomp that uses HPX's migration feature would greatly improve performance of this critical application.
Difficulty: Medium
Expected result: Dynamic load balancing for LibGeoDecomp that employs HPX migration
Knowledge Prerequisite: Good C++ knowledge
Mentor: Zach Byerly ()
See LibGeoDecomp and the STORM project

Stack Overflow Detection on Linux

Abstract: In contrast to Microsoft Windows, Linux isn't able to reliably detect stack overflows. Instead It simply reports segmentation faults. As HPX supports a very high number of threads, it can only allocate comparably small stacks. Therefore, stack overflows can occur even in the context of correct code. To help developers detect stack overflows, in this project a stack overflow detector is to be either created, or a good library is to be integrated into HPX.
Difficulty: Medium
Expected result: Stack overflows reported as stack overflows and not as segmentation faults
Knowledge Prerequisite: Good C or C++ knowledge
Mentor: Bryce Lelbach ()
See issue #2408 on HPX bug tracker

Port HPX to iOS

Abstract: HPX has already proven to run efficiently on ARM based systems. This has been demonstrated with an application written for Android tablet devices. A port to handheld devices running with iOS would be the next logical steps! In order to be able to run HPX efficiently on there, we need to adapt our build system to be able to cross compile for iOS and add a code to interface with the iOS GUI and other system services.
Difficulty: Easy-Medium
Expected result: Provide a prototype HPX application running on an iPhone or iPad
Knowledge Prerequisite: C++, Objective-C, iOS
Mentor: Hartmut Kaiser () and Thomas Heller ()

Augment CSV Files

Abstract: A counter destination .csv file with header that contains small counter labels such that results of multiple samples with multiple input parameters can be logged with counters, but does not work correctly when counters are queried for multiple OS threads. This should be a simple fix and should be extended to multiple localities as well. Make additions to HPX that would add user define parameters to the counter destination file. This would enable a user to have their information along with the HPX counter info in one CSV file. This could include input parameters as well as output such as Execution Time, or other pertinent output from the application, or the runtime (i.e. number of threads etc.). Then write some python/pandas or R to do statistical processing and/or plots. Database ideas welcome, or get familiar with the APEX and Tau interfaces with HPX to do some data processing visualization using them and write up examples for users.
Difficulty: Easy
Expected result: Augment CSV files with user defined parameters and fix hpx-counters for multiple OS threads and interval, perform statistics and plot capabilites.
Knowledge Prerequisite: familiarity and willingness to work with C++, python and pandas
Mentors: Patricia Grubel (), Hartmut Kaiser ()

Create A Parcelport Based on WebSockets

Abstract: Create a new parcelport which is based on WebSockets. The WebSockets++ library seems to be a perfect starting point to avoid having to dig into the WebSocket protocol too deeply.
Difficulty: Medium-Hard
Expected result: A proof of concept parcelport based on WebSockets with benchmark results
Knowledge Prerequisite: C++, knowing WebSockets is a plus
Mentor: Hartmut Kaiser () and Thomas Heller ()

Script Language Bindings

Abstract: Design and implement Python bindings for HPX exposing all or parts of the HPX functionality with a 'Pythonic' API. This should be possible as Python has a much more dynamic type system than C++. Using Boost.Python seems to be a good choice for this.
Difficulty: Medium
Expected result: Demonstrate functioning bindings by implementing small example scripts for different simple use cases
Knowledge Prerequisite: C++, Python
Mentor: Hartmut Kaiser () and Adrian Serio ()

All to All Communications

Abstract: Design and implement efficient all-to-all communication LCOs. While MPI provides mechanisms for broadcasting, scattering and gathering with all MPI processes inside a communicator, HPX currently misses this feature. It should be possible to exploit the Active Global Address Space to mimic global all-to-all communications without the need to actually communicate with every participating locality. Different strategies should be implemented and tested. A first and very basic implementation of broadcast already exists which tries to tackle the above described problem, however, more strategies to granularity control and locality exploitation need to be investigated an implemented. We also have a first version of a gather utility implemented.
Difficulty: Medium-Hard
Expected result: Implement benchmarks and provide performance results for the implemented algorithms
Knowledge Prerequisite: C++
Mentor: Thomas Heller () and Andreas Schäfer ()

Distributed Component Placement

Abstract: Implement a EDSL to specify the placement policies for components. This could be done similar to [Chapels Domain Maps] (http://chapel.cray.com/tutorials/SC12/SC12-6-DomainMaps.pdf). In Addition, allocators can be built on top of those domain maps to use with C++ standard library containers. This is one of the key features to allow users to efficiently write parallel algorithms without having them worried to much about the initial placement of their distributed objects in the Global Address space
Difficulty: Medium-Hard
Expected result: Provide at least one policy which automatically creates components in the global address space
Knowledge Prerequisite: C++
Mentor: Thomas Heller () and Hartmut Kaiser ()

A C++ Runtime Replacement

Abstract: Turn HPX into a replacement for the C++ runtime. We currently need to manually "lift" regular functions to HPX threads in order to have all the information for user-level threading available. This project should research the steps that need to be taken to implement a HPX C++ runtime replacement and provide a first proof of concept implementation for a platform of choice.
Difficulty: Easy-Medium
Expected result: A proof of concept implementation and documentation on how to run HPX application without the need of an hpx_main
Knowledge Prerequisite: C++, Dynamic Linker, Your favorite OS ABI to start programs/link executables
Mentor: Hartmut Kaiser () and Thomas Heller ()

Resumable Function Implementation

Abstract: Implement resumable functions either in GNU g++ or Clang. This should be based on the corresponding proposal to the C++ standardization committee (see N4286. While this is not a project which directly related HPX, having resumable functions available and integrated with hpx::future would allow to improve the performance and readability of asynchronous code. This project sounds to be huge - but it actually should not be too difficult to realize.
Difficulty: Medium-Hard
Expected result: Demonstrating the await functionality with appropriate tests
Knowledge Prerequisite: C++, knowledge of how to extend clang or gcc is clearly advantageous
Mentor: Hartmut Kaiser ()

Coroutine-like Interface

Abstract: HPX is an excellent runtime system for doing task based parallelism. In its current form however results of tasks can only be expressed in terms of returning from a function. However, there are scenarios where this is not sufficient. One example would be lazy ranges of integers (For example fibonacci, 0 to n, etc.). For those a generator/yield construct would be perfect!
Difficulty: Easy-Medium
Expected result: Implement yield and demonstrate on at least one example
Knowledge Prerequisite: C++
Mentor: Hartmut Kaiser () and Thomas Heller ()

Bug Hunter

Abstract: In addition to our extensive ideas list, there are several active tickets listed in our issue tracker which are worth tackling as a separate project. Feel free to talk to us if you find something which is interesting to you. A prospective student should pick at least one ticket with medium to hard difficulty and discuss how it could be solved
Difficulty: Medium-Hard
Expected result: The selected issues need to be fixed
Knowledge Prerequisite: C++
Mentor: Thomas Heller ()

Graphical and Terminal User Interface for Scimitar

Abstract: Scimitar, the HPX debugger is a distributed front-end for GDB with HPX support. It is a command-line application and in order to facilitate user experience it needs a graphical interface and a terminal interface. This task is not difficult but is very time consuming.
Difficulty: Easy-Medium
Expected result: A GUI and terminal interface (ncurses, etc) for on top of Scimitar.
Knowledge Prerequisite: Python, C++, Qt or a comparable library, and possibly x86 Assembly
Mentor: Parsa Amini ()

Port Graph500 to HPX

Abstract: Implement Graph500 using the HPX Runtime System. Graph500 is the benchmark used by HPC industry to model important factors of many modern parallel analytical workloads. The Graph500 list is a performance list of systems using the benchmark and was designed to augment the Top 500 list. The current Graph500 benchmarks are implemented using OpenMP and MPI. HPX is well suited for the fine-grain and irregular workloads of graph applications. Porting Graph500 to HPX would require replacing the inherent barrier synchronization with asynchronous communications of HPX, producing a new benchmark for the HPC community as well as an addition to the HPX benchmark suite. See http://www.graph500.org/ for information on the present Graph500 implementations.
Difficulty: Medium
Expected result: New implementation of the Graph500 benchmark.
Knowledge Prerequisite: C++
Mentor: Patricia Grubel (), and Thomas Heller ()

Port Mantevo MiniApps to HPX

Abstract: Implement a version of one or more mini apps from the Mantevo project (http://mantevo.org/ "Mantevo Project Home Page") using HPX Runtime System. We are interested in mini applications ported to HPX that have irregular workloads. Some of these are under development and we will have access to them in addition to those listed on the site. On the site, MiniFE and phdMESH would be a good additions to include in HPX benchmark suites. Porting the mini apps would require porting the apps from C to C++ and replacing the inherent barrier synchronization with HPX's asynchronous communication. This project would be a great addition to the HPX benchmark suite and the HPC community.
Difficulty: Medium
Expected result: New implementation of a Mantevo mini app or apps.
Knowledge Prerequisite: C, C++
Mentor: Patricia Grubel () and Thomas Heller ()

Create An HPX Communicator for Trilinos Project Teuchos Subpackage

Abstract: The trilinos project (http://trilinos.org/) consists of many libraries for HPC applications in several capability areas (http://trilinos.org/capability-areas/). Communication between parallel processes is handled by an abstract communication API (http://trilinos.org/docs/dev/packages/teuchos/doc/html/index.html#TeuchosComm_src) which currently has implementations for MPI and serial only. Extending the implementation with an HPX backend would permit any of the Teuchos enabled Trilinos libraries to run in parallel using HPX in place of MPI. Of particular interest is the mesh partitioning library Zoltan2 (http://trilinos.org/packages/zoltan2/ "Zoltan2 - A Package for Load Balancing and Combinatorial Scientific Computing") which would be used as a test case for the new communications interface. Note that some new collective HPX algorithms may be required to fulfill the API requirements (see all-to-all-communications project above).
Difficulty: Medium-Hard
Expected result: A demo application for partitioning meshes using HPX and Zoltan.
Knowledge Prerequisite: C, C++, (MPI)
Mentor: John Biddiscombe () and Thomas Heller ()

Extension/Evaluation of `LibGeoDecomp::Region` As An Alternative Adjacency Container to Boost Graph Library

Abstract: The Boost Graph Library (BGL) offers a set of data structures to store various kinds of graphs, together with generic algorithms to operate on these. For certain classes of graphs, which are relevant to high performance computing (HPC), the adjacency information could be stored more efficiently via a data structure we have developed for LibGeoDecomp: the Region class. Region stores basically a set of 1D/2D/3D coordinates with run-length compression. A set of 2D coordinates is equivalent to a set of directed edges of a graph. For this project we'd be interested in an adaptation of the Region interface to make it usable within BGL. The expected interface of adjacency classes is well defined within BGL.
Difficulty: Medium
Expected result: Adapter class or extended Region class for use in BGL, evaluation via set of relevant benchmarks
Knowledge Prerequisite: basic C++ and basic graph theory
Mentor: Andreas Schaefer ()

Add Mask Move/Assign Wrappers for Vectorization Intrinsics

Abstract: Vectorization is a key technique to leverage the full potential of modern CPUs. LibFlatArray is a C++ library which helps with transitioning scalar numerical algorithms on objects to vectorized implementations. It comes with expression templates that enable the user to write code that encapsulate vector intrinsics but appear to the user like standard mathematical datatypes and operations. These templates (dubbed short_vec in LibFlatArray) currently lack a mechanism to selectively set certain lanes of the vector registers via conditional masks. If we had this functionality we'd be able to represent if/then/else constructs way more idiomatically. Intrinsics for mask generation/application are readily available in all current vector instruction sets (Intel/ARM/IBM), we simply lack convenient/efficient wrappers to utilize them.
Difficulty: Medium
Expected result: Wrapper functions for comparison (to generate masks) and conditional assignment (using masks)
Knowledge Prerequisite: basic C++, vectorization via SSE, AVX/AVX2/AVX512
Mentor: Andreas Schaefer ()

Newtonian Physics Sandbox

Abstract: Develop a tool that allows users to create physical simulations of simple interacting solid objects (spheres, tetrahedrons) with little effort. A force-based, time discrete algorithm would be best suited for this. Ideally users could model a scene in Blender, export it to a text file, import that file in the tool and simulate it with LibGeoDecomp. LibGeoDecomp (https://www.libgeodecomp.org "LibGeoDecomp - Petascale Computer Simulations") is a library for scalable computer simulations. It can leverage HPX for parallelization, which is extremely relevant in this context to even out the computational load. LibGeoDecomp has been used in a number of science projects, but the complexity of these means they're not good to introduce new users. The purpose of this project is to show the capabilities of LibGeoDecomp as well as to provide a non-trivial, yet manageable example code. The focus of this project is the physical modelling (not too hard) and embedding it in LibGeoDecomp. Stretch goals could be in situ visualization via the existing VisIt interface, more complicated object geometries, or output to POVRay. A rudimentary sketch of what we'd like to accomplish can be seen in this video: https://www.youtube.com/watch?v=x8tQGzJQmbo
Difficulty: Easy-Medium
Expected result: Command line tool for force-based. time-discrete simulation of Newtonian physics of simple solids based on LibGeoDecomp
Knowledge Prerequisite: physical modeling, basic C++
Mentor: Andreas Schaefer ()

Implement Your Favorite Parcelport Backend

Abstract: The HPX runtime system uses a module called Parcelport to deliver packages over the network. An efficient implementation of this layer is indispensable and we are searching for new backend implementations based on CCI, ucx or libfabric. All of these mentioned abstractions over various network transport layers offer the ability to do fast, one-sided RDMA transfers. The purpose of this project is to explore one of these and implement a parcelport using it.
Difficulty: Medium-Hard
Expected result: A proof of concept for a chosen backend implementation with performance results
Knowledge Prerequisite: C++, Basic understanding of Network transports
Mentor: Thomas Heller ()

Implement a Faster Associative Container for GIDs

Abstract: The HPX runtime system uses Active Global Address Space (AGAS) to address global objects. Objects in HPX are identified by a 128-bit unique global identifier, abbreviated as a GID. The performance of HPX relies on fast lookups of GIDs in associative containers. We have experimented with binary search trees (std::map) and hash maps (std::unordered_map). However, we believe that we can implement a search data structure based on n-ary trees, tries or radix trees that exploit the structure of GIDs such that it allows us to have faster lookup and insertion.
Difficulty: Medium-Hard
Expected result: Various different container approaches to choose from together with realistic benchmarks to show the performance properties
Knowledge Prerequisite: C++, Algorithms
Mentor: Thomas Heller ()

HPX in the Cloud

Abstract: The HPX runtime system permits the dynamic (re-)allocation of computing resources during execution. This feature is especially interesting for dynamic execution environments, i.e., Cloud Computing, as the availability of resources is not necessarily statically defined before the deployment of a computational task. Yet, the implementation of a demonstrator is not merely restricted to writing source code, but as there is currently little experience with dynamic resizing and redistribution of tasks during runtime we also want to develop a set of heuristics for HPX on dynamic execution environments.
Difficulty: Medium-Hard
Expected result: Sample Benchmarks and Implementations for dynamic scaling using HPX
Knowledge Prerequisite: C++, MPI, virtualization technologies (VMs and containers)
Mentor: Alexander Ditter ()

Hybrid Cloud-Batch-Scheduling

Abstract: Currently HPX is mostly executed on conventional batch scheduled HPC systems, using tools like SLURM, TORQUE or PBS for the deployment of jobs. Along with the trend towards more dynamic execution environments, i.e., Cloud Computing, grows the need to supply available resources from existing cluster systems in order deploy software in virtual machines or containers. For this reason, we want to provide an intermediate scheduler that allows the concurrent use of batch scheduling and cloud middleware on the same physical infrastructure. This meta scheduler receives requests for the deployment of virtual machines and maps them onto the batch scheduling systems that manages the cluster infrastructure. An existing approach may be extended, refactored or just considered for inspirational purposes.
Difficulty: Medium-Hard
Expected result: An extensible open source meta scheduler for the concurrent use of cloud middleware (e.g., OpenNebula or Open Stack) on top of batch systems.
Knowledge Prerequisite: Basic knowledge on virtualization (e.g. libvirt), software engineering and scripting language(s) - e.g., Python
Mentor: Alexander Ditter ()

Project: HPXCL - Asynchronous integration of CUDA and OpenCL to HPX

Abstract: To fully utilize the resources of modern supercomputers, combining CPUs and acceleration cards such as GPUs is essential. Therefore, HPXCL provides the asynchronous integration of CUDA and OpenCL into the HPX framework. Such that the launch of a kernel and the data transfer can be integrated in the asynchronous execution graph of HPX. In this project the existing implementation of CUDA or OpenCL should be extended and improved.
Difficulty: Medium-Hard
Expected result: For both tasks we expect that the existing implementation is adapted to the latest version of HPX and old methods are re implemented with new HPX features. For the OpenCL part, we expect that all functionality is compatible with the latest OpenCL version. For the CUDA part only the basics are implemented yet, here we expect that the missing functionality with respect. to the OpenCL par is implemented. In both cases we expect a benchmark, which documents that the changes in the code do no reduce the existing performance or much better improve it.
Knowledge Prerequisite: Basic knowledge in CUDA or OpenCl and good knowledge in C++
Mentor: Patrick Diehl () and Thomas Heller ()

SYCL backend for HPX Compute

Abstract: HPX.Compute provides a new approach to work and data distribution for parallel algorithms. The existing implementation supports execution on CUDA-enabled GPUs and we have worked on alternative backends for AMD HCC and Khronos SYCL. In this project a preliminary implementation of SYCL backend should be extended with new features. An example of missing feature and good starting point is an implementation of a tuple which is C++ standard layout. Having that would vastly simplify the implementation of executor and allow to integrate an index-based for loop. Other tasks involve implementing and testing other parallel algorithms, implementing a concurrent executor or supporting work dispatch to multiple devices
Difficulty: Medium-Hard
Expected result: The backend is comparable with CUDA in terms of supported features and can schedule at least few algorithms, including index-based for loop
Knowledge Prerequisite: Basic knowledge in either OpenCL or SYCL, good knowledge in C++
Mentor: Marcin Copik ()

Project: Template

Abstract:
Difficulty:
Expected result:
Knowledge Prerequisite:
Mentor:

Content

HPX Resource Guide
HPX Source Code Structure and Coding Standards
Improvement of the HPX core runtime
How to Get Involved in Developing HPX
How to Report Bugs in HPX
Known issues in HPX V1.0.0
HPX continuous integration build configurations
How to run HPX on various Cluster environments
Google Summer of Code
Google Season of Documentation
Documentation Projects
Planning and coordination
- Coordination meeting notes
- TODO Roadmap