-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make task dependency graph construction atomic #191
base: master
Are you sure you want to change the base?
Make task dependency graph construction atomic #191
Conversation
…d deadlock Submitted by Anton Schreiner from Intel Poland with a deep analysis of the problem: The important points are (should not happen as per my understanding of the spec): • Producers don’t wait for consumers • The order of definitions of accessors in CG does matter – if we submit two equal tasks (with different accessor order of two buffers) from two different threads deadlock happens because one buffer has the first task as producer and the second buffer has the second task as producer – cyclic dependency. triSYCL dependency handling: a. Consumer – a task with read only access, Producer – a task with any other type of accessor. b. CG - command group How it should be by spec: The elementary node in dependency graph in SYCL is CG. Dependencies for the CG are specified with accessors. The order of definitions of accessors should not matter. How it is implemented: In triSYCL buffer keeps a use counter and a reference for the latest producer and all the modifications are protected with mutexes. A handler instance is made per CG submission which is called task. At the moment when accessor is created it increments use counter of the buffer and if the accessor wasn’t read only then it replaces the reference to last producer of the buffer with itself and saves the previous producer into a list inside task class. Kernel invocations are specified with parallel_for kind of methods of the task instance. Before the task runs its kernels it first waits for all the producers in the tasks list of producers. Problems: • Producers don’t wait for consumers • The order of definitions of accessors in CG does matters – that’s why deadlock happens
Introduce the concept of finalization of a command group.
…ssors This remove one of the dead-lock cases submitted by Intel.
Simplify also the code by removing ad-hoc test or intermediate container.
Do not add the third test yet because the anti-dependency is not managed yet.
Resolve conflict on include/CL/sycl/handler.hpp
associated to buffers */ | ||
std::vector<std::unique_lock<std::mutex>> locks; | ||
for (auto &b : buffers_in_use) | ||
// Create a lock for the mutex, but doe not lock it yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doe -> do
include/CL/sycl/handler.hpp
Outdated
@@ -31,6 +31,8 @@ | |||
namespace cl { | |||
namespace sycl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant to the commit but if it's possible might as well change it to the newer nested namespaces definition e.g. namespace cl::sycl
as was done for task.hpp, could also do it for queue.hpp if it requires it so we gradually move towards the new style.
tests/queue/deadlock_symetrical.cpp
Outdated
/* RUN: %{execute}%s | ||
|
||
Test that there is no deadlock in with massive parallel launch of | ||
kernels using accessors in an asymmetrical creation, A B || B A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the name of this test be deadlock_asymmetrical.cpp rather than deadlock_symmetrical.cpp? Or are the comments for deadlock_symetrical.cpp and deadlock.cpp in reverse? I believe it's the former if something is incorrect
This should answer the review from Andrew Gozillon in triSYCL#191
This should address some parts of #190 and other problems.