You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
SYCL allows submitting native backend operations to (in-order) queues without paying the synchronization overhead cost of a host_task through sycl::get_native(queue).
sycl::queue q(device, sycl::property::queue::in_order{});
constauto evt1 = q.submit(/* some SYCL operation */);
auto cuda_stream = sycl::get_native<sycl::backend::ext_oneapi_cuda>(q);
cudaNativeFunctionAsync(..., cuda_stream); // does not wait for evt1 on the host, only on device!constauto evt2 = ??;
As seen above we can get a sycl::event for the SYCL operation / kernel submission, but not for the cudaNative submission. Such an event would however be desirable so that another operation (on a different queue) could specify a dependency on that exact submission, something which is not possible when manually doing cudaEventRecord.
Describe the solution you would like
Multiple ideas, in descending complexity:
Support for AdaptiveCpp's enqueue_custom_operation, see docs
An extension function sycl::event sycl::queue::ext_record_event() that performs the equivalent of a cudaEventRecord on an in-order queue
A working implementation of sycl::make_event<backend::ext_oneapi_cuda>(cudaEvent_t, context &)
A pointer on what internal function needs to be called as workaround in the meantime
include/sycl/backend.hpp:356:1: note: candidate template ignored: requirement 'detail::InteropFeatureSupportMap<sycl::backend::ext_oneapi_cuda>::MakeEvent == true' was not satisfied [with Backend = sycl::backend::ext_oneapi_cuda]
Additional context
Using host_task as a replacement is not desirable because it needs to wait (on the host) for the previous operations on the (in-order) queue to complete, negating the latency-hiding benefits of eagerly submitting device work in-order.
Please advise if there is any workaround using (non-portable / unstable) internal APIs at the moment to create an event from such a manual submission, or to convert (wrap) a cudaEvent_t to a sycl::event so that other in-order queues may wait on it.
Above code / workarounds was tried with DPC++ e330855 (May 7, 2024).
The text was updated successfully, but these errors were encountered:
Thanks, I've looked into the implementation of ext_oneapi_submit_barrier and it appears to just return the event from the last SYCL submission that was made to the in-order queue. It does not seem to record new events meaning it is blind to my manual submission through the native queue.
I've found another extension with a promising name, sycl_ext_oneapi_in_order_queue_events which provides queue::ext_oneapi_set_external_event - but the external event is again a SYCL event and I have no way of creating one from a cudaEvent_t, so I'm back to the start.
Thinking about this a some more, I'm beginning to wonder if my example code can even be made thread-safe at all. As far as I understand DPC++ is free to perform its own native-queue submissions of kernels in a background thread and return control immediately, which means that even if CUDA streams themselves are / were thread-safe, I could experience spurious re-orders between a kernel launch in q.submit() and my own subsequent operation on the native queue.
Having an equivalent to AdaptiveCpp's enqueue_custom_operation really seems like the best option to have semantics that are both unambiguous to the user and allow the implementation to see all of the users interop work.
Is your feature request related to a problem? Please describe
SYCL allows submitting native backend operations to (in-order) queues without paying the synchronization overhead cost of a
host_task
throughsycl::get_native(queue)
.As seen above we can get a
sycl::event
for the SYCL operation / kernel submission, but not for thecudaNative
submission. Such an event would however be desirable so that another operation (on a different queue) could specify a dependency on that exact submission, something which is not possible when manually doingcudaEventRecord
.Describe the solution you would like
Multiple ideas, in descending complexity:
enqueue_custom_operation
, see docssycl::event sycl::queue::ext_record_event()
that performs the equivalent of acudaEventRecord
on an in-order queuesycl::make_event<backend::ext_oneapi_cuda>(cudaEvent_t, context &)
Describe alternatives you have considered
I have attempted
but the returned event does not make progress when queried using event.get_infosycl::info::event::command_execution_status()`.
Using the official API
return sycl::make_event<sycl::backend::ext_oneapi_cuda>(event, context);
instead fails to compile with
Additional context
Using
host_task
as a replacement is not desirable because it needs to wait (on the host) for the previous operations on the (in-order) queue to complete, negating the latency-hiding benefits of eagerly submitting device work in-order.Please advise if there is any workaround using (non-portable / unstable) internal APIs at the moment to create an event from such a manual submission, or to convert (wrap) a
cudaEvent_t
to asycl::event
so that other in-order queues may wait on it.Above code / workarounds was tried with DPC++ e330855 (May 7, 2024).
The text was updated successfully, but these errors were encountered: