-
Notifications
You must be signed in to change notification settings - Fork 957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfcs: add proposal on reorganizing GPU abstractions #1840
base: rfcs
Are you sure you want to change the base?
rfcs: add proposal on reorganizing GPU abstractions #1840
Conversation
0c7a706
to
1d5a726
Compare
* There is currently no plan to extend OpenCL support to other vendors | ||
therefore the name of the final class (`ocl_gpu_engine_t`) doesn't | ||
include the vendor's name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be clearer if Intel was put in the name of the engine?
Engine is the most heavily affected abstraction by the changes. | ||
There is currently no `gpu_engine_t` class as `compute_engine_t` is used as | ||
an Intel centric alternative of it. In order to have a common basic | ||
abstraction at the GPU level the `gpu_engine_t` class will be introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a list of functions to put into gpu_engine_t
? I'm not sure we need a common engine class for GPU engines considering all interactions with GPU engines are either generic (we only need engine_t
) or runtime-specific (where we have to cast to sycl_engine_base_t
or ocl_gpu_engine_t
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that when I was writing this paragraph. I don't have a use case that would require gpu_engine_t
class to exist at this point, it could be that there is none. The reason why I decided to add this class was to have something common at the GPU level. I'm currently working on an implementation and if I can't find any use cases I may remove it as it won't be a big deal to add it when the need arises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea that I'm mulling over is getting rid of sycl_engine_base_t
(in the current RFC it's still there). The rational behind the idea is that I don't see how we can have a single version of sycl_engine_base_t
given that it should be inherited from vendor specific classes (e.g. compute_engine_t
) and I don't think that duplicating it is a good idea.
One option that I can see is to move the intel specific functionality over to sycl_gpu_intel_engine_t
. After that, sycl_engine_base_t
will just contain sycl::context
, sycl::device
and backend_t
, and some basic interfaces such as create_stream
, create_memory_storage
and so on. With that in mind, I think we could go from inheritance to composition, for example, sycl_engine_base_t
can be renamed to sycl_engine_impl_t
that would be included as data member into vendor specific classes such as sycl_gpu_intel_engine_t
, sycl_gpu_nvidia_engine_t
and sycl_gpu_amd_engine_t
. The vendor specific classes would just need to provide interfaces that would redirect to the interfaces of sycl_engine_impl_t
.
Any thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like this idea, this should be cleaner than what we have now. Below is how it can be implemented (as I understand), this may or may not implementable but I think this should make things simpler:
- For Intel GPU engine we can have simple inheritance:
engine_t
->gpu_intel_engine_t
gpu_intel_engine_t
can includeshared_ptr<engine_impl_t>
and storesycl_engine_impl_t
orocl_engine_impl_t
. Thengpu_intel_engine_t
can do simple redirection via virtual methods ofengine_impl_t
sycl_gpu_engine_t
can still be in the hierarchy for simplicity (as well assycl_cpu_engine_t
). It can storesycl_engine_impl_t
and perform simple redirection. This way Nvidia/AMD engine classes can rely on simple inheritance and do not reimplement the same functionalityocl_gpu_engine_t
is not really needed as all Intel-specific primitives will be implemented on top ofgpu_intel_engine_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echeresh, I updated the RFC, please take a look.
|
||
With the new schema they will be changed to: | ||
* SYCL Intel and generic: `engine_t` -> `gpu_engine_t` -> (`gpu_intel_engine_t` -> `compute_engine_t`) -> `sycl_engine_base_t` -> `sycl_gpu_intel_engine_t` | ||
* SYCL NVIDIA: `engine_t` -> `gpu_engine_t` -> (`gpu_intel_engine_t` -> `compute_engine_t`) -> `sycl_engine_base_t` -> `sycl_gpu_nvidia_engine_t` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it's orthogonal to the RFC I feel like this might be a chance to drop prefixes in all related classes: e.g. dnnl::impl::gpu::nvidia::sycl_gpu_nvidia_engine_t
-> dnnl::impl::gpu::nvidia::engine_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can definitely do that however I'm concerned about possible clashing problems like the one we have with two sycl
namespaces.
In your example you skipped that sycl
namespace, was it on purpose? Do you suggest to assume that everything that is nvidia/amd related is sycl related? If so, it looks a bit confusing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if Nvidia/AMD implementations are all SYCL-based than having an extra directory is not that useful (maybe only for consistency).
Another observation is that current src/gpu/ocl
combines OpenCL-specific functionality (opposite to SYCL which is split between src/sycl
and src/gpu/sycl
) and "pseudo"-OpenCL implementations which are in fact used with both OpenCL/SYCL (moreover there are fully OpenCL-agnostic implementations in this directory, e.g. this). So if you have any ideas how to make it more consistent - this would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if Nvidia/AMD implementations are all SYCL-based than having an extra directory is not that useful (maybe only for consistency).
The extra directory doesn't make much sense indeed because it doesn't look like we'll need anything other than SYCL to support Nvidia and AMD. On the other hand, we will have sycl
namespace for intel vendor (gpu::intel::sycl
). Anyway, in the bigger scheme of things it's not going to make a big difference so I'm fine with either way.
So if you have any ideas how to make it more consistent - this would be nice.
- I don't think we could get rid of
src/sycl
because we have to support CPU and GPU engines and that is the place for common SYCL code. If we want to get rid of it then we will have to move the common code tosrc/common/sycl
and the CPU code tosrc/cpu/sycl
. I don't know if we really need it because it won't give us much. - We have a bunch of runtime agnostic primitives. They are basically comprised of nested primitives (e.g. sum, concat, gemm). We can put them directly in
src/gpu/generic
without any subdirectories.
├── cpu/ # CPU code | ||
├── sycl/ # A common place for basic CPU and GPU SYCL abstractions, e.g. `sycl_cpu_engine_t`, `sycl_gpu_engine_t`, etc. | ||
└── gpu/ # GPU code. Basic GPU code resides in the GPU directory directly, e.g. `gpu_engine_t`. | ||
├── intel/ # Everything that is related to Intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the proposal also includes adjustment of namespaces (like moving all Intel-specific GPU functionality into dnnl::impl::gpu::intel
, is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the plan, we already have it for amd and nvidia so we will get one for intel too. I'll add information about name spaces to the RFC.
04aaa14
to
abaf407
Compare
* SYCL Intel: `engine_t` -> `gpu_engine_t` -> `gpu_intel_engine_t` -> `compute_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_intel_engine_t` | ||
* SYCL NVIDIA: `engine_t` -> `gpu_engine_t` -> `gpu_nvidia_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_nvidia_engine_t` | ||
* SYCL AMD: `engine_t` -> `gpu_engine_t` -> `gpu_amd_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_amd_engine_t` | ||
* OpenCL: `engine_t` -> `gpu_engine_t` -> `gpu_intel_engine_t` -> `compute_engine_t` -> `ocl_gpu_engine_t` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels to me like we are missing some abstractions of the runtime which is causing unnecessary inheritance. A
lot of the runtime abstraction is currently coming from the compute_*_t
code. What if we broke that out into its own thing and had something more like:
// runtime_t encapsulates runtime kernel object creation, and runtime support info, at its core,
// I believe this should be a runtime_device_id, and runtime_device_context.
ocl_runtime_t
sycl_runtime_t
// Engines implementations contain a `runtime_t` object used for implementing the engine_t interface
// along with any <vendor>_device_info_t that may not be available from the the runtime. We likely need
// an extra layer of inheritance or a modification to `engine_t` to expose the `<vendor>_device_info_t`
// and a few runtime interfaces in a generic way.
SYCL NVIDIA: engine_t -> nvidia_sycl_engine_t
SYCL AMD: engine_t -> amd_sycl_engine_t
SYCL INTEL: engine_t -> intel_sycl_engine_t
OCL INTEL: engine_t -> intel_ocl_engine_t
Then rather than have a generic
folder, we have some runtime folders opencl
and sycl
where the runtime implementation and generic kernel implementations are stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get it correctly, the idea sounds very similar to what I described in my comment above (the sycl_engine_base_t
part). I think we will probably end up having something like that anyway because as I said in the comment duplicating sycl_engine_base_t
doesn't look right. I haven't thought about OpenCL though as it doesn't have the engine base class problem but I like the general idea and I'm currently figuring out what the design should look like.
Also, we need to keep in mind that for SYCL runtime we need to be able to support the following scenarios: only generic kernels enabled, NVIDIA + generic kernels enabled and AMD + generic kernels enabled. That's why we should not tie the generic kernels to any particular vendor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjoursler, I updated the RFC, please take a look.
│ ├── sycl/ # AMD related functionality/abstractions (MIOpen/rocBLAS based and HIP specific SYCL kernels) | ||
│ └── ... | ||
└── generic/ # Everything that is related to generic kernels | ||
└── sycl/ # generic SYCL kernels and related abstractions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the criteria for an OpenCL/SYCL kernel to not be generic? Most of our OpenCL kernels are generic in theory, so long as the runtime supports the necessary features. Do we have a method for separating dispatching/configuration heuristics from implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be driven by usage scenarios. If no plans to cover non-Intel hardware with OpenCL kernels - then we can assume all OpenCL kernels are Intel-specific (e.g. they can rely on Intel-specific features). Anything else will result in extra effort to support separation, etc. SYCL kernels may target both Nvidia/AMD hardware - so reusability potential is there but probably need more exploration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SYCL spec defines what's generic and what's not: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:generic-vs-non-generic.
I would say that a SYCL code that uses any vendor specific extensions would be considered as non-generic (e.g. sycl_ext_intel_grf_size, etc).
This proposal is a result of our collaboration with UXL Foundation. We agreed to provide generic SYCL kernels that should work with any vendor if they are supported by the compiler. In addition, we need to pave the way for more vendors to be supported. Therefore, we need to look beyond Intel vendor when reviewing the proposed architecture.
abaf407
to
0871878
Compare
0871878
to
78bd1e0
Compare
* SYCL NVIDIA: `engine_t` -> `gpu::engine_t` -> `gpu::nvidia::engine_t` | ||
* SYCL AMD: `engine_t` -> `gpu::engine_t` -> `gpu::amd::engine_t` | ||
* OpenCL: `engine_t` -> `gpu::engine_t` -> `gpu::intel::engine_t` -> `gpu::intel::ocl::engine_t` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on the plan to share generic implementation lists across sycl engines?
Would we keep separate implementation list for each vendor (and add generic impls in each of those), make the generic impl list common somehow? ...
const dnnl::impl::engine_impl_t *impl() const; | ||
|
||
// Points to either dnnl::impl::sycl::engine_impl_t or dnnl::impl::intel::ocl::engine_impl_t | ||
std::unique_ptr<dnnl::impl::engine_impl_t> impl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the type erasure here necessary? What if we just templated this to be:
template <impl_kind_t>
struct engine_t: public dnnl::impl::engine_t {
const impl_kind_t * impl() const;
// We could also remove indirection from `unique_ptr` if desired.
std::unique_ptr<impl_kind_t> impl;
}
Rendered doc