rfcs: add proposal on reorganizing GPU abstractions #1840

densamoilov · 2024-03-25T17:20:29Z

rfcs/20240325-reorganize-gpu/README.md

t4c1 · 2024-03-27T16:44:26Z

rfcs/20240325-reorganize-gpu/README.md

+    * There is currently no plan to extend OpenCL support to other vendors
+    therefore the name of the final class (`ocl_gpu_engine_t`) doesn't
+    include the vendor's name.


Would it be clearer if Intel was put in the name of the engine?

echeresh · 2024-03-27T20:53:07Z

rfcs/20240325-reorganize-gpu/README.md

+Engine is the most heavily affected abstraction by the changes.
+There is currently no `gpu_engine_t` class as `compute_engine_t` is used as
+an Intel centric alternative of it. In order to have a common basic
+abstraction at the GPU level the `gpu_engine_t` class will be introduced.


Do you have a list of functions to put into gpu_engine_t? I'm not sure we need a common engine class for GPU engines considering all interactions with GPU engines are either generic (we only need engine_t) or runtime-specific (where we have to cast to sycl_engine_base_t or ocl_gpu_engine_t).

I thought about that when I was writing this paragraph. I don't have a use case that would require gpu_engine_t class to exist at this point, it could be that there is none. The reason why I decided to add this class was to have something common at the GPU level. I'm currently working on an implementation and if I can't find any use cases I may remove it as it won't be a big deal to add it when the need arises.

Another idea that I'm mulling over is getting rid of sycl_engine_base_t (in the current RFC it's still there). The rational behind the idea is that I don't see how we can have a single version of sycl_engine_base_t given that it should be inherited from vendor specific classes (e.g. compute_engine_t) and I don't think that duplicating it is a good idea.

One option that I can see is to move the intel specific functionality over to sycl_gpu_intel_engine_t. After that, sycl_engine_base_t will just contain sycl::context, sycl::device and backend_t, and some basic interfaces such as create_stream, create_memory_storage and so on. With that in mind, I think we could go from inheritance to composition, for example, sycl_engine_base_t can be renamed to sycl_engine_impl_t that would be included as data member into vendor specific classes such as sycl_gpu_intel_engine_t, sycl_gpu_nvidia_engine_t and sycl_gpu_amd_engine_t. The vendor specific classes would just need to provide interfaces that would redirect to the interfaces of sycl_engine_impl_t.

Any thoughts on that?

Yeah, I like this idea, this should be cleaner than what we have now. Below is how it can be implemented (as I understand), this may or may not implementable but I think this should make things simpler:

For Intel GPU engine we can have simple inheritance: engine_t -> gpu_intel_engine_t

gpu_intel_engine_t can include shared_ptr<engine_impl_t> and store sycl_engine_impl_t or ocl_engine_impl_t. Then gpu_intel_engine_t can do simple redirection via virtual methods of engine_impl_t

sycl_gpu_engine_t can still be in the hierarchy for simplicity (as well as sycl_cpu_engine_t). It can store sycl_engine_impl_t and perform simple redirection. This way Nvidia/AMD engine classes can rely on simple inheritance and do not reimplement the same functionality

ocl_gpu_engine_t is not really needed as all Intel-specific primitives will be implemented on top of gpu_intel_engine_t

@echeresh, I updated the RFC, please take a look.

echeresh · 2024-03-27T21:26:47Z

rfcs/20240325-reorganize-gpu/README.md

+
+With the new schema they will be changed to:
+* SYCL Intel and generic: `engine_t` -> `gpu_engine_t` -> (`gpu_intel_engine_t` -> `compute_engine_t`) -> `sycl_engine_base_t` -> `sycl_gpu_intel_engine_t`
+* SYCL NVIDIA: `engine_t` -> `gpu_engine_t` -> (`gpu_intel_engine_t` -> `compute_engine_t`) -> `sycl_engine_base_t` -> `sycl_gpu_nvidia_engine_t`


Though it's orthogonal to the RFC I feel like this might be a chance to drop prefixes in all related classes: e.g. dnnl::impl::gpu::nvidia::sycl_gpu_nvidia_engine_t -> dnnl::impl::gpu::nvidia::engine_t.

We can definitely do that however I'm concerned about possible clashing problems like the one we have with two sycl namespaces.

In your example you skipped that sycl namespace, was it on purpose? Do you suggest to assume that everything that is nvidia/amd related is sycl related? If so, it looks a bit confusing to me.

Well, if Nvidia/AMD implementations are all SYCL-based than having an extra directory is not that useful (maybe only for consistency).

Another observation is that current src/gpu/ocl combines OpenCL-specific functionality (opposite to SYCL which is split between src/sycl and src/gpu/sycl) and "pseudo"-OpenCL implementations which are in fact used with both OpenCL/SYCL (moreover there are fully OpenCL-agnostic implementations in this directory, e.g. this). So if you have any ideas how to make it more consistent - this would be nice.

Well, if Nvidia/AMD implementations are all SYCL-based than having an extra directory is not that useful (maybe only for consistency).

The extra directory doesn't make much sense indeed because it doesn't look like we'll need anything other than SYCL to support Nvidia and AMD. On the other hand, we will have sycl namespace for intel vendor (gpu::intel::sycl). Anyway, in the bigger scheme of things it's not going to make a big difference so I'm fine with either way.

So if you have any ideas how to make it more consistent - this would be nice.

I don't think we could get rid of src/sycl because we have to support CPU and GPU engines and that is the place for common SYCL code. If we want to get rid of it then we will have to move the common code to src/common/sycl and the CPU code to src/cpu/sycl. I don't know if we really need it because it won't give us much.

We have a bunch of runtime agnostic primitives. They are basically comprised of nested primitives (e.g. sum, concat, gemm). We can put them directly in src/gpu/generic without any subdirectories.

echeresh · 2024-03-27T21:31:29Z

rfcs/20240325-reorganize-gpu/README.md

+├── cpu/ # CPU code
+├── sycl/ # A common place for basic CPU and GPU SYCL abstractions, e.g. `sycl_cpu_engine_t`, `sycl_gpu_engine_t`, etc.
+└── gpu/ # GPU code. Basic GPU code resides in the GPU directory directly, e.g. `gpu_engine_t`.
+    ├── intel/ # Everything that is related to Intel


I guess the proposal also includes adjustment of namespaces (like moving all Intel-specific GPU functionality into dnnl::impl::gpu::intel, is this correct?

Yeah, that's the plan, we already have it for amd and nvidia so we will get one for intel too. I'll add information about name spaces to the RFC.

rjoursler · 2024-03-28T16:14:45Z

rfcs/20240325-reorganize-gpu/README.md

+* SYCL Intel: `engine_t` -> `gpu_engine_t` -> `gpu_intel_engine_t` -> `compute_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_intel_engine_t`
+* SYCL NVIDIA: `engine_t` -> `gpu_engine_t` -> `gpu_nvidia_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_nvidia_engine_t`
+* SYCL AMD: `engine_t` -> `gpu_engine_t` -> `gpu_amd_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_amd_engine_t`
+* OpenCL: `engine_t` -> `gpu_engine_t` ->  `gpu_intel_engine_t` -> `compute_engine_t` -> `ocl_gpu_engine_t`


This feels to me like we are missing some abstractions of the runtime which is causing unnecessary inheritance. A
lot of the runtime abstraction is currently coming from the compute_*_t code. What if we broke that out into its own thing and had something more like:

// runtime_t encapsulates runtime kernel object creation, and runtime support info, at its core, // I believe this should be a runtime_device_id, and runtime_device_context. ocl_runtime_t sycl_runtime_t // Engines implementations contain a `runtime_t` object used for implementing the engine_t interface // along with any <vendor>_device_info_t that may not be available from the the runtime. We likely need // an extra layer of inheritance or a modification to `engine_t` to expose the `<vendor>_device_info_t` // and a few runtime interfaces in a generic way. SYCL NVIDIA: engine_t -> nvidia_sycl_engine_t SYCL AMD: engine_t -> amd_sycl_engine_t SYCL INTEL: engine_t -> intel_sycl_engine_t OCL INTEL: engine_t -> intel_ocl_engine_t

Then rather than have a generic folder, we have some runtime folders opencl and sycl where the runtime implementation and generic kernel implementations are stored.

If I get it correctly, the idea sounds very similar to what I described in my comment above (the sycl_engine_base_t part). I think we will probably end up having something like that anyway because as I said in the comment duplicating sycl_engine_base_t doesn't look right. I haven't thought about OpenCL though as it doesn't have the engine base class problem but I like the general idea and I'm currently figuring out what the design should look like.

Also, we need to keep in mind that for SYCL runtime we need to be able to support the following scenarios: only generic kernels enabled, NVIDIA + generic kernels enabled and AMD + generic kernels enabled. That's why we should not tie the generic kernels to any particular vendor.

@rjoursler, I updated the RFC, please take a look.

rjoursler · 2024-03-28T16:59:25Z

rfcs/20240325-reorganize-gpu/README.md

+    │   ├── sycl/ # AMD related functionality/abstractions (MIOpen/rocBLAS based and HIP specific SYCL kernels)
+    │   └── ...
+    └── generic/ # Everything that is related to generic kernels
+        └── sycl/ # generic SYCL kernels and related abstractions


What would be the criteria for an OpenCL/SYCL kernel to not be generic? Most of our OpenCL kernels are generic in theory, so long as the runtime supports the necessary features. Do we have a method for separating dispatching/configuration heuristics from implementation?

I think it should be driven by usage scenarios. If no plans to cover non-Intel hardware with OpenCL kernels - then we can assume all OpenCL kernels are Intel-specific (e.g. they can rely on Intel-specific features). Anything else will result in extra effort to support separation, etc. SYCL kernels may target both Nvidia/AMD hardware - so reusability potential is there but probably need more exploration.

The SYCL spec defines what's generic and what's not: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:generic-vs-non-generic.

I would say that a SYCL code that uses any vendor specific extensions would be considered as non-generic (e.g. sycl_ext_intel_grf_size, etc).

This proposal is a result of our collaboration with UXL Foundation. We agreed to provide generic SYCL kernels that should work with any vendor if they are supported by the compiler. In addition, we need to pave the way for more vendors to be supported. Therefore, we need to look beyond Intel vendor when reviewing the proposed architecture.

mgouicem · 2024-04-09T13:27:15Z

rfcs/20240325-reorganize-gpu/README.md

+* SYCL NVIDIA: `engine_t` -> `gpu::engine_t` -> `gpu::nvidia::engine_t`
+* SYCL AMD: `engine_t` -> `gpu::engine_t` -> `gpu::amd::engine_t`
+* OpenCL: `engine_t` -> `gpu::engine_t` ->  `gpu::intel::engine_t` -> `gpu::intel::ocl::engine_t`
+


Could you elaborate on the plan to share generic implementation lists across sycl engines?
Would we keep separate implementation list for each vendor (and add generic impls in each of those), make the generic impl list common somehow? ...

rjoursler · 2024-04-17T18:00:27Z

rfcs/20240325-reorganize-gpu/README.md

+    const dnnl::impl::engine_impl_t *impl() const;
+
+    // Points to either dnnl::impl::sycl::engine_impl_t or dnnl::impl::intel::ocl::engine_impl_t
+    std::unique_ptr<dnnl::impl::engine_impl_t> impl;


Is the type erasure here necessary? What if we just templated this to be:

template <impl_kind_t> struct engine_t: public dnnl::impl::engine_t { const impl_kind_t * impl() const; // We could also remove indirection from `unique_ptr` if desired. std::unique_ptr<impl_kind_t> impl; }

densamoilov added the RFC A design document label Mar 25, 2024

densamoilov force-pushed the dsamoylo/rfcs/reorganize-gpu branch 2 times, most recently from 0c7a706 to 1d5a726 Compare March 25, 2024 20:39

t4c1 reviewed Mar 27, 2024

View reviewed changes

echeresh reviewed Mar 27, 2024

View reviewed changes

densamoilov force-pushed the dsamoylo/rfcs/reorganize-gpu branch 3 times, most recently from 04aaa14 to abaf407 Compare March 28, 2024 06:29

rjoursler reviewed Mar 28, 2024

View reviewed changes

densamoilov force-pushed the dsamoylo/rfcs/reorganize-gpu branch from abaf407 to 0871878 Compare April 8, 2024 08:09

rfcs: add proposal on reorganizing GPU abstractions

78bd1e0

densamoilov force-pushed the dsamoylo/rfcs/reorganize-gpu branch from 0871878 to 78bd1e0 Compare April 8, 2024 08:16

mgouicem approved these changes Apr 9, 2024

View reviewed changes

rjoursler reviewed Apr 17, 2024

View reviewed changes

densamoilov added 13 commits April 24, 2024 11:21

Update README.md

d2735c0

Update README.md

b8e04b4

Update README.md

5d71aad

Update README.md

ce2fe2a

Update README.md

f17f3a8

Update README.md

af49e58

Update README.md

3602a87

Update README.md

aadb853

Update README.md

21fcd5e

Update README.md

6fac647

Update README.md

8e90f29

Update README.md

fd65e13

Update README.md

068e47b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfcs: add proposal on reorganizing GPU abstractions #1840

rfcs: add proposal on reorganizing GPU abstractions #1840

densamoilov commented Mar 25, 2024 •

edited

t4c1 Mar 27, 2024

echeresh Mar 27, 2024

densamoilov Mar 28, 2024 •

edited

densamoilov Mar 28, 2024

echeresh Mar 28, 2024

densamoilov Apr 8, 2024 •

edited

echeresh Mar 27, 2024

densamoilov Mar 28, 2024

echeresh Mar 28, 2024

densamoilov Mar 29, 2024 •

edited

echeresh Mar 27, 2024

densamoilov Mar 28, 2024

rjoursler Mar 28, 2024 •

edited

densamoilov Mar 28, 2024

densamoilov Apr 8, 2024

rjoursler Mar 28, 2024

echeresh Mar 28, 2024

densamoilov Mar 28, 2024 •

edited

mgouicem Apr 9, 2024

rjoursler Apr 17, 2024

rfcs: add proposal on reorganizing GPU abstractions #1840

Are you sure you want to change the base?

rfcs: add proposal on reorganizing GPU abstractions #1840

Conversation

densamoilov commented Mar 25, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

densamoilov Mar 28, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

densamoilov Apr 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

densamoilov Mar 29, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjoursler Mar 28, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

densamoilov Mar 28, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

densamoilov commented Mar 25, 2024 •

edited

densamoilov Mar 28, 2024 •

edited

densamoilov Apr 8, 2024 •

edited

densamoilov Mar 29, 2024 •

edited

rjoursler Mar 28, 2024 •

edited

densamoilov Mar 28, 2024 •

edited