Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfcs: add proposal on reorganizing GPU abstractions #1840

Open
wants to merge 14 commits into
base: rfcs
Choose a base branch
from

Conversation

densamoilov
Copy link
Contributor

@densamoilov densamoilov commented Mar 25, 2024

@densamoilov densamoilov added the RFC A design document label Mar 25, 2024
@densamoilov densamoilov force-pushed the dsamoylo/rfcs/reorganize-gpu branch 2 times, most recently from 0c7a706 to 1d5a726 Compare March 25, 2024 20:39
rfcs/20240325-reorganize-gpu/README.md Outdated Show resolved Hide resolved
rfcs/20240325-reorganize-gpu/README.md Outdated Show resolved Hide resolved
Comment on lines 94 to 99
* There is currently no plan to extend OpenCL support to other vendors
therefore the name of the final class (`ocl_gpu_engine_t`) doesn't
include the vendor's name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be clearer if Intel was put in the name of the engine?

Engine is the most heavily affected abstraction by the changes.
There is currently no `gpu_engine_t` class as `compute_engine_t` is used as
an Intel centric alternative of it. In order to have a common basic
abstraction at the GPU level the `gpu_engine_t` class will be introduced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a list of functions to put into gpu_engine_t? I'm not sure we need a common engine class for GPU engines considering all interactions with GPU engines are either generic (we only need engine_t) or runtime-specific (where we have to cast to sycl_engine_base_t or ocl_gpu_engine_t).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that when I was writing this paragraph. I don't have a use case that would require gpu_engine_t class to exist at this point, it could be that there is none. The reason why I decided to add this class was to have something common at the GPU level. I'm currently working on an implementation and if I can't find any use cases I may remove it as it won't be a big deal to add it when the need arises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea that I'm mulling over is getting rid of sycl_engine_base_t (in the current RFC it's still there). The rational behind the idea is that I don't see how we can have a single version of sycl_engine_base_t given that it should be inherited from vendor specific classes (e.g. compute_engine_t) and I don't think that duplicating it is a good idea.

One option that I can see is to move the intel specific functionality over to sycl_gpu_intel_engine_t. After that, sycl_engine_base_t will just contain sycl::context, sycl::device and backend_t, and some basic interfaces such as create_stream, create_memory_storage and so on. With that in mind, I think we could go from inheritance to composition, for example, sycl_engine_base_t can be renamed to sycl_engine_impl_t that would be included as data member into vendor specific classes such as sycl_gpu_intel_engine_t, sycl_gpu_nvidia_engine_t and sycl_gpu_amd_engine_t. The vendor specific classes would just need to provide interfaces that would redirect to the interfaces of sycl_engine_impl_t.

Any thoughts on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like this idea, this should be cleaner than what we have now. Below is how it can be implemented (as I understand), this may or may not implementable but I think this should make things simpler:

  • For Intel GPU engine we can have simple inheritance: engine_t -> gpu_intel_engine_t
    • gpu_intel_engine_t can include shared_ptr<engine_impl_t> and store sycl_engine_impl_t or ocl_engine_impl_t. Then gpu_intel_engine_t can do simple redirection via virtual methods of engine_impl_t
  • sycl_gpu_engine_t can still be in the hierarchy for simplicity (as well as sycl_cpu_engine_t). It can store sycl_engine_impl_t and perform simple redirection. This way Nvidia/AMD engine classes can rely on simple inheritance and do not reimplement the same functionality
  • ocl_gpu_engine_t is not really needed as all Intel-specific primitives will be implemented on top of gpu_intel_engine_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echeresh, I updated the RFC, please take a look.


With the new schema they will be changed to:
* SYCL Intel and generic: `engine_t` -> `gpu_engine_t` -> (`gpu_intel_engine_t` -> `compute_engine_t`) -> `sycl_engine_base_t` -> `sycl_gpu_intel_engine_t`
* SYCL NVIDIA: `engine_t` -> `gpu_engine_t` -> (`gpu_intel_engine_t` -> `compute_engine_t`) -> `sycl_engine_base_t` -> `sycl_gpu_nvidia_engine_t`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it's orthogonal to the RFC I feel like this might be a chance to drop prefixes in all related classes: e.g. dnnl::impl::gpu::nvidia::sycl_gpu_nvidia_engine_t -> dnnl::impl::gpu::nvidia::engine_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can definitely do that however I'm concerned about possible clashing problems like the one we have with two sycl namespaces.

In your example you skipped that sycl namespace, was it on purpose? Do you suggest to assume that everything that is nvidia/amd related is sycl related? If so, it looks a bit confusing to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if Nvidia/AMD implementations are all SYCL-based than having an extra directory is not that useful (maybe only for consistency).

Another observation is that current src/gpu/ocl combines OpenCL-specific functionality (opposite to SYCL which is split between src/sycl and src/gpu/sycl) and "pseudo"-OpenCL implementations which are in fact used with both OpenCL/SYCL (moreover there are fully OpenCL-agnostic implementations in this directory, e.g. this). So if you have any ideas how to make it more consistent - this would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if Nvidia/AMD implementations are all SYCL-based than having an extra directory is not that useful (maybe only for consistency).

The extra directory doesn't make much sense indeed because it doesn't look like we'll need anything other than SYCL to support Nvidia and AMD. On the other hand, we will have sycl namespace for intel vendor (gpu::intel::sycl). Anyway, in the bigger scheme of things it's not going to make a big difference so I'm fine with either way.

So if you have any ideas how to make it more consistent - this would be nice.

  • I don't think we could get rid of src/sycl because we have to support CPU and GPU engines and that is the place for common SYCL code. If we want to get rid of it then we will have to move the common code to src/common/sycl and the CPU code to src/cpu/sycl. I don't know if we really need it because it won't give us much.
  • We have a bunch of runtime agnostic primitives. They are basically comprised of nested primitives (e.g. sum, concat, gemm). We can put them directly in src/gpu/generic without any subdirectories.

├── cpu/ # CPU code
├── sycl/ # A common place for basic CPU and GPU SYCL abstractions, e.g. `sycl_cpu_engine_t`, `sycl_gpu_engine_t`, etc.
└── gpu/ # GPU code. Basic GPU code resides in the GPU directory directly, e.g. `gpu_engine_t`.
├── intel/ # Everything that is related to Intel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the proposal also includes adjustment of namespaces (like moving all Intel-specific GPU functionality into dnnl::impl::gpu::intel, is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the plan, we already have it for amd and nvidia so we will get one for intel too. I'll add information about name spaces to the RFC.

@densamoilov densamoilov force-pushed the dsamoylo/rfcs/reorganize-gpu branch 3 times, most recently from 04aaa14 to abaf407 Compare March 28, 2024 06:29
* SYCL Intel: `engine_t` -> `gpu_engine_t` -> `gpu_intel_engine_t` -> `compute_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_intel_engine_t`
* SYCL NVIDIA: `engine_t` -> `gpu_engine_t` -> `gpu_nvidia_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_nvidia_engine_t`
* SYCL AMD: `engine_t` -> `gpu_engine_t` -> `gpu_amd_engine_t` -> `sycl_engine_base_t` -> `sycl_gpu_amd_engine_t`
* OpenCL: `engine_t` -> `gpu_engine_t` -> `gpu_intel_engine_t` -> `compute_engine_t` -> `ocl_gpu_engine_t`
Copy link
Contributor

@rjoursler rjoursler Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels to me like we are missing some abstractions of the runtime which is causing unnecessary inheritance. A
lot of the runtime abstraction is currently coming from the compute_*_t code. What if we broke that out into its own thing and had something more like:

// runtime_t encapsulates runtime kernel object creation, and runtime support info, at its core,
// I believe this should be a  runtime_device_id, and runtime_device_context.
ocl_runtime_t 
sycl_runtime_t

// Engines implementations contain a `runtime_t` object used for implementing the engine_t interface
// along with any <vendor>_device_info_t that may not be available from the the runtime. We likely need
// an extra layer of inheritance or a modification to `engine_t` to expose the `<vendor>_device_info_t`
// and a few runtime interfaces in a generic way.
SYCL NVIDIA: engine_t -> nvidia_sycl_engine_t
SYCL AMD:  engine_t -> amd_sycl_engine_t
SYCL INTEL:  engine_t -> intel_sycl_engine_t
OCL INTEL: engine_t -> intel_ocl_engine_t

Then rather than have a generic folder, we have some runtime folders opencl and sycl where the runtime implementation and generic kernel implementations are stored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get it correctly, the idea sounds very similar to what I described in my comment above (the sycl_engine_base_t part). I think we will probably end up having something like that anyway because as I said in the comment duplicating sycl_engine_base_t doesn't look right. I haven't thought about OpenCL though as it doesn't have the engine base class problem but I like the general idea and I'm currently figuring out what the design should look like.

Also, we need to keep in mind that for SYCL runtime we need to be able to support the following scenarios: only generic kernels enabled, NVIDIA + generic kernels enabled and AMD + generic kernels enabled. That's why we should not tie the generic kernels to any particular vendor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjoursler, I updated the RFC, please take a look.

│ ├── sycl/ # AMD related functionality/abstractions (MIOpen/rocBLAS based and HIP specific SYCL kernels)
│ └── ...
└── generic/ # Everything that is related to generic kernels
└── sycl/ # generic SYCL kernels and related abstractions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the criteria for an OpenCL/SYCL kernel to not be generic? Most of our OpenCL kernels are generic in theory, so long as the runtime supports the necessary features. Do we have a method for separating dispatching/configuration heuristics from implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be driven by usage scenarios. If no plans to cover non-Intel hardware with OpenCL kernels - then we can assume all OpenCL kernels are Intel-specific (e.g. they can rely on Intel-specific features). Anything else will result in extra effort to support separation, etc. SYCL kernels may target both Nvidia/AMD hardware - so reusability potential is there but probably need more exploration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SYCL spec defines what's generic and what's not: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:generic-vs-non-generic.

I would say that a SYCL code that uses any vendor specific extensions would be considered as non-generic (e.g. sycl_ext_intel_grf_size, etc).

This proposal is a result of our collaboration with UXL Foundation. We agreed to provide generic SYCL kernels that should work with any vendor if they are supported by the compiler. In addition, we need to pave the way for more vendors to be supported. Therefore, we need to look beyond Intel vendor when reviewing the proposed architecture.

* SYCL NVIDIA: `engine_t` -> `gpu::engine_t` -> `gpu::nvidia::engine_t`
* SYCL AMD: `engine_t` -> `gpu::engine_t` -> `gpu::amd::engine_t`
* OpenCL: `engine_t` -> `gpu::engine_t` -> `gpu::intel::engine_t` -> `gpu::intel::ocl::engine_t`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on the plan to share generic implementation lists across sycl engines?
Would we keep separate implementation list for each vendor (and add generic impls in each of those), make the generic impl list common somehow? ...

const dnnl::impl::engine_impl_t *impl() const;

// Points to either dnnl::impl::sycl::engine_impl_t or dnnl::impl::intel::ocl::engine_impl_t
std::unique_ptr<dnnl::impl::engine_impl_t> impl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the type erasure here necessary? What if we just templated this to be:

template <impl_kind_t>
struct engine_t: public dnnl::impl::engine_t {
    const impl_kind_t * impl() const;
    // We could also remove indirection from `unique_ptr` if desired.
    std::unique_ptr<impl_kind_t> impl;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC A design document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants