[Kernel] Add GPU kernels. #372

changqi1 · 2024-05-07T10:08:21Z

No description provided.

pujiang2018 · 2024-05-15T00:59:34Z

src/common/allocator.h

    if (nbytes == 0) { return nullptr; }

    void *data;

+#ifdef GPU
+    if (device != nullptr) {
+        data = sycl::malloc_device<char>(nbytes, *static_cast<sycl::queue *>(device));


The allocation may fail, need to deal with the fail case?

pujiang2018 · 2024-05-15T01:02:32Z

src/layers/attention.h

+#ifdef GPU
+            sycl::queue *q = static_cast<sycl::queue *>(ctx->device);
+            int64_t size = ctx->batchSize * ctx->inputSeqLen * qkvCols * sizeof(float);
+            q->memcpy(qkvMatMul.Data(), query.Data(), size).wait();


why do we need a copy here?

pujiang2018 · 2024-05-15T01:07:31Z

src/models/model_factory.h

Do not need to change this file.

pujiang2018 · 2024-05-15T01:25:06Z

src/layers/mlp_llama.h

@@ -275,8 +275,7 @@ class LlamaMLP : public SingletonBase<LlamaMLP<WeiT>> {
        }
    }

-    template <typename T1, typename T2>
-    void catGateUpProj(DecoderContext *ctx, hpj::Matrix<T1> &input, hpj::Matrix<T2> &output, hpj::Matrix<T2> &siluBuf) {
+    void catGateUpProj(DecoderContext *ctx, hpj::Matrix<InT> &input, hpj::Matrix<ImT> &output, hpj::Matrix<ImT> &siluBuf) {


changed because of compiler error/warning? suggest using OutT for output.

pujiang2018 · 2024-05-15T01:29:15Z

src/utils/matmul_helper.h

@@ -349,8 +349,11 @@ class MMHelper {
        // W8A8
        else if constexpr (std::is_same_v<WeiT, w8a8_t>) {
            using dt = dnnl::memory::data_type;
+            dnnl::engine eng(dnnl::engine::kind::cpu, 0);
+            dnnl::stream stm(eng);


why now need to create an engine and stream every time calling into the function? it may impact the performance.

changqi1 added enhancement New feature or request gpu Related to GPU labels May 7, 2024

changqi1 requested review from pujiang2018, Duyi-Wang and abenmao May 7, 2024 10:09

changqi1 marked this pull request as draft May 7, 2024 13:21

changqi1 added 3 commits May 7, 2024 18:07

[Kernel] Add GPU kernels.

aff6786

format code

0b1be0e

fix running issue.

3ef6143

changqi1 marked this pull request as ready for review May 14, 2024 03:41

pujiang2018 reviewed May 15, 2024

View reviewed changes

changqi1 added 2 commits May 15, 2024 20:57

Add RmsNorm kernel.

619e788

Fix some issues.

39ec0b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Add GPU kernels. #372

[Kernel] Add GPU kernels. #372

changqi1 commented May 7, 2024

pujiang2018 May 15, 2024

pujiang2018 May 15, 2024

pujiang2018 May 15, 2024

pujiang2018 May 15, 2024

pujiang2018 May 15, 2024

[Kernel] Add GPU kernels. #372

Are you sure you want to change the base?

[Kernel] Add GPU kernels. #372

Conversation

changqi1 commented May 7, 2024

pujiang2018 May 15, 2024

Choose a reason for hiding this comment

pujiang2018 May 15, 2024

Choose a reason for hiding this comment

pujiang2018 May 15, 2024

Choose a reason for hiding this comment

pujiang2018 May 15, 2024

Choose a reason for hiding this comment

pujiang2018 May 15, 2024

Choose a reason for hiding this comment