[Inductor] support masked vectorization for the tail_loop #126526

jiayisunx · 2024-05-17T09:17:19Z

Stack from ghstack (oldest at bottom):

-> [Inductor] support masked vectorization for the tail_loop #126526

Currently the tail_loop always uses the scalar kernel. This PR supports masked vectorization for the tail_loop to improve the performance.

Generated code:

Before:

    {
        #pragma GCC ivdep
        for(long x0=static_cast<long>(0L); x0<static_cast<long>(2L); x0+=static_cast<long>(1L))
        {
            #pragma GCC ivdep
            for(long x1=static_cast<long>(0L); x1<static_cast<long>(3L); x1+=static_cast<long>(1L))
            {
                {
                    Welford<float> tmp_acc0 = Welford<float>();
                    Welford<at::vec::Vectorized<float>> tmp_acc0_vec = Welford<at::vec::Vectorized<float>>();
                    static WeightRecp<at::vec::Vectorized<float>> weight_recps(static_cast<long>(67L));
                    for(long x2=static_cast<long>(0L); x2<static_cast<long>(36L); x2+=static_cast<long>(1L))
                    {
                        for(long x3=static_cast<long>(0L); x3<static_cast<long>(16L); x3+=static_cast<long>(16L))
                        {
                            auto tmp0 = at::vec::Vectorized<bfloat16>::loadu(in_ptr0 + static_cast<long>(x3 + (30L*x1) + (90L*x2) + (3240L*x0)), 16);
                            auto tmp1 = at::vec::convert<float>(tmp0);
                            tmp_acc0_vec = welford_combine(tmp_acc0_vec, tmp1, &weight_recps);
                        }
                        #pragma omp simd simdlen(8)
                        for(long x3=static_cast<long>(16L); x3<static_cast<long>(30L); x3+=static_cast<long>(1L))
                        {
                            auto tmp0 = in_ptr0[static_cast<long>(x3 + (30L*x1) + (90L*x2) + (3240L*x0))];
                            auto tmp1 = c10::convert<float>(tmp0);
                            tmp_acc0 = welford_combine(tmp_acc0, tmp1);
                        }
                    }
                    tmp_acc0 = welford_combine(tmp_acc0, welford_vec_reduce_all(tmp_acc0_vec));
                    out_ptr0[static_cast<long>(x1 + (3L*x0))] = static_cast<float>(tmp_acc0.mean);
                    out_ptr1[static_cast<long>(x1 + (3L*x0))] = static_cast<float>(tmp_acc0.m2);
                }
            }
        }
    }

After:

    {
        #pragma GCC ivdep
        for(long x0=static_cast<long>(0L); x0<static_cast<long>(2L); x0+=static_cast<long>(1L))
        {
            #pragma GCC ivdep
            for(long x1=static_cast<long>(0L); x1<static_cast<long>(3L); x1+=static_cast<long>(1L))
            {
                {
                    Welford<float> tmp_acc0 = Welford<float>();
                    Welford<at::vec::Vectorized<float>> tmp_acc0_vec = Welford<at::vec::Vectorized<float>>();
                    static WeightRecp<at::vec::Vectorized<float>> weight_recps(static_cast<long>(36L), static_cast<long>(16L), static_cast<long>(14L));
                    for(long x2=static_cast<long>(0L); x2<static_cast<long>(36L); x2+=static_cast<long>(1L))
                    {
                        for(long x3=static_cast<long>(0L); x3<static_cast<long>(16L); x3+=static_cast<long>(16L))
                        {
                            auto tmp0 = at::vec::Vectorized<bfloat16>::loadu(in_ptr0 + static_cast<long>(x3 + (30L*x1) + (90L*x2) + (3240L*x0)), 16);
                            auto tmp1 = at::vec::convert<float>(tmp0);
                            tmp_acc0_vec = welford_combine(tmp_acc0_vec, tmp1, &weight_recps);
                        }
                        for(long x3=static_cast<long>(16L); x3<static_cast<long>(30L); x3+=static_cast<long>(14L))
                        {
                            auto tmp0 = at::vec::Vectorized<bfloat16>::loadu(in_ptr0 + static_cast<long>(x3 + (30L*x1) + (90L*x2) + (3240L*x0)), 14);
                            auto tmp1 = at::vec::convert<float>(tmp0);
                            tmp_acc0_vec = welford_combine(tmp_acc0_vec, tmp1, 14, &weight_recps);
                        }
                    }
                    tmp_acc0 = welford_combine(tmp_acc0, welford_vec_reduce_all(tmp_acc0_vec));
                    out_ptr0[static_cast<long>(x1 + (3L*x0))] = static_cast<float>(tmp_acc0.mean);
                    out_ptr1[static_cast<long>(x1 + (3L*x0))] = static_cast<float>(tmp_acc0.m2);
                }
            }
        }
    }

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-05-17T09:17:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126526

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Upgrade MacOS runner to 14

✅ You can merge normally! (6 Unrelated Failures)

As of commit b72e62a with merge base bf2909b ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-14) (gh) (matched macos rule in flaky-rules.json)
File doesn't exist

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

periodic / linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu) (gh) (trunk failure)
distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardNDTraining::test_2d_mlp_with_nd_mesh
periodic / linux-focal-rocm6.1-py3.8 / test (distributed, 1, 2, linux.rocm.gpu) (gh) (trunk failure)
distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard2DTraining::test_tp_with_fsdp_offloading

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#126884)
eca_halonext26ts
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal, unstable) (gh) (#126993)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) ()
inductor/test_efficient_conv_bn_eval.py::EfficientConvBNEvalCudaTests::test_basic_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 4e2aa6dfafd14ce90a1dc5b91cb4cbd59ad35628 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 455564d916d1df7e8ff02a6ec56333e4925995d0 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 44ba52dbbdac02b1b94a8b6c5c519bbfd85098c7 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 978570fd0bcb1d06151639c9cc9443707a1457dd Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 6d8d821a710fe38916d82149fd6a4947d33c5447 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: e0c28ce4cd364062091b7ac139b89439f25ac48b Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 09ad78c02f454ae64aed6e2e6c130194a68fc419 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: d468379dceab2966884ac812b9a64817151e9002 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 0ab10c9a6daa57f8a6e2264cf069ac341cd05edf Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 22a8a067dacb9b6212864aef39fc2bbdff98bd54 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: da65b846b7bda00dfba25164b23ab0258a7418b1 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 0a0eee47774df2936bc6a52ee1177416408187f6 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 57ca43d0ef1f782aa091cbd29c50f549a880c5aa Pull Request resolved: pytorch#126526

[ghstack-poisoned]

ghstack-source-id: 7532cf936ddf8e29b459b4a137b9375cb105462a Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 13de524dbe3f6030d3386363d5071630f9dd8910 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: a049492e711bb7ce726936844d83a490d12ec811 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 687c03d321edf1334e11dc9beec54dd98ae289e6 Pull Request resolved: #126526

jgong5 · 2024-05-27T04:02:36Z

torch/_inductor/codegen/cpp.py

+        self.supported_dtypes_for_masked_vec: List[torch.dtype] = [
+            torch.float,
+            torch.bfloat16,


Why do we want to limit this? Can we treat it same as normal vec check? We are simplifying the vec checker and target removing it entirely some day. Not good to further complicate it.

Currently, some operations on integer types do not support masked vectorization well, so these data types cannot yet support masked vectorization for the tail_loop. I will try to solve these issues in the near future.

Please add a comment here that it will be removed in the near future after we support all data types.

BTW, do you mind stack a PR now to support all data types?

jgong5 · 2024-05-27T04:04:54Z

torch/_inductor/codegen/cpp_prefix.h

+
+template <typename T>
+T reduce(const T& a, const T& b, const std::string& reduction_type) {
+  if (reduction_type == "max") {


This essentially moves the compile time checks to runtime. I don't think it is the right thing to do.

I have modified this part, please review it again, thanks!

jgong5 · 2024-05-27T04:13:12Z

torch/_inductor/codegen/cpp_prefix.h

+  // Guard against division by zero
+  wb_over_w = T::blendv(wb_over_w, T(0), new_weight == T(0));
+  auto new_mean = a.mean + delta * wb_over_w;
+  auto new_m2 = a.m2 + b.m2 + delta * delta * a.weight * wb_over_w;


Can we avoid code dedup between tail version and main version?

jgong5 · 2024-05-27T04:13:18Z

torch/_inductor/codegen/cpp_prefix.h

+            : delta * w->weight_recps[new_index]);
+  }
+  auto new_delta = data - new_mean;
+  auto new_m2 = acc.m2 + delta * new_delta;


jgong5 · 2024-05-27T04:20:32Z

torch/_inductor/codegen/cpp.py

+            if self.tiling_idx >= self.reduction_depth:
+                # calculate the reduction size that will be vectorized
+                reduction_inner_size = (
+                    self.ranges[-1]


self.ranges[-1] holds for either of the conditions, right?

Also, is the assumption that self.tiling_idx == len(sef.ranges) - 1 so that the vectorization happens on the inner-most loop?

Yes, vectorization happens on the inner-most loop.

According to what has been observed so far, self.ranges[-1] holds for either of the conditions.

Then, why can't we just do self.ranges[-1]?

Yes, vectorization happens on the inner-most loop.

To be safe, can we add an assertion here?

added, please review it again, thanks!

jgong5 · 2024-05-27T04:21:52Z

torch/_inductor/codegen/cpp.py

+                    else self.ranges[self.reduction_depth]
+                )
+                # calculate loops size outside the vectorized loop
+                self.reduction_outer_size = reduction_size / reduction_inner_size


Suggested change

self.reduction_outer_size = reduction_size / reduction_inner_size

self.reduction_outer_size = reduction_size // reduction_inner_size

Done, thanks!

[ghstack-poisoned]

ghstack-source-id: 73e336bcaca9297efc513763d989766e6ae1c857 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: 0718b9f54a25cd5adec7acefd614c736a7835c19 Pull Request resolved: #126526

[ghstack-poisoned]

ghstack-source-id: cacf3e8267b8f7578f407bf91ff8fc9be3f80988 Pull Request resolved: #126526

leslie-fang-intel

I think we need the UTs to check the vecmask used for tail loop.
Will it be clearer to add a new subclass maybe CppVecMaskKernel(CppVecKernel) and lift up common code from CppVecKernel as a method to be overwritten or reuse?

leslie-fang-intel · 2024-05-29T06:40:58Z

torch/_inductor/codegen/cpp_prefix.h

+  int64_t outer_size;
+  int64_t main_size;
+  int64_t tail_size;
+  std::vector<T> weight_recps;


Why we change the type from T::value_type to T?

We use T , i.e., vec type instead of scalar type, as all element values of weight(vec type) may no longer be identical in masked vec welford reduce since weight may be masked.

[ghstack-poisoned]

ghstack-source-id: b6674cd6c2f3e2624afa4e73997aaa3407f11597 Pull Request resolved: #126526

jiayisunx · 2024-05-30T07:04:27Z

I think we need the UTs to check the vecmask used for tail loop.

Added, thanks!

Will it be clearer to add a new subclass maybe CppVecMaskKernel(CppVecKernel) and lift up common code from CppVecKernel as a method to be overwritten or reuse?

I don't have a strong opinion, but adding a new subclass might introduce some code duplication, @jgong5, do you have any opinion?

jgong5 · 2024-05-30T23:37:55Z

I don't have a strong opinion, but adding a new subclass might introduce some code duplication, @jgong5, do you have any opinion?

I guess what @leslie-fang-intel meant was to factor out some functions from CppVecKernel to be overridden by CppVecMaskKernel, e.g., how the "load", "store" and "reduction" lines are generated. This can avoid code duplication you mentioned.

Update

473ce1e

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels May 17, 2024

jiayisunx added a commit that referenced this pull request May 17, 2024

[Inductor] support masked vectorization for the tail_loop

921a8ff

ghstack-source-id: 4e2aa6dfafd14ce90a1dc5b91cb4cbd59ad35628 Pull Request resolved: #126526

jiayisunx marked this pull request as draft May 17, 2024 09:17

pytorchbot added the open source label May 17, 2024

Update

278a763

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 20, 2024

[Inductor] support masked vectorization for the tail_loop

3d5286d

ghstack-source-id: 455564d916d1df7e8ff02a6ec56333e4925995d0 Pull Request resolved: #126526

Update

6f1f23c

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 20, 2024

[Inductor] support masked vectorization for the tail_loop

6b3ebfa

ghstack-source-id: 44ba52dbbdac02b1b94a8b6c5c519bbfd85098c7 Pull Request resolved: #126526

Update

c8db188

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 20, 2024

[Inductor] support masked vectorization for the tail_loop

8a9796f

ghstack-source-id: 978570fd0bcb1d06151639c9cc9443707a1457dd Pull Request resolved: #126526

Update

b21a003

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 20, 2024

[Inductor] support masked vectorization for the tail_loop

0627476

ghstack-source-id: 6d8d821a710fe38916d82149fd6a4947d33c5447 Pull Request resolved: #126526

Update

e8a28af

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 21, 2024

[Inductor] support masked vectorization for the tail_loop

ca75779

ghstack-source-id: e0c28ce4cd364062091b7ac139b89439f25ac48b Pull Request resolved: #126526

Update

a0061b2

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 21, 2024

[Inductor] support masked vectorization for the tail_loop

5eeef49

ghstack-source-id: 09ad78c02f454ae64aed6e2e6c130194a68fc419 Pull Request resolved: #126526

Update

b15206c

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 21, 2024

[Inductor] support masked vectorization for the tail_loop

0949028

ghstack-source-id: d468379dceab2966884ac812b9a64817151e9002 Pull Request resolved: #126526

Update

e51acda

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 22, 2024

[Inductor] support masked vectorization for the tail_loop

e2b9a8d

ghstack-source-id: 0ab10c9a6daa57f8a6e2264cf069ac341cd05edf Pull Request resolved: #126526

Update

e0184eb

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 22, 2024

[Inductor] support masked vectorization for the tail_loop

3a8d1b2

ghstack-source-id: 22a8a067dacb9b6212864aef39fc2bbdff98bd54 Pull Request resolved: #126526

Update

cff55d9

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 22, 2024

[Inductor] support masked vectorization for the tail_loop

bd49e4e

ghstack-source-id: da65b846b7bda00dfba25164b23ab0258a7418b1 Pull Request resolved: #126526

Update

1354b35

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 23, 2024

[Inductor] support masked vectorization for the tail_loop

bc8c61b

ghstack-source-id: 0a0eee47774df2936bc6a52ee1177416408187f6 Pull Request resolved: #126526

Update

8aff7cf

[ghstack-poisoned]

CaoE pushed a commit to CaoE/pytorch that referenced this pull request May 26, 2024

[Inductor] support masked vectorization for the tail_loop

a869c22

ghstack-source-id: 57ca43d0ef1f782aa091cbd29c50f549a880c5aa Pull Request resolved: pytorch#126526

CaoE pushed a commit to CaoE/pytorch that referenced this pull request May 26, 2024

[Inductor] support masked vectorization for the tail_loop

edd7161

ghstack-source-id: 57ca43d0ef1f782aa091cbd29c50f549a880c5aa Pull Request resolved: pytorch#126526

Update

2892017

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 27, 2024

[Inductor] support masked vectorization for the tail_loop

138cffd

ghstack-source-id: 7532cf936ddf8e29b459b4a137b9375cb105462a Pull Request resolved: #126526

Update

8d1f74d

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 27, 2024

[Inductor] support masked vectorization for the tail_loop

4d530d6

ghstack-source-id: 13de524dbe3f6030d3386363d5071630f9dd8910 Pull Request resolved: #126526

Update

76b6e37

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 27, 2024

[Inductor] support masked vectorization for the tail_loop

592d814

ghstack-source-id: a049492e711bb7ce726936844d83a490d12ec811 Pull Request resolved: #126526

Update

5cecb4e

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 27, 2024

[Inductor] support masked vectorization for the tail_loop

e7994df

ghstack-source-id: 687c03d321edf1334e11dc9beec54dd98ae289e6 Pull Request resolved: #126526

jiayisunx marked this pull request as ready for review May 27, 2024 03:02

jiayisunx requested review from jgong5, CaoE and leslie-fang-intel May 27, 2024 03:02

jgong5 requested changes May 27, 2024

View reviewed changes

jiayisunx added the ciflow/trunk Trigger trunk jobs on your pull request label May 27, 2024

CaoE added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label May 27, 2024

Update

03d3298

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 27, 2024

[Inductor] support masked vectorization for the tail_loop

151f92e

ghstack-source-id: 73e336bcaca9297efc513763d989766e6ae1c857 Pull Request resolved: #126526

jiayisunx requested a review from jgong5 May 28, 2024 01:07

jiayisunx mentioned this pull request May 28, 2024

[Inductor] support masked vectorization for the tail_loop for float16 datatype #127274

Closed

Update

5eba805

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 28, 2024

[Inductor] support masked vectorization for the tail_loop

6e2bb27

ghstack-source-id: 0718b9f54a25cd5adec7acefd614c736a7835c19 Pull Request resolved: #126526

Update

911047d

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 29, 2024

[Inductor] support masked vectorization for the tail_loop

7d48850

ghstack-source-id: cacf3e8267b8f7578f407bf91ff8fc9be3f80988 Pull Request resolved: #126526

leslie-fang-intel reviewed May 29, 2024

View reviewed changes

Update

b72e62a

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request May 30, 2024

[Inductor] support masked vectorization for the tail_loop

d8d6133

ghstack-source-id: b6674cd6c2f3e2624afa4e73997aaa3407f11597 Pull Request resolved: #126526

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor] support masked vectorization for the tail_loop #126526

[Inductor] support masked vectorization for the tail_loop #126526

jiayisunx commented May 17, 2024 •

edited

pytorch-bot bot commented May 17, 2024 •

edited

jgong5 May 27, 2024

jiayisunx May 27, 2024 •

edited

jgong5 May 29, 2024

jiayisunx May 30, 2024

jgong5 May 30, 2024

jgong5 May 27, 2024

jiayisunx May 28, 2024

jgong5 May 27, 2024

jiayisunx May 28, 2024

jgong5 May 27, 2024

jiayisunx May 28, 2024

jgong5 May 27, 2024

jgong5 May 27, 2024

CaoE May 27, 2024

CaoE May 27, 2024

jgong5 May 29, 2024

jgong5 May 29, 2024

jiayisunx May 30, 2024

jgong5 May 27, 2024

jiayisunx May 28, 2024

leslie-fang-intel left a comment

leslie-fang-intel May 29, 2024

CaoE May 30, 2024

jiayisunx commented May 30, 2024

jgong5 commented May 30, 2024

	self.reduction_outer_size = reduction_size / reduction_inner_size
	self.reduction_outer_size = reduction_size // reduction_inner_size

[Inductor] support masked vectorization for the tail_loop #126526

Are you sure you want to change the base?

[Inductor] support masked vectorization for the tail_loop #126526

Conversation

jiayisunx commented May 17, 2024 • edited

pytorch-bot bot commented May 17, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126526

❗ 1 Active SEVs

✅ You can merge normally! (6 Unrelated Failures)

Choose a reason for hiding this comment

jiayisunx May 27, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leslie-fang-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiayisunx commented May 30, 2024

jgong5 commented May 30, 2024

jiayisunx commented May 17, 2024 •

edited

pytorch-bot bot commented May 17, 2024 •

edited

jiayisunx May 27, 2024 •

edited