nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时报错FatalError: `Erroneous arithmetic operation` #63560

Eddie-Wang1120 · 2024-04-16T08:40:07Z

bug描述 Describe the Bug

nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时报错。
在相同的数据格式和矩阵rank场景下，nn.functional.group_norm支持NCHW数据排布。

复现步骤（最小代码集）

import paddle

x = paddle.arange(72, dtype="float16").reshape((2, 6, 6))
group_norm_out = paddle.nn.functional.group_norm(x, num_groups=6, data_format='NHWC')

报错信息

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
W0416 08:34:06.418946  6860 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.3, Runtime API Version: 12.0
W0416 08:34:06.448040  6860 gpu_resources.cc:164] device: 0, cuDNN Version: 8.8.


--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::pybind::eager_api_group_norm(_object*, _object*, _object*)
1   group_norm_ad_func(paddle::Tensor const&, paddle::optional<paddle::Tensor> const&, paddle::optional<paddle::Tensor> const&, float, int, std::string)
2   paddle::experimental::group_norm_intermediate(paddle::Tensor const&, paddle::optional<paddle::Tensor> const&, paddle::optional<paddle::Tensor> const&, float, int, std::string const&)

----------------------
Error Message Summary:
----------------------
FatalError: `Erroneous arithmetic operation` is detected by the operating system.
  [TimeInfo: *** Aborted at 1713256446 (unix time) try "date -d @1713256446" if you are using GNU date ***]
  [SignalInfo: *** SIGFPE (@0x7fd7b9c6e5bb) received by PID 6860 (TID 0x7fd8415c2740) from PID 18446744072531404219 ***]

Floating point exception

复现环境

paddlepaddle-gpu develop
最新commit
NVIDIA GeForce RTX 3050 16G

其他补充信息 Additional Supplementary Information

group_norm函数在非常多的模型中都需要用到，出现报错的场景也是模型推理中使用频率非常高的，非常希望该问题能得到重视以及解决！非常感谢！

The text was updated successfully, but these errors were encountered:

yuanlehome · 2024-04-16T09:13:18Z

问题已经收到，感谢使用Paddle。
你的问题是由于当前paddle.nn.functional.group_norm动态图api仅支持NCHW格式的输入导致的，至于对于NHWC格式的支持计划，我需要向其他同事了解一下再做回复。

Eddie-Wang1120 · 2024-04-16T11:03:04Z

问题已经收到，感谢使用Paddle。你的问题是由于当前paddle.nn.functional.group_norm动态图api仅支持NCHW格式的输入导致的，至于对于NHWC格式的支持计划，我需要向其他同事了解一下再做回复。

谢谢回复！非常希望问题得到解决，再次向paddle员工表示感谢！

Eddie-Wang1120 · 2024-04-16T11:28:40Z

问题已经收到，感谢使用Paddle。你的问题是由于当前paddle.nn.functional.group_norm动态图api仅支持NCHW格式的输入导致的，至于对于NHWC格式的支持计划，我需要向其他同事了解一下再做回复。

还有一个问题，我测试后发现这个api是支持fp32精度下的NHWC格式输入的，只有半精度不支持，可以参考一下这个测试结果

yuanlehome · 2024-04-16T13:14:31Z

你跑的是GPU？我看GPU已经支持了半精度的NHWC格式，出现报错的原因可能是有些BUG。。。

yuanlehome · 2024-04-16T13:26:33Z

你试一下rank为4呢？目前只支持了rank==4的情况。
是对rank==3有强需求吗？

Eddie-Wang1120 · 2024-04-17T02:57:42Z

你试一下rank为4呢？目前只支持了rank==4的情况。是对rank==3有强需求吗？

跑的是gpu，确实是有强需求的，模型推理时会用到rank==3的情况

zhwesky2010 · 2024-04-22T07:11:38Z

@Eddie-Wang1120 请问目前是缺rank=3还是NHWC/NCHW还是float16

Eddie-Wang1120 · 2024-04-22T08:08:00Z

@Eddie-Wang1120 请问目前是缺rank=3还是NHWC/NCHW还是float16

缺少nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时的支持，当前情况下会报错。

zhwesky2010 · 2024-04-22T08:09:54Z

@Eddie-Wang1120 请问目前是缺rank=3还是NHWC/NCHW还是float16

缺少nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时的支持，当前情况下会报错。

需求收到，我们会进行开发。

Eddie-Wang1120 · 2024-04-28T09:19:15Z

请问目前开发的进度如何？ @yuanlehome @zhwesky2010

Eddie-Wang1120 added status/new-issue 新建 type/bug-report 报bug labels Apr 16, 2024

paddle-bot bot assigned yuanlehome Apr 16, 2024

paddle-bot bot added status/following-up 跟进中 and removed status/new-issue 新建 labels Apr 19, 2024

paddle-bot bot added status/developing 开发中 and removed status/following-up 跟进中 labels Apr 22, 2024

zhwesky2010 mentioned this issue May 7, 2024

API Improvement for paddle.nn.functional.group_norm and paddle.nn.GroupNorm #63881

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时报错FatalError: `Erroneous arithmetic operation` #63560

nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时报错FatalError: `Erroneous arithmetic operation` #63560

Eddie-Wang1120 commented Apr 16, 2024 •

edited

yuanlehome commented Apr 16, 2024

Eddie-Wang1120 commented Apr 16, 2024

Eddie-Wang1120 commented Apr 16, 2024

yuanlehome commented Apr 16, 2024

yuanlehome commented Apr 16, 2024 •

edited

Eddie-Wang1120 commented Apr 17, 2024 •

edited

zhwesky2010 commented Apr 22, 2024 •

edited

Eddie-Wang1120 commented Apr 22, 2024

zhwesky2010 commented Apr 22, 2024

Eddie-Wang1120 commented Apr 28, 2024

nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时报错FatalError: Erroneous arithmetic operation #63560

nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时报错FatalError: Erroneous arithmetic operation #63560

Comments

Eddie-Wang1120 commented Apr 16, 2024 • edited

bug描述 Describe the Bug

复现步骤（最小代码集）

报错信息

复现环境

其他补充信息 Additional Supplementary Information

yuanlehome commented Apr 16, 2024

Eddie-Wang1120 commented Apr 16, 2024

Eddie-Wang1120 commented Apr 16, 2024

yuanlehome commented Apr 16, 2024

yuanlehome commented Apr 16, 2024 • edited

Eddie-Wang1120 commented Apr 17, 2024 • edited

zhwesky2010 commented Apr 22, 2024 • edited

Eddie-Wang1120 commented Apr 22, 2024

zhwesky2010 commented Apr 22, 2024

Eddie-Wang1120 commented Apr 28, 2024

nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时报错FatalError: `Erroneous arithmetic operation` #63560

nn.functional.group_norm 在float16格式下，数据排布为NHWC且输入矩阵rank=3时报错FatalError: `Erroneous arithmetic operation` #63560

Eddie-Wang1120 commented Apr 16, 2024 •

edited

yuanlehome commented Apr 16, 2024 •

edited

Eddie-Wang1120 commented Apr 17, 2024 •

edited

zhwesky2010 commented Apr 22, 2024 •

edited