Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEMeanStdDevNormalizationLayer returns nans for f16 tensors #1095

Open
alvoron opened this issue Feb 20, 2024 · 3 comments
Open

NEMeanStdDevNormalizationLayer returns nans for f16 tensors #1095

alvoron opened this issue Feb 20, 2024 · 3 comments
Assignees
Milestone

Comments

@alvoron
Copy link

alvoron commented Feb 20, 2024

NEMeanStdDevNormalizationLayer returns nans if srd\dst tensors are f16.
The issue was reproduced on ACL 23.08

How ACL was built: scons neon=1 opencl=0 openmp=0 cppthreads=1 arch=armv8.6-a Werror=false validation_tests=1 --jobs=8 os=macos build=native --silent fixed_format_kernels=1 asserts=1 debug=1

How reproducer was built: clang++ -O2 -g -I./ComputeLibrary -I./ComputeLibrary/include mvn_bug.c -o bug -L./ComputeLibrary/build/ -L./ComputeLibrary/build/tests/ -L./ComputeLibrary/build/tests/framework/ -larm_compute -lAssetsLibrary.o -lRawTensor.o -lExceptions.o -std=c++17

Issue was reproduced on Apple M1

Reproducer:

#include "arm_compute/core/TensorShape.h"

#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/functions/NEMeanStdDevNormalizationLayer.h"

#include "tests/Utils.h"
#include "tests/AssetsLibrary.h"
#include "tests/NEON/Accessor.h"

#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {
   size_t X = 128;
   size_t Y = 64;
   float epsValue_ = 0.00000999999974f;

  TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);

  auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }

  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);

  NEMeanStdDevNormalizationLayer mvn;
  mvn.configure(&srcTensor, &dstTensor, epsValue_);
  std::cout << "PASSED CONFIGURATION" << std::endl;

  srcTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();

  AssetsLibrary library(".", std::random_device()());
  std::uniform_real_distribution<> distribution{ -2000.0f, 3000.0f };
  library.fill(Accessor(srcTensor), distribution, 0);

  srcTensor.print(std::cout);
  mvn.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);

  srcTensor.allocator()->free();
  dstTensor.allocator()->free();

  return 0;
}
@morgolock
Copy link

Hi @alvoron

I managed to reproduce this, however the range of input values in your test [-2000.f,3000.f] is not supported for float16_t in the operator NEMeanStdDevNormalizationLayer.

We just test for values in the range [-1.f , 1.f] see https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/fixtures/MeanStdDevNormalizationLayerFixture.h#L61

I've also modified the test to use [-1000.f, 1000.f] and I see no nans

 18  int main(int argc, char *argv[]) {
 19    size_t X = 128;
 20    size_t Y = 64;
 21    float epsValue_ = 0.00000999999974f;
 22 
 23   TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
 24   TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
 25 
 26   auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
 27   if(status.error_code() != ErrorCode::OK) {
 28     std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
 29     exit(1);
 30   }
 31 
 32   std::cout << "PASSED VALIDATION" << std::endl;
 33     
 34   Tensor srcTensor;
 35   Tensor dstTensor;
 36   srcTensor.allocator()->init(srcTensorInfo);
 37   dstTensor.allocator()->init(dstTensorInfo);
 38 
 39   NEMeanStdDevNormalizationLayer mvn;
 40   mvn.configure(&srcTensor, &dstTensor, epsValue_);
 41   std::cout << "PASSED CONFIGURATION" << std::endl;
 42 
 43   srcTensor.allocator()->allocate();
 44   dstTensor.allocator()->allocate();
 45 
 46    std::uniform_real_distribution<float> distribution(-1000.0f, 1000.0f);
 47    Window window;
 48    window.use_tensor_dimensions(srcTensor.info()->tensor_shape());
 49    execute_window_loop(window,
 50                            [&](const Coordinates &id)
 51                              {
 52                                 const auto value                                  = static_cast<float16_t>(distribution(gen));
 53                                  *reinterpret_cast<float16_t *>(srcTensor.ptr_to_element(id)) = float16_t(value);
 54           });                    
 55           
 56   srcTensor.print(std::cout);
 57   mvn.run();
 58   std::cout << "PASSED RUN" << std::endl;
 59   dstTensor.print(std::cout);
 60 
 61   srcTensor.allocator()->free();
 62   dstTensor.allocator()->free();
 63 
 64   return 0;

What's the use case for the range of values [-2000.0f, 3000.0f] ? is there a model using this?

Hope this helps

@alvoron
Copy link
Author

alvoron commented Mar 15, 2024

The issue is reproduced on style transfer model.
I've got [-2000, 3000] range there.

I was able to reproduce the issue with the range [0, 1000].
Could you try?

@morgolock
Copy link

Hi @alvoron

Thank you for sharing the details. The following patch fixes the problem: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11311

This fix will be included in 24.05

Hope this helps.

@morgolock morgolock added this to the v24.05 milestone Mar 19, 2024
@morgolock morgolock self-assigned this Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants