Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_map fails for the complex32 case with gfortran-10 with -O3 #19

Open
kyrtle opened this issue Nov 17, 2020 · 10 comments
Open

test_map fails for the complex32 case with gfortran-10 with -O3 #19

kyrtle opened this issue Nov 17, 2020 · 10 comments

Comments

@kyrtle
Copy link

kyrtle commented Nov 17, 2020

Hello,
Thank you for this excellent library.
Here, I would report an issue. As I invoked ctest after installation, following error message was shown:

96% tests passed, 1 tests failed out of 25
Total Test time (real) =   7.27 sec
The following tests FAILED:
	 16 - test_map (Failed)
Errors while running CTest

Would you please see to this?
Best regards,
Tamoghna

@milancurcic
Copy link
Member

Hi @tamoghna-4119,

Thanks for reporting this. I can't reproduce it with gfortran-9.3.0 or ifort-2021.1--all tests pass.

What compiler and version do you use?

Also, can you run bin/test_map from the build directory and let me know the output that you get?

@kyrtle
Copy link
Author

kyrtle commented Nov 25, 2020

Thank you for your answer and I am really sorry for replying only now.
It might be a problem with gcc-10.2.0.
bin/test_map fails for complex real32. All other 9tests have passed.
I would try with gfortran-9.

Regards,
dT

@milancurcic
Copy link
Member

I dig a little deeper into this. I put a few diagnostic prints in test_map.f90:

! test that fails
tests(n) = assert(all(map(xpowx_c4, c4) == c4**c4), 'map,  complex real32')
n = n + 1

print *, 'c4', c4
print *, 'map(xpowx_c4, c4)', map(xpowx_c4, c4)
print *, 'c4**c4', c4**c4
print *, 'map(xpowx_c4, c4) == c4**c4', map(xpowx_c4, c4) == c4**c4
print *, 'all(map(xpowx_c4, c4) == c4**c4)', all(map(xpowx_c4, c4) == c4**c4)

This test fails with gfortran-10 (not with 9, 8, or 7), and only with -O3 or higher optimization level.

$ gfortran-10 -O3 src/functional.f90 test/testing.f90 test/test_map.f90 && ./a.out
test map,  int8                                                           : PASS
test map,  int16                                                          : PASS
test map,  int32                                                          : PASS
test map,  int64                                                          : PASS
test map,  real32                                                         : PASS
test map,  real64                                                         : PASS
test map,  real128                                                        : PASS
test map,  complex real32                                                 : FAIL
 c4             (1.00000000,0.00000000)             (2.00000000,0.00000000)             (3.00000000,0.00000000)
 map(xpowx_c4, c4)             (1.00000000,0.00000000)             (4.00000000,0.00000000)             (27.0000019,0.00000000)
 c4**c4             (1.00000000,0.00000000)             (4.00000000,0.00000000)             (27.0000019,0.00000000)
 map(xpowx_c4, c4) == c4**c4 T T T
 all(map(xpowx_c4, c4) == c4**c4) F
test map,  complex real64                                                 : PASS
test map,  complex real128                                                : PASS
Ran a total of  10 tests.
  9 tests PASSED,    1 tests FAILED.
STOP 1

Although all 3 elements evaluate as equal, evaluating all() of that array comes back false:

 map(xpowx_c4, c4) == c4**c4 T T T
 all(map(xpowx_c4, c4) == c4**c4) F

My guess is that at high optimization level the compiler tries to do something too aggressive with either side of the == comparison when it's an argument to all().

Right now I don't have a solution but I will keep this open.

@milancurcic milancurcic changed the title test_map Failed test_map fails for the complex32 case with gfortran-10 with -O3 Feb 16, 2021
@zmoon
Copy link
Contributor

zmoon commented Mar 16, 2021

When I was trying to figure out how to make it pass (7bcf168 / #23), I discovered that it's not just with all() that weird things happen. Assigning map(xpowx_c4, c4) == c4**c4 to a variable I got T T F, even though map(xpowx_c4, c4) == c4**c4 evaluates to T T T (or at least prints out that way).

@nakib
Copy link
Contributor

nakib commented Dec 11, 2022

Hello all,

I can reproduce this test failure for v0.6.2 for gfortran 11.3.0 using the default build flags.

Best,
Nakib

@nakib
Copy link
Contributor

nakib commented Dec 11, 2022

Not sure if this adds to the confusion, the following evaluates to false:

tests(n) = assert(all(map(xpowx_c4, c4) == &
     [cmplx(1.0_real32, 0.0_real32), &
     cmplx(4.0_real32, 0.0_real32), &
     cmplx(27.0_real32, 0.0_real32)]), 'map,  complex real32')

whereas the following evaluates to true:

tests(n) = assert(all(c4**c4 == &
     [cmplx(1.0_real32, 0.0_real32), &
     cmplx(4.0_real32, 0.0_real32), &
     cmplx(27.0_real32, 0.0_real32)]), 'map,  complex real32')

for the given definition of c4 in the test. I have checked that it does not matter if the arguments are swapped across the == sign.

Interestingly, the following passes the test:

tests(n) = assert(all(map(xpowx_c8, c8) == c4**c4), 'map,  complex real64 LHS, complex real32 RHS')

This precision loss issue seems to be due to the application of the map function. Rather strange! Will play with it more.

Best,
Nakib

@nakib
Copy link
Contributor

nakib commented Dec 24, 2022

I have done further digging. It turns out that the compiler flag -fpeel-loops causes this particular test to fail. This is one of the flags that gets tacked on on top of the flags defining the -O2 optimization level when we use the -O3 optimization. Read more here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

I quote from the page above the description of -fpeel-loops:

-fpeel-loops

Peels loops for which there is enough information that they do not roll much (from profile feedback or static analysis). It also turns on complete loop peeling (i.e. complete removal of loops with small constant number of iterations).

Enabled by -O3, -fprofile-use, and -fauto-profile.

It is unclear to me how this flag interacts with the application of the map_c4 function to result in the failed test. It seems like -fpeel-loops in conjuction with -O1 and -O2 will cause this issue. It works fine with the -O0 flag.

Perhaps it is better to ship with the -O2 optimization by default instead of -O3.

@milancurcic
Copy link
Member

Thanks for digging further! Are you sure -fpeel-loops causes it? I can still reproduce the issue with -O3 -fno-peel-loops.

@nakib
Copy link
Contributor

nakib commented Dec 25, 2022

Hi Milan,

Good point. I too see the test failing with -O3 -fno-peel-loops. However, the test passes with -O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides. Here I have added all the additional O3 level flags by hand except for -fpeel-loops. Naively, I would think that this and -O3 -fno-peel-loops should give the same results.

@milancurcic
Copy link
Member

I see, thanks. In that case, I assume that it's some undocumented optimization going from -O2 to -O3 that breaks the map for complex values. I'm not sure yet whether it's better to use -O2 or -O3 for release. On one hand, all compiled code should work, and on the other hand, I don't want to penalize the entire library for one edge case. If nothing else, we can document the issue in README.md as a note for the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants