Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

521.wrf_r segfaults when built with -flto=full -O2 #1429

Open
bryanpkc opened this issue Nov 1, 2023 · 17 comments
Open

521.wrf_r segfaults when built with -flto=full -O2 #1429

bryanpkc opened this issue Nov 1, 2023 · 17 comments

Comments

@bryanpkc
Copy link
Collaborator

bryanpkc commented Nov 1, 2023

After upgrading to LLVM 17, the 521.wrf_r benchmark in SPEC CPU2017 is miscompiled at O3 with LTO enabled. Examination of the crash site shows that a large amount of code that used to be inlined into wrf_init (via alloc_and_configure_domain) has been deleted, leaving the head_grid pointer uninitialized.

The root cause of this problem is that upstream LLVM had strengthened the pruning of unreachable code to handle undef and poison. Specifically, these three patches are found to be relevant:

Classic Flang leaves variables uninitialized by default. The corresponding IR values are therefore considered to contain undef. In our downstream compiler, we have tested with initializing all variables to zeroes, and that successfully avoids the problem.

628.pop2_s is also affected by this problem.

@bryanpkc bryanpkc changed the title 521.wrf_r segfaults when built with -flto=full -O3 521.wrf_r segfaults when built with -flto=full -Ofast Nov 1, 2023
@bryanpkc
Copy link
Collaborator Author

bryanpkc commented Nov 1, 2023

As discussed at the call, I will follow up in this issue with information about which variables are left uninitialized in the source, and whether the problem occurs at -O2/-O3.

@bryanpkc bryanpkc changed the title 521.wrf_r segfaults when built with -flto=full -Ofast 521.wrf_r segfaults when built with -flto=full -O2/-O3 Nov 1, 2023
@bryanpkc bryanpkc changed the title 521.wrf_r segfaults when built with -flto=full -O2/-O3 521.wrf_r segfaults when built with -flto=full -O2 Nov 1, 2023
@bryanpkc
Copy link
Collaborator Author

bryanpkc commented Nov 1, 2023

The problem does occur at -O2 and -O3.

The compiler flags I used are:

-Mallocatable=03 -flto=full -O2 -mcpu=native -ffast-math -Mbyteswapio

The linker flags I used (with ld.lld) are:

-flto=full -O2 -mcpu=native -ffast-math

@bryanpkc
Copy link
Collaborator Author

bryanpkc commented Nov 1, 2023

The problematic undef values that caused some basic blocks to be pruned correspond to the dummy arguments ipex and jpex in the function compute_memory_dims_rsl_lite (module_dm.F90, line 1088). This function is called by patch_domain_rsl_lite, which passes a whole bunch of actual arguments, none of which is initialized before the call.

@bryanpkc
Copy link
Collaborator Author

bryanpkc commented Nov 1, 2023

@pawosm-arm @shivaramaarao Could you check things on your side to see if you can reproduce the problem, or determine why you don't see the problem?

@pawosm-arm
Copy link
Collaborator

pawosm-arm commented Nov 1, 2023

Hi @bryanpkc assuming the problem manifests itself at runtime, I need to ask, what input data set are you using? (test, ref, or something else). Also, with -mcpu=native being used at build time, what are the CPU features (namely, NEON vs SVE)?

@bryanpkc
Copy link
Collaborator Author

bryanpkc commented Nov 2, 2023

@pawosm-arm I have confirmed that the problem can be reproduced without specifying any -mcpu= option. The run-time segmentation fault happens during benchmark start-up; I am using the test data set.

@pawosm-arm
Copy link
Collaborator

Been testing it all night in our CI, with your flags, with O3 and even with Ofast, and it didn't fail. It's time to do it manually, but bear in mind, building spec's wrf with LTO take ages, and I want to try two different builds of the compiler, so it may take days before I'd confirm anything.

@pawosm-arm
Copy link
Collaborator

Just noticed, our CI always adds -funroll-loops to the explicitly specified set of flags, but I doubt it matters

@pawosm-arm
Copy link
Collaborator

...also we don't seem to use (or even build) lld, but it also should make no difference on what LTO does when optimizing the code

@pawosm-arm
Copy link
Collaborator

Should 628.pop2_s also fail during benchmark start-up? It takes forever to execute it, so I guess it's not affected in my builds...

@bryanpkc
Copy link
Collaborator Author

bryanpkc commented Nov 3, 2023

@pawosm-arm Can you confirm in the 521.wrf_r source files that the actual arguments I mentioned are not initialized? I am using an older version of the benchmark (v1.0.5) and perhaps SPEC has changed the source since then?

@shivaramaarao
Copy link
Collaborator

shivaramaarao commented Nov 3, 2023 via email

@pawosm-arm
Copy link
Collaborator

@pawosm-arm Can you confirm in the 521.wrf_r source files that the actual arguments I mentioned are not initialized? I am using an older version of the benchmark (v1.0.5) and perhaps SPEC has changed the source since then?

I've got two versions here: 1.0.1 and 1.1.5. The module_dm.F90 file is exactly the same in both. These params are not initialized at call site, but they don't have to, they have all INTENT(OUT) in the calleee, so if any of those variables is used below the call site, the compiler has no liberty in removing any code that contributes to the value they're having assigned.

Since I can't reproduce the problem with the most recent release of our compiler, I'll try to build spec2k17's wrf with vanilla classic.

@pawosm-arm
Copy link
Collaborator

pawosm-arm commented Nov 3, 2023

Since I can't reproduce the problem with the most recent release of our compiler, I'll try to build spec2k17's wrf with vanilla classic.

Indeed, wrf fails (with segfault) right after start when built like that (spec2k17 ver. 1.1.5), but pop2 doesn't.
Note that stack size was set to unlimited (this is to prevent usual reason for wrf's segfaulting)

@bryanpkc
Copy link
Collaborator Author

As I mentioned above, our downstream compiler has an option to zero-initialize variables automatically, so we have a workaround, which we may be able to upstream. gfortran has a similar set of options. Oracle's Fortran compiler has an -xcheck=init_local option that initializes variables to trapping values, for debugging purpose. ifort has a similar -check uninit option.

I had a quick look at the language standard. While it discusses definition and undefinition of variables, it does not specify any default initialization behaviour, and uses of uninitialized variables are undefined behaviour. From what I can see in online discussions, programmers are encouraged to define variables explicitly.

What should be our next step? Any opinion? @pawosm-arm @shivaramaarao

@pawosm-arm
Copy link
Collaborator

I'm slightly confused. Doesn’t classic-flang already zero-initialize variables (at least globals/module variables)?

@bryanpkc
Copy link
Collaborator Author

I think the global/module variables are allocated in the BSS section and zero-initialized only as a side effect. Local variables that are not explicitly initialized are left undefined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants