[BUG/ISSUE] Significant performance degradation on multiple platforms affecting GEOS-Chem Classic and GCHP #57

LiamBindle · 2020-10-16T16:16:12Z

There appears to be a significant hit in GEOS-Chem Classic and GCHP performance on some platforms, particularly those using gfortran, stemming from somewhere in HEMCO. This issue has been observed by @lizziel, @jimmielin, @WilliamDowns, and myself.

The four of us just finished a call discussing the issue and how to proceed. Some notes from that meeting are in the collapsed block below

Zoom call notes

HEMCO slow down call [2020-10-16 Fri 10:00]
- Wil has seen in calc emis
- Get current emissions
- Haipeng had similar with get emis
  - Open MP loop becomes serial loop
  - Removing subroutine call (copy-pasting code directly into get emis loop fixed issue)
  - His slow down was in boundary layer mixing (~100x)
  - Potentially Open MP issue
- GCHP still uses OpenMP, only problematic >1000 cores
- Troubleshoot idea: check if there's a difference in performance when GCHP::HEMCO is built without OpenMP
- Will saw slow down in Get Current emissions code
  - Some (but not all) emissions containers hang for a while
  - Haipeng's seen some containers would hang for a while too
- Look at gfortran flags that could give warnings about performance things
- Can it be reproduced with HEMCO standalone
  - Lizzie is going to look into
- Liam will open issue on HEMCO
- Liam will check if building HEMCO without OpenMP speeds up
- Will will continue running timing tests and narrowing in on which emissions collections are problematic
- Haipeng will try GEOS-Chem classic with gfortran and some different compiler setting

Recordings of the issue

Lizzie's internal benchmarks show that HEMCO's performance in GEOS-Chem Classic with gfortran has been deteriorating. Wall times for HEMCO in GEOS-Chem Classic :

Version	gfortran	ifort
alpha.4	396	232
alpha.5	976	216
alpha.9	1732	376

In some GCHP timing tests that Lizzie and I ran a few weeks ago, I observed a very significant drop off in GCHP's scaling (see figure below). Note that the solid line are Lizzie's tests with ifort and the dashed lines are my tests with gfortran. The drop off in performance was dominated by a big slow down in HOCI_GC_RUN(...).

Further information

In GCHP timing tests by @WilliamDowns, he has seen a significant slow down in the Get Current Emissions code. He's found that certain (but not all) emissions containers exhibit significant slow downs.
@jimmielin saw a performance hit in GC-Classic due to a bad loop in "Get Emissions". He managed to resolve the slow down by replacing a subroutine call in bad-loop with the subroutines code (i.e. inline copy-paste). He suspects his issue was OpenMP related.

Action items

@WilliamDowns will continue to run timing tests, to try to narrow in on the specific calls and containers that are slow
@lizziel will see if the issue can be reproduced with HEMCO standalone
@LiamBindle will see if building HEMCO in GCHPctm without OpenMP, improves anything
@jimmielin is going to try some different gfortran compiler options in GC-Classic

We will continue discussing in this thread.

The text was updated successfully, but these errors were encountered:

WilliamDowns · 2020-10-16T17:27:31Z

Moving the entire code of get_current_emissions into the body of hco_calcemis has not resulted in a speed-up (my slowdown continues to occur somewhere in that function's work). Will continue timing tests / pinning down where it's occurring in there.

LiamBindle · 2020-10-16T18:31:07Z

As @jimmielin suggested, it definitely appears to be related to OpenMP.

Here is my timing test result for GCHP alpha.10 with OpenMP enabled and disabled. These were 2-week timing tests at C48 with 192 cores and gfortran 8.3.

OpenMP	Model throughput
Enabled	113 days/day
Disabled	163 days/day

Here is the same figure I showed above, but including this new timing test with OpenMP disabled. Note that the blue X corresponds to this new timing test with OpenMP disabled. Also note that the blue X is the same timing test as the solid red point below it.

There is a 40-90% increase in speed if OpenMP is disabled. I think this confirms it's related to OpenMP!

lizziel · 2020-10-16T19:26:57Z

Interesting. I'll do my HEMCO standalone tests for both OpenMP enabled and disabled.

lizziel · 2020-10-16T21:19:29Z

I am seeing significant differences in run-time in HEMCO standalone due to OpenMP. The differences across compilers are negligible in comparison. For all compilers disabling OpenMP results in reduction of run-time.

Wall-times for 1-month run

Compiler	OMP=y (24 threads)	OMP=n
gfortran 8.3	00:19:37	00:15:04
gfortran 9.3	00:21:32	00:14:59
ifort18	00:21:15	00:17:10
ifort19	00:21:41	00:16:10

All tests used HEMCO branch dev/GC_13.0.0. For the run directory I used GC-Classic fullchem HEMCO_Config.rc and HEMCO_Diagn.rc from GEOS-Chem branch dev/13.0.0.

WilliamDowns · 2020-10-19T21:40:28Z

For a 6-hour simulation in GCHPctm with 30 cores on 1 node using gfortran 9.3 and Spack-built OpenMPI 4.0.5, disabling OpenMP yields a total run time of 14m14s. With OpenMP enabled, the same run has not finished after 1h40m (it's 3h40m into the run). Mine may not be an issue unique to HEMCO based on cursory glances at the pace of the run outside of the very slow emissions timesteps. I'll have to rerun for more details as Cannon is about to shutdown.

yantosca · 2021-03-02T22:19:17Z

Has there been any more work into this issue? I am wondering if maybe using a combination of !$OMP COLLAPSE() and !OMP SCHEDULE( DYNAMIC, n ) might help here. I can try to look into this as I have been doing other profiling tests.

yantosca · 2021-03-03T22:41:15Z

I have been looking into this. It seems that testing if pointers LevDct1, LevDct2 are associated within routine GetVertIndx can have a big performance hit. The pointers are associated in GET_CURRENT_EMISSIONS and so can be tested there, and logical arguments can be passed to GetVertIndx indicating if the pointers are null or not.

Also a similar check on whether the PBLHEIGHT and BXHEIGHT fields are in the HcoState can be moved out of routine GetIdx, which is called on each (I,J). I am not sure if this breaks anything though.

I will do more testing/profiling. At first glance the COLLAPSE does not bring as big of a benefit as I thought, but cleaning up these tests if pointers are associated seem to be the way to get more performance here.

stale · 2021-04-02T23:01:20Z

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

lizziel · 2021-04-09T20:03:20Z

From comment above: "Also a similar check on whether the PBLHEIGHT and BXHEIGHT fields are in the HcoState can be moved out of routine GetIdx, which is called on each (I,J). I am not sure if this breaks anything though."

It turns out this does break things in the new HEMCO intermediate grid update I am bringing in (GEOS-Chem PR 681). Initialization of State_Met%LandTypeFrac previously used HCO_GetPtr. With the update it instead uses HCO_GC_EvalFld, a new wrapper around HCO_EvalFld. With this method the data is retrieved via Get_Current_Emissions where the new checks are outside of the IJ loop.

The problem with this is that during initialization Emissions_Run has only been called with Phase=0. Phase 0 skips the call to GridEdge_Set, and by extension HCO_SetPBLm, resulting in HcoState%Grid%PBLHEIGHT%Val not yet being associated. I don't think the code that uses PBLHEIGHT would ever be reached for the 2D Olson land map, but the new code still checks that it is associated.

I'm considering two different possible fixes:
(1) Wrap the BXHEIGHT%Val and PBLHEIGHT%Val associated checks in Get_Current_Emissions so they are only done if isLevDct1 or isLevDct2. For this solution I'm not sure if the error handling would still cover all cases.
(2) Change the Olson data retrieval back to getting a pointer (@jimmielin, could you comment on why you made this change?)

jimmielin · 2021-04-09T21:27:43Z

Hi Lizzie - I think we can just use HCO_GC_GetPtr instead. I changed it to EvalFld as we were planning to changing the GetPtrs to EvalFlds, but it would be unphysical to scale land type fractions anyway. One thing to note with the new intermediate grid updates is that the data has to be copied out of the pointer.

I think simply replacing
CALL HCO_GC_EvalFld( Input_Opt, State_Grid, Name, State_Met%LandTypeFrac(:,:,T), RC, FOUND=FND )

with a call to CALL HCO_GC_GetPtr ( ... Ptr2D ...) then copying the data like State_Met%LandTypeFrac(:,:,T) = Ptr2D(:,:) should resolve this issue.

lizziel · 2021-04-10T01:40:47Z

Okay, I will put it back to GetPtr then. I previously kept it as GetPtr rather than change it to EvalFld because, as you say, it doesn't make sense to scale/mask the land mask.

I'm still wondering if we could move those PBLHEIGHT checks elsewhere. It should only need to be checked once during a run.

lizziel · 2021-08-03T19:54:56Z

Does anyone know if this OpenMP problem in HEMCO is still an issue in GEOS-Chem 13? @jimmielin, @WilliamDowns, @LiamBindle

LiamBindle · 2021-08-03T20:03:39Z

Sorry, I don't. I'm not aware of any updates on it though so I suspect it still exists.

bbakernoaa · 2022-09-21T19:06:52Z

Has there been any update on this? I'm noticing when running MEGAN offline that it doesn't scale at all with CPUs.

LiamBindle added the category: Bug Something isn't working label Oct 16, 2020

LiamBindle mentioned this issue Dec 15, 2020

[QUESTION] How do I optimize the number of cores and nodes for a C48 simulation? geoschem/GCHP#69

Closed

sdeastham mentioned this issue Jan 14, 2021

[BUG/ISSUE] Crash in OpenMP loop in hcox_lightnox_mod geoschem/GCHP#75

Closed

stale bot added the stale No recent activity on this issue label Apr 2, 2021

jimmielin added never stale Never label this issue as stale and removed stale No recent activity on this issue labels Apr 3, 2021

yantosca added the topic: Performance Related to HEMCO performance, parallelization, or memory issues label Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/ISSUE] Significant performance degradation on multiple platforms affecting GEOS-Chem Classic and GCHP #57

[BUG/ISSUE] Significant performance degradation on multiple platforms affecting GEOS-Chem Classic and GCHP #57

LiamBindle commented Oct 16, 2020 •

edited

WilliamDowns commented Oct 16, 2020 •

edited

LiamBindle commented Oct 16, 2020 •

edited

lizziel commented Oct 16, 2020

lizziel commented Oct 16, 2020

WilliamDowns commented Oct 19, 2020

yantosca commented Mar 2, 2021

yantosca commented Mar 3, 2021

stale bot commented Apr 2, 2021

lizziel commented Apr 9, 2021

jimmielin commented Apr 9, 2021

lizziel commented Apr 10, 2021 •

edited

lizziel commented Aug 3, 2021

LiamBindle commented Aug 3, 2021

bbakernoaa commented Sep 21, 2022

[BUG/ISSUE] Significant performance degradation on multiple platforms affecting GEOS-Chem Classic and GCHP #57

[BUG/ISSUE] Significant performance degradation on multiple platforms affecting GEOS-Chem Classic and GCHP #57

Comments

LiamBindle commented Oct 16, 2020 • edited

Recordings of the issue

Further information

Action items

WilliamDowns commented Oct 16, 2020 • edited

LiamBindle commented Oct 16, 2020 • edited

lizziel commented Oct 16, 2020

lizziel commented Oct 16, 2020

WilliamDowns commented Oct 19, 2020

yantosca commented Mar 2, 2021

yantosca commented Mar 3, 2021

stale bot commented Apr 2, 2021

lizziel commented Apr 9, 2021

jimmielin commented Apr 9, 2021

lizziel commented Apr 10, 2021 • edited

lizziel commented Aug 3, 2021

LiamBindle commented Aug 3, 2021

bbakernoaa commented Sep 21, 2022

LiamBindle commented Oct 16, 2020 •

edited

WilliamDowns commented Oct 16, 2020 •

edited

LiamBindle commented Oct 16, 2020 •

edited

lizziel commented Apr 10, 2021 •

edited