Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error messages not printing to HEMCO.log #236

Open
1 task done
cbutenhoff opened this issue Aug 22, 2023 · 1 comment
Open
1 task done

Error messages not printing to HEMCO.log #236

cbutenhoff opened this issue Aug 22, 2023 · 1 comment
Labels
category: Interface to External Model Related to HEMCO updates needed to interface with other models (GEOS, WRF, CESM, etc) never stale Never label this issue as stale topic: ESMF or MPI Related to issues in the ESMF and/or MPI environments

Comments

@cbutenhoff
Copy link

Name and Institution (Required)

Name: Chris
Institution: Butenhoff

Confirm you have reviewed the following documentation

Description of your issue or question

I'm running HEMCO 3.0.0 within GCHP 13.1.2.

Apologies if this issue has been addressed by a newer version of HEMCO, but I believe I found a case where an HEMCO error message should have been written to the HEMCO.log file but wasn't.

My GCHP job failed with the following messages in the GCHP output file:

HEMCO: array pointer vertically flipped relative to MAPL Import ENERGY_LEVS
 HEMCO: array pointer vertically flipped relative to MAPL Import INDUSTRY_LEVS
  --- Do convection now
===============================================================================
GEOS-Chem ERROR: Error encountered in "HCO_Run"!
 -> at HCOI_GC_Run (in module GeosCore/hco_interface_gc_mod.F90)

THIS ERROR ORIGINATED IN HEMCO!  Please check the HEMCO log file for
additional error messages!
===============================================================================

===============================================================================
GEOS-Chem ERROR: Error encountered in "HCOI_GC_Run"!
 -> at Emissions_Run (in module GeosCore/emissions_mod.F90)
===============================================================================

pe=00002 FAIL at line=01201    gchp_chunk_mod.F90                       <Error calling EMISSIONS_RUN>
pe=00002 FAIL at line=03619    Chem_GridCompMod.F90                     <status=1>
pe=00002 FAIL at line=02672    Chem_GridCompMod.F90                     <status=1>
pe=00002 FAIL at line=01839    MAPL_Generic.F90                         <needs informative message>
pe=00002 FAIL at line=00573    GCHP_GridCompMod.F90                     <status=1>
pe=00002 FAIL at line=01839    MAPL_Generic.F90                         <needs informative message>
pe=00002 FAIL at line=01162    MAPL_CapGridComp.F90                     <status=1>
pe=00002 FAIL at line=01086    MAPL_CapGridComp.F90                     <status=1>
pe=00002 FAIL at line=00792    MAPL_CapGridComp.F90                     <status=1>
pe=00002 FAIL at line=00922    MAPL_CapGridComp.F90                     <status=1>
pe=00002 FAIL at line=00247    MAPL_Cap.F90                             <status=1>
pe=00002 FAIL at line=00211    MAPL_Cap.F90                             <status=1>
pe=00002 FAIL at line=00154    MAPL_Cap.F90                             <status=1>
pe=00002 FAIL at line=00129    MAPL_Cap.F90                             <status=1>
pe=00002 FAIL at line=00030    GCHPctm.F90                              <status=1>
===============================================================================
GEOS-Chem ERROR: Error encountered in "HCO_Run"!
 -> at HCOI_GC_Run (in module GeosCore/hco_interface_gc_mod.F90)

THIS ERROR ORIGINATED IN HEMCO!  Please check the HEMCO log file for
additional error messages!

As you can see, the message states that the error occurred in HEMCO and to check for error messages in the HEMCO log file. However, even when I set the Verbose and Warning flags to the maximum value of 3, there were no error messages in HEMCO.log,

I eventually tracked the error down to an erroneous negative flux in one of my custom emissions inventories that caused the array TmpFlux in subroutine Hco_CalcEmis to be negative and it entered this IF-clause:

 ! Check for negative values according to the corresponding setting
       ! in the configuration file: 2 means allow negative values, 1 means
       ! set to zero and prompt a warning, else return with error.
       IF ( HcoState%Options%NegFlag /= 2 ) THEN
          ! CLB 08/21/2023
          PRINT*, 'CLB in hco_calc_mod.F90: CalcEmis4.1'

          IF ( ANY(TmpFlx < 0.0_hp) ) THEN
          
             ! CLB 08/21/2023
             PRINT*, 'CLB in hco_calc_mod.F90: CalcEmis4.2'

             ! Set to zero and prompt warning
             IF ( HcoState%Options%NegFlag == 1 ) THEN
                ! CLB 08/21/2023
                PRINT*, 'CLB in hco_calc_mod.F90: CalcEmis4.3'
                WHERE ( TmpFlx < 0.0_hp ) TmpFlx = 0.0_hp
                MSG = 'Negative emissions set to zero: '// TRIM(Dct%cName)
                CALL HCO_WARNING( HcoState%Config%Err, MSG, RC )

             ! Return with error
             ELSE
                ! CLB 08/21/2023
                PRINT*, 'CLB in hco_calc_mod.F90: CalcEmis4.4', Dct%HcoID
                MSG = 'Negative emissions in: '// TRIM(Dct%cName) // '. ' // &
                'To allow negatives, edit settings in the configuration file.'
                CALL HCO_ERROR( HcoState%Config%Err, MSG, RC )
                RETURN
             ENDIF
          ENDIF
       ENDIF

Since I didn't have the Negative values flag set to 1 in HEMCO_Config.rc, HEMCO should have written the error message in the ELSE clause.

The routine HCO_ERROR calls HCO_MSGErr:

SUBROUTINE HCO_MSGErr( Err, Msg, Sep1, Sep2, Verb )
!
! !INPUT PARAMETERS:
!
    TYPE(HcoErr),     POINTER                  :: Err
    CHARACTER(LEN=*), INTENT(IN   ), OPTIONAL  :: Msg
    CHARACTER(LEN=1), INTENT(IN   ), OPTIONAL  :: Sep1
    CHARACTER(LEN=1), INTENT(IN   ), OPTIONAL  :: Sep2
    INTEGER,          INTENT(IN   ), OPTIONAL  :: Verb
!
! !REVISION HISTORY:
!  23 Sep 2013 - C. Keller   - Initialization
!  See https://github.com/geoschem/hemco for complete history
!EOP
!------------------------------------------------------------------------------
!BOC
    LOGICAL  :: IsOpen
    INTEGER  :: LUN

    !======================================================================
    ! HCO_MSG begins here
    !======================================================================

    ! Check if Err object is indeed defined
    IF ( .NOT. ASSOCIATED(Err) ) THEN
       IsOpen = .FALSE.
    ELSE
       IsOpen = Err%LogIsOpen
       
    ! CLB 08/21/2023
       PRINT*, 'CLB In hco_error_mod.F90 l434', Msg

       ! Don't print if this is not the root CPU
       IF ( .NOT. Err%IsRoot ) RETURN
       
       ! CLB 08/21/2023
       PRINT*, 'CLB In hco_error_mod.F90 l440', Msg


       ! Don't print if verbose level is smaller than verbose level of this
       ! CPU.
       IF ( PRESENT( Verb ) ) THEN
          IF ( Verb < Err%Verbose ) RETURN
       ENDIF
    ENDIF

You can see the MSG is not printed if this is not the root CPU. I was running GCHP using multiple nodes/cores so apparently the process running this was not on the root CPU. When I comment this line out, the MSG is written to GCHP.out (or standard output) but not HEMCO.log.

I'm not sure how often this issue comes up, but fixing it could save considerable debugging time.

Thanks.

@jimmielin
Copy link
Collaborator

Hi, thanks for reporting this bug. I think that a lot of the HEMCO error handling is still reliant on the root thread and if something crashes at a non-root process it is very hard to debug.

I think to make HEMCO errors more MPI-safe it might be better to output HEMCO.log into separate files in the MPI environment, i.e., HEMCO.log.0000, HEMCO.log.0001, ... but there would be some performance impacts of I/O on non-root threads. Alternatively maybe the modeling framework would provide an I/O handle to print errors in (CESM captures standard output into one single file in cesm.log.... for example) but this is not universal (WRF still writes into separate files despite capturing STDOUT and STDERR). Not sure about MAPL so I am pinging @lizziel who knows much more about MAPL than I do. But we may have to work together to find a solution that works for all models coupling with HEMCO.

@jimmielin jimmielin added topic: ESMF or MPI Related to issues in the ESMF and/or MPI environments category: Interface to External Model Related to HEMCO updates needed to interface with other models (GEOS, WRF, CESM, etc) labels Aug 23, 2023
@yantosca yantosca added the never stale Never label this issue as stale label Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Interface to External Model Related to HEMCO updates needed to interface with other models (GEOS, WRF, CESM, etc) never stale Never label this issue as stale topic: ESMF or MPI Related to issues in the ESMF and/or MPI environments
Projects
None yet
Development

No branches or pull requests

3 participants