-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running ngen using mpirun generates a NetCDF: HDF error #749
Comments
There is some old code at https://github.com/stcui007/ngen/tree/I481_netcdf that seems to run the MPI job to the finish without error, which might possibly provide an initial start point for a satisfactory solution. |
Here is the netCDF related info: This netCDF 4.7.4 has been built with the following features:
|
Which machine is that on? |
It is on UCS3, probaly some other UCS series also. |
Replied on Issue page. UCS3 and probably other UCS machines.
…On Tue, Mar 5, 2024 at 10:15 AM Phil Miller - NOAA ***@***.***> wrote:
Which machine is that on?
—
Reply to this email directly, view it on GitHub
<#749 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA4SRJBBEMM634KTUMU22LYWXVSLAVCNFSM6AAAAABEABRW52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGEZTOOJTGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Just realized, there are more info in /local/lib/lib/pkgconfig. |
$ h5c++ -showconfig
|
For reference, the NetCDF file in question based on the realization config used:
|
The partition file: |
We identified a potential cause for this issue: the NetCDF files are not being closed before In particular, An example of this issue (not exactly, but close enough), is seen in the following minimum example: #include <mpi.h>
#include <memory>
#include <netcdf>
int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
auto x = std::make_shared<netCDF::NcFile>(argv[1], netCDF::NcFile::read);
for (const auto& kv : x->getVars()) {
std::cout << kv.first << std::endl;
}
MPI_Finalize();
return 0;
} where compiling and running this results in:
|
The current
ngen
code runs correctly in serial mode with NetCDF forcing file, but produces a HDF5 file close error after finishing the time steps in MPI parallel mode.Current behavior
The mpirun finishes the 720 time steps but output a HDF5 file close error afterward as follows.
.....
Finished 720 timesteps.
NGen top-level timings:
NGen::init: 0.651571
NGen::simulation: 0.416964
NGen::routing: 7.173e-06
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 0:
#000: H5D.c line 332 in H5Dclose(): not a dataset
major: Invalid arguments to routine
minor: Inappropriate type
NetCDF: HDF error
file: ncFile.cpp line:33
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 0:
#000: H5D.c line 332 in H5Dclose(): not a dataset
major: Invalid arguments to routine
minor: Inappropriate type
NetCDF: HDF error
file: ncFile.cpp line:33
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 0:
#000: H5D.c line 332 in H5Dclose(): not a dataset
major: Invalid arguments to routine
minor: Inappropriate type
NetCDF: HDF error
file: ncFile.cpp line:33
Expected behavior
......
Finished 720 timesteps.
NGen top-level timings:
NGen::init: 0.511167
NGen::simulation: 0.605978
NGen::routing: 1.281e-06
At least for the serial run.
Steps to replicate behavior (include URLs)
Build the codes with:
Run
ngen
as follows wheretest_partition_cats3.json
is a partition file generated withcatchment_data.geojson
andnexus_data.geojson
hydrofabric that come with the master branch.The run finishes the 720 time steps but output a HDF5 file close error afterward.
Screenshots
The text was updated successfully, but these errors were encountered: