Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regridding bathymetry from high resolution to output grid #349

Open
jvmcgovern opened this issue Mar 14, 2024 · 5 comments
Open

Regridding bathymetry from high resolution to output grid #349

jvmcgovern opened this issue Mar 14, 2024 · 5 comments

Comments

@jvmcgovern
Copy link

I'm trying to use xesmf (latest version on conda-forge) to conservatively remap bathymetry for a ROMS/CROCO model. I created lat_b and lon_b fields for the input (regular) and output (rotated) grids. Because of the size of the input data (~16000x12000) and output data (~1300x1050), I thought parallel=true would work. I'm getting the following error:

runtimeerror: an attempt has been made to start a new process before the current process has finished its bootstrapping phase. this probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:

I've tried this with and without chunking of the input data. I'm using chunking of the output field of approx 550x700 but to no avail. Any pointers would be appreciated. Do I have to go straight to ESMF?

@raphaeldussin
Copy link
Contributor

you probably need to use the ESMF parallel remapping tool, see this for how to do it on HPC: https://xesmf.readthedocs.io/en/stable/large_problems_on_HPC.html

if that does not work, take a look at https://github.com/ESMG/gridtools
I also have a package that's still very much under development here: https://github.com/raphaeldussin/sloppy

@jvmcgovern
Copy link
Author

Thanks @raphaeldussin. I'm working now on installing ESMF (I cant find the command line tools from the esmpy install).

For future reference, is there a convenient way to interrogate at system level in a python environment what kind of upper limits there are on computation.

@aulemahal
Copy link
Collaborator

aulemahal commented Mar 14, 2024

On one hand, I think the error is unrelated to xESMF, but rather to dask.

May be this issue can help. Activating parallel=True in xESMF forces dask to use processes and this might be the reason the error popped at that moment.

HOWEVER, as noted in the docstring and in the notebook, parallel=True will perform the weights generation for multiple blocks of the output grid in parallel. The input grid is loaded completely in memory for each block of the output grid.
In your case, the output grid is much smaller than the input grid, which means that you won't be making any RAM gain with the option, it might be even worse.

I would try without it.

EDIT: I'll add a note that if you have an environment with xESMF, Then ESMF_RegridWeightGen is already installed inside it!

@raphaeldussin
Copy link
Contributor

@jvmcgovern the ESMF command line tools should already be in the conda env where you installed xesmf, it comes as part of the ESMF conda package

@jvmcgovern
Copy link
Author

I used ESMF on my HPC server (with 396 cores) and I got an apparent out of memory error. There's limited information on what went wrong however:

Sun Mar 24 00:48:52 GMT 2024
submit MPI job
Starting weight generation with these inputs:
Source File: xe_input_grid.nc
Destination File: xe_output_grid.nc
Weight File: xe_input2output_grid_weights.nc
Source File is in CF Grid format
Source Grid is a global grid
Source Grid is a logically rectangular grid
Use the center coordinates of the source grid to do the regrid
Destination File is in CF Grid format
Destination Grid is a global grid
Destination Grid is a logically rectangular grid
Use the center coordinates of the destination grid to do the regrid
Regrid Method: conserve
Pole option: NONE
Line Type: greatcircle
Norm Type: dstarea
Extrap. Method: none

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 200 PID 38612 RUNNING AT n105
= KILLED BY SIGNAL: 9 (Killed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants