Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on xESMF treatment of masks and missing values #256

Open
blimlim opened this issue Apr 12, 2023 · 7 comments
Open

Clarification on xESMF treatment of masks and missing values #256

blimlim opened this issue Apr 12, 2023 · 7 comments

Comments

@blimlim
Copy link

blimlim commented Apr 12, 2023

Hi everyone!

I have been regridding some global ocean data and was hoping for some clarification on how xESMF handles masks and missing values, as I've noticed some behaviour I don't quite understand. I've read through #22 of the old repository but didn't see this behaviour appear there.

I have data covering the globe on a lon/lat grid, which I want to interpolate to a coarser grid using bilinear interpolation. I only want to use the ocean points, so have tried a few different methods of masking out the land:

1. Supplying an input binary mask to the input grid in the xe.Regridder call:
This results in the following:
regrid_input_mask (Figure 1)

2. Setting all land values to nan, and regridding with no masks supplied to xe.Regridder
From what I understand, any output grid points which are next to an nan on the input data will be set to nan. This results in
regrid_input_nand (Figure 2)

Looking closer at the above two results, the first outputs values at more points, specifically along the western coasts. The difference between the two results is shown below:
diff_input_masked_input_nand (Figure 3)

I was wondering if anyone knows how these extra values are calculated? In #22 of the old repository, it sounds like providing an input mask is equivalent to setting the inputs under the mask to 0, and then regridding. However, if I do this manually without a mask, I get a different result again:

3. Setting land points to 0, and regridding with no masks supplied to xe.Regridder
regrid_input_zeroed (Figure 4)
This results in soft edges at the boundaries, which makes sense, however this is quite different to the first plot.

The main thing I'm hoping to find out is how the extra values in Figure 1 compared to Figure 2 are calculated. If they are a result of an interpolation with 0 at the missing points, it might be best for me not not use them in case the values are artificially low.

Let me know if there's any data/code or more info I can provide that would be helpful. I tried to set up a simple artificial example but wasn't able to get the same behaviour with it (methods 1 and 2 gave the same results when I tried using some simple constructed data).

Thank you for your help with this!

@raphaeldussin
Copy link
Contributor

To regrid to a coarser grid, I recommend you use the conservative_normed method, passing the masks for both input and output grids. without mask defined, land values (NaN or zeros) can bleed into the regridded field.

@Thomas-Moore-Creative
Copy link

Hi @raphaeldussin - I've just bumped into this issue thread. Thanks to @blimlim for posting and your answer.

WRT the above are their any good, clear examples on using the conservative_normed method? My experience is that it's possible some xESMF users are not as careful and thoughtful as @blimlim and just press ahead with bilinear and use the possibly problematic result? Worst case is errors in coastal cells are not picked up and used as good data?

One challenge might be understanding how to easily generate the "output" mask for the coarser grid of choice. Are there any examples we can point people to specifically on that?

The answer probably is "just go RTF manual" but if there are any specific pages / examples you could point to that would be helpful?

@raphaeldussin
Copy link
Contributor

indeed the docs is a good starting point ;) https://xesmf.readthedocs.io/en/latest/notebooks/Masking.html

Also, conservative normed does a better job conserving statistics across resolutions.

@blimlim
Copy link
Author

blimlim commented Apr 12, 2023

Thanks @raphaeldussin and @Thomas-Moore-Creative for your help with this!

Just hoping to confirm, to use the conservative normed regridding, will I need to also supply the corners of the grid cells? I'll try to obtain these.

I'll be using these files as ancillaries for an atmosphere model, and so am thinking it will be important for them to be periodic in longitude. I've read that the periodic parameter is forced to be False when using the conservative normed method. Would you know if this is still correct?

@raphaeldussin
Copy link
Contributor

yes to both. there is an issue #28 opened about periodicity and conservative that is yet to be investigated.
That might be something that might interest you @blimlim, probably not much code changes necessary here.

@aulemahal
Copy link
Collaborator

@blimlim If your data is on a regular grid (1D lat and 1D lon), then cf-xarray can guess the bounds for you. It is a dependency of xESMF so it could even be done automatically under-the-hood.

As for the periodicity issue, from my experience, as long as your bounds are periodic, the result should respect this periodicity. I mean, the west bounds of the westmost cell should be the exact same as the east bound of the eastmost cell. Dependending on the exact coordinates, there is a probability that guessed bounds will not be perfect, in which case model-provided bounds would be the best alternative.

@blimlim
Copy link
Author

blimlim commented Apr 13, 2023

Thanks @raphaeldussin and @aulemahal for your help! I do have access to the grid bounds, so I'll have a go at the conservative normed regridding. At the moment I'm new to the field so still getting my head around everything, but if in the future the periodicity question hasn't yet been looked at, that would be interesting to investigate!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants