Allow NaN's in input grids for FFT filters #396

mdtanker · 2023-03-15T22:44:30Z

Description of the desired feature:

As pointed out by @RichardScottOZ in #377, it would be good to be able to apply the FFT transformation on grids that contain NaN's. I thought I'd share how I'm currently doing this, and start a discussion about if / how we could include it in the filters.

I'm filling the grid with either constant values (should be the median of the grid values to avoid edge effects), or with a nearest neighbor interpolation. Here's a few options:

xarray.DataArray.fillna()

 filled = grid.fillna(10)   # constant value
 filled = grid.fillna(np.nanmedian(grid))   # median of grids value

pygmt.grdfill()

 filled = pygmt.grdfill(grid, mode="n", verbose="q")   # pygmt nearest neighbor

rioxarray.interpolate_na()

filled = grid.rio.write_crs("epsg:4326"  # rio needs crs set
     ).rio.set_spatial_dims(grid.dims[1], grid.dims[0]   # rio needs dimension names set
     ).rio.interpolate_na(method="nearest")

verde.KNeighbors()

df = vd.grid_to_table(grid)
df_dropped = df[df[grid.name].notna()]
coords = (df_dropped[grid.dims[1]], df_dropped[grid.dims[0]])
region = vd.get_region((df[grid.dims[1]], df[grid.dims[0]]))
filled = vd.KNeighbors(
            ).fit(coords, df_dropped[grid.name]
            ).grid(region=region, shape=grid.shape).scalars

Are there any other interpolation techniques I'm missing here?

This filled grid can be padded, passed to the FFT filters, unpadded, and then masked by the original grid. I'm using xr.where for this:

result = xr.where(grid.notnull(), unpadded_grid, grid)

Here is a function I use to combine all of this:

def filter_grid(
    grid,
    filter_width,
    filt_type = "lowpass",
):
    # get coordinate names
    original_dims = grid.dims

    # if there are nan's, fill them with nearest neighbor
    if grid.isnull().any():
        filled = grid.rio.write_crs("epsg:4326"
             ).rio.set_spatial_dims(original_dims[1], original_dims[0]
             ).rio.interpolate_na(method="nearest")
        print("filling NaN's with nearest neighbor")
    else:
        filled = grid.copy()

    # reset coord names back to originals
    filled = filled.rename({
        filled.dims[0]:original_dims[0],
        filled.dims[1]:original_dims[1],
        })

    # define width of padding in each direction
    pad_width = {
        original_dims[1]: grid[original_dims[1]].size // 3,
        original_dims[0]: grid[original_dims[0]].size // 3,
    }

    # apply padding
    padded = xrft.pad(filled, pad_width)

    if filt_type == "lowpass":
        filt = hm.gaussian_lowpass(
            padded,
            wavelength = filter_width).rename("filt")
    elif filt_type == "highpass":
        filt = hm.gaussian_highpass(
            padded,
            wavelength = filter_width).rename("filt")
    else:
        raise ValueError("filt_type must be 'lowpass' or 'highpass'")

    # unpad the grid
    unpadded = xrft.unpad(filt,  pad_width)

    # reset coordinate values to original (avoid rounding errors)
    unpadded = unpadded.assign_coords(
        {
            original_dims[0]: grid[original_dims[0]].values,
            original_dims[1]: grid[original_dims[1]].values,
        })

    if grid.isnull().any():
        result = xr.where(grid.notnull(), unpadded, grid)
    else:
        result = unpadded.copy()

    np.testing.assert_equal(grid.easting.values, result.easting.values)
    np.testing.assert_equal(grid.northing.values, result.northing.values)

    return result

If we wanted to add this option to the transformations, I was thinking we could add the above filling and masking to the apply_filter() function, with an additional parameter: fill_value=None.

fill_value options would include:

None (default, no filling, ValueError: Found nan(s) ... raised)
Float (fill with constant value)
Callable (np.nanmedian or equivalent, to fill the grid with)
String: ("nearest"): use either Verde, PyGMT or Rio nearest neighbor interpolation.
* pygmt or rio would require an additional dependency
* if rio, would require optional CRS kwarg(default to EPSG:4326?)

fill_value would then be added to each filter function as well.

What are everyone's thoughts on this? Related to #390 as well.

Are you willing to help implement and maintain this feature?

The text was updated successfully, but these errors were encountered:

RichardScottOZ · 2023-03-16T20:37:17Z

Sounds good - when I did some it was much more ad hoc. Filter being any type of method in the branch there? Then I can test it on all of Australia.

leouieda · 2023-03-18T09:18:29Z

Hi all, I think this would be better as a separate function in Verde. The thing is that FFTs with nan aren't really supported and what we do in interpolate them before filtering. So it would be better as a separate explicit step. Also, I'd want to use this in other contexts than filtering (for example, in satellite imagery).

The existing options are all a bit cumbersome in their own way. So this warrants a function.

Something like https://www.fatiando.org/verde/latest/api/generated/verde.project_grid.html#verde.project_grid

leouieda · 2023-03-18T09:19:54Z

To make the user experience better, we could check for nan in the filter functions and recommend using the new function to fill them

Esteban82 · 2023-03-18T12:59:07Z

Hello!
I have a question that maybe could somehow be implemented by harmonica.

Would it be a good approach to apply different methods to fill the NaN and them calculate the FFT and plot the results together?

I think that this way it is possible to have an idea how the NaN is affecting the result.
In order words, to know which wavelengths are affected by the NaN (and how much) and which wavelengths are not.

mdtanker · 2023-03-19T19:01:35Z

I agree with @leouieda that it would be useful as its own function.

Would it be useful to implement a Chain option in Harmonica like the Verde one? This could then include projecting a grid, padding it, filling the nans, applying a filter, masking back to the original, and unpadding.

I think masking a grid based on another grid would also be a useful function in Verde.

RichardScottOZ · 2023-04-02T01:16:03Z

Verde isn't capable of dealing with data at satellite scale, used in the usual manner now?

RichardScottOZ · 2023-04-12T04:29:10Z

I have a a nice irregular highly patchy survey to have a look at some of this with.

leomiquelutti · 2024-03-14T16:59:05Z

Hi everyone,

do we have a strategy already? I'd like to address that as I need to perform transformations on grids with NaNs

leouieda · 2024-03-14T17:34:59Z

I agree with @Esteban82 that this is a tricky thing that can potentially bias the results by quite a lot. For large patches, the nearest neighbor interpolation can be very bad and introduce low and high frequency artifacts. So we can break this down:

Add a fill_nans function to Verde that takes a grid and fills in any missing values using some of the strategies @mdtanker point out: interpolation (nearest neighbor, linear, or cubic), mean/mediam value, constant.
In harmonica, doing the transforms would be done with 2 function calls:

filled = vd.fill_nans(grid)
filtered = hm.gaussian_lowpass(filled, 10e3)
filtered_with_holes = xr.where(np.isnan(grid), filtered, np.nan)

Does this sound reasonable? If it's only 2 extra lines of code, maybe not worth a separate function or argument? Doing so would also mean taking all of the possible options fill_nans takes, which makes the function way more complicated to use and test.

RichardScottOZ · 2024-03-14T20:49:58Z

Tricky, but reality...has to be done generally.

It is a complicated thing to do in a general sense, so I don't think that matters...basic tests for the generic use cases you mention, not all possible ranges of everything, which we would never finish.

For something simple like constant or mean has verde been tested at scale?

leomiquelutti · 2024-03-15T15:59:08Z

@leouieda why not

filtered = hm.gaussian_lowpass(filled, 10e3, fill_nans = "interpolate")

This would mean adding the vd.fill_nans(grid) in all transformations but it seems more straightforward for users (in my humble user's opinion).

In this case, fill_nans can take several different arguments, each one for a special case.

leouieda · 2024-03-16T01:37:38Z

@RichardScottOZ the nearest neighbor interpolation is fast and can handle large datasets. The Spline certainly wouldn't but it's generally a bad extrapolator.

@leomiquelutti we could add a fill_nans argument that is only True or False (default) and explain in the docs that if people want something custom they need to use the verde function. What do you think?

RichardScottOZ · 2024-03-16T07:18:20Z

Sounds reasonable to me!

leomiquelutti · 2024-03-16T10:54:11Z

In this case which strategy will be adopted for `fill_nans = True`?

leouieda · 2024-03-28T12:58:06Z

The default, which is nearest neighbor interpolation.

leomiquelutti · 2024-03-28T17:28:01Z

Should it also return a mask with locations of the NaNs in the DataArray, so the NaNs are hidden in the plots? Or think of any similar scheme?

leouieda · 2024-03-28T18:50:37Z

Not really. The mask can be generated with the original grid with a call to np.isnan so it's not worth the extra return valeu only when fillna=True.

leomiquelutti · 2024-04-12T13:38:38Z

@leouieda I have to wait until fatiando/verde#439 is ready for me to start implementing this. right?

leouieda · 2024-04-12T16:23:00Z

@leomiquelutti ready and released, yes.

mdtanker added the enhancement Idea or request for a new feature label Mar 15, 2023

santisoler mentioned this issue Mar 17, 2023

Put padding inside the transformation function #390

Open

leouieda mentioned this issue Mar 14, 2024

Add a function to fill NaNs in a grid fatiando/verde#439

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow NaN's in input grids for FFT filters #396

Allow NaN's in input grids for FFT filters #396

mdtanker commented Mar 15, 2023 •

edited

RichardScottOZ commented Mar 16, 2023

leouieda commented Mar 18, 2023 •

edited

leouieda commented Mar 18, 2023

Esteban82 commented Mar 18, 2023

mdtanker commented Mar 19, 2023

RichardScottOZ commented Apr 2, 2023

RichardScottOZ commented Apr 12, 2023

leomiquelutti commented Mar 14, 2024

leouieda commented Mar 14, 2024

RichardScottOZ commented Mar 14, 2024

leomiquelutti commented Mar 15, 2024

leouieda commented Mar 16, 2024

RichardScottOZ commented Mar 16, 2024

leomiquelutti commented Mar 16, 2024 via email •

edited

leouieda commented Mar 28, 2024

leomiquelutti commented Mar 28, 2024

leouieda commented Mar 28, 2024

leomiquelutti commented Apr 12, 2024

leouieda commented Apr 12, 2024

Allow NaN's in input grids for FFT filters #396

Allow NaN's in input grids for FFT filters #396

Comments

mdtanker commented Mar 15, 2023 • edited

RichardScottOZ commented Mar 16, 2023

leouieda commented Mar 18, 2023 • edited

leouieda commented Mar 18, 2023

Esteban82 commented Mar 18, 2023

mdtanker commented Mar 19, 2023

RichardScottOZ commented Apr 2, 2023

RichardScottOZ commented Apr 12, 2023

leomiquelutti commented Mar 14, 2024

leouieda commented Mar 14, 2024

RichardScottOZ commented Mar 14, 2024

leomiquelutti commented Mar 15, 2024

leouieda commented Mar 16, 2024

RichardScottOZ commented Mar 16, 2024

leomiquelutti commented Mar 16, 2024 via email • edited

leouieda commented Mar 28, 2024

leomiquelutti commented Mar 28, 2024

leouieda commented Mar 28, 2024

leomiquelutti commented Apr 12, 2024

leouieda commented Apr 12, 2024

mdtanker commented Mar 15, 2023 •

edited

leouieda commented Mar 18, 2023 •

edited

leomiquelutti commented Mar 16, 2024 via email •

edited