Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test for duplicated points in an STM #75

Open
rogerkuou opened this issue May 6, 2024 · 1 comment
Open

Unit test for duplicated points in an STM #75

rogerkuou opened this issue May 6, 2024 · 1 comment

Comments

@rogerkuou
Copy link
Member

Issue coming from a discussion in PR #66

A duplicated coordinate (lat, lon, time) will cause the spatial temporal query enrich_from_dataset fail. We need to create a check function to validate there is no duplicated 3D coordinates in an STM.

Also quote Sarah's comments here which are good for consideration:

@rogerkuou to check if the points are unique, the test np.unique(ds['lat'].values).shape == ds['lat'].values.shape is not enough because it only checks the duplicates in one dimension here lat. However, for example, points can be located on one line.
Instead, we need a test if there are cases where (lat, lon, time) are duplicated. Functions like xarray.Dataset.drop_duplicates and pandas.DataFrame.duplicated can be used to write a test. But these functions only work on dim and not coords. In our cases, lat and lon are coords and space is the dim. So we might need to use unstack which leads to memory problems.

@SarahAlidoost
Copy link
Contributor

SarahAlidoost commented May 6, 2024

and also this:

As discussed, scipy KDTree works if coordinates (lat, lon, time) are duplicated and the values of variables e.g. temperature are the same too, I added a test for this. If the values of variables are not the same for duplicated coordinates, MacOs and linux behave differently to pick up a value related to the nearest neighbor.

Note that it is a rare case if there are duplicated coordinates with two different values for one variable. However, this cases might happen in data preparations. For example, if coordinates are somehow rounded up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants