Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geographical models #149

Open
MuellerSeb opened this issue May 1, 2020 · 10 comments
Open

Geographical models #149

MuellerSeb opened this issue May 1, 2020 · 10 comments

Comments

@MuellerSeb
Copy link
Member

MuellerSeb commented May 1, 2020

It was shown, that the current approach of plugging in the great-circle distance directly into the covariance/variogram function does not automatically result in a valid covariance model on the sphere.

To overcome this flaw, we should use the so called Yadrenko Covariance Function (See here).
Starting with a valid isotropic model in 3D, where the covariance between two points is given as a function of their distance:

cov(x1, x2) = func(dist(x1, x2))

We can construct the related yadrenko model on the sphere, that takes the great-circle distance zeta(x1, x2) with the following modification:

cov_sph(x1, x2) = func(2 * sin(zeta(x1, x2) / 2))

One should not, that 2 * sin(x / 2) is approximately x for small angles, which means, that the current approach in PyKrige is approximately correct for small angles.

@MuellerSeb MuellerSeb added this to the v2.0 milestone May 1, 2020
@MuellerSeb MuellerSeb self-assigned this May 1, 2020
@MuellerSeb MuellerSeb added this to Selected in PyKrige v2 May 1, 2020
@MuellerSeb MuellerSeb added this to lat-lon support in GS-Framework v2 May 1, 2020
@mjziebarth
Copy link
Contributor

That is an interesting paper (and line of research) that I had not been aware of when I first implemented the great-circle distance. Do you happen to know how conclusive the research is overall with regards to that models in Euclidean space do not perform worse than models using the great circle distance?

Then I have one suggestion: From what I understand, the Yadrenko model lifts the great circle distance back to Euclidean space and then uses those Eucliden distances (please correct me if I'm wrong). If the choice is made to use the Yadrenko model (seeing that paper it seems reasonable to me), I think it could make sense to skip the use of the great circle distance entirely by directly converting the geographic coordinates to Euclidean space even before the distances are computed. This would reduce code complexity by handling the coordinates very early on and might speed things up a bit, I guess. Maybe it could make sense compare the accuracy of both approaches.

Best, Malte

@MuellerSeb
Copy link
Member Author

Hey Malte,
Thanks for your thoughts on that. I don't think that it lifts back the distance to 3D, since it directly depends on the great circle distance. I also found some other papers about that and the keypoint is to create valid covaraince models, meaning "conditional negative semidefinite" ones.
I would suggest implementing a great-circle cdist, analog to scipys cdist to speed things up when using geo-coordinates, to easily create the distance matrix.

@mjziebarth
Copy link
Contributor

Hey Sebastian,

with the comment I was referring mostly to one sentence of the paper that you linked (p. 3):

The Euclidean distance between two points on a sphere, which is also known as the chordal distance, can be expressed in terms of great circle distance as [...]

Directly after this, there is an equation that equates the Euclidean or chordal distance to the expression 2*sin(theta/2) where theta is the great circle distance, i.e. the expression you suggested to use for plugging great circle distance into the Euclidean covariance models. So, as far as I understand, this expression actually converts great circle distance back to Euclidean distance and relies on the great circle distance only superficially in the two-step conversion process. Did I miss something there?

Anyway, that was just to note that things may possibly be simplified from a conceptional point of view. Whether this will have an impact on performance remains speculation on my part.

That being said, I think the point about the invalid covariance models is a great one and admittedly one that I have no experience with.
Best!

@MuellerSeb
Copy link
Member Author

Ahhh! Thanks for the enlightenment! This could change a lot of stuff. I need to think about that.

@MuellerSeb
Copy link
Member Author

I would suggest using the Yadrenko models as the standard interpretation for geo-coordinates. Then we can drop the whole haversine procedures and just calculate the fields in 3D (with a little memory overhead for the third positional dimension).

This way we can ensure, that the used model is really a conditionally negative semi-definite (CNSD, abbreviation used by Webster 2007) one. As mentioned above, this will result in approximately equal results for "small" angles. For example, for an great-circle distance of pi/2 (quarter of the globe) we have a deviation of 15% in the distance argument described above and for pi/4 it is only 2%.

This also means, that if the length scale parameter is less then pi/4 (~5000km on the globe) we can assume similar results for both methods (current approach and yadrenko model).

What do you think @rth, @mjziebarth, @bsmurphy, @LSchueler?

@mjziebarth
Copy link
Contributor

Hi Sebastian,

to me this sounds good! I have two minor comments:

  • I'm not sure how the policy of behavior changes is for GeoStat / PyKrige, but maybe it would make sense to deprecate the old geographic support and make sure that people are aware of the change. I think this could be easily done in context of the other refactoring.
  • One thing worth thinking about is to make sure people are aware that the distance used in the variogram is not the on-surface distance. I think there could the potential for confusion when analyzing the long-range spatial correlation structure. Of course this would be clear after careful study of the model, but I could, for instance, easily see myself plugging in geographic coordinates and not realizing that distances are chordial. This wouldn't have any impact on the kriging results but when trying to learn things from the spatial correlation model at large distances it could matter. I don't know how to practically implement this, maybe a prominent spot in the documentation of lat/lon arguments is enough.

Best, Malte

@MuellerSeb
Copy link
Member Author

@mjziebarth : I would suggest doing this within the GS-Framework 2.0 project.
By increasing the major version of Pykrige and GSTools we can break backward compatibility.
We can still keep v1.x of PyKrige alive, but I wouldn't introduce a deprecation cycle there.
Of course, we have to well-document this change and the behavior.

@MuellerSeb
Copy link
Member Author

This is now solved in GSTools and will be used in PyKrige in the future:

@MuellerSeb MuellerSeb added the solved upstream Solved in GSTools label Mar 30, 2021
@MuellerSeb MuellerSeb changed the title Correct geographical models Geographical models Sep 5, 2021
@MuellerSeb
Copy link
Member Author

This issue should also mention, that we need to implement geographic coordinates for universal kriging.

@MuellerSeb
Copy link
Member Author

#217 is related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
GS-Framework v2
  
lat-lon support
PyKrige v2
  
Selected
Development

No branches or pull requests

2 participants