Geographical models #149

MuellerSeb · 2020-05-01T10:15:06Z

It was shown, that the current approach of plugging in the great-circle distance directly into the covariance/variogram function does not automatically result in a valid covariance model on the sphere.

To overcome this flaw, we should use the so called Yadrenko Covariance Function (See here).
Starting with a valid isotropic model in 3D, where the covariance between two points is given as a function of their distance:

cov(x1, x2) = func(dist(x1, x2))

We can construct the related yadrenko model on the sphere, that takes the great-circle distance zeta(x1, x2) with the following modification:

cov_sph(x1, x2) = func(2 * sin(zeta(x1, x2) / 2))

One should not, that 2 * sin(x / 2) is approximately x for small angles, which means, that the current approach in PyKrige is approximately correct for small angles.

The text was updated successfully, but these errors were encountered:

mjziebarth · 2020-05-01T11:43:13Z

That is an interesting paper (and line of research) that I had not been aware of when I first implemented the great-circle distance. Do you happen to know how conclusive the research is overall with regards to that models in Euclidean space do not perform worse than models using the great circle distance?

Then I have one suggestion: From what I understand, the Yadrenko model lifts the great circle distance back to Euclidean space and then uses those Eucliden distances (please correct me if I'm wrong). If the choice is made to use the Yadrenko model (seeing that paper it seems reasonable to me), I think it could make sense to skip the use of the great circle distance entirely by directly converting the geographic coordinates to Euclidean space even before the distances are computed. This would reduce code complexity by handling the coordinates very early on and might speed things up a bit, I guess. Maybe it could make sense compare the accuracy of both approaches.

Best, Malte

MuellerSeb · 2020-05-01T11:48:56Z

Hey Malte,
Thanks for your thoughts on that. I don't think that it lifts back the distance to 3D, since it directly depends on the great circle distance. I also found some other papers about that and the keypoint is to create valid covaraince models, meaning "conditional negative semidefinite" ones.
I would suggest implementing a great-circle cdist, analog to scipys cdist to speed things up when using geo-coordinates, to easily create the distance matrix.

mjziebarth · 2020-05-01T12:18:23Z

Hey Sebastian,

with the comment I was referring mostly to one sentence of the paper that you linked (p. 3):

The Euclidean distance between two points on a sphere, which is also known as the chordal distance, can be expressed in terms of great circle distance as [...]

Directly after this, there is an equation that equates the Euclidean or chordal distance to the expression 2*sin(theta/2) where theta is the great circle distance, i.e. the expression you suggested to use for plugging great circle distance into the Euclidean covariance models. So, as far as I understand, this expression actually converts great circle distance back to Euclidean distance and relies on the great circle distance only superficially in the two-step conversion process. Did I miss something there?

Anyway, that was just to note that things may possibly be simplified from a conceptional point of view. Whether this will have an impact on performance remains speculation on my part.

That being said, I think the point about the invalid covariance models is a great one and admittedly one that I have no experience with.
Best!

MuellerSeb · 2020-05-01T15:12:06Z

Ahhh! Thanks for the enlightenment! This could change a lot of stuff. I need to think about that.

MuellerSeb · 2020-05-02T12:40:29Z

I would suggest using the Yadrenko models as the standard interpretation for geo-coordinates. Then we can drop the whole haversine procedures and just calculate the fields in 3D (with a little memory overhead for the third positional dimension).

This way we can ensure, that the used model is really a conditionally negative semi-definite (CNSD, abbreviation used by Webster 2007) one. As mentioned above, this will result in approximately equal results for "small" angles. For example, for an great-circle distance of pi/2 (quarter of the globe) we have a deviation of 15% in the distance argument described above and for pi/4 it is only 2%.

This also means, that if the length scale parameter is less then pi/4 (~5000km on the globe) we can assume similar results for both methods (current approach and yadrenko model).

What do you think @rth, @mjziebarth, @bsmurphy, @LSchueler?

mjziebarth · 2020-05-05T12:34:02Z

Hi Sebastian,

to me this sounds good! I have two minor comments:

I'm not sure how the policy of behavior changes is for GeoStat / PyKrige, but maybe it would make sense to deprecate the old geographic support and make sure that people are aware of the change. I think this could be easily done in context of the other refactoring.
One thing worth thinking about is to make sure people are aware that the distance used in the variogram is not the on-surface distance. I think there could the potential for confusion when analyzing the long-range spatial correlation structure. Of course this would be clear after careful study of the model, but I could, for instance, easily see myself plugging in geographic coordinates and not realizing that distances are chordial. This wouldn't have any impact on the kriging results but when trying to learn things from the spatial correlation model at large distances it could matter. I don't know how to practically implement this, maybe a prominent spot in the documentation of lat/lon arguments is enough.

Best, Malte

MuellerSeb · 2020-05-05T12:39:06Z

@mjziebarth : I would suggest doing this within the GS-Framework 2.0 project.
By increasing the major version of Pykrige and GSTools we can break backward compatibility.
We can still keep v1.x of PyKrige alive, but I wouldn't introduce a deprecation cycle there.
Of course, we have to well-document this change and the behavior.

MuellerSeb · 2021-03-30T21:33:51Z

This is now solved in GSTools and will be used in PyKrige in the future:

Add support for geographical coordinates GSTools#113

MuellerSeb · 2021-09-05T17:03:35Z

This issue should also mention, that we need to implement geographic coordinates for universal kriging.

MuellerSeb · 2022-04-20T10:36:15Z

#217 is related.

MuellerSeb added bug enhancement new feature labels May 1, 2020

MuellerSeb added this to the v2.0 milestone May 1, 2020

MuellerSeb self-assigned this May 1, 2020

MuellerSeb added this to Selected in PyKrige v2 May 1, 2020

MuellerSeb added this to lat-lon support in GS-Framework v2 May 1, 2020

MuellerSeb mentioned this issue May 1, 2020

Lat-Lon support (geographical coordinates) GeoStat-Framework/GSTools#54

Closed

mjziebarth mentioned this issue Sep 17, 2020

MemoryError: Unable to allocate array with shape (114671, 114671) and data type float64 #166

Closed

MuellerSeb added the solved upstream Solved in GSTools label Mar 30, 2021

MuellerSeb mentioned this issue Jun 8, 2021

Haversine Distance for Geographic Coordinates mmaelicke/scikit-gstat#108

Open

MuellerSeb changed the title ~~Correct geographical models~~ Geographical models Sep 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geographical models #149

Geographical models #149

MuellerSeb commented May 1, 2020 •

edited

mjziebarth commented May 1, 2020

MuellerSeb commented May 1, 2020

mjziebarth commented May 1, 2020

MuellerSeb commented May 1, 2020

MuellerSeb commented May 2, 2020

mjziebarth commented May 5, 2020

MuellerSeb commented May 5, 2020

MuellerSeb commented Mar 30, 2021

MuellerSeb commented Sep 5, 2021

MuellerSeb commented Apr 20, 2022

Geographical models #149

Geographical models #149

Comments

MuellerSeb commented May 1, 2020 • edited

mjziebarth commented May 1, 2020

MuellerSeb commented May 1, 2020

mjziebarth commented May 1, 2020

MuellerSeb commented May 1, 2020

MuellerSeb commented May 2, 2020

mjziebarth commented May 5, 2020

MuellerSeb commented May 5, 2020

MuellerSeb commented Mar 30, 2021

MuellerSeb commented Sep 5, 2021

MuellerSeb commented Apr 20, 2022

MuellerSeb commented May 1, 2020 •

edited