Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

betadisper calculate distance between samples and the centroid of a different group. #606

Open
joshgsmith opened this issue Oct 30, 2023 · 3 comments

Comments

@joshgsmith
Copy link

joshgsmith commented Oct 30, 2023

The documentation is clear that betadisper() computes the distance between samples and their respective centroid or median while ensuring positive-definite eigenvalues. betadisper() also returns the principal coordinates of centroids, and these can be used to calculate the distances among centroids. However, I do not see any functionality to calculate the distance between samples and a centroid belonging to another group. For example, lets say we have 100 samples (we will call them 'sites'), 50 sites belonging to period == "Before" and 50 sites to period == "After." How can we determine the distances between each site belonging to period == "Before" and the centroid of period =="After"?

where m is a distance matrix, something like:
disper_mat <- betadisper_mod(m, type="centroid", group = group_vars2$period)
returns the distances between sites and their respective centroids, independently (in this case, one for Before and one for After)

If we want the principal coordinates of each centroid, we could use:

shift_dist <- reshape2::melt(as.matrix(sqrt(dist(m$centroids[,m$eig>0]^2)- dist(m$centroids[,m$eig<0]^2))))%>% tibble::rownames_to_column("distance")

However, shift_dist only finds the distance between the two centroids, not the distance between each samples and the centroid of a different group.

In both chunks above, only the within-group distances are calculated (distances from sites to their within group centroid). Is it possible to calculate the distance both within group and across groups? Specifically, the across group component is the distances of samples belonging to group Before to the centroid belonging to group After.

This would be a fantastic utility, particularly when dealing with time series and ecological data to examine multivariate 'shift distance' relative to a centroid defined by a certain time period. As an example, lets say we have ecological abundance data spanning 2000-2023. We could use the centroid of years 2000-2005 to describe the 'reference' period, then examine the annual shift distances for each year of the time series to estimate how much the community changes during the reference period vs. each year after that.

@jarioksa
Copy link
Contributor

jarioksa commented Oct 31, 2023

There is no such function. However, this is R and you can always write such a function!

Here is a function that calculates distances from each sampling unit to each centroid:

`betadistances` <-
    function(x)
 {
     cnt <- x$centroids
     coord <- x$vectors
     pos <- which(x$eig >= 0)
     neg <- which(x$eig < 0)
     d <- apply(cnt[,pos], 1,
                function(z) rowSums(sweep(coord[,pos], 2, z)^2))
     if (length(neg))
         d <- d - apply(cnt[, neg], 1,
                        function(z) rowSums(sweep(coord[,neg], 2, z)^2))
     d <- as.data.frame(sqrt(d))
     cbind("group" = x$group, d)
 }

This is a proof-of-concept implementation and may not cover all corner cases.

Is this the function you asked for? What do you think we should do with this? Comments @gavinsimpson

Note: vegan has a related function meandist, but it calculates mean distances among points and not distances to centroids.

@joshgsmith
Copy link
Author

joshgsmith commented Oct 31, 2023

@jarioksa this is very nice! I'm not sure I fully understand how the distances are calculated without calling dist() in that function, but I will apply it my my actual data today to test its functionality.

I was toying with something like:

shift_dist <- sqrt(dist(x$vectors[,x$eig>0]^2, x$centroids[,x$eig>0]^2)- dist(x$vectors[,x$eig<0]^2, x$centroids[,x$eig<0]^2))
Which doesn't seem to produce the same distances as the function you provided.

The betadistances function appears to work very well, and using usedist::dist_setNames() on the original distance matrix helps to keep track of the sample names through betadisper and betadistances.

@jarioksa
Copy link
Contributor

@joshgsmith your way will not work. I saw you crossposted to StackOverflow. This is not a good habit to collect the answers. The usedist package suggested in StackOverflow won't work with semimetric dissimilarities (such as Jaccard, Bray-Curtis etc). This is documented in the usedist (but naturally, the developer may change that later). The method suggested above will also work with semimetric dissimilarities (non-semidefinite matrices).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants