Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA generated with vegdist #520

Open
NcyFS opened this issue Jul 19, 2022 · 6 comments
Open

NA generated with vegdist #520

NcyFS opened this issue Jul 19, 2022 · 6 comments

Comments

@NcyFS
Copy link

NcyFS commented Jul 19, 2022

image

Hello,

Thank you so much for developping this amazing package !

I don't understand the missing data generated by the vegdist function. Here is a reproducible example:
I thank you in advance for any help you can give me,

Nancy

@jarioksa
Copy link
Contributor

Didn't find any example data. However, vegdist gives a warning 'you have empty rows: their dissimilarities may be meaningless in method "bray"'. I think this warning has come true. Empty means that you have no species and all entries are zero. When you compare two such empty rows, the dissimilarity simplifies to 0/0 and that is Not-a-Number.

@NcyFS
Copy link
Author

NcyFS commented Jul 20, 2022

Thanks so much for your response. With my code I don't have empty rows so I am not sure it is the issue.

here below is the complete code with dataset :

#packages
library(tidyverse)
library(vegan)
#> Le chargement a nécessité le package : permute
#> Le chargement a nécessité le package : lattice
#> This is vegan 2.5-7

#datasets
soc=tibble::tribble(
  ~...1, ~X1,  ~X2,  ~X3,  ~X4,  ~X5,
  1,  NA, 9.41, 7.92, 3.47, 5.45,
  2,  NA,   NA, 4.46,    0,    0,
  3,  NA,   NA,   NA, 6.93, 6.44,
  4,  NA,   NA,   NA,   NA,    0,
  5,  NA,   NA,   NA,   NA,   NA
)
soc
#> # A tibble: 5 x 6
#>    ...1 X1       X2    X3    X4    X5
#>   <dbl> <lgl> <dbl> <dbl> <dbl> <dbl>
#> 1     1 NA     9.41  7.92  3.47  5.45
#> 2     2 NA    NA     4.46  0     0   
#> 3     3 NA    NA    NA     6.93  6.44
#> 4     4 NA    NA    NA    NA     0   
#> 5     5 NA    NA    NA    NA    NA

#distance conversion
(socdist<-vegdist(soc[-nrow(soc),-c(1,2)], method="bray",na.rm=TRUE))
#> Warning in vegdist(soc[-nrow(soc), -c(1, 2)], method = "bray", na.rm = TRUE):
#> you have empty rows: their dissimilarities may be meaningless in method "bray"
#> Warning in vegdist(soc[-nrow(soc), -c(1, 2)], method = "bray", na.rm = TRUE):
#> missing values in results
#>           1         2         3
#> 2 0.5812207                    
#> 3 0.1996411 1.0000000          
#> 4 1.0000000       NaN 1.0000000

Created on 2022-07-20 by the reprex package (v2.0.1)

@jarioksa
Copy link
Contributor

jarioksa commented Jul 20, 2022

You have empty rows (and you really shouldn't argue with your computer: you will lose – if computer finds an empty row, you have an empty row). You may see this if you look at your reduced data:

> soc[-nrow(soc), -c(1,2)]
# A tibble: 4 × 4
     X2    X3    X4    X5
  <dbl> <dbl> <dbl> <dbl>
1  9.41  7.92  3.47  5.45
2 NA     4.46  0     0   
3 NA    NA     6.93  6.44
4 NA    NA    NA     0   

Last row (number 4) has only three NAs and one 0 – and that is empty. All zeros is empty, all NA is empty. Then the Bray-Curtis coefficient will reduce to 0/0 and that is Not-a-Number (NaN – technically it is not a missing value NA but a result of undefined mathematical operation 0/0, although Not-a-Numbers are treated similarly as NA in many R operations).

With these kind of data you can use dissimilarities that do not divide by row total (that is 0 in some cases), such as Euclidean or Manhattan distances – or alternatively, you should be satisfied with the correct result that is NaN for 0/0. However, I am not sure that you should use any dissimilarity measures with your data. What kind of data do you have? The original full data soc is an upper triangular matrix without diagonal. These kind of data are used for (dis)similarities in several software packages – not in standard R that instead uses lower triangular matrix without diagonal. Do you have (dis)similarities originally? If so, you hardly should calculate dissimilarities of (dis)similarities.

If you have dissimilarities originally, you can cast them to a valid R distance structure using as.dist(t(soc[,-1])).

@NcyFS
Copy link
Author

NcyFS commented Jul 20, 2022

Ok thanks so much ! Biologically, the 0 makes sense but fine I get the issue of the division :-)
soc are social interaction matrix between individuals.
Do you think I can use my data directly with vegdist and just as.dist ? I must admit I was blindly following a tutorial.

Thank you so much for your feedback in any case, It was of great help !

@jarioksa
Copy link
Contributor

Social interactions are not my field, and I have no idea about your goals nor about the tutorial. However, the minimum is to have complete symmetric data for further analysis, because rows or columns cannot be meaningfully compared if part of their valid data are arbitrary NA. The steps you need to take are:

  1. t() transposes data so that lower diagonal is filled and upper diagonal is NA.
  2. as.dist() changes this to an R distance structure.
  3. as.matrix() gives you a full symmetric matrix (with zero diagonal) that you can handle with, say, vegdist or dist (but I don't know if you should – but that's not my field).

In one line this is:

> as.matrix(as.dist(t(soc[,-1])))
     X1   X2   X3   X4   X5
X1 0.00 9.41 7.92 3.47 5.45
X2 9.41 0.00 4.46 0.00 0.00
X3 7.92 4.46 0.00 6.93 6.44
X4 3.47 0.00 6.93 0.00 0.00
X5 5.45 0.00 6.44 0.00 0.00

@NcyFS
Copy link
Author

NcyFS commented Jul 20, 2022

I fet it ! Once again, thank you so much for all these information, it has been so helpfull. Thanks a lot ! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants