NA generated with vegdist #520

NcyFS · 2022-07-19T16:17:42Z

Hello,

Thank you so much for developping this amazing package !

I don't understand the missing data generated by the vegdist function. Here is a reproducible example:
I thank you in advance for any help you can give me,

Nancy

jarioksa · 2022-07-19T19:43:51Z

Didn't find any example data. However, vegdist gives a warning 'you have empty rows: their dissimilarities may be meaningless in method "bray"'. I think this warning has come true. Empty means that you have no species and all entries are zero. When you compare two such empty rows, the dissimilarity simplifies to 0/0 and that is Not-a-Number.

NcyFS · 2022-07-20T07:01:46Z

Thanks so much for your response. With my code I don't have empty rows so I am not sure it is the issue.

here below is the complete code with dataset :

#packages
library(tidyverse)
library(vegan)
#> Le chargement a nécessité le package : permute
#> Le chargement a nécessité le package : lattice
#> This is vegan 2.5-7

#datasets
soc=tibble::tribble(
  ~...1, ~X1,  ~X2,  ~X3,  ~X4,  ~X5,
  1,  NA, 9.41, 7.92, 3.47, 5.45,
  2,  NA,   NA, 4.46,    0,    0,
  3,  NA,   NA,   NA, 6.93, 6.44,
  4,  NA,   NA,   NA,   NA,    0,
  5,  NA,   NA,   NA,   NA,   NA
)
soc
#> # A tibble: 5 x 6
#>    ...1 X1       X2    X3    X4    X5
#>   <dbl> <lgl> <dbl> <dbl> <dbl> <dbl>
#> 1     1 NA     9.41  7.92  3.47  5.45
#> 2     2 NA    NA     4.46  0     0   
#> 3     3 NA    NA    NA     6.93  6.44
#> 4     4 NA    NA    NA    NA     0   
#> 5     5 NA    NA    NA    NA    NA

#distance conversion
(socdist<-vegdist(soc[-nrow(soc),-c(1,2)], method="bray",na.rm=TRUE))
#> Warning in vegdist(soc[-nrow(soc), -c(1, 2)], method = "bray", na.rm = TRUE):
#> you have empty rows: their dissimilarities may be meaningless in method "bray"
#> Warning in vegdist(soc[-nrow(soc), -c(1, 2)], method = "bray", na.rm = TRUE):
#> missing values in results
#>           1         2         3
#> 2 0.5812207                    
#> 3 0.1996411 1.0000000          
#> 4 1.0000000       NaN 1.0000000

^{Created on 2022-07-20 by the reprex package (v2.0.1)}

jarioksa · 2022-07-20T07:24:49Z

You have empty rows (and you really shouldn't argue with your computer: you will lose – if computer finds an empty row, you have an empty row). You may see this if you look at your reduced data:

> soc[-nrow(soc), -c(1,2)]
# A tibble: 4 × 4
     X2    X3    X4    X5
  <dbl> <dbl> <dbl> <dbl>
1  9.41  7.92  3.47  5.45
2 NA     4.46  0     0   
3 NA    NA     6.93  6.44
4 NA    NA    NA     0

Last row (number 4) has only three NAs and one 0 – and that is empty. All zeros is empty, all NA is empty. Then the Bray-Curtis coefficient will reduce to 0/0 and that is Not-a-Number (NaN – technically it is not a missing value NA but a result of undefined mathematical operation 0/0, although Not-a-Numbers are treated similarly as NA in many R operations).

With these kind of data you can use dissimilarities that do not divide by row total (that is 0 in some cases), such as Euclidean or Manhattan distances – or alternatively, you should be satisfied with the correct result that is NaN for 0/0. However, I am not sure that you should use any dissimilarity measures with your data. What kind of data do you have? The original full data soc is an upper triangular matrix without diagonal. These kind of data are used for (dis)similarities in several software packages – not in standard R that instead uses lower triangular matrix without diagonal. Do you have (dis)similarities originally? If so, you hardly should calculate dissimilarities of (dis)similarities.

If you have dissimilarities originally, you can cast them to a valid R distance structure using as.dist(t(soc[,-1])).

NcyFS · 2022-07-20T07:37:58Z

Ok thanks so much ! Biologically, the 0 makes sense but fine I get the issue of the division :-)
soc are social interaction matrix between individuals.
Do you think I can use my data directly with vegdist and just as.dist ? I must admit I was blindly following a tutorial.

Thank you so much for your feedback in any case, It was of great help !

jarioksa · 2022-07-20T08:30:57Z

Social interactions are not my field, and I have no idea about your goals nor about the tutorial. However, the minimum is to have complete symmetric data for further analysis, because rows or columns cannot be meaningfully compared if part of their valid data are arbitrary NA. The steps you need to take are:

t() transposes data so that lower diagonal is filled and upper diagonal is NA.
as.dist() changes this to an R distance structure.
as.matrix() gives you a full symmetric matrix (with zero diagonal) that you can handle with, say, vegdist or dist (but I don't know if you should – but that's not my field).

In one line this is:

> as.matrix(as.dist(t(soc[,-1])))
     X1   X2   X3   X4   X5
X1 0.00 9.41 7.92 3.47 5.45
X2 9.41 0.00 4.46 0.00 0.00
X3 7.92 4.46 0.00 6.93 6.44
X4 3.47 0.00 6.93 0.00 0.00
X5 5.45 0.00 6.44 0.00 0.00

NcyFS · 2022-07-20T08:53:56Z

I fet it ! Once again, thank you so much for all these information, it has been so helpfull. Thanks a lot ! :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NA generated with vegdist #520

NA generated with vegdist #520

NcyFS commented Jul 19, 2022

jarioksa commented Jul 19, 2022

NcyFS commented Jul 20, 2022

jarioksa commented Jul 20, 2022 •

edited

NcyFS commented Jul 20, 2022

jarioksa commented Jul 20, 2022

NcyFS commented Jul 20, 2022

NA generated with vegdist #520

NA generated with vegdist #520

Comments

NcyFS commented Jul 19, 2022

jarioksa commented Jul 19, 2022

NcyFS commented Jul 20, 2022

jarioksa commented Jul 20, 2022 • edited

NcyFS commented Jul 20, 2022

jarioksa commented Jul 20, 2022

NcyFS commented Jul 20, 2022

jarioksa commented Jul 20, 2022 •

edited