Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnr_resolve not matching the same name multiple times OR matches erroneously #920

Open
ErikKusch opened this issue Nov 14, 2023 · 0 comments

Comments

@ErikKusch
Copy link

The Issue

Using the function gnr_resolve(), I never obtain the same matched name for multiple user-supplied names - even when doing so would lead to a clearly better match. These erroneous matches persist even in single-species gnr_resolve()queries.

Minimal Working Example

Running this code:

library(taxize)
sps <- c("Lagopus matu", "Logopus muta", "Lagopus lagopus", "Lagopus muta", "Lagopas lagopus")
GNR_df <- gnr_resolve(sci = sps, best_match_only = TRUE)
GNR_df

results in this output:

# A tibble: 5 × 5
  user_supplied_name submitted_name  matched_name              data_source_title score
* <chr>              <chr>           <chr>                     <chr>             <dbl>
1 Lagopus matu       Lagopus matu    Lagopus Brisson, 1760     Catalogue of Lif… 0.75 
2 Logopus muta       Logopus muta    Lagopus muta (Montin, 17… Catalogue of Lif… 0.75 
3 Lagopus lagopus    Lagopus lagopus Lagopus lagopus           Wikispecies       0.988
4 Lagopus muta       Lagopus muta    Lagopus muta              Wikispecies       0.988
5 Lagopas lagopus    Lagopas lagopus Lagopus lagopus (Linnaeu… Catalogue of Lif… 0.75 

Evidently, the best match for Lagopus matu (first row in the output) should be Lagopus muta as has been matched correctly in row four. Additionally, the matches to Lagopus lagopus (row 3) and Lagopas lagopus (row 5) ought to be the same - Lagopus lagopus.

Interestingly, even when running the gnr_resolve()function only on just the first species:

gnr_resolve(sci = sps[1], best_match_only = TRUE)

still results in the same erroneous match as above:

# A tibble: 1 × 5
  user_supplied_name submitted_name matched_name          data_source_title      score
* <chr>              <chr>          <chr>                 <chr>                  <dbl>
1 Lagopus matu       Lagopus matu   Lagopus Brisson, 1760 Catalogue of Life Che…  0.75

Workaround

For now, I have put together a workaround with the rgbif package:

library(rgbif)
Fixed_Species <- sapply(sps, # loop over species names
    FUN = function(x){
        gbif_resolve <- rgbif::name_backbone_verbose(x) # retrieve gbif backbone matches
        ifelse(gbif_resolve$data$matchType != "NONE", 
               gbif_resolve$data$canonicalName[1], # if match has been made, then pull matched canonical name
               gbif_resolve$alternatives$canonicalName # if no match, then pull out alternative matches from fuzzy matching
              )
    }
)

which, to me, leads to the expected matches:

    Lagopus matu      Logopus muta   Lagopus lagopus      Lagopus muta   Lagopas lagopus 
   "Lagopus muta"    "Lagopus muta" "Lagopus lagopus"    "Lagopus muta" "Lagopus lagopus" 
Session Info
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Oslo
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] taxize_0.9.100

loaded via a namespace (and not attached):
 [1] bold_1.3.0        gtable_0.3.4      jsonlite_1.8.7    crayon_1.5.2     
 [5] rgbif_3.7.7       dplyr_1.1.2       compiler_4.3.2    tidyselect_1.2.0 
 [9] Rcpp_1.0.11       xml2_1.3.4        stringr_1.5.0     parallel_4.3.2   
[13] scales_1.2.1      uuid_1.1-1        lattice_0.21-9    ggplot2_3.4.3    
[17] R6_2.5.1          plyr_1.8.8        generics_0.1.3    curl_5.0.2       
[21] oai_0.4.0         iterators_1.0.14  tibble_3.2.1      crul_1.4.0       
[25] munsell_0.5.0     pillar_1.9.0      rlang_1.1.1       utf8_1.2.3       
[29] httpcode_0.3.0    stringi_1.7.12    lazyeval_0.2.2    cli_3.6.1        
[33] magrittr_2.0.3    foreach_1.5.2     digest_0.6.31     grid_4.3.2       
[37] rstudioapi_0.15.0 lifecycle_1.0.3   nlme_3.1-163      vctrs_0.6.3      
[41] glue_1.6.2        data.table_1.14.8 whisker_0.4.1     zoo_1.8-12       
[45] codetools_0.2-19  ape_5.7-1         fansi_1.0.4       colorspace_2.1-0 
[49] conditionz_0.1.0  httr_1.4.7        tools_4.3.2       pkgconfig_2.0.3  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant