Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update microorganisms data set to latest taxonomy #135

Open
msberends opened this issue Feb 21, 2024 · 9 comments
Open

Update microorganisms data set to latest taxonomy #135

msberends opened this issue Feb 21, 2024 · 9 comments
Labels
enhancement New feature or request
Milestone

Comments

@msberends
Copy link
Owner

No description provided.

@msberends
Copy link
Owner Author

Remember to add the taxons mentioned in #131

@silverfoxdoc
Copy link

silverfoxdoc commented Mar 10, 2024

hopefully will be fixed by an update to microorganism data set but can't seem to match Clavispora (Candida) lusitaniae. Also hoping that Nakaseomyces glabratus and Pichia kudriavzevii will arrive with update and their old names redirected to these new names, as lab are moving over to these new names.

@msberends
Copy link
Owner Author

msberends commented Mar 11, 2024

That’s true, it’s a discussion at our lab as well. Do you know if it’s formal already? Meaning, accepted by authoritative taxonomy sources? I’m well aware of the publications suggesting these changes, but do you know whether they are formally adopted already?

@silverfoxdoc
Copy link

Afraid I don't know for sure about the International Taxonomy groups definitely accepting all these changes. This is a useful paper detailing the most common changes from a medical perspective if you've not already seen it https://doi.org/10.1093/ofid/ofac559

It may well be a moving target unfortunately...

@msberends
Copy link
Owner Author

Yes, I know that one. Unfortunately, Open Forum Infectious Diseases is ‘just’ a journal, not a taxonomically reliable source. The results/propositions of such papers must be ratified by taxonomic sources first.

But I found on MycoBank, a great and reliable taxonomic source for fungi, that they do have new names for many Candida species already. I found a couple of inconsistencies though, I’ll share them here and hope that we both could have a look at it. Will be later this week probably.

@msberends
Copy link
Owner Author

From #144, lookup these:

  • Penicillium marneffei
  • Candida rugosa
  • Candida utilis

Perhaps they are in MycoBank?

@silverfoxdoc
Copy link

silverfoxdoc commented May 5, 2024

So I've had a further look at this.

I've taken the OFID paper as a starting point. I initially compared new and old OFID names to Mycobank downloadable dataset but then discovered rgbif package which provides a wrapper to GBIF databases and provides quite a nice way to check things without downloading massive files.

I've produced a reprex which may hopefully help with deciding which toxonomic names to go with. It's not perfect but hopefully can help. In an ideal world I would probably use all the names within a biotech company's MALDI-TOF database and cross reference taxon name to GBIF status.

I suspect biotech are never going to give that out kind of information and it is probably not allowed as part of licence when using their software pull that kind of info from the instrument, even if it is possible, due to it being commercially sensitive.

It's probably inevitable biotech companies will progress to using "new" names and to me it seems reasonable to use GBIF as the reference standard, even if the GBIF accepted taxonomic name isn't what we use most often in human/vet medicine, as long we can convert all the synonyms to this reference taxonomix name.

I've corrected some spelling mistakes within the OFID paper names. Some of the old OFID names are the accepted taxonomic name in GBIF but do not have a species column when GBIF is interograted and therefore return NA. Hopefully the column names make sense. The GBIF search function also seems to have some fuzzy logic behind it, so some of the names get matched imprecisely e.g. "Emmonsia "species 3"" gets matched to "Emmonisa".

library(tidyverse)
library(rvest)
library(rgbif)

# print all tbl rows
options(pillar.print_max = Inf)

#download and extract OFID paper tables
url <- "https://academic.oup.com/ofid/article/10/1/ofac559/6974385"

tables <- url %>% 
  read_html() %>%
  html_table() %>% 
  .[seq(1, 24, 4)]# for some reason it is extracting each table 4 times

# cleean OFID tables up
clean_ofid_tlbs <- function(x) {
  
  janitor::clean_names(x) %>% 
    mutate(current_name = str_replace_all(current_name, "([a-z])(?=[A-Z][^A-Z]+)", "\\1 ")) %>% 
    separate_longer_delim(current_name, delim = regex("\\s(?=[A-Z])")) %>% 
    separate_longer_delim(previous_name_s, delim = regex("\\s(?=var )")) %>% 
    separate_longer_delim(previous_name_s, delim = ",") %>% 
    mutate(
      current_name = str_replace_all(
        current_name,
        c("Nakaseomyces bracarensisa" = "Nakaseomyces bracarensis",
          "Nakaseomyces glabrataa" = "Nakaseomyces glabratus",
          "Nakaseomyces nivariensisa" = "Nakaseomyces nivariensis",
          "Paracoccidioides restrepoanaa" = "Paracoccidioides restrepoana",
          "Talaromyces marneffeib" = "Talaromyces marneffei",
          "Moesziomyces antarticus" = "Moesziomyces antarcticus",
          "Apiotricum domesticum" = "Apiotrichum domesticum",
          "Trematospheria grisea" = "Trematosphaeria grisea",
          "Rhizopus arrhizus var delemar" = "Rhizopus arrhizus var. delemar"
        )
      ),
      current_name = str_remove(current_name, "\\(varieties no longer recognized\\)"),
      current_name = case_when(current_name == "" ~ NA,
                               .default = current_name),
      previous_name_s = str_replace_all(
        previous_name_s,
        c("var interdigitale" = "Trichophyton mentagrophytes var. interdigitale",
          "var mentagrophytes" = "Trichophyton mentagrophytes var. mentagrophytes",
          "genotype VIII" = "Trichophyton mentagrophytes genotype VIII",
          "var chinensis" = "Rhizopus microsporus var. chinensis",
          "var oligosporus" = "Rhizopus microsporus var. oligosporus",
          "var rhizopodiformis" = "Rhizopus microsporus var. rhizopodiformis"
          )
      )
    ) %>% 
    drop_na() %>% 
    rename(ofid_old = previous_name_s, ofid_new = current_name) %>% 
    select(ofid_old, ofid_new)
  
}

ofid_tbls <- map(tables, clean_ofid_tlbs)


# Check names against GBIF backbone dataset to see if they are synomyn or accepted name
check_gbif_synonym <- function(x) {
  
  mutate(x, 
         ofid_old_GBIF_ = rgbif::name_backbone_checklist(ofid_old)["status"],
         GBIF_ = rgbif::name_backbone_checklist(ofid_old)["species"],
         .after = ofid_old) %>% 
    mutate(GBIF_ofid_new_match = case_when(GBIF_$species == ofid_new ~ TRUE,
                             GBIF_$species != ofid_new ~ FALSE,
                             GBIF_$species == NA_character_ ~ FALSE,
                             ),
           .after = ofid_new)
  
}

syn_status <- map(ofid_tbls, check_gbif_synonym)

syn_status
#> [[1]]
#> # A tibble: 41 × 5
#>    ofid_old     ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#>    <chr>        <chr>                 <chr>         <chr>    <lgl>              
#>  1 Candida bra… SYNONYM               Nakaseomyces… Nakaseo… TRUE               
#>  2 Candida cat… SYNONYM               Diutina cate… Diutina… TRUE               
#>  3 Candida col… ACCEPTED              Candida coll… Torulas… FALSE              
#>  4 Candida fab… SYNONYM               Cyberlindner… Cyberli… TRUE               
#>  5 Candida fam… ACCEPTED              <NA>          Debaryo… NA                 
#>  6 Candida gla… SYNONYM               Nakaseomyces… Nakaseo… TRUE               
#>  7 Candida gui… SYNONYM               Meyerozyma g… Meyeroz… TRUE               
#>  8 Candida kru… SYNONYM               Issatchenkia… Pichia … FALSE              
#>  9 Candida kef… SYNONYM               Kluyveromyce… Kluyver… TRUE               
#> 10 Candida pse… SYNONYM               Kluyveromyce… Kluyver… TRUE               
#> 11 Candida lip… SYNONYM               Yarrowia lip… Yarrowi… TRUE               
#> 12 Candida lus… SYNONYM               Clavispora l… Clavisp… TRUE               
#> 13 Candida niv… SYNONYM               Nakaseomyces… Nakaseo… TRUE               
#> 14 Candida neo… SYNONYM               Diutina neor… Diutina… TRUE               
#> 15 Candida nor… SYNONYM               Pichia norve… Pichia … TRUE               
#> 16 Candida par… SYNONYM               Wickerhamiel… Diutina… FALSE              
#> 17 Candida pel… SYNONYM               Wickerhamomy… Wickerh… TRUE               
#> 18 Pichia anom… SYNONYM               Wickerhamomy… Wickerh… TRUE               
#> 19 Candida pse… SYNONYM               Diutina pseu… Diutina… TRUE               
#> 20 Candida rug… SYNONYM               Diutina rugo… Diutina… TRUE               
#> 21 Cryptococcu… SYNONYM               Naganishia a… Naganis… TRUE               
#> 22 Cryptococcu… SYNONYM               Cutaneotrich… Cutaneo… FALSE              
#> 23 Cryptococcu… SYNONYM               Cutaneotrich… Cutaneo… TRUE               
#> 24 Cryptococcu… SYNONYM               Papiliotrema… Papilio… TRUE               
#> 25 Pseudozyma … SYNONYM               Moesziomyces… Moeszio… TRUE               
#> 26 Pseudozyma … SYNONYM               Moesziomyces… Moeszio… TRUE               
#> 27 Pseudozyma … SYNONYM               Dirkmeia chu… Dirkmei… TRUE               
#> 28 Pseudozyma … ACCEPTED              <NA>          Triodio… NA                 
#> 29 Pseudozyma … SYNONYM               Moesziomyces… Moeszio… TRUE               
#> 30 Pseudozyma … ACCEPTED              <NA>          Ustilag… NA                 
#> 31 Geotrichum … SYNONYM               Saprochaete … Magnusi… FALSE              
#> 32 Geotrichum … SYNONYM               Magnusiomyce… Magnusi… TRUE               
#> 33 Saprochaete… SYNONYM               Magnusiomyce… Magnusi… TRUE               
#> 34 Pichia ohme… SYNONYM               Kodamaea ohm… Kodamae… TRUE               
#> 35 Trichosporo… SYNONYM               Cutaneotrich… Cutaneo… TRUE               
#> 36 Trichosporo… SYNONYM               Cutaneotrich… Cutaneo… TRUE               
#> 37 Trichosporo… SYNONYM               Apiotrichum … Apiotri… TRUE               
#> 38 Trichosporo… SYNONYM               Apiotrichum … Apiotri… TRUE               
#> 39 Trichosporo… SYNONYM               Cutaneotrich… Cutaneo… TRUE               
#> 40 Trichosporo… SYNONYM               Apiotrichum … Apiotri… TRUE               
#> 41 Trichosporo… ACCEPTED              Trichosporon… Apiotri… FALSE              
#> 
#> [[2]]
#> # A tibble: 37 × 5
#>    ofid_old     ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#>    <chr>        <chr>                 <chr>         <chr>    <lgl>              
#>  1 Acremonium … SYNONYM               Sarocladium … Sarocla… TRUE               
#>  2 Acremonium … SYNONYM               Gliomastix r… Gliomas… TRUE               
#>  3 Acremonium … SYNONYM               Sarocladium … Sarocla… TRUE               
#>  4 Arthroderma… SYNONYM               Trichophyton… Trichop… TRUE               
#>  5 Cerinosteru… SYNONYM               Quambalaria … Quambal… TRUE               
#>  6 Sporothrix … SYNONYM               Quambalaria … Quambal… TRUE               
#>  7 Fusarium di… SYNONYM               Bisifusarium… Bisifus… TRUE               
#>  8 Fusarium fa… ACCEPTED              Fusarium fal… Neocosm… FALSE              
#>  9 Acremonium … SYNONYM               Fusarium fal… Neocosm… FALSE              
#> 10 Fusarium ke… ACCEPTED              Fusarium ker… Neocosm… FALSE              
#> 11 Fusarium li… ACCEPTED              Fusarium lic… Neocosm… FALSE              
#> 12 Fusarium pe… ACCEPTED              Fusarium pet… Neocosm… FALSE              
#> 13 Fusarium so… ACCEPTED              Fusarium sol… Neocosm… FALSE              
#> 14 Geosmithia … SYNONYM               Rasamsonia a… Rasamso… TRUE               
#> 15 Penicillium… SYNONYM               Rasamsonia a… Rasamso… TRUE               
#> 16 Gibberella … ACCEPTED              Gibberella f… Fusariu… FALSE              
#> 17 Lecythophor… SYNONYM               Coniochaeta … Conioch… TRUE               
#> 18 Phialophora… SYNONYM               Coniochaeta … Conioch… TRUE               
#> 19 Microsporum… SYNONYM               Paraphyton c… Paraphy… TRUE               
#> 20 Microsporum… ACCEPTED              Microsporum … Nannizz… FALSE              
#> 21 Microsporum… ACCEPTED              Microsporum … Lophoph… FALSE              
#> 22 Microsporum… ACCEPTED              Microsporum … Nannizz… FALSE              
#> 23 Microsporum… ACCEPTED              Microsporum … Nannizz… FALSE              
#> 24 Microsporum… SYNONYM               Trichophyton… Nannizz… FALSE              
#> 25 Neosartorya… SYNONYM               Aspergillus … Aspergi… TRUE               
#> 26 Neosartorya… SYNONYM               Aspergillus … Aspergi… TRUE               
#> 27 Aspergillus… SYNONYM               Aspergillus … Aspergi… TRUE               
#> 28 Neosartorya… SYNONYM               Aspergillus … Aspergi… TRUE               
#> 29 Paecilomyce… SYNONYM               Purpureocill… Purpure… TRUE               
#> 30 Paecilomyce… SYNONYM               Marquandomyc… Marquan… TRUE               
#> 31 Penicillium… SYNONYM               Talaromyces … Talarom… TRUE               
#> 32 Penicillium… SYNONYM               Talaromyces … Talarom… TRUE               
#> 33 Trichophyto… SYNONYM               Arthroderma … Arthrod… TRUE               
#> 34 Trichophyto… ACCEPTED              Trichophyton… Arthrod… FALSE              
#> 35 Trichophyto… SYNONYM               Trichophyton… Trichop… FALSE              
#> 36 Trichophyto… SYNONYM               Trichophyton… Trichop… TRUE               
#> 37 Trichophyto… ACCEPTED              Trichophyton… Trichop… FALSE              
#> 
#> [[3]]
#> # A tibble: 27 × 5
#>    ofid_old     ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#>    <chr>        <chr>                 <chr>         <chr>    <lgl>              
#>  1 Emmonsia cr… ACCEPTED              <NA>          Emergom… NA                 
#>  2 Emmonsia he… SYNONYM               Blastomyces … Blastom… TRUE               
#>  3 Emmonsia pa… SYNONYM               Blastomyces … Blastom… TRUE               
#>  4 Emmonsia so… ACCEPTED              Emmonsia soli Emergom… FALSE              
#>  5 Emmonsia “s… ACCEPTED              <NA>          Blastom… NA                 
#>  6 Emmonsia “s… ACCEPTED              <NA>          Emergom… NA                 
#>  7 Emmonsia pa… SYNONYM               Emergomyces … Emergom… TRUE               
#>  8 Histoplasma… ACCEPTED              <NA>          Histopl… NA                 
#>  9 Histoplasma… ACCEPTED              <NA>          Histopl… NA                 
#> 10 Histoplasma… ACCEPTED              <NA>          Histopl… NA                 
#> 11 Histoplasma… ACCEPTED              <NA>          Histopl… NA                 
#> 12 Lacazia lob… SYNONYM               Paracoccidio… Paracoc… TRUE               
#> 13 Paracoccidi… ACCEPTED              Paracoccidio… Paracoc… FALSE              
#> 14 Paracoccidi… ACCEPTED              Paracoccidio… Paracoc… FALSE              
#> 15 Paracoccidi… ACCEPTED              Paracoccidio… Paracoc… FALSE              
#> 16 Paracoccidi… ACCEPTED              Paracoccidio… Paracoc… FALSE              
#> 17 Paracoccidi… ACCEPTED              Paracoccidio… Paracoc… FALSE              
#> 18 Penicillium… SYNONYM               Talaromyces … Talarom… TRUE               
#> 19 Sporothrix … ACCEPTED              Sporothrix s… Sporoth… FALSE              
#> 20 Sporothrix … ACCEPTED              Sporothrix s… Sporoth… FALSE              
#> 21 Sporothrix … ACCEPTED              Sporothrix s… Sporoth… FALSE              
#> 22 Sporothrix … ACCEPTED              Sporothrix s… Sporoth… FALSE              
#> 23 Sporothrix … ACCEPTED              Sporothrix p… Sporoth… FALSE              
#> 24 Sporothrix … ACCEPTED              Sporothrix p… Sporoth… FALSE              
#> 25 Sporothrix … ACCEPTED              Sporothrix p… Sporoth… FALSE              
#> 26 Sporothrix … ACCEPTED              Sporothrix p… Sporoth… FALSE              
#> 27 Sporothrix … ACCEPTED              Sporothrix p… Sporoth… FALSE              
#> 
#> [[4]]
#> # A tibble: 9 × 5
#>   ofid_old      ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#>   <chr>         <chr>                 <chr>         <chr>    <lgl>              
#> 1 Bipolaris au… SYNONYM               Curvularia a… Curvula… TRUE               
#> 2 Bipolaris ha… SYNONYM               Curvularia h… Curvula… TRUE               
#> 3 Bipolaris sp… SYNONYM               Curvularia s… Curvula… TRUE               
#> 4 Ochroconis g… SYNONYM               Verruconis g… Verruco… TRUE               
#> 5 Phialophora … SYNONYM               Pleurostoma … Pleuros… TRUE               
#> 6 Pseudallesch… ACCEPTED              Pseudallesch… Scedosp… FALSE              
#> 7 Ramichloridi… SYNONYM               Rhinocladiel… Rhinocl… TRUE               
#> 8 Ramichloridi… SYNONYM               Myrmecridium… Myrmecr… TRUE               
#> 9 Scedosporium… ACCEPTED              Scedosporium… Lomento… FALSE              
#> 
#> [[5]]
#> # A tibble: 8 × 5
#>   ofid_old      ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#>   <chr>         <chr>                 <chr>         <chr>    <lgl>              
#> 1 Leptosphaeri… SYNONYM               Falciformisp… Falcifo… TRUE               
#> 2 Leptosphaeri… SYNONYM               Falciformisp… Falcifo… TRUE               
#> 3 Scytalidium … SYNONYM               Neoscytalidi… Neoscyt… TRUE               
#> 4 Scytalidium … SYNONYM               Neoscytalidi… Neoscyt… TRUE               
#> 5 Hendersonula… SYNONYM               Neoscytalidi… Nattras… FALSE              
#> 6 Pyrenochaeta… SYNONYM               Medicopsis r… Medicop… TRUE               
#> 7 Pyrenochaeta… SYNONYM               Nigrograna m… Nigrogr… TRUE               
#> 8 Madurella gr… SYNONYM               Trematosphae… Tremato… TRUE               
#> 
#> [[6]]
#> # A tibble: 13 × 5
#>    ofid_old     ofid_old_GBIF_$status GBIF_$species ofid_new GBIF_ofid_new_match
#>    <chr>        <chr>                 <chr>         <chr>    <lgl>              
#>  1 Absidia cor… SYNONYM               Lichtheimia … Lichthe… TRUE               
#>  2 Mycocladus … SYNONYM               Lichtheimia … Lichthe… TRUE               
#>  3 Rhizopus az… SYNONYM               Rhizopus mic… Rhizopu… TRUE               
#>  4 Rhizopus de… SYNONYM               Rhizopus arr… Rhizopu… FALSE              
#>  5 Rhizopus mi… ACCEPTED              Rhizopus mic… Rhizopu… TRUE               
#>  6 Rhizopus mi… SYNONYM               Rhizopus mic… Rhizopu… TRUE               
#>  7 Rhizopus mi… SYNONYM               Rhizopus mic… Rhizopu… TRUE               
#>  8 Rhizopus mi… SYNONYM               Rhizopus mic… Rhizopu… TRUE               
#>  9 Rhizopus or… SYNONYM               Rhizopus arr… Rhizopu… TRUE               
#> 10 Rhizomucor … SYNONYM               Mucor irregu… Mucor i… TRUE               
#> 11 Saksenaea v… ACCEPTED              Saksenaea va… Saksena… FALSE              
#> 12 Saksenaea v… ACCEPTED              Saksenaea va… Saksena… FALSE              
#> 13 Saksenaea v… ACCEPTED              Saksenaea va… Saksena… FALSE

Created on 2024-05-05 with reprex v2.1.0

@silverfoxdoc
Copy link

In addtion to the above I have identified some other issues which came up when using AMR on some data. The following were the problems I identified from the output of mo_renamed()

Mycobacterium bovis            Mycobacterium tuberculosis            Karlson et al., 1970          Riojas et al., 2018       
Salmonella arizonae            Salmonella enterica arizonae          Kauffmann, 1964               Le Minor et al., 1987     
Salmonella enteritidis         Salmonella enterica                   Castellani et al., 1919       Le Minor et al., 1987     
Stenotrophomonas               Xanthomonas                           Ouattara et al., 2017         Vauterin et al., 1995 

Although LPSN lists M. bovis taxonomically as M. tb, UK Mycobacterial reference labs still refer to it as M. bovis -- this has clinical implications as M. bovis is intrinsically resistant to pyrazinamide and therefore requires a longer treatment duration than M. tb standard short course therapy.

Salmonellae are always difficult but I thought Enteritidis was a serovar and so should be Salmonella enterica Enteritidis. This is the current WHO acrredited list for Salmonella serovars in case you don't have it https://www.pasteur.fr/sites/default/files/veng_0.pdf

rgbif::name_backbone("Stenotrophomonas") interogation comes up with Stenotrophomonas being the accepted GBIF name

@msberends
Copy link
Owner Author

Thanks for the great work up!

GBIF is not very up to date with bacterial taxonomy - if they release in November (which they do annually) then there are still hundreds of outdated species according to LPSN that strictly follows IJSEM publications. But the rgbif approach is nice.

I’ll look deeper into what you mentioned here, great to have this as a reference, so many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants