Skip to content

akrinos/2023-euk-diversity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Missing microbial eukaryotes and misleading meta-omic conclusions

Supplemental Code for Krinos et al. 2023

Arianna I. Krinos, Margaret Mars Brisbin, Sarah K. Hu, Natalie R. Cohen, Tatiana A. Rynearson, Michael J. Follows, Frederik Schulz, Harriet Alexander

Taxonomic annotation is a critical problem in environmental microbial meta-omics. In protists (single-celled microbial eukaryotes) in particular, complex genomes and incomplete databases pose important threats to accurate interpretation. We conducted a careful analysis of protistan meta-omic datasets in order to quantify the extent of this problem. We also propose a 2-stage approach that helps with more accurate estimation of uncertainty in microbial meta-omics.

This work would not have been possible without many very useful software tools, including but not limited to

And a couple of our own tools

Directory organization

Workhorse code code/snakemake-workflows

These workflows are deployed on the cluster for heavier-lift parts of this analysis. The outputs of these workflows are often used in the analysis notebooks.

  1. 01-scale-genus_eukulele - run EUKulele against the Phaeocystis databases (stored on Zenodo) for Scale 1 of the paper as written on bioRxiv
  2. 01-scale-genus_functional - run eggnog-mapper to functionally annotate Phaeocystis sequences from the Tara Oceans metagenomes
  3. 01-scale-genus_tree - run alignment and phylogenetic tree tools for the Phaeocystis references
  4. 02-scale-family_eukulele - run EUKulele against the sequences from Narragansett Bay, as appears in Figure 3 of the paper
  5. 03-scale-phylum_deepclust - run DIAMOND DeepClust against the sequences from the BATS dataset, including/excluding the sequences from phylum Retaria as described in the paper
  6. 03-scale-phylum_eukulele - run EUKulele against the sequences from the BATS dataset, including/excluding the sequences from phylum Retaria as described in the paper
  7. XX-scale-all_deepclust - run all scales of analysis through DIAMOND DeepClust to provide input to the tax-aliquots steps

Visualization notebooks

Each notebook is connected to one of the main text and/or supplemental figures in the final paper. Data needed to run these notebooks can be generated by downloading source datasets and running the Snakemake workflows from the section above.

Notebooks are named according to the convention:

XXFIG_<descriptor>.ipynb

where "XX" will either tell you which figure this notebook was connected to, if a main text figure, or "XX" if strictly supplemental. "FIG" tells you that this is a figure notebook, and the descriptor provides more details about the notebook's objective(s).

About

Clean version of repo for 2023 "euk-diversity" paper on taxonomic annotation of meta-omic sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages