Merge pull request #110 from rrohwer/master

updates to clarify directions and add script to reformat dada2 output
McMahonLab · Jun 11, 2019 · 35e8693 · 35e8693
2 parents bd08f9d + 6eea40e
commit 35e8693
Show file tree

Hide file tree

Showing 6 changed files with 1,977 additions and 467 deletions.
diff --git a/FreshTrain-files/README.md b/FreshTrain-files/README.md
@@ -14,10 +14,16 @@ zipped file name        | description
 FreshTrain18Aug2016     | old version formatted for Greengenes (don't use)  
 FreshTrain25Jan2018Greengenes13_5.zip | current version formatted for Greengenes  
 FreshTrain30Apr2018SILVAv128.zip | current version formatted for SILVA v128  
-FreshTrain30Apr2018SILVAv132.zip | current version formatted for SILVA v132  
+**FreshTrain30Apr2018SILVAv132.zip** | **current version formatted for SILVA v132**  
 
-The different formats match the FreshTrain's coarse-level nomenclature to the nomenclature in the comprehensive database of choice. The FreshTrain defines lineage-clade-tribe (~family-genus-species) level phylogenies, so the phylum, class, and order names are changed in the different versions to be consistent with the chosen comprehensive database.
+The different formats match the FreshTrain's coarse-level nomenclature to the nomenclature in the comprehensive database of choice. The FreshTrain defines lineage-clade-tribe (~family-genus-species) level phylogenies, so the phylum, class, and order names are changed in the different FreshTrain versions to be consistent with the paired comprehensive database.  
 
 <br>
-The citation for the FreshTrain database is:  
-[Newton, R. J., Jones, S. E., Eiler, A., McMahon, K. D. & Bertilsson, S. A guide to the natural history of freshwater lake bacteria. Microbiol. Mol. Biol. Rev. 75, 14–49 (2011).](http://mmbr.asm.org/content/75/1/14.full)
+The citation for the original FreshTrain database and the arb version of it is:  
+
+[Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. 2011. A Guide to the Natural History of Freshwater Lake Bacteria. Microbiol Mol Biol Rev 75:14–49.](https://mmbr.asm.org/content/75/1/14.full)  The arb files are available at [github.com/McMahonLab/FWMFG](https://github.com/McMahonLab/FWMFG).
+
+<br>
+The citation for these taxonomy assignment-compatible formats of the FreshTrain and the TaxAss method is:  
+
+[Rohwer RR, Hamilton JJ, Newton RJ, McMahon KD. 2018. TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution. mSphere 3:e00327-18.](https://msphere.asm.org/content/3/5/e00327-18)  
diff --git a/README.md b/README.md
@@ -7,12 +7,9 @@ How do I TaxAss?
 
 **Step-by-step directions:** [tax-scripts/TaxAss_Directions.html](https://htmlpreview.github.io/?https://github.com/McMahonLab/TaxAss/blob/master/tax-scripts/TaxAss_Directions.html)
 
-Please cite our mSphere paper:  
-TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution
-Robin R Rohwer, Joshua J Hamilton, Ryan J Newton, Katherine D McMahon
-mSphere; doi: https://doi.org/10.1128/mSphere.00327-18
+**Please cite TaxAss:** [Rohwer RR, Hamilton JJ, Newton RJ, McMahon KD. 2018. TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution. mSphere 3:e00327-18.](https://msphere.asm.org/content/3/5/e00327-18)
 
-TaxAss uses a series of R, python, and bash scripts in addition to using BLAST+ and mothur's classify.seqs() command.  The scripts are sourced from the terminal window (mac or linux). You'll need to download this repository (green "Clone or download" button, top right), and then just add the tax-scripts folder to your working diriectory.
+TaxAss only assigns taxonomy, so you can use TaxAss after using mothur, dada2, vsearch, or whatever QC pipeline you prefer. TaxAss uses a series of R, python, and bash scripts in addition to using BLAST+ and mothur's classify.seqs() command.  The scripts are sourced from the terminal window (mac or linux). You'll need to download this repository (green "Clone or download" button, top right), and add the tax-scripts folder to your working diriectory.
 
 Where's the stuff I need?
 ---

diff --git a/tax-scripts/RunSteps_quickie.sh b/tax-scripts/RunSteps_quickie.sh
@@ -4,17 +4,19 @@
 # That means that you do not try different percent identity cutoffs to choose the best one.
 # That might make sense for you if you have already made a similarity choice, for example by
 # choosing a cutoff to cluster OTUs. Then just have pident match that cutoff.
+# In almost all of our test datasets we found a pident of 98 was best.
 # Note: this also skips the BLAST check (step 6). You could go back and just do that one.
 # Note: still run step 16 to tidy up.
+# Note: still gotta do the reformatting manually (step 0)
 
-# Choose pident.
+# USER CAN CHANGE THIS INPUT ---------------------------------
 
 pident=("98")
 fwbootstrap=("80")
 ggbootstrap=("80")
 processors=("2")
 
-# Note: still gotta do the reformatting manually (step 0)
+# -------------------------------------------------------------
 
 # 1
 makeblastdb -dbtype nucl -in custom.fasta -input_type fasta -parse_seqids -out custom.db &&