Skip to content

Commit

Permalink
Merge pull request #109 from rrohwer/master
Browse files Browse the repository at this point in the history
Minor Edits to Reflect mSphere publication
  • Loading branch information
rrohwer committed Sep 4, 2018
2 parents a7b218b + 5bccb66 commit bd08f9d
Show file tree
Hide file tree
Showing 29 changed files with 153 additions and 54 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,15 @@ tax-scripts/.RData
FreshTrain-files/gg_13_5.fasta
FreshTrain-files/gg_13_5.taxonomy

# Expanded FreshTrain Folders
# Expanded FreshTrain Folders and unreleased FreshTrain versions
FreshTrain-files/FreshTrain18Aug2016/
FreshTrain-files/FreshTrain11Feb2016/
FreshTrain-files/FreshTrain11Feb2016.zip
FreshTrain-files/FreshTrain25Jan2018Greengenes13_5/
FreshTrain-files/FreshTrain25Mar2018SILVAv128/
FreshTrain-files/FreshTrain25Mar2018SILVAv128.zip
FreshTrain-files/FreshTrain25Mar2018SILVAv132/
FreshTrain-files/FreshTrain25Mar2018SILVAv132.zip
FreshTrain-files/FreshTrain30Apr2018SILVAv128/
FreshTrain-files/FreshTrain30Apr2018SILVAv132/

Expand Down
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ How do I TaxAss?

**Step-by-step directions:** [tax-scripts/TaxAss_Directions.html](https://htmlpreview.github.io/?https://github.com/McMahonLab/TaxAss/blob/master/tax-scripts/TaxAss_Directions.html)

For now, please cite our preprint:
TaxAss: Leveraging Custom Databases Achieves Fine-Scale Taxonomic Resolution
Robin Rebecca Rohwer, Joshua J Hamilton, Ryan J Newton, Katherine D McMahon
bioRxiv 214288; doi: https://doi.org/10.1101/214288
Please cite our mSphere paper:
TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution
Robin R Rohwer, Joshua J Hamilton, Ryan J Newton, Katherine D McMahon
mSphere; doi: https://doi.org/10.1128/mSphere.00327-18

TaxAss uses a series of python, R, and bash scripts in addition to using BLAST+ and mothur's classify.seqs() command. The scripts are sourced from the terminal window (mac or linux). You'll need to download this repository (green "Clone or download" button, top right), and then just add the tax-scripts folder to your working diriectory.
TaxAss uses a series of R, python, and bash scripts in addition to using BLAST+ and mothur's classify.seqs() command. The scripts are sourced from the terminal window (mac or linux). You'll need to download this repository (green "Clone or download" button, top right), and then just add the tax-scripts folder to your working diriectory.

Where's the stuff I need?
---
Expand All @@ -24,16 +24,17 @@ Where's the stuff I need?

**The stuff you can ignore:**
1. Scripts to process & edit the FreshTrain arb files are in the `arb-scripts` folder
2. Scripts for making poster/paper figures are in `figure-scripts` folder.
2. Scripts for making the paper figures are in the `figure-scripts` folder.
3. A pdf of my ISME16 poster (that might be a good overview): `poster-ISME16.pdf`

What if it doesn't work?
---

Please tell me! If you find a bug (or if you get confused) I want to fix it/help you. If you're all github savvy you can submit an issue, or you can just send me an e-mail with TaxAss in the title: robin.rohwer@gmail.com
Please tell me! If you find a bug (or if you get confused) I want to fix it/help you. If you're all github savvy you can submit an issue, or you can just send me an e-mail with TaxAss in the subject: robin.rohwer@gmail.com

Who made TaxAss?
---
The McMahon lab studies lake microbial ecology (and some of it also studies wastewater). We are at the Univerisity of Wisconsin-Madison.
Lab Website: https://mcmahonlab.wisc.edu/
Lab Twitter: @mcmahonlab
Lab Twitter: @mcmahonlab
Robin Twitter: @RobinRohwer
14 changes: 4 additions & 10 deletions arb-scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,5 @@
These are scripts for reformatting the ARB output of the Freshwater Training Set
# README arb-scripts

Robin's in the process of pulling them together into a repeatable workflow.
That's gonna be in this repo soon.

The .pl script is the one Trina uses already.
The ones Robin's adding are to address bugs she's encountered while working on 16STaxAss.

When Robin is not writing code, she enjoys referring to herself in the 3rd person.

RRR 2-8-16
These are scripts for reformatting the ARB output of the Freshwater Training Set.

This folder also includes the scripts and files used for the TaxAss manuscript's Marathonas Validation (fig 1 in paper). These are located inside the `Marathonas_test` folder.
16 changes: 16 additions & 0 deletions figure-scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# README figure-scripts

These scripts generate the final versions of figures and tables for the paper.
Many of the figures and tables are already generated in the workflow, but this is
the version of the script where I tweak colors, dimensions, resolution, etc.
For tables, a csv with the data is generated that msphere later formatted pretty.

Also included are figures generated for my ISME16 poster, figures that didn't make
the final cut, and a script for finding all the stats listed in the text.

The folder structure for reproducing the manuscript's data is `TaxAss-BatchFiles.zip`.
This folder pairs with the directions in `README-TaxAss-BatchFiles.html`.

The Marathonas validation batch files and directions are inside the `arb-scripts` folder.
The directions are in the file `arb-scripts/Marathonas_test/README_marathonas_validation.html`
and all necessary files are also in that folder.
29 changes: 0 additions & 29 deletions figure-scripts/README.txt

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ file.path.summary.table.v3v4 <- "../arb-scripts/Marathonas_test/v3v4_mara/v3v4_r
file.path.summary.table.v4v5 <- "../arb-scripts/Marathonas_test/v4v5_mara/v4v5_results/summary_table.rds"


# file.path.created.plot <- "~/Dropbox/PhD/Write\ It/draft\ 7/res-submission_figures/mara_validation.pdf"
# file.path.created.plot <- "~/Dropbox/PhD/TaxAss_manuscript/draft 7/re-submission_figures/mara_validation-referencelabel.pdf"
# file.path.created.table <- "~/Dropbox/PhD/Write\ It/draft\ 7/res-submission_figures/mara_validation.csv"

# ---- functions ----
Expand Down Expand Up @@ -69,6 +69,7 @@ make.example.table <- function(){ # all lazy calls to global env
col.wrong <- adjustcolor(col = "darkred", alpha.f = .3)
col.vector <- c(col.correct, col.correct, col.correct, col.correct, col.under, col.wrong, col.wrong, col.wrong)

# these are actual examples I found in the results, not just made-up examples
left.text.vect <- c("acI-A1", "bacI-unclassified", "n/a", "acI-A", "acI-B1", "n/a", "betI-A","acI-C")
right.text.vect <- c("acI-A1","bacI-unclassified", "Microcystaceae","acI", "Actinobacteria", "LD19","betI-B","acI-C1")

Expand All @@ -85,7 +86,7 @@ make.example.table <- function(){ # all lazy calls to global env

middle.left <- box.lefts + (box.middles - box.lefts) * (1/2)
middle.right <- box.middles + (box.rights - box.middles) * (1/2)
mtext(text = c("Correct","TaxAss"), side = 1, line = .5, at = c(middle.left, middle.right), cex = label.cex)
mtext(text = c("Reference","TaxAss"), side = 1, line = .5, at = c(middle.left, middle.right), cex = label.cex)
mtext(text = "Examples of Each Category:", side = 3, line = 1, at = box.middles, cex = label.cex)

}
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file added figure-scripts/text_S1_BLAST_example_calc.pdf
Binary file not shown.
6 changes: 4 additions & 2 deletions tax-scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
Scripts for 16S Taxonomy Assignment Workflow
README tax-scripts
===

This folder contains the scripts used for OTU assignment. Detailed descriptions of them are in the `Clean Workflow.txt` document.
This folder contains the TaxAss scripts. Detailed descriptions of them are in `TaxAss_Directions.html`. You can read the directions online here: [tax-scripts/TaxAss_Directions.html](https://htmlpreview.github.io/?https://github.com/McMahonLab/TaxAss/blob/master/tax-scripts/TaxAss_Directions.html)

To use TaxAss, download this folder, add your data and databases, and use it as your working directory.
3 changes: 2 additions & 1 deletion tax-scripts/TaxAss_Directions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,8 @@ _i.e._ you're choosing a pident from the start, and just making the taxonomy tab
- **with headers (column names), but no row names**
- **no "totals" row/column**
(as might have been added by mother)
- **numbers only for all abundance values**
- **numbers only for all abundance values**
Note that if you had a sample that failed sequencing, there might be no reads left in it after QC. This sample should be removed before TaxAss or normalizing it could cause a NaN non-number value.

```
colname colname colname colname colname
Expand Down
6 changes: 5 additions & 1 deletion tax-scripts/TaxAss_Directions.html
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
Expand Down Expand Up @@ -725,7 +728,8 @@ <h3><code>.abund</code> file format</h3>
</li>
<li><strong>no “totals” row/column</strong><br />
(as might have been added by mother)</li>
<li><strong>numbers only for all abundance values</strong></li>
<li><strong>numbers only for all abundance values</strong><br />
Note that if you had a sample that failed sequencing, there might be no reads left in it after QC. This sample should be removed before TaxAss or normalizing it could cause a NaN non-number value.</li>
</ul>
<pre><code>colname colname colname colname colname
seqID Abundance Abundance Abundance Abundance
Expand Down
Empty file modified tax-scripts/calc_full_length_pident.R
100644 → 100755
Empty file.
Empty file modified tax-scripts/create_fastas_given_seqIDs.py
100644 → 100755
Empty file.
Empty file modified tax-scripts/filter_seqIDs_by_pident.R
100644 → 100755
Empty file.
Empty file modified tax-scripts/find_classification_disagreements.R
100644 → 100755
Empty file.
Empty file modified tax-scripts/find_seqIDs_blast_removed.py
100644 → 100755
Empty file.
Empty file modified tax-scripts/force_consistent_unclassifieds_on_database.R
100644 → 100755
Empty file.
Empty file modified tax-scripts/plot_blast_hit_stats.R
100644 → 100755
Empty file.
Empty file modified tax-scripts/plot_classification_disagreements.R
100644 → 100755
Empty file.
Empty file modified tax-scripts/plot_classification_improvement.R
100644 → 100755
Empty file.
106 changes: 106 additions & 0 deletions tax-scripts/reformat_dada2_seqtabs.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# RRR 2018-7-20

# dada2 has an internal implementation of the Wang classifier/ RDP classifier / mothur default / TaxAss choice
# BUT, just keep using mothur with TaxAss because formatting is painful.
# this script takes the dada2 output file (seqtab_nochim) and cretes otus.fasta and otus.abund

# command line syntax:

# Rscript reformat.dada2.seqtabs.R seqtab_nochim.rds otus.fasta otus.abund

# ---- Accept Arguments from Terminal Command Line ----

userprefs <- commandArgs(trailingOnly = TRUE)
path.to.seqtab <- userprefs[1]
fasta.output <- userprefs[2]
abund.output <- userprefs[3]

cat("fuck you forgot to comment out the file paths in reformat_dada2_seqtabs.R!")
path.to.seqtab <- "/Users/athena/Desktop/dada2-meV45/dada2/seqtab_nochim.rds"
fasta.output <- "~/Desktop/otus.fasta"
abund.output <- "~/Desktop/otus.abund"
count.output <- "/Users/athena/Desktop/dada2-meV45/taxass_gg/data/otus.count"

# ---- define functions ----

import.dada2.file <- function(path){
is.rds <- grepl(pattern = "*.rds", x = path)
if (is.rds){
seqtab_nochim <- readRDS(file = path)
return(seqtab_nochim)
}else{
cat("input must be rds file ending in \".rds\"\nSave it in this format at the end of dada2 pipeline using\nsaveRDS(object = seqtab_nochim, file = \"filename\")")
}
}

make.fasta.file <- function(dadatable, fasta.path){
fasta.seqs <- colnames(dadatable)
otu.names <- paste("otu", 1:length(fasta.seqs), sep = "_")
fasta.names <- paste(">", otu.names, sep = "")
fasta.file <- paste(fasta.names, fasta.seqs, sep = "\n")
write.table(x = fasta.file, file = fasta.path, quote = F, row.names = F, sep = "\n", col.names = F)
cat("Made file: ", fasta.path, "\n")
return(otu.names)
}

find.zero.samples <- function(tot.reads, otu.table){
read.stats <- boxplot(x = tot.reads, plot = F)
cat("These samples have outlier read counts. Samples with zero reads are being removed.\n",paste(names(read.stats$out), " : ", read.stats$out, sep = "", "\n"))
index <- which(tot.reads == 0)
return(index)
}

convert.to.rel.abund <- function(OTUs){
sample.totals <- rowSums(OTUs)
# vectors are applied to matrices by stepping down rows in a column
norm.otus <- OTUs / sample.totals * 100
return(norm.otus)
}

reformat.for.taxass <- function(OTUs){
# rows = OTUs, cols = samples, col 1 = seqIDs
otu.table <- t(OTUs)
otu.table <- cbind(row.names(otu.table), otu.table)
colnames(otu.table)[1] <- "seqID"
return(otu.table)
}

make.abund.file <- function(OTUs, FilePath){
write.table(x = OTUs, file = FilePath, quote = FALSE, sep = "\t", row.names = FALSE)
cat("Made file: ", FilePath, "\n")
}

make.count.file <- function(Count, FilePath){
Count <- cbind(names(Count), Count)
colnames(Count) <- c("Sample.Name", "Total.Reads")
write.table(x = Count, file = FilePath, quote = F, sep = "\t", row.names = F)
cat("Made file: ", FilePath, "\n")
}

# ---- use functions ----

seqtab_nochim <- import.dada2.file(path = path.to.seqtab)

seqIDs <- make.fasta.file(dadatable = seqtab_nochim, fasta.path = fasta.output)

colnames(seqtab_nochim) <- seqIDs

read.count <- rowSums(seqtab_nochim)

index <- find.zero.samples(tot.reads = read.count, otu.table = seqtab_nochim)
if(length(index) > 0){
read.count <- read.count[-index]
seqtab_nochim <- seqtab_nochim[-index, ]
}

seqtab_nochim <- convert.to.rel.abund(OTUs = seqtab_nochim)

seqtab_nochim <- reformat.for.taxass(OTUs = seqtab_nochim)

make.abund.file(OTUs = seqtab_nochim, FilePath = abund.output)

make.count.file(Count = read.count, FilePath = count.output)




Empty file modified tax-scripts/reformat_mothur_OTU_tables.R
100644 → 100755
Empty file.

0 comments on commit bd08f9d

Please sign in to comment.