Methods used for quality control, clustering, and analysis of tag-sequencing (18S rRNA gene) survey from Guaymas Basin benthic samples.
Alexis Pasulka, Sarah K. Hu, Lisa Mesrop, Craig Cary, Kathryn Coyne, Karla Heidelberg, Peter Countway, & David A. Caron. SSU-rRNA sequencing survey of sediment-hosted microbial eukaryotes from Guaymas Basin hydrothermal vent. In prep.
Benthic samples were collected from Guaymas Basin, MX, from a hydrothermally active Raw sequences are from NCBI (SRA) under project ID SRP110312 or accession numbers SAMN07274333 - SAMN07274355.
Script for quality filtering sequences and generating an OTU table: 'SeqQC_OTUclustering_Pasulka_et_al.pl'
Import OTU table into R for all downstream analyses.
Initial OTU QC
- Import OTU table
- Calculate total number of sequences per OTU, per sample
- Import 'NameSchematic.txt' to re-name sample names so they are more informative. Join with data.
- Remove global singletons (OTUs that are found in only 1 sample with 1 sequence)
- Plot supplemental figures showing total number of sequenes and the distribution of OTUs in each sample.
- Run through 'pr2_rename_taxa' function to manually curate taxonomic group names (for summing sequences)
- Remove unwanted samples due to low sequence number or high abundance of metazoa.
Whole community plots
- Summarize the number of sequences in each sample by the manually designated "Taxa" column name
- Pool replicate samples
- Plot (ggplot2) relative abundance of each taxonomic group in the community
- Plot OTU richness
Composition of ciliates
- Aggregate data to major taxonomic group and Level 4 (approximately Class level).
- Plot relative abundance of ciliate reads at the class level.
Bubble plots - 3 groups
- Calculate relative abundance
- Subset to three taxonomic groups of interest: Rhizaria, Ciliates, and Apicomplexa
- Generate bubble plot
Presence-absence UpsetR
- Aggregate the count of OTUs by habitat/sample type
- Change to binary
- Repeat with only ciliate data
- Plot using UpSetR
OTU richness - Ciliates only
- Subset ciliate reads from main data frame
- If value does not equal 1, change to 1 (change to binary)
- Aggregate by total number of OTUs in each ciliate class
- Generate plot bubbles
- Import distribution of ciliate OTUs (what samples were each OTUs found at)
- Generate shaded grey area for each bubble plot
Beta diversity metrics - MDS and ANOSIMS
- Transpose data and convert to numeric
- Calculate relative abundance
- Transform data for test for best fit: including 4th root, square root, and presence absence.
- Calculate NMDS for each transformed data set
- look at stress value to identify which transformation results in least stress
- Import "meta_Vent.csv" and merge with data
- This analysis uses 4th root transformed data for MDS plots.
NMDS Figure
- Input points calculated from above section
- Factor appropriate colors and shapes
- Use ggplot2 to plot NMDS figure
ANOSIM, SIMPER, & Alpha diversity ANOSIM
- Import 4th root transformed data
- Test various factors to run ANOSIM analyses - habitat, sediment depth, mat color
- Save output results at .txt files.
- Repeat, but remove control samples to run ANOSIM SIMPER
- Import 4th root transformed data
- Run simper, again with various factors: habitat, mat color, & sediment horizon/depth
- Save output as text file Alpha diversity
- Import R objects from before, need to use subsampled data.
- Randomly sub-sample
- Use 'diversity()' function on subsampled data to calculate Shannon and Inverse Simpson diversity metrics
- Plot box plots to show distribution by sample type Rarefaction curve
- Use subsampled data
- Factor sample types with desired colors to plot the rarefaction curve.
- Use 'rarecurve()' function to generate rarefaction curves
Alexis Pasulka & Sarah Hu - last updated October 2018