By: Dr Umer Zeeshan Ijaz Last Updated: 27/08/2019
Assuming you have organized the samples in separate folder with the folder name specifying the sample names, and within each folder, there is a "Raw" folder containing the paired-end forward and reverse files.
Step 1: All you have to do is to run the following one liners to create the fictious barcodes and save them as sample_metadata.tsv
file. Please make sure you
d="/PATH_TO_FOLDER/";
t=$(ls $d | wc -l);
paste <(ls $d) <(perl -le 'sub p{my $l=pop @_;unless(@_){return map [$_],@$l;}return map { my $ll=$_; map [@$ll,$_],@$l} p(@_);} @a=[A,C,G,T]; print join("", @$_) for p(@a,@a,@a,@a,@a,@a,@a,@a);' | awk -v k=$t 'NR<=k{print}') | awk 'BEGIN{print "sample-id\tbarcode-sequence\n#q2:types\tcategorical"}1' > sample_metadata.tsv
Tip: In the above, the perl one liner is used to generate a unique 8bp barcode for every sample folder. Below, we are generating a 3bp as an example.
perl -le 'sub p{my $l=pop @_; unless(@_){return map [$_],@$l;}return map { my $ll=$_; map [@$ll,$_],@$l} p(@_);} @a=[A,C,G,T]; print join("", @$_) for p(@a,@a,@a)'
AAA
AAC
AAG
AAT
ACA
ACC
ACG
ACT
AGA
AGC
AGG
AGT
ATA
ATC
ATG
ATT
CAA
CAC
CAG
CAT
CCA
CCC
CCG
CCT
CGA
CGC
CGG
CGT
CTA
CTC
CTG
CTT
GAA
GAC
GAG
GAT
GCA
GCC
GCG
GCT
GGA
GGC
GGG
GGT
GTA
GTC
GTG
GTT
TAA
TAC
TAG
TAT
TCA
TCC
TCG
TCT
TGA
TGC
TGG
TGT
TTA
TTC
TTG
TTT
Step 2: Generate barcodes
(for i in $(ls $d); do bc=$(awk -v k=$i '$1==k{print $2}' sample_metadata.tsv); bioawk -cfastx -v k=$bc '{print "@"$1" "$4"\n"k"\n+";for(i=0;i< length(k);i++){printf "#"};printf "\n"}' $d/$i/Raw/*_R1.fastq ; done) > barcodes.fastq
Step 3: Assemble forward files
(for i in $(ls $d); do bioawk -cfastx '{print "@"$1" "$4"\n"$seq"\n+\n"$qual}' $d/$i/Raw/*_R1.fastq ; done) > forward.fastq
Step 4: Assemble reverse files
(for i in $(ls $d); do bioawk -cfastx '{print "@"$1" "$4"\n"$seq"\n+\n"$qual}' $d/$i/Raw/*_R2.fastq ; done) > reverse.fastq
Step 5: Zip all the files and move them to
gzip *.fastq
mkdir emp-paired-end-sequences; mv *.gz emp-paired-end-sequences/.
Step 6: Enable Qiime2 on Orion cluster
export PATH=/home/opt/miniconda2/bin:$PATH
source activate qiime2-2019.7
See if qiime is working properly:
qiime --help
Usage: qiime [OPTIONS] COMMAND [ARGS]...
QIIME 2 command-line interface (q2cli)
--------------------------------------
To get help with QIIME 2, visit https://qiime2.org.
To enable tab completion in Bash, run the following command or add it to
your .bashrc/.bash_profile:
source tab-qiime
To enable tab completion in ZSH, run the following commands or add them to
your .zshrc:
autoload bashcompinit && bashcompinit && source tab-qiime
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
info Display information about current deployment.
tools Tools for working with QIIME 2 files.
dev Utilities for developers and advanced users.
alignment Plugin for generating and manipulating alignments.
composition Plugin for compositional data analysis.
cutadapt Plugin for removing adapter sequences, primers, and
other unwanted sequence from sequence data.
dada2 Plugin for sequence quality control with DADA2.
deblur Plugin for sequence quality control with Deblur.
demux Plugin for demultiplexing & viewing sequence quality.
diversity Plugin for exploring community diversity.
emperor Plugin for ordination plotting with Emperor.
feature-classifier Plugin for taxonomic classification.
feature-table Plugin for working with sample by feature tables.
fragment-insertion Plugin for extending phylogenies.
gneiss Plugin for building compositional models.
longitudinal Plugin for paired sample and time series analyses.
metadata Plugin for working with Metadata.
phylogeny Plugin for generating and manipulating phylogenies.
quality-control Plugin for quality control of feature and sequence data.
quality-filter Plugin for PHRED-based filtering and trimming.
sample-classifier Plugin for machine learning prediction of sample
metadata.
taxa Plugin for working with feature taxonomy annotations.
vsearch Plugin for clustering and dereplicating with vsearch.
Step 7: Import the data according to the recommendations given at https://docs.qiime2.org/2019.7/tutorials/importing/#importing-seqs
qiime tools import \
--type EMPPairedEndSequences \
--input-path emp-paired-end-sequences \
--output-path emp-paired-end-sequences.qza
Step 8: Demultiplex and visualise
qiime demux emp-paired --p-no-golay-error-correction --i-seqs emp-paired-end-sequences.qza --m-barcodes-file sample_metadata.tsv --m-barcodes-column barcode-sequence --o-per-sample-sequences demux.qza --o-error-correction-details demux-details.qza
qiime demux summarize --i-data ./demux.qza --o-visualization ./demux.qzv
qiime tools export --input-path demux.qzv --output-path output
Step 9: Now try Dada2
qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 240 --p-trunc-len-r 200 --p-n-threads 0 --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza --verbose
Step 10: Create phylogenetic tree
qiime phylogeny align-to-tree-mafft-fasttree --i-sequences rep-seqs.qza --o-alignment aligned-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza --p-n-threads 0 --o-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza
Step 11: Assign taxonomy
qiime feature-classifier classify-sklearn --i-classifier /software/qiime2_databases/silva-132-99-nb-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
qiime feature-classifier classify-sklearn --i-classifier /software/qiime2_databases/Fungene_A_amoA_20092019-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
qiime feature-classifier classify-sklearn --i-classifier /software/qiime2_databases/Fungene_B_amoA_20092019-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
qiime feature-classifier classify-sklearn --i-classifier /software/qiime2_databases/Fungene_nirK_20092019-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
qiime feature-classifier classify-sklearn --i-classifier /software/qiime2_databases/Fungene_nirS_20092019-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
qiime feature-classifier classify-sklearn --i-classifier /software/qiime2_databases/Fungene_nrfA_20092019-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
qiime feature-classifier classify-sklearn --i-classifier /software/qiime2_databases/Fungene_nxrB_20092019-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv
Exporting data to use with R:
qiime tools export --input-path table.qza --output-path output
qiime tools export --input-path rep-seqs.qza --output-path output
qiime tools export --input-path rooted-tree.qza --output-path output
qiime tools export --input-path taxonomy.qza --output-path output
Making Biome file compatible with R and phyloseq: Please check joey711/phyloseq#821 on how to make a biome table compatible with phyloseq. Go inside the "output" folder generated in the previous step and write these commands:
biom convert -i feature-table.biom -o feature-table.tsv --to-tsv
sed -i s/Taxon/taxonomy/ taxonomy.tsv | sed -i s/Feature\ ID/FeatureID/ taxonomy.tsv
biom add-metadata \
-i feature-table.tsv \
-o feature_w_tax.biom \
--observation-metadata-fp taxonomy.tsv \
--observation-header FeatureID,taxonomy,Confidence \
--sc-separated taxonomy --float-fields Confidence
Constructing a classifier for a new reference database: Follow the tutorial given at https://docs.qiime2.org/2019.7/tutorials/feature-classifier/
qiime tools import --type 'FeatureData[Sequence]' --input-path fungene_nirK_28_08_2019.fasta --output-path fungene_nirK_28_08_2019.qza
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path fungene_nirK_28_08_2019.tax --output-path fungene_nirK_28_08_2019-taxonomy.qza
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads fungene_nirK_28_08_2019.qza --i-reference-taxonomy fungene_nirK_28_08_2019-taxonomy.qza --o-classifier fungene_nirK_28_08_2019-classifier.qza