Skip to content
dariader edited this page Oct 5, 2018 · 1 revision

What causes antibiotic resistance

Time: 13.10, 1.10.2018

  • Unzip reference data
  • Unzip Illumina data
    • Format coltrol: head -20 [filename.format]
      • Each read has 4 lines of information, and then the next read starts on the following line.

      • The first line starts with the @ symbol, and contains identifiers and information about this read.

      • The next line contains the actual sequence, then on line three there is a ‘+’, which may sometimes have the identifier and info repeated.

      • Line 4 contains the quality string, where ASCII characters encode the quality score for each base.

        • The quality score ranges from 0 to about 40; the higher the number, the greater the accuracy of the base call.
  • Quality control
  • we used word count with the lines flag to see how many lines there are in each fastq file. wc -l [filename.fastq] * calculated the number of reads in each file * 1823504 amp_res_1.fastq * 1823504 amp_res_2.fastq
  • Inspecting raw sequencing data with fastqc. Filtering the reads.

apt-get install fastqc fastqc -h fastqc -o amp_res_1.fastq amp_res_2.fastq

  • The basic statistics match what we calculated for the number of reads manually.

  • Run Trimmomatic

    • Run Trimmomatic in paired end mode, with following parameters:
      • select the correct phred scale - you can figure it out from the first part of the FastQC reports “Basic statistics” - “Encoding”. Illumina1.9

S - Sanger Phred+33, raw reads typically (0, 40) X - Solexa Solexa+64, raw reads typically (-5, 40) I - Illumina 1.3+ Phred+64, raw reads typically (0, 40) J - Illumina 1.5+ Phred+64, raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) (Note: See discussion above). L - Illumina 1.8+ Phred+33, raw reads typically (0, 41)

  • Cut bases off the start of a read if quality below 20
  • Cut bases off the end of a read if quality below 20
  • Trim reads using a sliding window approach, with window size 10 and average quality within the window 20.
  • Drop the read if it is below length 20.

-threads How many processors do you want Trimmomatic to run with? SE or PE Single End or Paired End reads? -phred33 or -phred64 Which quality score do your reads have? SLIDINGWINDOW Perform sliding window trimming, cutting once the average quality within the window falls below a threshold. LEADING Cut bases off the start of a read, if below a threshold quality. TRAILING Cut bases off the end of a read, if below a threshold quality. CROP Cut the read to a specified length. HEADCROP Cut the specified number of bases from the start of the read. MINLEN Drop an entire read if it is below a specified length. TOPHRED33 Convert quality scores to Phred-33. TOPHRED64 Convert quality scores to Phred-64.

TrimmomaticPE [-threads ] [-phred33] [-trimlog ] [-basein | ] [-baseout | ] ...

java -jar trimmomatic-0.32+dfsg-1.jar PE -phred33 TrimmomaticPE -phred33 amp_res_1.fastq amp_res_2.fastq _1P.fq _1U.fq _2P.fq _2U.fq LEADING:20 TRAILING:20 SLIDINGWINDOW:10:20 MINLEN:20

TrimmomaticPE: Started with arguments: amp_res_1.fastq amp_res_2.fastq _1P.fq _1U.fq _2P.fq _2U.fq LEADING:20 TRAILING:20 SLIDINGWINDOW:10:20 MINLEN:20 Multiple cores found: Using 4 threads Quality encoding detected as phred33 Input Read Pairs: 455876 Both Surviving: 446259 (97,89%) Forward Only Surviving: 9216 (2,02%) Reverse Only Surviving: 273 (0,06%) Dropped: 128 (0,03%) TrimmomaticPE: Completed successfully

java -jar trimmomatic-0.38.jar PE -phred33 amp_res_1.fastq amp_res_2.fastq _1P.fq _1U.fq _2P.fq _2U.fq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:10:20 MINLEN:20

  • ☐ Input Read Pairs: 455876 Both Surviving: 446259 (97,89%) Forward Only Surviving: 9216 (2,02%) Reverse Only Surviving: 273 (0,06%) Dropped: 128 (0,03%)

  • how many paired reads were kept?

    • 455876 (four times less than lines)
  • we cheched it manually

wc -l

					* 1P 1785036 lines
					* 2P 1785036 lines
  • fastqc analysis on the _1P.fq and _2P.fq files.
    • What happens if we increase the quality score at all steps to 30 at trimmomatic?
    • TrimmomaticPE

amp_res_2.fastq _1P.fq _1U.fq _2P.fq _2U.fq LEADING:30 TRAILING:30 SLIDINGWINDOW:10:30 MINLEN:20

  • Multiple cores found: Using 4 threads

    Quality encoding detected as phred33 Input Read Pairs: 446259 Both Surviving: 354123 (79,35%) Forward Only Surviving: 35710 (8,00%) Reverse Only Surviving: 50171 (11,24%) Dropped: 6255 (1,40%) TrimmomaticPE: Completed successfully

1P ./pasted_image013.png?width=500./pasted_image014.png?width=500 ./pasted_image015.png?width=500./pasted_image016.png?width=500 ./pasted_image017.png?width=500 ./pasted_image018.png?width=500

./pasted_image019.png?width=500./pasted_image020.png?width=500 ./pasted_image021.png?width=500./pasted_image022.png?width=500

2P ./pasted_image023.png?width=500./pasted_image024.png?width=500 ./pasted_image025.png?width=500./pasted_image026.png?width=500 ./pasted_image027.png?width=500./pasted_image028.png?width=500 ./pasted_image029.png?width=500./pasted_image030.png?width=500 ./pasted_image031.png?width=500./pasted_image032.png?width=500

  • Alingnment to reference

sudo apt-get install bwa

  • index the reference file

bwa index -a is GCA_000005845.2_ASM584v2_genomic.fna

* we observe a bunch of new files in directotry
  • Align your reads

    • We aligned trimmed, paired sequences
      • bwa mem GCA_000005845.2_ASM584v2_genomic.fna _1P.fq _2P.fq > alignment.sam
    • compressed and sorted the sam file samtools view -S -b alignment.sam > alignment.bam
  • samtools flagstat alignment.bam

892776 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 891655 + 0 mapped (99.87%:-nan%) 892776 + 0 paired in sequencing 446402 + 0 read1 446374 + 0 read2 888879 + 0 properly paired (99.56%:-nan%) 890679 + 0 with itself and mate mapped 976 + 0 singletons (0.11%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)l

  • ☐ If we use quality treshold at Trimmomatic stage equals 30

71438 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 69996 + 0 mapped (97.98%:-nan%) 71438 + 0 paired in sequencing 35719 + 0 read1 35719 + 0 read2 856 + 0 properly paired (1.20%:-nan%) 68564 + 0 with itself and mate mapped 1432 + 0 singletons (2.00%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

  • Sort bam file by sequence coordinate on reference:

samtools sort alignment.bam alignment_sorte‏d

  • Indexing

samtools index alignment_sorted.bam

	* Viewing the BAM file in IGV browser
  • Variant calling

    • We created mpileup file

      samtools mpileup -f GCA_000005845.2_ASM584v2_genomic.fna alignment_sorted.bam > my.mpileup

  • Scanned possible variants

      java -jar VarScan.v2.3.9.jar mpileup2snp -h
    

Only SNPs will be reported Warning: No p-value threshold provided, so p-values will not be calculated Min coverage: 8 Min reads2: 2 Min var freq: 0.2 Min avg qual: 15 P-value thresh: 0.01 USAGE: java -jar VarScan.jar mpileup2cns [pileup file] OPTIONS mpileup file - The SAMtools mpileup file

OPTIONS: --min-coverage Minimum read depth at a position to make a call [8] --min-reads2 Minimum supporting reads at a position to call variants [2] --min-avg-qual Minimum base quality at a position to count a read [15] --min-var-freq Minimum variant allele frequency threshold [0.01] --min-freq-for-hom Minimum frequency to call homozygote [0.75] --p-value Default p-value threshold for calling variants [99e-02] --strand-filter Ignore variants with >90% support on one strand [1] --output-vcf If set to 1, outputs in VCF format --vcf-sample-list For VCF output, a list of sample names in order, one per line --variants Report only variant (SNP/indel) positions [0]

* ☐ If we use at Trimmomatic stage quality threshold equals 30, we obtain similar results
  • java -jar VarScan.v2.3.9.jar mpileup2snp my.mpileup OPTIONS --min-var-freq 0.1 --variants --output-vcf 1 > VarScan_results.vcf
    • ☒ --min-var-freq 0.1 --- too low value of the parameter

java -jar VarScan.v2.3.9.jar mpileup2snp my.mpileup OPTIONS --min-var-freq 0.1 --variants --output-vcf 1 > VarScan_results.vcf Only SNPs will be reported Warning: No p-value threshold provided, so p-values will not be calculated Min coverage: 8 Min reads2: 2 Min var freq: 0.1 Min avg qual: 15 P-value thresh: 0.01 Reading input from my.mpileup 4641448 bases in pileup file 10 variant positions (7 SNP, 3 indel) 0 were failed by the strand-filter 7 variant positions reported (7 SNP, 0 indel)

  • ☒ --min-var-freq 0.01 --- too low value of the parameter

java -jar VarScan.v2.3.9.jar mpileup2snp my.mpileup OPTIONS --min-var-freq 0.01 --variants --output-vcf 1 > VarScan_results_0.01trsh.vcf Only SNPs will be reported Warning: No p-value threshold provided, so p-values will not be calculated Min coverage: 8 Min reads2: 2 Min var freq: 0.01 Min avg qual: 15 P-value thresh: 0.01 Reading input from my.mpileup 4641448 bases in pileup file 10 variant positions (7 SNP, 3 indel) 0 were failed by the strand-filter 7 variant positions reported (7 SNP, 0 indel)

  • --min-var-freq 0.5 --- optimal value of the parameter

java -jar VarScan.v2.3.9.jar mpileup2snp my.mpileup OPTIONS --min-var-freq 0.5 --variants --output-vcf 1 > VarScan_results_0.5trsh.vcf Only SNPs will be reported Warning: No p-value threshold provided, so p-values will not be calculated Min coverage: 8 Min reads2: 2 Min var freq: 0.5 Min avg qual: 15 P-value thresh: 0.01 Reading input from my.mpileup 4641448 bases in pileup file 9 variant positions (6 SNP, 3 indel) 0 were failed by the strand-filter 6 variant positions reported (6 SNP, 0 indel)

						* --min-var-freq 0.75  --- best value of the parameter according to 

java -jar VarScan.v2.3.9.jar mpileup2snp my.mpileup OPTIONS --strand-filter 1 --min-var-freq 0.75 --variants --output-vcf 1 > VarScan_results_075tsh_lt.vcf Only SNPs will be reported Warning: No p-value threshold provided, so p-values will not be calculated Min coverage: 8 Min reads2: 2 Min var freq: 0.75 Min avg qual: 15 P-value thresh: 0.01 Reading input from my.mpileup 4641448 bases in pileup file 9 variant positions (6 SNP, 3 indel) 0 were failed by the strand-filter 6 variant positions reported (6 SNP, 0 indel)

  • SNP analysis in IGV browser
Clone this wiki locally