Skip to content

RNA-Seq pipelines that use HISAT2, Kallisto, Salmon, DESeq and Sleuth.

Notifications You must be signed in to change notification settings

ahishsujay/rna_seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

rna_seq

Overview:

RNA-Seq pipelines that uses HISAT2 and Kallisto for alignment (pseudo-alignment and abundance calculations in case of the latter).

Download Data:

The dataset downloaded was GSE120534. Once the accession list is downloaded, download the SRA toolkit (https://github.com/ncbi/sra-tools) and run the following:

prefetch --option-file SRR_Acc_List.txt

Once the files are download, retrieve the FASTQ files the following way:

for file in *; do fastq-dump --split-files "$file"; done

Requirements:

  1. Kallisto. Alternatively, conda install kallisto if conda is installed.
  2. HISAT2.
  3. Reference index:
  • Transcriptome index for Homo sapiens if Kallisto is used. Alternatively, index file can also be built using kallisto index.
  • Genome Index for Homo sapiens if HISAT2 is used. Alternatively, index file can also be built using hisat2-build.

Arguments:

-a | --aligner-to-use: Specify 1 if you want to use Kallisto or 2 if you want to use HISAT2. DEFAULT: Kallisto
-i | --input-files-directory: Enter the path of the directory containing FASTQ files.
-r | --reference-index: Enter the path of the reference index. 1. If Kallisto is selected, then enter the path of indexed reference transcriptome (Ends with .idx); 2. If HISAT2 is selected, then enter the path of indexed reference genome with the prefix
-o | --output-directory: Enter name of directory which will contain output of respective aligner.

Script execution:

1. Alignment / Pseudoalignment and quantify:

1. Kallisto:

  • Run ./aligner_wrapper.py -a 1 -i <FASTQ files directory> -r <reference index directory> -o <aligner output directory> to use Kallisto for pseudo-alginment and generate the abundance files.
    Alternatively, the following bash script can be run in the directory where the input files are present:
for f in `ls *.fastq | sed 's/_[12].fastq//g' | sort -u`
do
kallisto quant -i ../reference/homo_sapiens/transcriptome.idx -o kallisto_output/${f} ${f}_1.fastq ${f}_2.fastq
done

2. HISAT2:

  • Run ./aligner_wrapper.py -a 2 -i <FASTQ files directory> -r <reference index directory>genome -o <aligner output directory> to use HISAT2 for alignment.
    Alternatively, the following bash script can be run in the directory where the input files are present:
for f in `ls *.fastq | sed 's/_[12].fastq//g' | sort -u`
do
hisat2 -x ../reference/grch38/genome -1 ${f}_1.fastq -2 ${f}_2.fastq -S ${f}.sam
done

2. Differential Expression (DE):

1. DESeq2:

  • Run ./differential_expression.R -de DESeq -o <kallisto_output_directory> -m <meta_data_file> -p <pvalue_cutoff> to import transcript abundance files from Kallisto and perform DE analysis with DESeq2.

2. Sleuth:

  • Run ./differential_expression.R -de Sleuth -o <kallisto_output_directory> -m <meta_data_file> -p <pvalue_cutoff> to use Sleuth after quantifying with Kallisto.

About

RNA-Seq pipelines that use HISAT2, Kallisto, Salmon, DESeq and Sleuth.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published