GitHub - tallulandrews/scRNASeqPipeline

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
ERCC_Controls		ERCC_Controls
software		software
0.1_kallisto_extract_transcripts.sh		0.1_kallisto_extract_transcripts.sh
0.2_bowtie_build_genome.sh		0.2_bowtie_build_genome.sh
0.3_Salmon_build_index.sh		0.3_Salmon_build_index.sh
00_Add_to_Reference.readme		00_Add_to_Reference.readme
00_Generate_FastQs.readme		00_Generate_FastQs.readme
00_Kallisto_For_SmartSeq.readme		00_Kallisto_For_SmartSeq.readme
00_LIST_OF_BSUB_COMMANDS.sh		00_LIST_OF_BSUB_COMMANDS.sh
00_Steps		00_Steps
0_Anno_Extract_Transcriptome.pl		0_Anno_Extract_Transcriptome.pl
0_BAM2FastQ.sh		0_BAM2FastQ.sh
0_BuildGenome.sh		0_BuildGenome.sh
0_CRAM2BAM.sh		0_CRAM2BAM.sh
0_Check_Barcodes.pl		0_Check_Barcodes.pl
0_Convert_CRAM_to_BAM.sh		0_Convert_CRAM_to_BAM.sh
0_Determine_Barcodes.pl		0_Determine_Barcodes.pl
0_Download_Files_from_Dropbox.pl		0_Download_Files_from_Dropbox.pl
0_Extract_Metadata_from_Bam.sh		0_Extract_Metadata_from_Bam.sh
0_FASTQC.sh		0_FASTQC.sh
0_FASTQC_Streaming.sh		0_FASTQC_Streaming.sh
0_FASTQC_limits.txt		0_FASTQC_limits.txt
0_Flexible_Convert_Dir_CRAM_to_BAM.sh		0_Flexible_Convert_Dir_CRAM_to_BAM.sh
0_GBK2FASTA.pl		0_GBK2FASTA.pl
0_Gather_Summary_Statistics.pl		0_Gather_Summary_Statistics.pl
0_Get_Data_from_iRODS.sh		0_Get_Data_from_iRODS.sh
0_Make_ERCC_fasta_and_gtf.pl		0_Make_ERCC_fasta_and_gtf.pl
0_Merge_FASTQs.sh		0_Merge_FASTQs.sh
0_My_Extract_Transcriptome.pl		0_My_Extract_Transcriptome.pl
0_Process_GBK.pl		0_Process_GBK.pl
0_custom_undo_demultiplexing.pl		0_custom_undo_demultiplexing.pl
0_make_transcriptome.sh		0_make_transcriptome.sh
1.5_DO_Trim_Reads.sh		1.5_DO_Trim_Reads.sh
1.5_Trim_Reads_Paired.sh		1.5_Trim_Reads_Paired.sh
1.5_Trim_UMI.pl		1.5_Trim_UMI.pl
1.6_Summarizing_Trimming.pl		1.6_Summarizing_Trimming.pl
1_BreakDown_Files_wrapper.sh		1_BreakDown_Files_wrapper.sh
1_BreakDown_PairedEnds.pl		1_BreakDown_PairedEnds.pl
1_BreakDown_PairedEnds_Custom_Wafergen.pl		1_BreakDown_PairedEnds_Custom_Wafergen.pl
1_Breakdown_UMI_read_pairs.pl		1_Breakdown_UMI_read_pairs.pl
1_DO_BreakDown_Files.sh		1_DO_BreakDown_Files.sh
1_Flexible_FullTranscript_Demultiplexing.pl		1_Flexible_FullTranscript_Demultiplexing.pl
1_Flexible_UMI_Demultiplexing.pl		1_Flexible_UMI_Demultiplexing.pl
2-5.1_DO_kallisto_quant.sh		2-5.1_DO_kallisto_quant.sh
2-5.1_kallisto_quant.sh		2-5.1_kallisto_quant.sh
2-5.2_DO_Salmon_quant.sh		2-5.2_DO_Salmon_quant.sh
2-5.2_Salmon_quant.sh		2-5.2_Salmon_quant.sh
2-5_DO_RSEM.sh		2-5_DO_RSEM.sh
2-5_STAR-RSEM.sh		2-5_STAR-RSEM.sh
2-5_bowtie2-RSEM.sh		2-5_bowtie2-RSEM.sh
2.2_DO_MapReads_Tophat.sh		2.2_DO_MapReads_Tophat.sh
2.2_MapReads_Tophat.sh		2.2_MapReads_Tophat.sh
2_DO_MapReadsFile.sh		2_DO_MapReadsFile.sh
2_DO_MapReadsFile_singleend.sh		2_DO_MapReadsFile_singleend.sh
2_MapReadsFile.sh		2_MapReadsFile.sh
2_MapReadsFile_Transcriptome.sh		2_MapReadsFile_Transcriptome.sh
2_MapReadsFile_singleend.sh		2_MapReadsFile_singleend.sh
2_STAR_Parameters.txt		2_STAR_Parameters.txt
3_CLEANUP_MapReadFiles.sh		3_CLEANUP_MapReadFiles.sh
3_Compile_Mapping_Statistics.pl		3_Compile_Mapping_Statistics.pl
3_Compile_UMI_Statistics.pl		3_Compile_UMI_Statistics.pl
3_DO_UmiDedup.sh		3_DO_UmiDedup.sh
3_SAMtools_sort_wrapper.sh		3_SAMtools_sort_wrapper.sh
3_SortBAMs.pl		3_SortBAMs.pl
3_UmiDedup.sh		3_UmiDedup.sh
3_merge_dedup_MappedReads.sh		3_merge_dedup_MappedReads.sh
4_Convert_GTF2BED_customized_for_Ensembl.pl		4_Convert_GTF2BED_customized_for_Ensembl.pl
4_DO_RSeQC_Multiple.sh		4_DO_RSeQC_Multiple.sh
4_MergeBAMs.pl		4_MergeBAMs.pl
4_Process_RSEQC_output.pl		4_Process_RSEQC_output.pl
4_RSeQC_Multiple.sh		4_RSeQC_Multiple.sh
5.0_Summarize_Known_Transcriptome.pl		5.0_Summarize_Known_Transcriptome.pl
5_Cufflinks_wrapper.sh		5_Cufflinks_wrapper.sh
5_Cuffmerge_wrapper.sh		5_Cuffmerge_wrapper.sh
5_DO_Cufflinks.sh		5_DO_Cufflinks.sh
5_DO_Cufflinks_denovo_Transcripts.sh		5_DO_Cufflinks_denovo_Transcripts.sh
5_DO_Cuffmerge.sh		5_DO_Cuffmerge.sh
5_DO_Quantification_X2.sh		5_DO_Quantification_X2.sh
5_DO_featureCounts.sh		5_DO_featureCounts.sh
5_DO_featureCounts_locally.sh		5_DO_featureCounts_locally.sh
5_Fix_Transcriptome_for_featureCounts.pl		5_Fix_Transcriptome_for_featureCounts.pl
5_RSEM.sh		5_RSEM.sh
5_RSEM_build_refrence.sh		5_RSEM_build_refrence.sh
5_Summarize_Filter_Merged_Transcriptome.pl		5_Summarize_Filter_Merged_Transcriptome.pl
5_TidyCufflinks.pl		5_TidyCufflinks.pl
5_featureCounts_wrapper.sh		5_featureCounts_wrapper.sh
6.1_Get_Expression_Kallisto.pl		6.1_Get_Expression_Kallisto.pl
6_Get_Construct_Expression_Cufflinks.pl		6_Get_Construct_Expression_Cufflinks.pl
6_Get_Cufflinks_Gene_Level_Expression.pl		6_Get_Cufflinks_Gene_Level_Expression.pl
6_Get_Expression_featureCounts.pl		6_Get_Expression_featureCounts.pl
6_Get_Kallisto.pl		6_Get_Kallisto.pl
6_Get_RSEM_Expression.pl		6_Get_RSEM_Expression.pl
6_Get_Salmon_Expression.pl		6_Get_Salmon_Expression.pl
99_Check_Barcodes.pl		99_Check_Barcodes.pl
99_Check_RSEM_Output.pl		99_Check_RSEM_Output.pl
99_Check_Results.pl		99_Check_Results.pl
99_NotesForImprovement		99_NotesForImprovement
99_get_order_chr_in_SAM.pl		99_get_order_chr_in_SAM.pl
Extract_PlateID_and_WellID_from_headers.pl		Extract_PlateID_and_WellID_from_headers.pl
Kallisto_Build_Index.sh		Kallisto_Build_Index.sh

Repository files navigation

This is a collection of scripts I use (or have used in the past) to process scRNASeq data. They are free to use by anyone else for any purpose, but come with no assurances or guarantees of correctness or functionality. The general workflow is as follows:

0 : Create the appropriate genome for the dataset, and obtain the read files & initial QC
	- Building mapping indexes generally requires ~30Gb of memory for a mouse-sized genome
1 : Split the files by well (cell), Trim reads as appropriate based on QC
2 : Map the reads to the genome
3 : Clean up mapping output & remove duplicates
4 : Mapping QC
5 : Quantify expression
6 : Assemble expression matrix

Finished Pipelines:
00_Kallisto_For_SmartSeq.readme = Smartseq2 + Kallisto (no UMIs) 


Brief Descriptions of Useful files:
0_Extract_barcodes_from_BAM.sh : open the first line of each BAM file and find the barcode (tagged with BC:) - for matching up metadata.