Benchmarking pipeline for BLUEPRINT data

Prerequisites:

-virtualenv

-github account

-reference genome gtf and fasta files. The Mus musculus Ensembl release 89 genome was used in this study with ERCC sequences appended (see https://tools.thermofisher.com/content/sfs/manuals/cms_095048.txt).

-java version 1.8

-R version 3.4.4

To run the pipeline:

Execute ./wrapper.sh path/to/java path/to/ref/fasta path/to/ref/gtf

In practice it is unlikely that your machine will have the resources to run the entire pipeline in one go, so you will probably need to split up the wrapper script and run it in bits.

The pipeline automatically downloads the required data. In addition, a list of SRR accession codes can be found in SRR_Acc_List.txt.

As part of the pipeline, quality control steps are automatically carried out. For reference, these are the statistics used to filter the raw data:

Statistic	Name of statistic in table	Threshold
No. uniquely mapping reads	Unique	>8000000
No. of non-uniquely mapping reads	NonUnique	>350000
No. alignments	NumAlign	>8200000
No. of reads	NumReads	>4000000

These are the statistics used to filter the Polyester simulated data:

Statistic	Name of statistic in table	Threshold
No. of non-uniquely mapping reads	NonUnique	>250,000

In addition, the scater package was used to filter cells in which more than 10% of reads mapped to mitochondrial genes in both the raw and simulated data.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
figures/scripts		figures/scripts
raw_results		raw_results
.Rhistory		.Rhistory
Bs_zeros_removed_25_09_param		Bs_zeros_removed_25_09_param
LICENSE		LICENSE
README.md		README.md
RSEM_ref.sh		RSEM_ref.sh
SRR_Acc_List.txt		SRR_Acc_List.txt
benchmark.sh		benchmark.sh
benchmark_real.sh		benchmark_real.sh
cell_level_analysis.sh		cell_level_analysis.sh
clean_data.sh		clean_data.sh
control_polyester_script.sh		control_polyester_script.sh
convert_fasta_to_fastq.sh		convert_fasta_to_fastq.sh
first_half_polyester.sh		first_half_polyester.sh
format_counts.R		format_counts.R
generate.py		generate.py
make_indexes.sh		make_indexes.sh
make_matrix.sh		make_matrix.sh
make_polyester.R		make_polyester.R
make_splatter.R		make_splatter.R
plate1.txt		plate1.txt
plate2.txt		plate2.txt
quality_control.sh		quality_control.sh
quantify.sh		quantify.sh
quantify_real_data.sh		quantify_real_data.sh
rRNA_contamination.sh		rRNA_contamination.sh
rsem-generate-data-matrix		rsem-generate-data-matrix
rsem-generate-data-matrix10		rsem-generate-data-matrix10
rsem-generate-data-matrix13		rsem-generate-data-matrix13
rsem-generate-data-matrix2		rsem-generate-data-matrix2
rsem-generate-data-matrix3		rsem-generate-data-matrix3
rsem-generate-data-matrix4		rsem-generate-data-matrix4
rsem-generate-data-matrix5		rsem-generate-data-matrix5
rsem-generate-data-matrix6		rsem-generate-data-matrix6
second_half_polyester.sh		second_half_polyester.sh
setup.sh		setup.sh
simulate.sh		simulate.sh
trim.py		trim.py
urls.txt		urls.txt
wrapper.sh		wrapper.sh

License

jenni-westoby/BLUEPRINT

Folders and files

Latest commit

History

Repository files navigation

Benchmarking pipeline for BLUEPRINT data

About

Topics

Resources

License

Stars

Watchers

Forks

Languages