GitHub - jstapleton/synthetic_reads: Scripts for assembling and analyzing synthetic long sequencing reads.

This repository contains the scripts used to construct synthetic long reads and the data to recreate the plots from Stapleton et al. 2015, "Haplotype-phased synthetic long reads from short-read sequencing."

Each of the samples described in the paper has its own directory, Chicken/ Gelsemium2/ HepG2/ MG1655/ multiplex/ Env/ Gelsemium/ HCT116/ Potato/

... which contains a makefile that will construct synthetic reads from the raw data (which can be downloaded from the Sequence Read Archive). In practice, most of these assemblies are best done on a cluster.

The python scripts called by the makefiles are in Scripts/

Figures.ipynb is an ipython notebook to recreate most of the figures in the paper.

HPCC_Hasher.sub and HPCC_Spades.sub are qsub files for running the hashing and assembly steps on a cluster.

This directory also contains JAStrim.fa and pairedTrim.txt, which are adapter files for trimmomatic.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Chicken		Chicken
Env		Env
Gelsemium		Gelsemium
Gelsemium2		Gelsemium2
HCT116		HCT116
HepG2		HepG2
MG1655		MG1655
Potato		Potato
Scripts		Scripts
multiplex		multiplex
.gitignore		.gitignore
Figures.ipynb		Figures.ipynb
HPCC_Hasher.sub		HPCC_Hasher.sub
HPCC_Spades.sub		HPCC_Spades.sub
JAStrim.fa		JAStrim.fa
LICENSE.md		LICENSE.md
README.md		README.md
pairedTrim.txt		pairedTrim.txt

License

jstapleton/synthetic_reads

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages