Skip to content

LarracuenteLab/Khost_Eickbush_Larracuente2017

Repository files navigation

#Khost, Eickbush and Larracuente. Single molecule long read sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster Pipelines, scripts and data files

#Links to raw reads Raw PacBio reads from: Kim K, Peluso P, Babayan P, Yeadon PJ, Yu C, Fisher WW, et al. Long-read, whole-genome shotgun sequence data for five model organisms. Scientific data. 2014;1(140045). Epub 11/25/2014. doi: doi:10.1038/sdata.2014.45. Downloaded accession SRX499318 data from:

https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro1_24NOV2013_398.tgz https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro2_25NOV2013_399.tgz https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro3_26NOV2013_400.tgz https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro4_28NOV2013_401.tgz https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro5_29NOV2013_402.tgz https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro6_1DEC2013_403.tgz

PBcR-BLASR (Celera 8.1) Error corrected reads: http://bergmanlab.ls.manchester.ac.uk/?p=2151

Raw Illumina reads: ENA Accession ERX645969

Miller, D.E., C.B. Smith, R.S. Hawley and C.M. Bergman (2013) PacBio Whole Genome Shotgun Sequences for the D. melanogaster Reference Strain. http://bergmanlab.ls.manchester.ac.uk/?p=1971

#Links to assemblies Falcon assembly: https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/reads/dmel_FALCON_diploid_assembly.tgz

Celera 8.1 PBcR haploid Blasr quivered assembly https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/reads/dmel_haploid_assembly.tgz

Celera 8.2 MHAP assembly from: Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015;33(6):623-30. doi: 10.1038/nbt.3238. PubMed PMID: 26006009.

Accession GCA_000778455

#Description of supplementary files File S1: Sample Celera 8.3 MHAP specification file, with default small/haploid genome parameters. Parameters altered in this study were merSize, -k, --num-hashes, and assembleCoverage.

File S2: Sample Celera 8.3 MHAP specification file, with large/diploid genome parameters. Parameters altered in this study were merSize, -k, --num-hashes, and assembleCoverage.

File S3: Sample specification file used to run Canu assembler with 4% error rate

File S4: Sample specification file used to run FALCON

File S5: Spec file used to construct BLASR-corr Cel8.3 assembly

File S6: SLURM script used to construct BLASR-corr Cel8.3 assembly. Note: this script did not appear to properly allocate resources on our cluster, resulting in a long (~17 days) assembly time. Configured properly, assembly should be much faster.

File S7: SLURM file used to run Canu 1.2 using BLASR corrected reads (Canu-corr assembly). This uses the default Canu settings but skips read correction.

File S8: SLURM job handler file used to to run Celera 8.3 assembler

File S9: Custom Repbase repeat library used to annotate assemblies.

File S10: Perl script used to annotate assembly from BLAST output

File S11: GFF annotation file for the major Rsp locus in the PBcR-BLASR assembly, constructed using custom scripts.

File S12: GFF annotation file for the 1.688 loci in the PBcR-BLASR assembly, constructed using custom scripts.

File S13: GFF annotation file for the minor Rsp locus in the PBcR-BLASR assembly, constructed using custom scripts.

About

Pipelines and data files for satellite DNA assembly and analysis with SMRT sequence data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published