Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
download_links		download_links
long_reads_high_depth.fastq.gz		long_reads_high_depth.fastq.gz
long_reads_low_depth.fastq.gz		long_reads_low_depth.fastq.gz
reference.fasta		reference.fasta
short_reads_1.fastq.gz		short_reads_1.fastq.gz
short_reads_2.fastq.gz		short_reads_2.fastq.gz

README.md

Unicycler sample data

I've put together a few small read sets so users can test that Unicycler works.

The synthetic Shigella plasmid reads are the smallest in size and included in the Unicycler repo – try these if you're in a hurry.

The other three are real read sets from small bacterial genomes from the FDA-ARGOS project and are available to download via figshare. The Helicobacter pylori and Streptococcus pyogenes genomes are relatively simple and easy to assemble. The Neisseria gonorrhoeae genome is complex and tougher. I subsampled each Illumina read set down to create smaller files. The PacBio read sets were subsampled based on quality (i.e. they are a high-quality subset of the original reads).

I'd recommend looking at the resulting assembly graphs in Bandage to get an idea of how well the assemblies completed – especially useful for comparing hybrid assemblies made with low-depth vs high-depth long reads.

Shigella sonnei plasmids (synthetic reads)

These are synthetic reads from plasmids A, B and E from the Shigella sonnei 53G genome assembly:

Download reads from the figshare page or via these direct links:

These plasmids are small compared to a bacterial genome, but insertion sequences create many repeats. Only the smallest plasmid assembles completely with short reads alone. Hybrid assemblies with low-depth long reads manage to complete the medium-sized plasmid, and it takes high-depth long reads to complete all three.

Helicobacter pylori

These are real Illumina and PacBio reads from Helicobacter pylori sample FDAARGOS_300:

Download reads from the figshare page or via these direct links:

The Helicobacter pylori genome is small and simple. It has only two copies of the RNA operon and no other large repeats, making it very easy to assemble compared to most bacterial genomes. A hybrid assembly with the high-depth long reads should produce a nice completed chromosome. A hybrid assembly with the low-depth long reads comes very close to completion, with just a couple of slightly ambiguous spots remaining.

Streptococcus pyogenes

These are real Illumina and PacBio reads from Streptococcus pyogenes sample FDAARGOS_190:

Download reads from the figshare page or via these direct links:

The Streptococcus pyogenes genome is particularly small and simple and is relatively easy to assemble with Illumina reads. It does have a few repetitive elements, however, including five copies of the RNA operon and six copies of IS1548. A hybrid assembly with the high-depth long reads should produce a nice completed chromosome. A hybrid assembly with the low-depth long reads will not quite complete, leaving a bit of ambiguity around some of the RNA operons.

Neisseria gonorrhoeae

These are real Illumina and PacBio reads from Neisseria gonorrhoeae sample FDAARGOS_204:

Download reads from the figshare page or via these direct links:

While the Neisseria gonorrhoeae genome is small, it is a difficult one to assemble, with many copies of IS1016, ISNgo2 and other repeats. A hybrid assembly with the high-depth long reads should produce a nice completed chromosome. A hybrid assembly with the low-depth long reads, while still a large improvement over the Illumina-only assembly, fails to resolve in a number of regions. This demonstrates that more complex genomes require higher long-read-depth to achieve complete assemblies.

Assembly commands

Illumina-only assembly:
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -o output_dir

Long-read-only assembly:
unicycler -l long_reads_high_depth.fastq.gz -o output_dir

Hybrid assembly (low-depth long reads):
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -l long_reads_low_depth.fastq.gz -o output_dir

Hybrid assembly (high-depth long reads):
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -l long_reads_high_depth.fastq.gz -o output_dir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample_data

sample_data

README.md

README.md

download_links

download_links

long_reads_high_depth.fastq.gz

long_reads_high_depth.fastq.gz

long_reads_low_depth.fastq.gz

long_reads_low_depth.fastq.gz

reference.fasta

reference.fasta

short_reads_1.fastq.gz

short_reads_1.fastq.gz

short_reads_2.fastq.gz

short_reads_2.fastq.gz

README.md

Unicycler sample data

Shigella sonnei plasmids (synthetic reads)

Helicobacter pylori

Streptococcus pyogenes

Neisseria gonorrhoeae

Assembly commands

Files

sample_data

Directory actions

More options

Directory actions

More options

Latest commit

History

sample_data

Folders and files

parent directory

Unicycler sample data

Shigella sonnei plasmids (synthetic reads)

Helicobacter pylori

Streptococcus pyogenes

Neisseria gonorrhoeae

Assembly commands