Canu assembly notes

Installation

First, I cloned and installed canu.

$ git clone git@github.com:marbl/canu.git
Cloning into 'canu'...
...
$ cd canu/src/
$ make -j 2
...

See here for install logs.

I also added alias canu="/Users/ksb/computation/science/canu/Darwin-amd64/bin/canu" to my ~/.bash_profile to make it easier to run the script.

Data

I downloaded the following files from google drive and moved them to a raw_reads/ subfolder.

Copy of JB418_ACTGAT.R1.fastq.gz (renamed to r1.fastq.gz)
Copy of JB418_ACTGAT.R2.fastq.gz (renamed to r2.fastq.gz)
filtered_subreads_pacbiojb418.fastq (renamed to pb.fastq)
- I compressed this file with gzip, creating pb.fastq.gz

$ mv raw_reads/Copy\ of\ JB418_ACTGAT.R1.fastq.gz raw_reads/r1.fastq.gz
$ mv raw_reads/Copy\ of\ JB418_ACTGAT.R2.fastq.gz raw_reads/r2.fastq.gz
$ mv raw_reads/filtered_subreads_pacbiojb418.fastq raw_reads/pb.fastq
$ gzip raw_reads/pb.fastq

Run Canu

I didn't have java installed, and java and gnuplot are required to make plots. I installed using homebrew.

$ brew cask install java
...
$ brew install gnuplot
...

I made a subfolder runs/, moved there, and ran the assembler based on this command in the "quick start" section of the canu docs. I used 4.7 for the genomeSize parameter based on the median genome length for Hafnia alvei in genbank.

$ mkdir runs
$ cd runs/
$ canu -p jb418 -d jb418-auto genomeSize=4.7m -pacbio-raw ../raw_reads/pb.fastq.gz

This step took about 2.75 hours on my ~3 year old MacBook Pro. For complete run log, see here.

The assembly ends up here.

Polishing

We can use the illumina paired reads for polishing/error correction. I used Pilon, which in turn requires Bam alignment files which can be generated by Bowtie. I used Bowtie2. More precisely, bowtie makes Sam files, which can be converted to Bam using SamTools.

All of these can also be installed with homebrew from the homebrew/science cask.

$ brew tap homebrew/science
...
$ brew install pilon
...
$ brew install bowtie2
...
$ brew install samtools

Bowtie2 maps illumina reads onto the draft assembly produced by Canu, generating a Sam (sequence alignment map) file. Using samtools, we can convert this to a bam (binary alignment map) file that pilon can use, and then pilon uses that map file to polish the assembly - correcting errors and filling in gaps (if possible).

Running Bowtie2

See here for bowtie2 example.

First, we need a bowtie2 index. I backed out to the root of the project, and then:

$ mkdir polishing
$ cd polishing/
$ mkdir jb418
$ cd jb418
$ bowtie2-build ../../runs/jb418-auto/jb418.contigs.fasta jb418

This generates 6 .bt2 files in the jb418 directory. Next, we want to generate the alignments. Bowtie doesn't work with compressed fastq files so first we've got to unzip them.

$ gunzip ../../raw_reads/r*

Next, run the aligner to generate sam files.

$ bowtie2 -x jb418 -1 ../../raw_reads/r1.fastq -2 ../../raw_reads/r2.fastq -S jb418.sam
16051550 reads; of these:
  16051550 (100.00%) were paired; of these:
    1755602 (10.94%) aligned concordantly 0 times
    13645476 (85.01%) aligned concordantly exactly 1 time
    650472 (4.05%) aligned concordantly >1 times
    ----
    1755602 pairs aligned concordantly 0 times; of these:
      1032651 (58.82%) aligned discordantly 1 time
    ----
    722951 pairs aligned 0 times concordantly or discordantly; of these:
      1445902 mates make up the pairs; of these:
        985662 (68.17%) aligned 0 times
        335313 (23.19%) aligned exactly 1 time
        124927 (8.64%) aligned >1 times
96.93% overall alignment rate

This took ~ 1 hour to run on my macbook. There was no output until the very end, be patient!

Now we need to convert this to Bam using samtools.

$ samtools view -bS jb418.sam > jb418.bam

This step took ~ 20 min.

Running Pilon

Now that we've got our Bam alignment file, we can run Pilon. Turns out, pilon needs a sorted and indexed bam file, so first we need to do some more with samtools.

$ samtools sort jb418.bam -o sorted_jb418
$ samtools index sorted_jb418.bam sorted_jb418.bai

I then tried to run pilon, but got an error that I don't have enough memory or something.

$ pilon --genome ../../runs/jb418-auto/jb418.contigs.fasta --bam sorted_jb418.bam --outdir pilon/
Pilon version 1.22 Wed Mar 15 16:38:30 2017 -0400
Genome: ../../runs/jb418-auto/jb418.contigs.fasta
Fixing snps, indels, gaps, local
Input genome size: 5959283
Scanning BAMs
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.simontuffs.onejar.Boot.run(Boot.java:340)
	at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at htsjdk.samtools.DefaultSAMRecordFactory.createBAMRecord(DefaultSAMRecordFactory.java:36)
	at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:200)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:660)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:634)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
	at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:544)
	at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:518)
	at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at org.broadinstitute.pilon.BamFile.scan(BamFile.scala:285)
	at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$3.apply(GenomeFile.scala:92)
	at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$3.apply(GenomeFile.scala:92)
	at scala.collection.parallel.AugmentedIterableIterator$class.map2combiner(RemainsIterator.scala:115)
	at scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:62)
	at scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1054)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
	at scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1051)
	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
	at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
	at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:378)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:443)
	at scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:426)
	at scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:56)

This is the same issue noted in Pilon's issue tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
logs		logs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logs

logs

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Canu assembly notes

Installation

Data

Run Canu

Polishing

Running Bowtie2

Running Pilon

About

Releases

Packages

DuttonLab/jb418_assembly

Folders and files

Latest commit

History

Repository files navigation

Canu assembly notes

Installation

Data

Run Canu

Polishing

Running Bowtie2

Running Pilon

About

Resources

Stars

Watchers

Forks