Skip to content

DuttonLab/jb418_assembly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Canu assembly notes

Installation

First, I cloned and installed canu.

$ git clone git@github.com:marbl/canu.git
Cloning into 'canu'...
...
$ cd canu/src/
$ make -j 2
...

See here for install logs.

I also added alias canu="/Users/ksb/computation/science/canu/Darwin-amd64/bin/canu" to my ~/.bash_profile to make it easier to run the script.

Data

I downloaded the following files from google drive and moved them to a raw_reads/ subfolder.

  • Copy of JB418_ACTGAT.R1.fastq.gz (renamed to r1.fastq.gz)
  • Copy of JB418_ACTGAT.R2.fastq.gz (renamed to r2.fastq.gz)
  • filtered_subreads_pacbiojb418.fastq (renamed to pb.fastq)
    • I compressed this file with gzip, creating pb.fastq.gz
$ mv raw_reads/Copy\ of\ JB418_ACTGAT.R1.fastq.gz raw_reads/r1.fastq.gz
$ mv raw_reads/Copy\ of\ JB418_ACTGAT.R2.fastq.gz raw_reads/r2.fastq.gz
$ mv raw_reads/filtered_subreads_pacbiojb418.fastq raw_reads/pb.fastq
$ gzip raw_reads/pb.fastq

Run Canu

I didn't have java installed, and java and gnuplot are required to make plots. I installed using homebrew.

$ brew cask install java
...
$ brew install gnuplot
...

I made a subfolder runs/, moved there, and ran the assembler based on this command in the "quick start" section of the canu docs. I used 4.7 for the genomeSize parameter based on the median genome length for Hafnia alvei in genbank.

$ mkdir runs
$ cd runs/
$ canu -p jb418 -d jb418-auto genomeSize=4.7m -pacbio-raw ../raw_reads/pb.fastq.gz

This step took about 2.75 hours on my ~3 year old MacBook Pro. For complete run log, see here.

The assembly ends up here.

Polishing

We can use the illumina paired reads for polishing/error correction. I used Pilon, which in turn requires Bam alignment files which can be generated by Bowtie. I used Bowtie2. More precisely, bowtie makes Sam files, which can be converted to Bam using SamTools.

All of these can also be installed with homebrew from the homebrew/science cask.

$ brew tap homebrew/science
...
$ brew install pilon
...
$ brew install bowtie2
...
$ brew install samtools

Bowtie2 maps illumina reads onto the draft assembly produced by Canu, generating a Sam (sequence alignment map) file. Using samtools, we can convert this to a bam (binary alignment map) file that pilon can use, and then pilon uses that map file to polish the assembly - correcting errors and filling in gaps (if possible).

Running Bowtie2

See here for bowtie2 example.

First, we need a bowtie2 index. I backed out to the root of the project, and then:

$ mkdir polishing
$ cd polishing/
$ mkdir jb418
$ cd jb418
$ bowtie2-build ../../runs/jb418-auto/jb418.contigs.fasta jb418

This generates 6 .bt2 files in the jb418 directory. Next, we want to generate the alignments. Bowtie doesn't work with compressed fastq files so first we've got to unzip them.

$ gunzip ../../raw_reads/r*

Next, run the aligner to generate sam files.

$ bowtie2 -x jb418 -1 ../../raw_reads/r1.fastq -2 ../../raw_reads/r2.fastq -S jb418.sam
16051550 reads; of these:
  16051550 (100.00%) were paired; of these:
    1755602 (10.94%) aligned concordantly 0 times
    13645476 (85.01%) aligned concordantly exactly 1 time
    650472 (4.05%) aligned concordantly >1 times
    ----
    1755602 pairs aligned concordantly 0 times; of these:
      1032651 (58.82%) aligned discordantly 1 time
    ----
    722951 pairs aligned 0 times concordantly or discordantly; of these:
      1445902 mates make up the pairs; of these:
        985662 (68.17%) aligned 0 times
        335313 (23.19%) aligned exactly 1 time
        124927 (8.64%) aligned >1 times
96.93% overall alignment rate

This took ~ 1 hour to run on my macbook. There was no output until the very end, be patient!

Now we need to convert this to Bam using samtools.

$ samtools view -bS jb418.sam > jb418.bam

This step took ~ 20 min.

Running Pilon

Now that we've got our Bam alignment file, we can run Pilon. Turns out, pilon needs a sorted and indexed bam file, so first we need to do some more with samtools.

$ samtools sort jb418.bam -o sorted_jb418
$ samtools index sorted_jb418.bam sorted_jb418.bai

I then tried to run pilon, but got an error that I don't have enough memory or something.

$ pilon --genome ../../runs/jb418-auto/jb418.contigs.fasta --bam sorted_jb418.bam --outdir pilon/
Pilon version 1.22 Wed Mar 15 16:38:30 2017 -0400
Genome: ../../runs/jb418-auto/jb418.contigs.fasta
Fixing snps, indels, gaps, local
Input genome size: 5959283
Scanning BAMs
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.simontuffs.onejar.Boot.run(Boot.java:340)
	at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at htsjdk.samtools.DefaultSAMRecordFactory.createBAMRecord(DefaultSAMRecordFactory.java:36)
	at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:200)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:660)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:634)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
	at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:544)
	at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:518)
	at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at org.broadinstitute.pilon.BamFile.scan(BamFile.scala:285)
	at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$3.apply(GenomeFile.scala:92)
	at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$3.apply(GenomeFile.scala:92)
	at scala.collection.parallel.AugmentedIterableIterator$class.map2combiner(RemainsIterator.scala:115)
	at scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:62)
	at scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1054)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
	at scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1051)
	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
	at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
	at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:378)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:443)
	at scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:426)
	at scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:56)

This is the same issue noted in Pilon's issue tracker.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published