Skip to content

daligner executable

Mark Lakata edited this page Jun 20, 2016 · 2 revisions

This is an example taken from a FALCON workflow.

The input to this flow is a raw_reads.db file in the current directory.

$ daligner -v -t16 -H15000 -e0.7 -s1000 raw_reads.1 raw_reads.1

The options are:

-v         Verbose output
-t16       "Tuple supression [sic] frequency."
           If a kmer appears more than 16 times, don't count
           it as a seed hit.  This is to avoid homopolymers, which don't help in alignment.
-H15000    "HGAP threshold (in bp.s)"
           ?
-e0.7      "Average error [sic]"
           The alignment correlation must be greater than 70% (error less than 30%) 
-s1000     "Trace spacing"
           ?
raw_reads.1   raw_reads is shorthand for raw_reads.db. The 1 means use partition 1.
raw_reads.1   By repeating the same file name, this compares the reads to themselves.

Here is the log output:

Building index for raw_reads.1

 Kshift=28
 BSHIFT=8
 TooFrequent=16
 (Kshift-1)/BSHIFT + (TooFrequent < INT32_MAX)=4
 sizeof(KmerPos)=16
 nreads=23595
 Kmer=14
 block->reads[nreads].boff=400033006
 kmers=399702676
 sizeof(KmerPos)*(kmers+1)=6395242832
 Allocated 399702677 of 16 (6395242832 bytes) at 0x7f3f56627010
   Kmer count = 399,702,676
   Using 11.91Gb of space
   Revised kmer count = 294,457,040
   Index occupies 4.39Gb

Comparing raw_reads.1 to raw_reads.1

   Capping mutual k-mer matches over 10000 (effectively -t100)
   Hit count = 682,284,336
   Highwater of 24.72Gb space

     682,284,336 14-mers (4.264076e-09 of matrix)
       1,051,303 seed hits (6.570335e-12 of matrix)
         377,595 confirmed hits (2.359858e-12 of matrix)

Building index for c(raw_reads.1)

 Kshift=28
 BSHIFT=8
 TooFrequent=16
 (Kshift-1)/BSHIFT + (TooFrequent < INT32_MAX)=4
 sizeof(KmerPos)=16
 nreads=23595
 Kmer=14
 block->reads[nreads].boff=400033006
 kmers=399702676
 sizeof(KmerPos)*(kmers+1)=6395242832
 Allocated 399702677 of 16 (6395242832 bytes) at 0x7f3c5a02d010
   Kmer count = 399,702,676
   Using 11.91Gb of space
   Revised kmer count = 294,457,040
   Index occupies 4.39Gb

Comparing raw_reads.1 to c(raw_reads.1)

   Capping mutual k-mer matches over 10000 (effectively -t100)
   Hit count = 643,810,060
   Highwater of 23.57Gb space

     643,810,060 14-mers (4.023624e-09 of matrix)
     960,715 seed hits (6.004186e-12 of matrix)
     346,001 confirmed hits (2.162404e-12 of matrix)

The output is a collection of *.las (local alignment) files:

$ ls -l *.las
-rw-r--r-- 1 mlakata Domain Users 4837180 Jun 20 11:18 raw_reads.1.raw_reads.1.C0.las
-rw-r--r-- 1 mlakata Domain Users 5034968 Jun 20 11:18 raw_reads.1.raw_reads.1.C1.las
-rw-r--r-- 1 mlakata Domain Users 4960928 Jun 20 11:18 raw_reads.1.raw_reads.1.C2.las
-rw-r--r-- 1 mlakata Domain Users 5061924 Jun 20 11:18 raw_reads.1.raw_reads.1.C3.las
-rw-r--r-- 1 mlakata Domain Users 5222656 Jun 20 11:17 raw_reads.1.raw_reads.1.N0.las
-rw-r--r-- 1 mlakata Domain Users 5482888 Jun 20 11:17 raw_reads.1.raw_reads.1.N1.las
-rw-r--r-- 1 mlakata Domain Users 5495064 Jun 20 11:17 raw_reads.1.raw_reads.1.N2.las
-rw-r--r-- 1 mlakata Domain Users 5596832 Jun 20 11:17 raw_reads.1.raw_reads.1.N3.las

The reason for the plurality of files is that each file is the result of 1 thread (4 threads is baked into daligner), and each thread is run twice, once in the normal-normal direction (N) and once in the normal-reverse complement direction (C).

Clone this wiki locally