Skip to content
This repository has been archived by the owner on Mar 16, 2022. It is now read-only.

Contrib

dgordon562 edited this page Jun 22, 2015 · 5 revisions

Contributions from users

Clarification on "job" directories for DALIGNER jobs

from dgordon

Each job_ directory is associated with a number N. That job_ directory has all of the las files for N and all x < N. For example, if N is 62 for this job_ directory, then it will have:

raw_reads.5.raw_reads.62.N1.las
raw_reads.62.raw_reads.5.N1.las

However, it will not have

raw_reads.67.raw_reads.62.N1.las

That las file will be in the job_ directory associated with N=67.

You can tell what N is for a particular job directory by looking at the rj_*.sh command in the job directory. The first raw_reads file on that line will tell you the value of N. For example,

daligner -v -t16 -H6000 -e0.7 -s1000 raw_reads.62 raw_reads.1 raw_reads.2 ...

shows you that N = 62 for this job_ directory.

The las files in a job_ directory have symbolic links to them from the m_ directories. Las files have 2 numbers in them such as:

raw_reads.7.raw_reads.62.N3.las

The first number in the las file (in this case 7) tells you which m_ directory is linked to this las file, in this case it is m_00007.

The larger number (which could be the first or the second number) tells you which job_ directory this las file will be in. In this example, raw_reads.7.raw_reads.62.N3.las will be in the job_ directory with N = 62.

There can be multiple job_ directories for the same N. In my experience a single job_ won't have files for more than about 104 x's where x < N. So if N = 300, it will put x = 1 to 104 in one job directory, x = 105 to 209 in the next and 210 to 313 in a 3rd.

Disk quotas

from dgordon

If the disk fills up to the level of the quota, the entire Falcon assembly may become corrupted and you will need to delete everything and start the assembly over from the beginning.

The reason is that in many case daligner will not crash when no more files can be written--it simply writes 0-length or truncated files, but will blithely continue on, and the done flag will be set, so fc_run.py will not know anything is wrong. The Falcon assembly will then start crashing at the LAsort stage. Restarting Falcon will not work. It becomes difficult to determine which job_ directories are corrupted and which are not.

(Editor's note: We'll have to address this issue some day.)

General help with FALCON

from David Gordon: I didn't find this in the regular documentation and had to learn it the hard way so I'm trying to save you all some struggle.

  • sge_option_da controls the options for running the daligner jobs that run out of the subdirectory 0-rawreads. Note that daligner will make 4 threads. For human, daligner uses about 30GB for this process and it uses 4 slots so I use the following:

      sge_option_da = -pe serial 4 -l mfree=7.5G
    
  • sge_option_la controls the sge options for, I believe, the LAsort/merge and LA4Falcon jobs that run out of the subdirectory 0-rawreads. This step will require about 6GB for human. I use:

      sge_option_la = -pe orte 6 -l mfree=6G
    
  • sge_option_pda is used for the daligner jobs that run out of the subdirectory 1-preads_ovl. Again, daligner will make 4 threads. For human, this stage of daligner uses more than 30G so I use the following:

      sge_option_pda = -pe serial 4 -l mfree=12G
    
  • sge_option_pla must be for the LAsort/merge jobs that run out of the 1-preads_ovl directory:

      sge_option_pla = -pe orte 2 -l mfree=6G
    
  • sge_option_fc is used for the final 2-asm-falcon stage, including running fc_graph_to_contig.py My experience is that 6GB is not sufficient.

      sge_option_fc = -pe orte 6 -l mfree=6G
    
  • below is for ct_* tasks

      sge_option_cns = -pe orte 6 -l mfree=6G -l ssd=FALSE
    
  • pa_concurrent_jobs specifies the max # of daligner 0-rawreads jobs.