Skip to content
This repository has been archived by the owner on Mar 16, 2022. It is now read-only.

Configuration

Christopher Dunn edited this page Jan 17, 2019 · 10 revisions

Falcon/pypeflow job-submission configuration

First, you can use either .cfg or .json for configuration. Keys and section-names are case-sensitive. (Before April 2018, they were case-insensitive.)

New-style config

pypeflow-2.0.0 offers a new, more flexible way to configure job-submission via pypeflow.

You should be able to quite, alter any of these, and resume to see the new values take effect. (This was a long-standing request from David Gordon.)

The job.defaults section should have the basics, and any defaults. Your have several choices.

Concurrency

[job.defaults]
njobs = 32

[job.step.cns]
njobs = 8

That would allow up to 32 simultaneous jobs in most steps, but only 8 during falcon-consensus.

Blocking calls

This is simplest, and the first thing you should ever try:

[job.defaults]
pwatcher_type = blocking
submit = /bin/bash -c "${JOB_SCRIPT}"

[General]
pwatcher_type = blocking
# Because of a bug, this is needed in the "General" section, but soon
# it will work from the "job.defaults" too, which is preferred.

If you want to separate stderr/stdout into each task-dir, for isolated debugging:

[job.defaults]
pwatcher_type = blocking
submit = /bin/bash -c "${JOB_SCRIPT}" > "${JOB_STDOUT}" 2> "${JOB_STDERR}"

[General]
pwatcher_type = blocking

Note that there is no &; these are foreground processes. Each will block the thread which calls one.

It is easy to construct such a string for your own job-submission system, as long as the system provides a way to do "blocking" calls. E.g. SGE uses -sync y (and -V to pass the shell environment):

[job.defaults]
pwatcher_type = blocking
submit = qsub -S /bin/bash -sync y -V  \
  -q ${JOB_QUEUE}     \
  -N ${JOB_NAME}        \
  -o "${JOB_STDOUT}" \
  -e "${JOB_STDERR}" \
  -pe smp ${NPROC}    \
  "${JOB_SCRIPT}"

JOB_QUEUE = myqueue
MB = 4000
NPROC = 4

By convention, we use JOB_* for most variables. However, NPROC and MB are special; those limit the resources, so the process itself will be informed. Aside from those, we generate the following automatically:

  • JOB_STDOUT
  • JOB_STDERR
  • JOB_SCRIPT
  • JOB_NAME
  • (Some older aliases are also supported.)

You can provide default values for any of the substitutuion variables. (You can even define your own, but please use all-upper case.) And you can override these in the step-specific sections.

(Btw, we have had trouble with -l h_vmem=${MB}M.)

[job.step.cns]
NPROC = 24
MB = 2000

Currently, the falcon "steps" are:

  • job.step.dust
  • job.step.da
  • job.step.la
  • job.step.cns
  • job.step.pda
  • job.step.pla
  • job.step.asm (aka job.step.fc)

For other examples, see pypeFLOW configuration.

File-system polling

This is fairly normal. We submit jobs somehow, and we poll the filesystem to learn when each job is done.

This is a bit more convenient because we provide useful defaults for various job-submission systems. (We cannot do this generically because each system has a different way of "killing" a job early.)

[job.defaults]
pwatcher_type = fs_based
job_type = sge # choices: local/sge/lsf/pbs/slurm/torque/etc?
JOB_QUEUE = myqueue

File-system polling on your local machine

[job.defaults]
pwatcher_type = fs_based
job_type = local

This should be used before using sge etc, since it will test your workflow, independent of any job-submission problems. It uses & to put simple processes into the background.

File-system polling with flexible calls

If you do not like our submit and kill strings, you can provide your own in [job.defaults]. Variable substitutions are the same as for the blocking pwatcher (above).

[job.defaults]
submit = qsub -S /bin/bash --special-flags -q myqueue -N ${JOB_NAME} "${JOB_SCRIPT}" 
kill = qdel -j ${JOB_NAME}

It's tricky. And we don't yet have a dry-run mode. But it lets you do whatever you want.

Note: We do not yet have a way to learn the job-number from the submission command, so job-killing is subject to name-collisions. This is one reason why the "blocking" calls are easier to support.

Old-style config

In the past, you would specify overrides for each section.

Concurrency

[General]
default_concurrent_jobs = 32
cns_concurrent_jobs = 8

That would allow up to 32 simultaneous jobs in most steps, but only 8 during falcon-consensus.

Job-submission

[General]
job_queue = mydefaultqueue
sge_option_da = -pe smp 8 -q queueA
sge_option_la = -pe smp 2 -q queueA
sge_option_cns = -pe smp 8 -q queueA
sge_option_pda = -pe smp 8 -q queueB
sge_option_pla = -pe smp 2 -q queueB
sge_option_fc = -pe smp 24 -q queueB

Because we use Python ConfigParser, you could also do this:

[General]
job_queue = myqueue
sge_option_da = -pe smp 8 -q %(job_queue)

Those still work. They are substituted into your "submit" string as ${JOB_OPTS} if you do not provide JOB_OPTS yourself. But we recommend using the system above.

Why? Well, for one thing, the job needs to know how many processors were actually reserved for it. Otherwise, it could use whatever it wants. So hard-coded numbers are not helpful.

Also, it is far more flexible. You can set your own submission string, and you can pass-along whatever extra variables you need.

See also: