High RAM usage on simulated samples #9

morgantaschuk · 2015-09-22T22:30:49Z

Hi,

I have 50x simulated 1000g data with ART (http://www.niehs.nih.gov/research/resources/software/biostatistics/art/). I'm trying to run fermikit on this data and our cluster is killing the job when it passes 170G RAM. Do you have any suggestions for decreasing memory usage?

fermi.kit/fermi2.pl unitig -s3g -t16 -l126 -p art_50x_fermikit "fermi.kit/seqtk mergepe NA12877_50x_1.fq.gz NA12877_50x_2.fq.gz" > art_50x_fermikit.mak
make -f art_50x_fermikit.mak
fermi.kit/run-calling -t16 reference/hg19_random.fa art_50x_fermikit.mag.gz | sh

The only thing in the log is the following:

bash -c '/u/mtaschuk/git/fermikit/fermi.kit/bfc -s 3g -t 16 <(~/git/fermikit/fermi.kit/seqtk mergepe NA12877_50x_1.fq.gz NA12877_50x_2.fq.gz) <(~/git/fermikit/fermi.kit/seqtk mergepe NA12877_50x_1.fq.gz NA12877_50x_2.fq.gz) 2> art_50x_fermikit.ec.fq.gz.log | gzip -1 > art_50x_fermikit.ec.fq.gz'

I'm using NA12877 vcf from GiaB, converted to fasta reference, and then simulating using ART with the following characteristics:

read length: 126
mean fragment length: 500
standard deviation: 120
average coverage: 50x
paired end
error model: HiSeq 2500

The text was updated successfully, but these errors were encountered:

lh3 · 2015-09-22T23:41:34Z

Simulators usually generate reads with much higher error rate. The peak memory of the error corrector is sensitive to the error rate. This is an issue with fermikit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High RAM usage on simulated samples #9

High RAM usage on simulated samples #9

morgantaschuk commented Sep 22, 2015

lh3 commented Sep 22, 2015

High RAM usage on simulated samples #9

High RAM usage on simulated samples #9

Comments

morgantaschuk commented Sep 22, 2015

lh3 commented Sep 22, 2015