LINE search takes much longer compared to other steps #421

foriin · 2024-01-22T20:50:04Z

Hi Shujun,

This is not a bug report, but a question. I've noticed that when I run EDTA on Drosophila genome, it takes an extraordinary amount of time when searching for LINEs. Drosophila genome is populated mostly by LTRs but it takes 5-10 times more time for EDTA to look for LINEs. Is there a way to improve the speed of this step? If it's a pure repeatmasker/repeatmodeller or blast, maybe it could've been done in parallel? I can't understand how running Repeatmodeller on 150 Mb genome with 16 cores in parallel could take 10 hours...

Cheers,
Artem

oushujun · 2024-01-23T16:51:41Z

Hi Artem,

Unfortunately, this is the case. The LINE search function is carried out by RepeatModeler which is slow on even small genomes. Because RepeatModeler's search is based on copy number and multiple alignments, splitting the genome into small subsets may lose families that are already low copy. You can run EDTA on SSD, which will significantly improve your RepeatModeler/RepeatMasker runs because they are I/O intense.

Shujun

foriin · 2024-01-24T09:31:17Z

Thanks, Shujun,
The cluster I ran EDTA on has only SSD, I think :) I see the problem now: we need to parallelize RM, but it has to establish communication between all the jobs in parallel. Could you please tell me what specific part of RM is assigned for LINE search?

oushujun · 2024-01-25T15:58:36Z

RM2 is described here: https://www.pnas.org/doi/10.1073/pnas.1921046117. Fig 1 shows the workflow. Currently, the whole RM2 workflow is executed, and SINE/LINE elements are harvested at the end output of RM2. If a particular module can be separated, or RM2 being further acclerated, it would be great!

Shujun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LINE search takes much longer compared to other steps #421

LINE search takes much longer compared to other steps #421

foriin commented Jan 22, 2024

oushujun commented Jan 23, 2024

foriin commented Jan 24, 2024

oushujun commented Jan 25, 2024

LINE search takes much longer compared to other steps #421

LINE search takes much longer compared to other steps #421

Comments

foriin commented Jan 22, 2024

oushujun commented Jan 23, 2024

foriin commented Jan 24, 2024

oushujun commented Jan 25, 2024