Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[No LINE, EDTA 2.2.0] Empty LINE file after RM2 #455

Open
Isoris opened this issue Apr 17, 2024 · 0 comments
Open

[No LINE, EDTA 2.2.0] Empty LINE file after RM2 #455

Isoris opened this issue Apr 17, 2024 · 0 comments

Comments

@Isoris
Copy link

Isoris commented Apr 17, 2024

Hi, I have tried to run EDTA at the latest version, in different ways on the HPC of my university however for each run the LINE folder is not EMPTY but there is an empty LINE fasta file

genome.fa.mod.RM2.fa is 0kb
genome.fa.mod.LINE.raw.fa is also 0kb

however the LINE folder is not empty and RM2 can proceed to blast the Dfam and the different rounds cna complete quite well but in the end it seems that there may be a problem with how the $genome argument is passed to the RM2 or I am not quite sure to understand what is the problem but this is what I get:

(EDTA2.2) [qandres@tara-frontend-1 RM_758883.WedApr171556382024]$ ll total 121 drwxr-sr-x 2 qandres proj5057 8192 2024-04-17 15:57:25 round-1 -rw-r--r-- 1 qandres proj5057 3288 2024-04-17 15:57:47 consensi.fa -rw-r--r-- 1 qandres proj5057 102587 2024-04-17 15:57:47 families.stk drwxr-sr-x 4 qandres proj5057 8192 2024-04-17 15:57:47 round-2 (EDTA2.2) [qandres@tara-frontend-1 RM_758883.WedApr171556382024]$

and also I get this

(EDTA2.2) [qandres@tara-frontend-1 RM_758883.WedApr171556382024]$ ls -l ./*/*.log -rw-r--r-- 1 qandres proj5057 13605 Apr 17 15:57 ./round-1/filter-stage-1.log -rw-r--r-- 1 qandres proj5057 19347 Apr 17 15:57 ./round-1/makeblastdb.log -rw-r--r-- 1 qandres proj5057 0 Apr 17 15:56 ./round-1/repeatscout.log -rw-r--r-- 1 qandres proj5057 0 Apr 17 15:57 ./round-2/blastdbcmd.log -rw-r--r-- 1 qandres proj5057 12979 Apr 17 15:57 ./round-2/makeblastdb.log

When I open the rice LINE RM folder of round-2 I get this

(EDTA2.2) [qandres@tara-frontend-1 round-2]$ ll total 2854 -rw-r--r-- 1 qandres proj5057 466 2024-04-17 15:57:25 sampleDB-2.fa.entry_batch -rw-r--r-- 1 qandres proj5057 1020493 2024-04-17 15:57:25 sampleDB-2.fa -rw-r--r-- 1 qandres proj5057 250594 2024-04-17 15:57:27 tmpMaskDB-1.nsq -rw-r--r-- 1 qandres proj5057 520 2024-04-17 15:57:27 tmpMaskDB-1.nin -rw-r--r-- 1 qandres proj5057 1341 2024-04-17 15:57:27 tmpMaskDB-1.nhr -rw-r--r-- 1 qandres proj5057 52 2024-04-17 15:57:27 tmpMaskDB-1.nni -rw-r--r-- 1 qandres proj5057 200 2024-04-17 15:57:27 tmpMaskDB-1.nnd -rw-r--r-- 1 qandres proj5057 132 2024-04-17 15:57:27 tmpMaskDB-1.nog -rw-r--r-- 1 qandres proj5057 600 2024-04-17 15:57:27 tmpMaskDB-1.njs -rw-r--r-- 1 qandres proj5057 30 2024-04-17 15:57:28 tmpMaskDB-1-gilist.txt -rw-r--r-- 1 qandres proj5057 48 2024-04-17 15:57:28 tmpMaskDB-1-gilist -rw-r--r-- 1 qandres proj5057 1020493 2024-04-17 15:57:28 sampleDB-2.fa.masked -rw-r--r-- 1 qandres proj5057 528 2024-04-17 15:57:28 sampleDB-2.fa.masked.nin -rw-r--r-- 1 qandres proj5057 1341 2024-04-17 15:57:28 sampleDB-2.fa.masked.nhr -rw-r--r-- 1 qandres proj5057 251206 2024-04-17 15:57:28 sampleDB-2.fa.masked.nsq -rw-r--r-- 1 qandres proj5057 52 2024-04-17 15:57:28 sampleDB-2.fa.masked.nni -rw-r--r-- 1 qandres proj5057 200 2024-04-17 15:57:28 sampleDB-2.fa.masked.nnd -rw-r--r-- 1 qandres proj5057 132 2024-04-17 15:57:28 sampleDB-2.fa.masked.nog -rw-r--r-- 1 qandres proj5057 671 2024-04-17 15:57:28 sampleDB-2.fa.masked.njs -rw-r--r-- 1 qandres proj5057 0 2024-04-17 15:57:28 blastdbcmd.log -rw-r--r-- 1 qandres proj5057 72090 2024-04-17 15:57:31 msps.out -rw-r--r-- 1 qandres proj5057 144 2024-04-17 15:57:31 seqnames drwxr-sr-x 2 qandres proj5057 4096 2024-04-17 15:57:31 images drwxr-sr-x 2 qandres proj5057 4096 2024-04-17 15:57:36 summary -rw-r--r-- 1 qandres proj5057 4520 2024-04-17 15:57:37 family-49.fa -rw-r--r-- 1 qandres proj5057 529 2024-04-17 15:57:37 family-49.fa.njs -rw-r--r-- 1 qandres proj5057 30633 2024-04-17 15:57:38 family-49-cons.malign -rw-r--r-- 1 qandres proj5057 7293 2024-04-17 15:57:39 family-49.fa.refiner.stk -rw-r--r-- 1 qandres proj5057 220 2024-04-17 15:57:39 family-49.fa.refiner_cons -rw-r--r-- 1 qandres proj5057 4412 2024-04-17 15:57:39 family-36.fa -rw-r--r-- 1 qandres proj5057 529 2024-04-17 15:57:39 family-36.fa.njs -rw-r--r-- 1 qandres proj5057 29308 2024-04-17 15:57:39 family-36-cons.malign -rw-r--r-- 1 qandres proj5057 6148 2024-04-17 15:57:41 family-36.fa.refiner.stk -rw-r--r-- 1 qandres proj5057 310 2024-04-17 15:57:41 family-36.fa.refiner_cons -rw-r--r-- 1 qandres proj5057 4161 2024-04-17 15:57:41 family-107.fa -rw-r--r-- 1 qandres proj5057 534 2024-04-17 15:57:41 family-107.fa.njs -rw-r--r-- 1 qandres proj5057 23632 2024-04-17 15:57:41 family-107-cons.malign -rw-r--r-- 1 qandres proj5057 5679 2024-04-17 15:57:42 family-107.fa.refiner.stk -rw-r--r-- 1 qandres proj5057 318 2024-04-17 15:57:42 family-107.fa.refiner_cons -rw-r--r-- 1 qandres proj5057 16117 2024-04-17 15:57:42 family-67.fa -rw-r--r-- 1 qandres proj5057 530 2024-04-17 15:57:43 family-67.fa.njs -rw-r--r-- 1 qandres proj5057 19041 2024-04-17 15:57:43 family-67-cons.malign -rw-r--r-- 1 qandres proj5057 12979 2024-04-17 15:57:46 makeblastdb.log -rw-r--r-- 1 qandres proj5057 27476 2024-04-17 15:57:47 family-67.fa.refiner.stk -rw-r--r-- 1 qandres proj5057 2090 2024-04-17 15:57:47 family-67.fa.refiner_cons -rw-r--r-- 1 qandres proj5057 46890 2024-04-17 15:57:47 families.stk -rw-r--r-- 1 qandres proj5057 445 2024-04-17 15:57:47 index.html -rw-r--r-- 1 qandres proj5057 2967 2024-04-17 15:57:47 consensi.fa (EDTA2.2) [qandres@tara-frontend-1 round-2]$

So in my undertanding RM2 can run well but it seems that EDTA can't recover the consensi.fa and families.stk??

here is what I did to run EDTA in the singularity container (latest, version 2.2.0--hdfd78af_1)

I did this:

conda deactivate mamba create -n EDTA2.2 -c conda-forge -c bioconda -c r annosine2 biopython blast cd-hit coreutils genericrepeatfinder genometools-genometools glob2 h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow==2.11 tesorter mamba activate EDTA2.2 export LANGUAGE=en_US.UTF-8 export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 module load Singularity/3.4.2 cd /tarafs/data/project/proj5057-AGBKUB/TE_catfish_p2/work_space/ SINGULARITY_CACHEDIR=./ export SINGULARITY_CACHEDIR export PYTHONNOUSERSITE=1 export BLASTDB_LMDB_MAP_SIZE=100000000 git clone https://github.com/oushujun/EDTA.git cd ./EDTA/test sbatch -N 1 --ntasks-per-node=20 -t 120:00:00 -p memory -A proj5034 -J TE_EDTA_test_on_rice_genome --wrap="singularity exec --bind /work_space:/work_space --bind /work_space/EDTA/test:/work_space/EDTA/test /work_space/EDTA.sif EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib ../database/rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --threads 10"

maybe linked to https://github.com/Dfam-consortium/RepeatModeler/issues/192

When I run EDTA v2.1.0 the Lines are indeed here in my genome and also in the rice example.

Thank you for your time and any help is appreciated :)
Best regards,
Quen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant