-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepeatModeler run successfully, but did not create *.classified file and *-families.fa, and so stopped the earlGrey. #108
Comments
Hi, in this case it looks like RepeatModeler failed - Is this being run on a queuing system? Where is RepeatModeler installed (conda environment, or manual install)? In this case, it looks like RepeatModeler2 is trying to write a log to root |
Yes, it is run on a queuing system (slurm). I installed miniconda in my home directory and then earlgrey was installed with the conda installed in my home directory. And so earlgrey environment is located within miniconda env directory of my home directory. repeatmodeler is also in the earlgrey environment. Jae |
Exit code 11 usually indicates a segmentation fault in unix systems. Potential causes for this in a slurm system could be using too much memory or not being given enough cores. Generally, repeat annotation on larger genomes will require a high-memory node to prevent being killed by the queuing system. I would recommend trying a fresh run. Alternatively, the Docker container may work better depending on the architecture of your HPC and queuing system |
Thanks for the comment. maybe I should try containers, docker or singularity. In addition, when I ran earlgrey with asmall genome (about 200Mbp), all of the final output were not created in *_summaryFiles directory.
I attached the log file here (I cut out some of part because of size limit). |
The error has occurred in the post-filtering step:
Have you got spaces or strange characters in your FASTA header names in the input file? If so, this will cause some methods to fail. I recommend checking line 940 in |
Hmm. I tried to figure out the errors but I could not. pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 954, saw 10 And then I checked line 954 in ${species}_EarlGrey/${species}_mergedRepeats/looseMerge/*.rmerge.gff.filtered ctg_1016 RepeatMasker LTR/Gypsy 439913 443130 23901 - NA Tstart=1582;Tend=4584;ID=RND-1_FAMILY-240;shortTE=F;LTRgroup=ctg_1016_g6;TEgroup=ctg_1016|RND-1_FAMILY-240|4 Below is the part of the log file mv: cannot stat ‘/projectsp/f_cee53_1/ellison_lab/JaeHakSon/repeats/earlgrey/Z.indianus_4595_EarlGrey/Z.indianus_4595_mergedRepeats/looseMerge/Z.indianus_4595.rmerge.gff.filtered.2’: No such file or directory
|
update on the previous comment.I figure out the issue and solved it. Maybe is this a typo in the code? |
This is odd - I haven't been able to reproduce this bug on any of the machines here (multiple linux and mac systems). If this works for you, then happy it is a good solution! |
Hi Toby,
I am trying to run earlyGrey (by conda installed) with two genomes. one genome is small (200Mb) and the other one is big (1.3Gb).
When running it with the big genome, I got errors. I guess that error occurs by output of RepeatModeler.
My earlGrey run stopped at the stage of repeatmodeler because repeatmodeler did not create *.claasified the RepeaModelr directory in and did not copy *-families.fa, *-familes.stk, *-rmod.log in the Database directory.
When I re-run repeatmodeler with -recoverDir otpion, it said that repeatmodeler successfully run. However, it did not create and copy the necessary files for the downstream running. and I got stuck in the step with a big genome. With a small genome, there is no problem.
I think that I can manually create *.classified file using RepeatClassifier and then copy the appropriate file into the Database directory. And then I will use the same earlGrey command with the big genome.
I wonder if this way works without issues and creates the same earlGrey outputs.
Below is the log file for the big genome.
Building database housefly_aabys:
Reading /scratch/js3054/housefly/ragtag_option/scaff_hifi_hic/3d-dna/post_review/base_HiC.fasta.prep...
Number of sequences (bp) added to database: 502 ( 1357786862 bp )
RepeatModeler Version 2.0.5
Using output directory = /projectsp/f_cee53_1/ellison_lab/JaeHakSon/repeats/earlgrey/housefly_aabys_EarlGrey/housefly_aabys_RepeatModeler/RM_64325.SatMay112212482024
Search Engine = rmblast 2.14.1+
Threads = 32
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.5
LTR Structural Analysis: Disabled [use -LTRStruct to enable]
Random Number Seed: 1715479967
Database = /projectsp/f_cee53_1/ellison_lab/JaeHakSon/repeats/earlgrey/housefly_aabys_EarlGrey/housefly_aabys_Database/housefly_aabys .
Size(bp) Count
230076028-246509918 | [ 2 ]
213642138-230076027 | [ ]
197208248-213642137 | [ 2 ]
180774358-197208247 | [ ]
164340469-180774358 | [ 1 ]
147906579-164340468 | [ ]
131472689-147906578 | [ ]
115038799-131472688 | [ ]
98604909-115038798 | [ 1 ]
82171020-98604909 | [ 1 ]
65737130-82171019 | [ ]
49303240-65737129 | [ 1 ]
32869350-49303239 | [ ]
16435460-32869349 | [ ]
1571-16435460 |************************************************* [ 494 ]
Storage Throughput = excellent ( 1828.62 MB/s )
Ready to start the sampling process.
INFO: The runtime of RepeatModeler heavily depends on the quality of the assembly
and the repetitive content of the sequences. It is not imperative
that RepeatModeler completes all rounds in order to obtain useful
results. At the completion of each round, the files ( consensi.fa, and
families.stk ) found in:
/projectsp/f_cee53_1/ellison_lab/JaeHakSon/repeats/earlgrey/housefly_aabys_EarlGrey/housefly_aabys_RepeatModeler/RM_64325.SatMay112212482024/
will contain all results produced thus far. These files may be
manually copied and run through RepeatClassifier should the program
be terminated early.
RepeatModeler Round # 1
.
.
.
Comparison Time: 06:42:39 (hh:mm:ss) Elapsed Time, 564088 HSPs Collected
RECON Elapsed: 00:00:00 (hh:mm:ss) Elapsed Time
RECON Elapsed: 00:00:38 (hh:mm:ss) Elapsed Time
eleredef failed. Exit code 11
ERROR: RepeatModeler Failed, Retrying with limit set as Round 5
Could not open up /rmod.log for writing!
ERROR: RepeatModeler Failed, Retrying with limit set as Round 4
Could not open up /rmod.log for writing!
ERROR: RepeatModeler Failed
The text was updated successfully, but these errors were encountered: