Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTR retriever is not compatible with RepeatModeler2 since v2.9.8? #169

Open
BitaoQiu opened this issue Apr 11, 2024 · 5 comments
Open

LTR retriever is not compatible with RepeatModeler2 since v2.9.8? #169

BitaoQiu opened this issue Apr 11, 2024 · 5 comments

Comments

@BitaoQiu
Copy link

BitaoQiu commented Apr 11, 2024

Dear LTR retriever developers,

I was using RepeatModeler and found that there is no output from LTR retriever (v2.9.8 and v.2.9.9, either from GitHub or Conda). This seems to have been reported before by other users. The log file of v2.9.8 reports:


Thu Mar 28 23:13:07 CET 2024 Dependency checking: All passed!
Thu Mar 28 23:13:16 CET 2024 LTR_retriever is starting from the Init step.
Thu Mar 28 23:13:17 CET 2024 Start to convert inputs...
Total candidates: 35905
Total uniq candidates: 35905

Thu Mar 28 23:13:22 CET 2024 Module 1: Start to clean up candidates...
Sequences with 10 missing bp or 0.8 missing data rate will be discarded.
Sequences containing tandem repeats will be discarded.

    Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa
    Options:
            -misschar       n       Define the letter representing unknown sequences; case insensitive; default: n
            -Nscreen        [0|1]   Enable (1) or disable (0) the -nc parameter; default: 1
            -nc             [int]   Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
            -nr             [0-1]   Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
            -minlen         [int]   Minimum sequence length filter after clean up; default: 100 (bp)
            -cleanN         [0|1]   Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
            -trf            [0|1]   Enable (1) or disable (0) tandem repeat finder (trf); default: 1
            -trf_path       path    Path to the trf program

Thu Mar 28 23:13:22 CET 2024 0 clean candidates remained


Out of curiosity, I downgraded LTR retriever to v2.9.5 from conda, and this time it passed Module 1:


Thu Apr 11 21:37:41 CEST 2024 Dependency checking: All passed!
Thu Apr 11 21:37:43 CEST 2024 LTR_retriever is starting from the Init step.
Thu Apr 11 21:37:45 CEST 2024 Start to convert inputs...
Total candidates: 35905
Total uniq candidates: 35905

Thu Apr 11 21:37:49 CEST 2024 Module 1: Start to clean up candidates...
Sequences with 10 missing bp or 0.8 missing data rate will be discarded.
Sequences containing tandem repeats will be discarded.

Thu Apr 11 21:37:49 CEST 2024 35905 clean candidates remained

Thu Apr 11 21:37:49 CEST 2024 Modules 2-5: Start to analyze the structure of candidates...
The terminal motif, TSD, boundary, orientation, age, and superfamily will be identified in this step.


It seems there is something wrong with get_range.pl from v2.9.8, which makes LTR_retriever not able to read LTR_harvest output. May I ask is there any suggestion?

@BitaoQiu BitaoQiu changed the title Clean up does not work since v2.9.8 LTR retriever is not compatible with RepeatModeler2 since v2.9.8? Apr 11, 2024
@oushujun
Copy link
Owner

Hello,

I could not reproduce the issue using the test data. Can you provide one example?

Shujun

@juanjo255
Copy link

juanjo255 commented Apr 24, 2024

Hello! @oushujun

I've been struggling with the same problem :( using the genome.fa for testing available at EDTA

Tue Apr 23 21:51:11 -05 2024	Dependency checking: All passed!
Tue Apr 23 21:51:18 -05 2024	LTR_retriever is starting from the Init step.
Tue Apr 23 21:51:18 -05 2024	Start to convert inputs...
				Total candidates: 14
				Total uniq candidates: 14

Tue Apr 23 21:51:18 -05 2024	Module 1: Start to clean up candidates...
				Sequences with 10 missing bp or 0.8 missing data rate will be discarded.
				Sequences containing tandem repeats will be discarded.


        Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa 
	Options:
		-misschar	n	Define the letter representing unknown sequences; case insensitive; default: n
		-Nscreen	[0|1]	Enable (1) or disable (0) the -nc parameter; default: 1
		-nc		[int]	Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
		-nr		[0-1]	Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
		-minlen		[int]	Minimum sequence length filter after clean up; default: 100 (bp)
		-cleanN		[0|1]	Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
		-trf		[0|1]	Enable (1) or disable (0) tandem repeat finder (trf); default: 1
		-trf_path	path	Path to the trf program
        
Tue Apr 23 21:51:18 -05 2024	0 clean candidates remained

cp: cannot stat 'seq.fa.retriever.scn.adj': No such file or directory
Tue Apr 23 21:51:18 -05 2024	No LTR-RT was found in your data.

Tue Apr 23 21:51:18 -05 2024	All analyses were finished!

@oushujun
Copy link
Owner

@juanjo255 can you please provide your commands? Thanks!

Shujun

@juanjo255
Copy link

Hello @oushujun,

thanks for the help.

I am using RepeatModeler. I also had to downgrade LTR_RETRIEVER to the 2.5 tag for it to work. So, after building database with BuildDatabase, it was just this simple command:

RepeatModeler -LTRStruct -threads 32 -database ~/path/to/databaset was just,

I hope it can help,

Juan

@BitaoQiu
Copy link
Author

BitaoQiu commented May 8, 2024

@oushujun Sorry for my late reply... Yes, just using RepeatModeler and LTR_retriever 2.9.8 will produce the error (as @juanjo255 wrote)... I am now only using 2.9.5 ...

Best,
Bitao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants