Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup.pl Bug? #162

Open
asgray opened this issue Feb 13, 2024 · 3 comments
Open

cleanup.pl Bug? #162

asgray opened this issue Feb 13, 2024 · 3 comments

Comments

@asgray
Copy link

asgray commented Feb 13, 2024

Hi, I'm in the process of updating to 2.9.9 from 2.9.0 and I'm seeing some odd outputs:

~/projects/LTR_retriever$ ./LTR_retriever -genome dmel-smaller.fa -inharvest raw-struct-results.txt

############################
### LTR_retriever v2.9.9 ###
############################

Contributors: Shujun Ou, Ning Jiang

For LTR_retriever, please cite:

        Ou S and Jiang N (2018). LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176(2): 1410-1422.

For LAI, please cite:

        Ou S, Chen J, Jiang N (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 2018;46(21):e126.

Parameters: -genome dmel-smaller.fa -inharvest raw-struct-results.txt


Mon Feb 12 03:56:29 PM PST 2024 Dependency checking: All passed!
Mon Feb 12 03:56:34 PM PST 2024 LTR_retriever is starting from the Init step.
Mon Feb 12 03:56:34 PM PST 2024 The longest sequence ID in the genome contains 68 characters, which is longer than the limit (13)
                                Trying to reformat seq IDs...
                                Attempt 1...
Mon Feb 12 03:56:34 PM PST 2024 Seq ID conversion successful!

Mon Feb 12 03:56:34 PM PST 2024 Start to convert inputs...
                                Total candidates: 42
                                Total uniq candidates: 42

Mon Feb 12 03:56:34 PM PST 2024 Module 1: Start to clean up candidates...
                                Sequences with 10 missing bp or 0.8 missing data rate will be discarded.
                                Sequences containing tandem repeats will be discarded.

        Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa 
        Options:
                -misschar       n       Define the letter representing unknown sequences; case insensitive; default: n
                -Nscreen        [0|1]   Enable (1) or disable (0) the -nc parameter; default: 1
                -nc             [int]   Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
                -nr             [0-1]   Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
                -minlen         [int]   Minimum sequence length filter after clean up; default: 100 (bp)
                -cleanN         [0|1]   Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
                -trf            [0|1]   Enable (1) or disable (0) tandem repeat finder (trf); default: 1
                -trf_path       path    Path to the trf program
        
Mon Feb 12 03:56:34 PM PST 2024 0 clean candidates remained

cp: cannot stat 'dmel-smaller.fa.mod.retriever.scn.adj': No such file or directory
Mon Feb 12 03:56:34 PM PST 2024 No LTR-RT was found in your data.

Mon Feb 12 03:56:34 PM PST 2024 All analyses were finished!

I believe the command that calls cleanup.pl is:
perl ./bin/cleanup.pl -trf 1 -trf_path /usr/local/bin/trf -misschar N -nc 10 -nr 0.8 -minlen 100 -minscore 1000 -f dmel-smaller.fa.mod.ltrTE.fa > dmel-smaller.fa.mod.ltrTE.stg1

What is the expected behavior here?

@oushujun
Copy link
Owner

You have very few candidates to begin with, and the clean up process may determine all of them not valid.

Shujun

@CSU-KangHu
Copy link

CSU-KangHu commented Mar 18, 2024

Hi @oushujun,

Should the line $trf=0 if /^-trf$/i and $ARGV[$k+1]!~/^-/; be changed to $trf=$ARGV[$k+1] if /^-trf$/i and $ARGV[$k+1]!~/^-/; in cleanup.pl?

I noticed that when specifying perl ./bin/cleanup.pl -trf 1, the trf program is not executed.

@oushujun
Copy link
Owner

Hi @oushujun,

Should the line $trf=0 if /^-trf$/i and $ARGV[$k+1]!~/^-/; be changed to $trf=$ARGV[$k+1] if /^-trf$/i and $ARGV[$k+1]!~/^-/; in cleanup.pl?

I noticed that when specifying perl ./bin/cleanup.pl -trf 1, the trf program is not executed.

Good catch! You are correct. I have updated the code, and it will be pushed to GitHub in the next update. I don't think it's the solution of the initial post though.

Shujun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants