Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR in TE annotation stats #439

Open
shiyi-pan opened this issue Feb 29, 2024 · 9 comments
Open

ERROR in TE annotation stats #439

shiyi-pan opened this issue Feb 29, 2024 · 9 comments

Comments

@shiyi-pan
Copy link

Hi, oushujun, thank you for develop this great tool for genome repeat annotation. I want to use EDTA to annotate my genome and met an error.

I install EDTA v2.1.3 by mamba with following script ( I can't install the latest version for server configuration):
mamba env create -f EDTA.yml -p /gss1/home/ruanjian/EDTA
Here is the script used to annotate my genome:
perl /gss1/home//c.annotation/a.TEs_annotation/EDTA/EDTA.pl --genome long.fa --species others --step all --overwrite 1 --threads 16 --sensitive 1 --anno 1 --evaluate 1

Here is the error I met:

Massive resequencing efforts have been undertaken to catalog allelic variants in major crop species including soybean, but the scope of
the information for genetic variation often depends on short sequence reads mapped to the extant reference genome. Additional de novo
assembled genome sequences provide a unique opportunity to explore a dispensable genome fraction in the pan-genome of a species.
Here, we report the de novo assembly and annotation of Hwangkeum, a popular soybean cultivar in Korea. The assembly was constructed
using PromethION nanopore sequencing data and two genetic maps and was then error-corrected using Illumina short-reads and PacBio
SMRT reads. The 933.12Mb assembly was annotated as containing 79,870 transcripts for 58,550 genes using RNA-Seq data and the public
soybean annotation set. Comparison of the Hwangkeum assembly with the Williams 82 soybean reference genome sequence
(Wm82.a2.v1) revealed 1.8 million single-nucleotide polymorphisms, 0.5 million indels, and 25 thousand putative structural variants.
However, there was no natural megabase-scale chromosomal rearrangement. Incidentally, by adding two novel subfamilies, we found that
soybean contains four clearly separated subfamilies of centromeric satellite repeats. Analyses of satellite repeats and gene content suggested that the Hwangkeum assembly is a high-quality assembly. This was further supported by comparison of the marker arrangement of
anthocyanin biosynthesis genes and of gene arrangement at the Rsv3 locus. Therefore, the results indicate that the de novo assembly of
Hwangkeum is a valuable additional reference genome resource for characterizing traits for the

GFF> line 7.
Use of uninitialized value $extra in substitution (s///) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 101, line 7.
Use of uninitialized value $extra in pattern match (m//) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 102, line 7.
Use of uninitialized value $element_end in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $TE_class in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $method in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $score in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $strand in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $phase in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $type in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Argument "Binary:matches.." isn't numeric in numeric gt (>) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/split_overlap.pl line 26, line 1.
Argument "matches" isn't numeric in numeric gt (>) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/split_overlap.pl line 26, line 1.
Warning: LOC list - is empty.

Count all-versus-all misclassifications using the cleanup_nested.pl .stat file
perl count_nested.pl -in sequence.fa.stat -cat [redun|nested|all] > sequence.fa.stat.sum

Count all-versus-all misclassifications using the cleanup_nested.pl .stat file
perl count_nested.pl -in sequence.fa.stat -cat [redun|nested|all] > sequence.fa.stat.sum

Count all-versus-all misclassifications using the cleanup_nested.pl .stat file
perl count_nested.pl -in sequence.fa.stat -cat [redun|nested|all] > sequence.fa.stat.sum
ERROR: TE annotation stats results not found in long.fa.mod.EDTA.TE.fa.stat!

Could you help me fix this problem, thank you very much.

@oushujun
Copy link
Owner

Hi,

Sorry for the delay. Your error message seems truncated. Please double check your genome file or provide a more complete program output.

Thanks!
Shujun

@shiyi-pan
Copy link
Author

Thank you for your reply, oushujun. Can you tell me from which aspects to examine the genome file? I have some short contigs on my genome, does that affect how EDTA works?

@oushujun
Copy link
Owner

Your error message seems to contain an abstract, which should not happen if your genome file is what it is meant to be. You may want to check if it's the correct file or if the sequence names are simple.

Shujun

@shiyi-pan
Copy link
Author

I'm sorry to bother you again, Shujun. I'm not sure what's the specific meaning of "abstract". The sequence name of my genome file looks like this: RagTag_0001,RagTag_0002...... RagTag_1695. My genome file looks like normal fasta format file,the sequence consists of four base types ATCG and ambiguous base N.

Thank you again, Shujun.

@oushujun
Copy link
Owner

This is the error message in your initial post:

image

I don't understand why EDTA would spit out an abstract-like paragraph in its error message...

From your last reply, it seems that your genome file is ok. Please update your EDTA to 2.2.1 and try again.

Shujun

@shiyi-pan
Copy link
Author

Thank you for your reply, oushujun. I'm sorry for my careless. I tried to copy some normal log content before the error but somehow copied the paper I was reading.

I update my EDTA and met a problem too. Here is the error messages:

Species: others
find: ‘./TIR-Learner-+-TIRvish.gff3’: No such file or directory

unknown/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
unknown/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
unknown/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/U not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
Sun Mar 17 11:43:02 CST 2024 Homology-based annotation of TEs using formated.ragtag.scaffold.fasta.mod.EDTA.TElib.fa from scratch.

Warning: SINE/U not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: SINE/U not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

The final TEanno.sum file doesn't have the SINE class.

image

Thank you again.

@oushujun
Copy link
Owner

what command did you use? Thanks!

@shiyi-pan
Copy link
Author

Thank you, Shujun. Here is my command:

mamba activate edta

perl EDTA.pl --genome formated.ragtag.scaffold.fasta --species others --step all --overwrite 1 --threads 8 --sensitive 1 --anno 1 --evaluate 1

By the way, I find there are two TE_Sequence_Ontology.txt file in EDTA with different file size:

image

Do I need to unify the content of two files? If need, which one is better? Thank you again.

@oushujun
Copy link
Owner

The EDTA version should be fine. You need to update EDTA to the latest version.

Shujun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants