ERROR in TE annotation stats #439

shiyi-pan · 2024-02-29T13:29:30Z

Hi, oushujun, thank you for develop this great tool for genome repeat annotation. I want to use EDTA to annotate my genome and met an error.

I install EDTA v2.1.3 by mamba with following script ( I can't install the latest version for server configuration):
mamba env create -f EDTA.yml -p /gss1/home/ruanjian/EDTA
Here is the script used to annotate my genome:
perl /gss1/home//c.annotation/a.TEs_annotation/EDTA/EDTA.pl --genome long.fa --species others --step all --overwrite 1 --threads 16 --sensitive 1 --anno 1 --evaluate 1

Here is the error I met:

Massive resequencing efforts have been undertaken to catalog allelic variants in major crop species including soybean, but the scope of
the information for genetic variation often depends on short sequence reads mapped to the extant reference genome. Additional de novo
assembled genome sequences provide a unique opportunity to explore a dispensable genome fraction in the pan-genome of a species.
Here, we report the de novo assembly and annotation of Hwangkeum, a popular soybean cultivar in Korea. The assembly was constructed
using PromethION nanopore sequencing data and two genetic maps and was then error-corrected using Illumina short-reads and PacBio
SMRT reads. The 933.12Mb assembly was annotated as containing 79,870 transcripts for 58,550 genes using RNA-Seq data and the public
soybean annotation set. Comparison of the Hwangkeum assembly with the Williams 82 soybean reference genome sequence
(Wm82.a2.v1) revealed 1.8 million single-nucleotide polymorphisms, 0.5 million indels, and 25 thousand putative structural variants.
However, there was no natural megabase-scale chromosomal rearrangement. Incidentally, by adding two novel subfamilies, we found that
soybean contains four clearly separated subfamilies of centromeric satellite repeats. Analyses of satellite repeats and gene content suggested that the Hwangkeum assembly is a high-quality assembly. This was further supported by comparison of the marker arrangement of
anthocyanin biosynthesis genes and of gene arrangement at the Rsv3 locus. Therefore, the results indicate that the de novo assembly of
Hwangkeum is a valuable additional reference genome resource for characterizing traits for the

GFF> line 7.
Use of uninitialized value $extra in substitution (s///) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 101, line 7.
Use of uninitialized value $extra in pattern match (m//) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 102, line 7.
Use of uninitialized value $element_end in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $TE_class in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $method in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $score in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $strand in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $phase in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $type in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Argument "Binary:matches.." isn't numeric in numeric gt (>) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/split_overlap.pl line 26, line 1.
Argument "matches" isn't numeric in numeric gt (>) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/split_overlap.pl line 26, line 1.
Warning: LOC list - is empty.

Count all-versus-all misclassifications using the cleanup_nested.pl .stat file
perl count_nested.pl -in sequence.fa.stat -cat [redun|nested|all] > sequence.fa.stat.sum

Count all-versus-all misclassifications using the cleanup_nested.pl .stat file
perl count_nested.pl -in sequence.fa.stat -cat [redun|nested|all] > sequence.fa.stat.sum
ERROR: TE annotation stats results not found in long.fa.mod.EDTA.TE.fa.stat!

Could you help me fix this problem, thank you very much.

oushujun · 2024-03-14T17:43:09Z

Hi,

Sorry for the delay. Your error message seems truncated. Please double check your genome file or provide a more complete program output.

Thanks!
Shujun

shiyi-pan · 2024-03-15T14:23:55Z

Thank you for your reply, oushujun. Can you tell me from which aspects to examine the genome file? I have some short contigs on my genome, does that affect how EDTA works?

oushujun · 2024-03-15T22:04:17Z

Your error message seems to contain an abstract, which should not happen if your genome file is what it is meant to be. You may want to check if it's the correct file or if the sequence names are simple.

Shujun

shiyi-pan · 2024-03-16T13:48:32Z

I'm sorry to bother you again, Shujun. I'm not sure what's the specific meaning of "abstract". The sequence name of my genome file looks like this: RagTag_0001，RagTag_0002...... RagTag_1695. My genome file looks like normal fasta format file，the sequence consists of four base types ATCG and ambiguous base N.

Thank you again, Shujun.

oushujun · 2024-03-18T19:07:06Z

This is the error message in your initial post:

I don't understand why EDTA would spit out an abstract-like paragraph in its error message...

From your last reply, it seems that your genome file is ok. Please update your EDTA to 2.2.1 and try again.

Shujun

shiyi-pan · 2024-03-20T08:38:58Z

Thank you for your reply, oushujun. I'm sorry for my careless. I tried to copy some normal log content before the error but somehow copied the paper I was reading.

I update my EDTA and met a problem too. Here is the error messages:

Species: others
find: ‘./TIR-Learner-+-TIRvish.gff3’: No such file or directory

unknown/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
unknown/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
unknown/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/U not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
Sun Mar 17 11:43:02 CST 2024 Homology-based annotation of TEs using formated.ragtag.scaffold.fasta.mod.EDTA.TElib.fa from scratch.

Warning: SINE/U not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

The final TEanno.sum file doesn't have the SINE class.

Thank you again.

oushujun · 2024-03-20T20:48:17Z

what command did you use? Thanks!

shiyi-pan · 2024-03-21T06:59:51Z

Thank you, Shujun. Here is my command:

mamba activate edta

perl EDTA.pl --genome formated.ragtag.scaffold.fasta --species others --step all --overwrite 1 --threads 8 --sensitive 1 --anno 1 --evaluate 1

By the way, I find there are two TE_Sequence_Ontology.txt file in EDTA with different file size:

Do I need to unify the content of two files? If need, which one is better? Thank you again.

oushujun · 2024-03-27T01:39:40Z

The EDTA version should be fine. You need to update EDTA to the latest version.

Shujun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR in TE annotation stats #439

ERROR in TE annotation stats #439

shiyi-pan commented Feb 29, 2024

oushujun commented Mar 14, 2024

shiyi-pan commented Mar 15, 2024

oushujun commented Mar 15, 2024

shiyi-pan commented Mar 16, 2024

oushujun commented Mar 18, 2024

shiyi-pan commented Mar 20, 2024

oushujun commented Mar 20, 2024

shiyi-pan commented Mar 21, 2024

oushujun commented Mar 27, 2024

ERROR in TE annotation stats #439

ERROR in TE annotation stats #439

Comments

shiyi-pan commented Feb 29, 2024

oushujun commented Mar 14, 2024

shiyi-pan commented Mar 15, 2024

oushujun commented Mar 15, 2024

shiyi-pan commented Mar 16, 2024

oushujun commented Mar 18, 2024

shiyi-pan commented Mar 20, 2024

oushujun commented Mar 20, 2024

shiyi-pan commented Mar 21, 2024

oushujun commented Mar 27, 2024