-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR in TE annotation stats #439
Comments
Hi, Sorry for the delay. Your error message seems truncated. Please double check your genome file or provide a more complete program output. Thanks! |
Thank you for your reply, oushujun. Can you tell me from which aspects to examine the genome file? I have some short contigs on my genome, does that affect how EDTA works? |
Your error message seems to contain an abstract, which should not happen if your genome file is what it is meant to be. You may want to check if it's the correct file or if the sequence names are simple. Shujun |
I'm sorry to bother you again, Shujun. I'm not sure what's the specific meaning of "abstract". The sequence name of my genome file looks like this: RagTag_0001,RagTag_0002...... RagTag_1695. My genome file looks like normal fasta format file,the sequence consists of four base types ATCG and ambiguous base N. Thank you again, Shujun. |
Thank you for your reply, oushujun. I'm sorry for my careless. I tried to copy some normal log content before the error but somehow copied the paper I was reading. I update my EDTA and met a problem too. Here is the error messages: Species: others unknown/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation. Warning: SINE/U not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: SINE/U not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. The final TEanno.sum file doesn't have the SINE class. Thank you again. |
what command did you use? Thanks! |
Thank you, Shujun. Here is my command: mamba activate edta perl EDTA.pl --genome formated.ragtag.scaffold.fasta --species others --step all --overwrite 1 --threads 8 --sensitive 1 --anno 1 --evaluate 1 By the way, I find there are two TE_Sequence_Ontology.txt file in EDTA with different file size: Do I need to unify the content of two files? If need, which one is better? Thank you again. |
The EDTA version should be fine. You need to update EDTA to the latest version. Shujun |
Hi, oushujun, thank you for develop this great tool for genome repeat annotation. I want to use EDTA to annotate my genome and met an error.
I install EDTA v2.1.3 by mamba with following script ( I can't install the latest version for server configuration):
mamba env create -f EDTA.yml -p /gss1/home/ruanjian/EDTA
Here is the script used to annotate my genome:
perl /gss1/home//c.annotation/a.TEs_annotation/EDTA/EDTA.pl --genome long.fa --species others --step all --overwrite 1 --threads 16 --sensitive 1 --anno 1 --evaluate 1
Here is the error I met:
Massive resequencing efforts have been undertaken to catalog allelic variants in major crop species including soybean, but the scope of
the information for genetic variation often depends on short sequence reads mapped to the extant reference genome. Additional de novo
assembled genome sequences provide a unique opportunity to explore a dispensable genome fraction in the pan-genome of a species.
Here, we report the de novo assembly and annotation of Hwangkeum, a popular soybean cultivar in Korea. The assembly was constructed
using PromethION nanopore sequencing data and two genetic maps and was then error-corrected using Illumina short-reads and PacBio
SMRT reads. The 933.12Mb assembly was annotated as containing 79,870 transcripts for 58,550 genes using RNA-Seq data and the public
soybean annotation set. Comparison of the Hwangkeum assembly with the Williams 82 soybean reference genome sequence
(Wm82.a2.v1) revealed 1.8 million single-nucleotide polymorphisms, 0.5 million indels, and 25 thousand putative structural variants.
However, there was no natural megabase-scale chromosomal rearrangement. Incidentally, by adding two novel subfamilies, we found that
soybean contains four clearly separated subfamilies of centromeric satellite repeats. Analyses of satellite repeats and gene content suggested that the Hwangkeum assembly is a high-quality assembly. This was further supported by comparison of the marker arrangement of
anthocyanin biosynthesis genes and of gene arrangement at the Rsv3 locus. Therefore, the results indicate that the de novo assembly of
Hwangkeum is a valuable additional reference genome resource for characterizing traits for the
GFF> line 7.
Use of uninitialized value $extra in substitution (s///) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 101, line 7.
Use of uninitialized value $extra in pattern match (m//) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 102, line 7.
Use of uninitialized value $element_end in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $TE_class in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $method in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $score in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $strand in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $phase in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Use of uninitialized value $type in concatenation (.) or string at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/gff2bed.pl line 110, line 7.
Argument "Binary:matches.." isn't numeric in numeric gt (>) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/split_overlap.pl line 26, line 1.
Argument "matches" isn't numeric in numeric gt (>) at /gss1/home//c.annotation/a.TEs_annotation/EDTA/util/split_overlap.pl line 26, line 1.
Warning: LOC list - is empty.
Count all-versus-all misclassifications using the cleanup_nested.pl .stat file
perl count_nested.pl -in sequence.fa.stat -cat [redun|nested|all] > sequence.fa.stat.sum
Count all-versus-all misclassifications using the cleanup_nested.pl .stat file
perl count_nested.pl -in sequence.fa.stat -cat [redun|nested|all] > sequence.fa.stat.sum
Count all-versus-all misclassifications using the cleanup_nested.pl .stat file
perl count_nested.pl -in sequence.fa.stat -cat [redun|nested|all] > sequence.fa.stat.sum
ERROR: TE annotation stats results not found in long.fa.mod.EDTA.TE.fa.stat!
Could you help me fix this problem, thank you very much.
The text was updated successfully, but these errors were encountered: