Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LINE and SINE results files has 0 bp! #456

Open
XuanZhang-Black opened this issue Apr 17, 2024 · 0 comments
Open

LINE and SINE results files has 0 bp! #456

XuanZhang-Black opened this issue Apr 17, 2024 · 0 comments

Comments

@XuanZhang-Black
Copy link

Dr. Shujun,

Hi! I installed EDTA v2.2.1 by ran the commands "git clone https://github.com/oushujun/EDTA.git" and "mamba env create -f EDTA_2.2.x.yml".

And I tested it with the following command “perl... /EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0. Liban --exclude genome.exclude.bed -- overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10 ”. But the following warning was in the output log:
"Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.",
"Warning: The SINE result file has 0 bp!",
" Warning:The LINE result file has 0 bp!",
"Error encountered: [Errno 2] No such file or directory: 'bedtools'
mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory",
"cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory".

I don't know whether there is a dependency failed to be installed successfully or the data itself does not have new LINEs/SINEs. The following is my log file, may I ask if this is the successful installation?

#########################################################

Extensive de-novo TE Annotator (EDTA) v2.2.1
Shujun Ou (shujun.ou.1@gmail.com)

#########################################################

Parameters: --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10

2024年 04月 17日 星期三 22:57:04 CST Dependency checking:
All passed!

A custom library rice7.0.0.liban is provided via --curatedlib. Please make sure this is a manually curated library but not machine generated.

A CDS file genome.cds.fa is provided via --cds. Please make sure this is the DNA sequence of coding regions only.

A BED file is provided via --exclude. Regions specified by this file will be excluded from TE annotation and masking.

2024年 04月 17日 星期三 22:57:08 CST Obtain raw TE libraries using various structure-based programs:
2024年 04月 17日 星期三 22:57:08 CST EDTA_raw: Check dependencies, prepare working directories.

2024年 04月 17日 星期三 22:57:09 CST Start to find LTR candidates.

2024年 04月 17日 星期三 22:57:09 CST Identify LTR retrotransposon candidates from scratch.

Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.
2024年 04月 17日 星期三 22:57:33 CST Finish finding LTR candidates.

2024年 04月 17日 星期三 22:57:33 CST Start to find SINE candidates.

2024年 04月 17日 星期三 22:58:14 CST Warning: The SINE result file has 0 bp!

2024年 04月 17日 星期三 22:58:14 CST Start to find LINE candidates.

2024年 04月 17日 星期三 22:58:14 CST Identify LINE retrotransposon candidates from scratch.

2024年 04月 17日 星期三 22:59:56 CST Warning: The LINE result file has 0 bp!

2024年 04月 17日 星期三 22:59:56 CST Start to find TIR candidates.

2024年 04月 17日 星期三 22:59:56 CST Identify TIR candidates from scratch.

Species: others
2024年 04月 17日 星期三 23:00:47 CST Finish finding TIR candidates.

2024年 04月 17日 星期三 23:00:47 CST Start to find Helitron candidates.

2024年 04月 17日 星期三 23:00:47 CST Identify Helitron candidates from scratch.

2024年 04月 17日 星期三 23:01:22 CST Finish finding Helitron candidates.

2024年 04月 17日 星期三 23:01:22 CST Execution of EDTA_raw.pl is finished!

2024年 04月 17日 星期三 23:01:22 CST Obtain raw TE libraries finished.
All intact TEs found by EDTA:
genome.fa.mod.EDTA.intact.raw.fa
genome.fa.mod.EDTA.intact.raw.gff3

2024年 04月 17日 星期三 23:01:22 CST Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:

Warning: No sequences were masked
2024年 04月 17日 星期三 23:01:40 CST EDTA advance filtering finished.

2024年 04月 17日 星期三 23:01:40 CST Perform EDTA final steps to generate a non-redundant comprehensive TE library.

			Filter RepeatModeler results that are ignored in the raw step.

2024年 04月 17日 星期三 23:01:45 CST Clean up TE-related sequences in the CDS file with TEsorter.

			Remove CDS-related sequences in the EDTA library.

			Remove CDS-related sequences in intact TEs.

2024年 04月 17日 星期三 23:01:52 CST Combine the high-quality TE library rice7.0.0.liban with the EDTA library:

2024年 04月 17日 星期三 23:01:59 CST EDTA final stage finished! You may check out:
The final EDTA TE library: genome.fa.mod.EDTA.TElib.fa
Family names of intact TEs have been updated by rice7.0.0.liban: genome.fa.mod.EDTA.intact.gff3
Comparing to the provided library, EDTA found these novel TEs: genome.fa.mod.EDTA.TElib.novel.fa
The provided library has been incorporated into the final library: genome.fa.mod.EDTA.TElib.fa

2024年 04月 17日 星期三 23:01:59 CST Perform post-EDTA analysis for whole-genome annotation:

2024年 04月 17日 星期三 23:01:59 CST Homology-based annotation of TEs using genome.fa.mod.EDTA.TElib.fa from scratch.

Error encountered: [Errno 2] No such file or directory: 'bedtools'
mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory
2024年 04月 17日 星期三 23:02:10 CST TE annotation using the EDTA library has finished! Check out:
Whole-genome TE annotation (total TE: 34.61%): genome.fa.mod.EDTA.TEanno.gff3
Whole-genome TE annotation summary: genome.fa.mod.EDTA.TEanno.sum
Whole-genome TE divergence plot: genome.fa.mod_divergence_plot.pdf
Whole-genome TE density plot: genome.fa.mod.EDTA.TEanno.density_plots.pdf
Low-threshold TE masking for MAKER gene annotation (masked: 17.27%): genome.fa.mod.MAKER.masked

cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory
2024年 04月 17日 星期三 23:02:10 CST Evaluate the level of inconsistency for whole-genome TE annotation:

2024年 04月 17日 星期三 23:02:12 CST Evaluation of TE annotation finished! Check out these files:

			Overall: genome.fa.mod.EDTA.TE.fa.stat.all.sum
			Nested: genome.fa.mod.EDTA.TE.fa.stat.nested.sum
			Non-nested: genome.fa.mod.EDTA.TE.fa.stat.redun.sum

			If you want to learn more about the formatting and information of these files, please visit:
				https://github.com/oushujun/EDTA/wiki/Making-sense-of-EDTA-usage-and-outputs---Q&A

The file "genome.fa.mod.EDTA.TEanno.sum" is as follow, did I run it successfully?

$ cat genome.fa.mod.EDTA.TEanno.sum
Repeat Classes

Total Sequences: 1
Total Length: 1000000 bp
Class Count bpMasked %masked
===== ===== ======== =======
LINE -- -- --
unknown 39 13979 1.40%
LTR -- -- --
Copia 11 18647 1.86%
Gypsy 48 108654 10.87%
TRIM 1 129 0.01%
unknown 1 248 0.02%
SINE -- -- --
unknown 11 1775 0.18%
TIR -- -- --
CACTA 23 22722 2.27%
Mutator 115 47072 4.71%
PIF_Harbinger 110 28045 2.80%
PILE 4 1033 0.10%
POLE 2 506 0.05%
Tc1_Mariner 124 48718 4.87%
hAT 35 13953 1.40%
unknown 9 1433 0.14%
nonTIR -- -- --
helitron 56 39164 3.92%
---------------------------------
total interspersed 589 346078 34.61%


Total 589 346078 34.61%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant