Releases · oushujun/EDTA

12 Jan 17:20

oushujun

v2.2.0

8980f49

Big update to v2.2.0 Pre-release

Pre-release

replace local AnnoSINEv2 with the conda version

Assets 2

11 Oct 04:36

oushujun

v2.1.0

9d7f12a

panEDTA for consistent pan-genome TE annotation Latest

Latest

Release note and useage

This is the serial version of panEDTA. Each genome will be annotated sequentially and then combined with the panEDTA functionality. Existing EDTA annotation of genomes (--anno 1) will be recognized and reused. A way to acclerate the pan-genome annotation is to execute EDTA annotation of each genomes separately and in parallel, then execute panEDTA to finish the remaining of the runs. You may want to save the GFF files and the sum file of the EDTA results because they will be overwritten by panEDTA. You may want to check out the toy example in the ./test folder to get familiarized.

sh panEDTA.sh -genomes genome_list.txt -cds cds.fasta -threads 10
    -g	A list of genome files with paths accessible from the working directory.
                Required: You can provide only a list of genomes in this file (one column, one genome each row).
                Optional: You can also provide both genomes and CDS files in this file (two columns, one genome and one CDS each row).
                    Missing of CDS files (eg, for some or all genomes) is allowed.
    -c	Optional. Coding sequence files in fasta format.
                The CDS file provided via this parameter will fill in the missing CDS files in the genome list.
                If no CDS files are provided in the genome list, then this CDS file will be used on all genomes.
    -l	Optional. A manually curated, non-redundant library following the RepeatMasker naming format.
    -f	Minimum number of full-length TE copies in individual genomes to be kept as candidate TEs for the pangenome.
                Lower is more inclusive, and will ↑ library size, ↑ sensitivity, and ↑ inconsistency.
                Higher is more stringent, and will ↓ library size, ↓ sensitivity, and ↓ inconsistency.
                Default: 3.
    -t	Number of CPUs to run panEDTA. Default: 10.

Reference:

Ou S., Collins T., Qiu Y., Seetharam A., Menard C., Manchanda N., Gent J., Schatz M., Anderson S., Hufford M.✉, Hirsch C.✉ (2022). Differences in activity and stability drive transposable element variation in tropical and temperate maize. bioRxiv

Assets 2

23 Jun 00:33

oushujun

v2.0.1

001834d

New features and bug fix

New features

added the --u parameter to allow user-specified mutation rate #271
allow users to use the count_base.pl genome stats to replace the -genome_size and -seq_count parameters in util/buildSummary.pl.

Bug fix and enhancements

check RepeatMasker results in immediate steps to accommodate for situations when no repeat is found.
add more alias to the Sequence Ontology list and partially solve #151 and #178.
resolve the Illegal division by zero error when flanking sequences of candidate TEs are all N/X. #259

Assets 2

26 Nov 02:22

oushujun

v2.0.0

1d39d19

EDTA v2.0.0 - faster, better, and nicer!

Performance improvements

Set to use the original LTRharvest and LTR_FINDER when --threads 1. It will be much faster for highly fragmented genomes (> 5,000 sequences) by reducing the number of files created (#225). Users may run EDTA_raw.pl for each TE type with --threads 1, then run EDTA.pl with multi threads and --overwrite 0.
Improve the filtering scheme for TE flanking sequences that are highly repetitive. If both flanking sequences are repetitive, filter out those with copy number > 50k on either side (Based on feedback from Zhigui Bao @baozg). This will avoid program suspension due to the long stretch of tandem repeats that exist in high-quality genomes.
Improve and polish the filtering scheme suggested by Sergei Ryazansky @DrHogart (#136).

New features

change the longest sequence ID limit from 15 to 13 characters to allow sequences > 100 Mb (#239).
support renaming LTR sequences that RepeatModeler reports via --sensitive 1 (#184).
support renaming TEsorter libraries (#184).
cleanup_nested.pl: added the -clean option to allow for cleaning or not cleaning nested sequences.
get_consistent_TE.pl: a new script that helps find TEs that are consistently annotated in a genome.
add more specific guides for EDTA usage installed via conda (#208).
rename and save the existing.EDTA.intact.fa.out file when using the parameter --overwrite 0.
Updated EDTA_processI.pl and TE_purifier.pl: redirect RepeatMasker error msgs to STDERR suggested by Nathalie de Vries.
make_panTElib.pl: a matured script that helps to create a pan-genome TE library for pan-genome TE annotations. A documented usage example (with great details) can be found here: https://github.com/HuffordLab/NAM-genomes/tree/master/te-annotation

Issues fixed

Resolve classification inconsistency when --curatedlib is provided
1. Added new entries and alias to the TE SO database (#219).
2. Format sequence IDs for library files provided via --curatedlib to use the TE SO system (#220).
3. check TIR classification discrepancy between candidate seq and lib seq with TE_SO name conversion.
Resolve singularity warnings by adding "LC_ALL=C" and author info to the Dockerfile (#122).
Fix #150 when flanking sequence is empty.
Fixed typos in EDTA.pl and EDTA_processI.pl reported by Nathalie de Vries.

Note

If your run was successful with version 1.9.4+ and didn't notice any particular errors, you may not need to rerun it with 2.0.0. The core filtering algorithms are not very different between these versions.

Contributors

DrHogart and baozg

Assets 2

14 Jan 17:13

oushujun

v1.9.6

1b2360a

More (easy) ways to install EDTA

Make installation easier and quicker

Installation of EDTA has been troublesome for some users (#137, #140, #146, etc...). Here I make a couple more ways to install it across all platforms.

The default and recommended way is changed to use the EDTA.yml file, which freezes all dependency versions. If it works for me, it should also work for you.
Provide new docker/singularity containers that work for the current version (v1.9.x) and hopefully future versions.
Provide the docker container for users to build their own container.

Other improvements

Tidy up the output of --evaluation.
Detect and remove short tandem repeats when removing redundancies. Contributed by Sergei Ryazansky (#136).
Other small improvements that make EDTA better and better!

Assets 2

04 Dec 15:08

oushujun

v1.9.5

00decc8

New Docker image

As suggested by @eburgueno (#122, #125), the Docker version of EDTA is switched to the Biocontainers' Quay.io version with a couple fix contributed by @Juke34 and @philippbayer (#121, #122). I think this version of Docker image should be running OK. This release will help me to figure this out.

Assets 2

29 Oct 15:28

oushujun

v1.9.4

7f9f72f

Faster and Better

Major updates

parallelize LTRharvest. The code was adapted from LTR_FINDER_parallel and provided by @wild-joker on LTR_HARVEST_parallel. I made some slight modifications to it and also available.
fix a number of bugs for processing input CDS files.
add a 1-MB toy genome for testing purposes.

Assets 2

28 Aug 23:41

oushujun

v1.9.0

8c6975a

Formatting standard GFF3 output and more.

Major updates

Format the GFF3 output following the standard specifications.
1.1. Add common TEs to the Sequence Ontology database.
1.2. Create an alias file to convert different TE naming system to the Sequence Ontology names.
Improve TE summary (*.mod.EDTA.TEanno.sum) by splitting overlapping TEs and force each bp annotated only once. Splitting rule (retaining preference): 1. Structural > homology; 2. Longer > shorter; 3. Nested inner > outer. (i.e., #98)
The split GFF3 file is located here if you want to replace the default one: *mod.EDTA.anno/*.mod.EDTA.TEanno.split.gff3.
Add a script (make_panTElib.pl) to construct a pan-genome TE library from a list of TE libraries. This is a beta function.
Usage: perl make_panTElib.pl -liblist TElib.list [options]

Minor updates

Detect SSRs in flanking sequences and label candidates as false. This can significantly accelerate the TIR and Helitron identification when SSRs are rich in the genome (i.e., #93 #96).
Recover structurally intact Helitrons from the negative strand.
Allow users to provide the path to dependencies.

How to

How to update old annotations to the current version?

Backup old results, because the update will overwrite existing results (.gff3, .sum, etc).
Navigate to the root of the working directory that contains EDTA working folders (i.e., .raw, combine, final, anno).
Execute the patch script by providing the genome name (eg., genome.fa)
perl ..../EDTA/util/patch_1.8.3_to_1.9.0.pl genome.fa [threads]
Check out the updated gff3 and summary results in the working directory.

Assets 2

04 Apr 23:01

oushujun

v1.8.3

07c90c2

Many updates

Bugfix

Correct genome sequence number in the TEanno.sum file #73
Replace RepeatClassifier with TEsorter for RepeatModeler result classification #72 #58

Improvement

Remove excessive TE fragments in intact TEs #76
Add identity info for homology-based annotation
Improve --rmout functionality
Update README for installations and usages #64

Reporting status

Report finishing time for raw/TIR #77
Add warnings for lack of certain TE class #75

Assets 2

28 Feb 22:44

oushujun

v1.8.2

5a75b1c

v1.8.2

Update usages and installations, fix a couple minor bugs.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release note and useage

Reference:

New features

Bug fix and enhancements

Performance improvements

New features

Issues fixed

Note

Contributors

Make installation easier and quicker

Other improvements

Major updates

Major updates

Minor updates

How to

Bugfix

Improvement

Reporting status

Releases: oushujun/EDTA

Big update to v2.2.0

panEDTA for consistent pan-genome TE annotation

Release note and useage

Reference:

New features and bug fix

New features

Bug fix and enhancements

EDTA v2.0.0 - faster, better, and nicer!

Performance improvements

New features

Issues fixed

Note

Contributors

More (easy) ways to install EDTA

Make installation easier and quicker

Other improvements

New Docker image

Faster and Better

Major updates

Formatting standard GFF3 output and more.

Major updates

Minor updates

How to

Many updates

Bugfix

Improvement

Reporting status

v1.8.2