-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining RM rows #443
Comments
Hi, The directions of these entries are different and the physical distances between them are too far. The last two entries are close enough, but their TE coordinates substantially overlap (4910-7166 vs 6988-8240), thus they can not be considered as a single element. Thanks! |
Hey Shujun, Thanks for the clarification! So if a substantial overlap is detected, then they cannot be considered a single element.
Where at least in two cases the overlap is not substantial and the direction is the same. Many thankss for your support Shujun! :) |
The gff rows you pasted seem to contain extra information compared to the RM out rows. To combine rows, both physical coordinate, direction, and the TE coordinate, divergence need to be considered. If the physical coordinate, direction, and divergence meet the criteria, but the TE coordinate overlaps substantially, they are still considered two elements. If the the TE coordinates have a large distance in between and are in the agreeable directions (first piece has smaller 5' coordinates), they are still considered a single element. In such a case, the annotated TE has a large deletion. Shujun |
Hi, Shujun Sorry for jumping into this conversation. What we don't understand is why even meet all the standard in the script, but some rows still not tjoins? Here is the code and small working example I used: But looking for these three rows: # before joining
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
30291 4.5 0.2 0.4 Chr3 17485555 17489789 (8669366) + VANDAL12 DNA/Mutator 1 4200 (9966) 64678 *
38777 2.6 0.5 0.2 Chr3 17489775 17494536 (8664619) + VANDAL12 DNA/Mutator 3442 7944 (4030) 64679
26487 1.4 0.2 0.0 Chr3 17494533 17497540 (8661615) + VANDAL12 DNA/Mutator 8849 11860 (114) 64680 *
# after joining
SW_score perc_div. perc_del. perc_ins. query_sequence query_begin query_end query_remain strand matching_repeat repeat_class/family repeat_begin repeat_end repeat_remain ID
30291 4.5 0.2 0.4 Chr3 17485555 17489789 8669366 + VANDAL12 DNA/Mutator 1 4200 (9966) 64678
34020 2.1 0.4 0.1 Chr3 17489775 17497540 8661615 + VANDAL12 DNA/Mutator 3442 11860 (114) 64679_64680 So the |
For anyone interested in these merging, the case I pasted here didn't merge is because the overlap in the repeat consensus of last four column. |
Hi Shujun,
Thanks again for developing this amazing package!
I am running the newest v.2.2. I manually increased the max divergence for fragments to be combined from 3.5 to 4.5 at https://github.com/oushujun/EDTA/blob/v2.2.0/EDTA.pl#L694
The fragments below should be combined into two distinct elements. However this seems to not happen even if they overlap. This is how the annotation looks like:
The first three and the last two fragments should be merged. The gap in between is 200bp.
From my
$genome.out.new
:Do you have an idea about why this is happening?
Thankss!!
The text was updated successfully, but these errors were encountered: