Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules for considering splicing variants as identical in TALON #126

Open
mihshimada opened this issue Feb 15, 2023 · 2 comments
Open

Rules for considering splicing variants as identical in TALON #126

mihshimada opened this issue Feb 15, 2023 · 2 comments

Comments

@mihshimada
Copy link

Hi,
I would like to know the detailed rules when splicing variants are determined to be identical in TALON. If there are multiple isoforms and the start and end points of each exon are slightly different, up to how many bp is it considered the same exon?
Also, if the 5' end and 3' end of an isoform are on the same exon as the isofom being compared, can they be combined as the same isoform even if their positions are different? Or is there a limit to how much difference they can be considered the same?

@fairliereese
Copy link
Member

Hi there,
For splice sites, TALON uses the exact coordinates to define novel isoforms.

For 5'/3' ends, TALON assigns the isoforms the reference 5' or 3' end that's associated with the same intron chain if the ends are within the user-specified distance of the annotated ends (--5p and --3p arguments to the talon_initialize_database command). If a read is found with ends that exceed these boundaries, it will be assigned to a new transcript model. If you're interested in refining your transcriptome post-TALON to have better 5'/3' end calls, I recommend using another tool developed in the lab called Lapa.

@mihshimada
Copy link
Author

Thank you very much for taking the time to answer my question!
Your answer was exactly what I wanted to hear.
However, I did not specifically specify --3p and --5p when I ran talon_initialize_database.
In that case, I thought the default value would be 300bp for 3p and 500bp for 5p.
But when I actually analyzed the data, I found that isoforms that differ significantly more than that are determined to be identical, as shown below.
Why is this?
image

@fairliereese fairliereese reopened this Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants