Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the meaning of ISM None #148

Open
kathryncrouch opened this issue Feb 16, 2024 · 2 comments
Open

What is the meaning of ISM None #148

kathryncrouch opened this issue Feb 16, 2024 · 2 comments

Comments

@kathryncrouch
Copy link

Hi,

I see quite a few models in my TALON output where the transcript novelty assignment is ISM, and the incomplete splice match type is "None".

I understand what the Prefix, Suffix and Both subtypes mean - but under what conditions is a model assigned to ISM (as opposed to NIC/NNC) but not assigned to one of the subcategories? Is this best just thought of as "Other"?

Many of the models I see like this are partial transcripts that match only one exon in the reference. Thus, they matchpart of the model, but don't have any splice junctions to compare with the reference, but aren't considered Genomic. However, I wondered if you had a more formal definition of how ISM None arises.

@kathryncrouch
Copy link
Author

To add to this, I am also having some trouble understanding why some of the other models are characterisd the way they are.

In these screenshots, the darker gene models at the top are the reference. The lighter models lower down are TALON output.

image
The middle model is labelled NIC. Why? The intron in this model is not represented in the reference, how is this "in catalog"?

image
The third model down is labelled NIC. Again, I don't understand why. The two models above it are labelled NNC, which makes more sense.

image
The lower model has a truncated exon. I would expect this to be NNC, but it's annotated NIC.

image
This model is annotated NIC, but has a completely novel intron.

image
The top three TALON models make sense (NNC, known, genomic, working from the top down). The two below that are more confusing. The one with the blue arrow is annotated ISM prefix and the one with the green arrow is annotated ISM suffix. I don't fully understand the logic for either of these. I feel like both should be NNC because of the completely novel introns in the 5' UTR.

Are these annotations that I don't understand something to do with introns in UTRs rather than coding regions? Or are these models actually given multiple annotations and I'm only seeing one of them (these labels are derived from the count table produced by transcript_count)? Sorry if I'm missing something obvious, but I'm really stumped by these, and I can't see the answer by looking at the definitions either in your paper or the SQUANTI paper.

@jschroaderUAlbany
Copy link

jschroaderUAlbany commented Apr 2, 2024

I believe if the transcript is not classified as an Incomplete Splice Match then it assigned as an "None" in the ISM_subtype column. An ISM is a transcript that contains a subsection of an annotated transcript but does not extend all the way to the annotated 3′ or 5′ end.

What I don't understand is what the prefix, suffix, or both mean in the context of the ISM_subtype?

I was also having trouble finding this in the paper or elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants