You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CSQ annotations are 'haplotype-aware' in the sense that they can incorporate multiple variants when determining the predicted effect. For example, imagine you observe a SNP that gives rise to a CAG --> TAG change. This looks like a premature stop codon if you only consider one variant at a time. But it's also possible there is an adjacent SNP that instead results in a CAG --> CCG change - which would actually make this an MNP (multi-nucleotide polymorphism), but more importantly only result in a Gln --> .Pro change (still bad, but not as bad as a stop codon).
CSQ annotations, in contrast to something like SNPeff are able to pick this critical difference up.
There are a couple of issues with CSQ annotations.
1.) The PD1074 genome isn't quite ready for them. They require a good quality GFF file like the one you see here. In fact, the GFF's on ensemble are the only ones I was able to get to work with the bcftools csq command previously. The easiest option I see here will probably be to liftover those to PD1074, unless we can get the wormbase one to work.
2.) While the variants are haplotype aware, they can be expressed in any order. See this example fromt he bcftools manual:
# Two separate VCF records at positions 2:122106101 and 2:122106102
# change the same codon. This UV-induced C>T dinucleotide mutation
# has been annotated fully at the position 2:122106101 with
# - consequence type
# - gene name
# - ensembl transcript ID
# - coding strand (+ fwd, - rev)
# - amino acid position (in the coding strand orientation)
# - list of corresponding VCF variants
# The annotation at the second position gives the position of the full
# annotation
BCSQ=missense|CLASP1|ENST00000545861|-|1174P>1174L|122106101G>A+122106102G>A
BCSQ=@122106101
# A frame-restoring combination of two frameshift insertions C>CG and T>TGG
BCSQ=@46115084
BCSQ=inframe_insertion|COPZ2|ENST00000006101|-|18AGRGP>18AQAGGP|46115072C>CG+46115084T>TGG
# Stop gained variant
BCSQ=stop_gained|C2orf83|ENST00000264387|-|141W>141*|228476140C>T
# The consequence type of a variant downstream from a stop are prefixed with *
BCSQ=*missense|PER3|ENST00000361923|+|1028M>1028T|7890117T>C
Note that the first variant references an upstream variant whereas the second one references a downstream one.
There are actually two issues here:
How do we represent these types of variants? The genome browser currently lists every variant and predicted consequences. My thinking is we should probably link the @1234... notation to the variant of interest and color it as a 'reference' row. Maybe when you highlight the actual variant we can somehow trigger both to light up.
CSQ annotations are 'haplotype-aware' in the sense that they can incorporate multiple variants when determining the predicted effect. For example, imagine you observe a SNP that gives rise to a
CAG --> TAG
change. This looks like a premature stop codon if you only consider one variant at a time. But it's also possible there is an adjacent SNP that instead results in aCAG --> CCG
change - which would actually make this an MNP (multi-nucleotide polymorphism), but more importantly only result in aGln --> .Pro
change (still bad, but not as bad as a stop codon).CSQ annotations, in contrast to something like SNPeff are able to pick this critical difference up.
There are a couple of issues with CSQ annotations.
1.) The PD1074 genome isn't quite ready for them. They require a good quality GFF file like the one you see here. In fact, the GFF's on ensemble are the only ones I was able to get to work with the
bcftools csq
command previously. The easiest option I see here will probably be to liftover those to PD1074, unless we can get the wormbase one to work.2.) While the variants are haplotype aware, they can be expressed in any order. See this example fromt he bcftools manual:
Note that the first variant references an upstream variant whereas the second one references a downstream one.
There are actually two issues here:
How do we represent these types of variants? The genome browser currently lists every variant and predicted consequences. My thinking is we should probably link the
@1234...
notation to the variant of interest and color it as a 'reference' row. Maybe when you highlight the actual variant we can somehow trigger both to light up.Related issues
Notes
The text was updated successfully, but these errors were encountered: