Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with interpretation #62

Open
pjvandehaar opened this issue Mar 21, 2017 · 0 comments
Open

Help with interpretation #62

pjvandehaar opened this issue Mar 21, 2017 · 0 comments

Comments

@pjvandehaar
Copy link
Collaborator

pjvandehaar commented Mar 21, 2017

Things to tell users:

  • whether to trust a phenotype
    • partially via the QQ plot, especially gc-lambda-50%ile and gc-lambda-10%ile
  • whether to trust a peak
    • associations: top pval; other pvals in peak (ie, vertical line vs alone); top effect; other effects in peak (ie, do they agree); MAF; width of peak; nearby associations (both in terms of distance and LD)
    • annotations: nonsynonymous vs synonymous vs intronic vs UTR vs near gene; coding gene vs lincRNA vs pseudogene; known eQTL; known TFBS
    • context: the number of variants; variant density in that region; QQ; QQ in similar MAF; number of cases/controls/samples
    • variant quality: imputation quality, read depth, allele balance, HBE, callrate? something related to recessive/dominant/additive?
    • later, do conditional analysis to judge how many signals are in a peak

Maybe label our MGI and Sardinia data with heuristics, adjust by hand, and then train a ML?

  • If we went for standard image-based approach for peak interpretation:
    • would we feed in x-axis as variants or positions? If LD will be a feature, then it might as well be by variant, right?
    • input: 1000 variants on each side of peak, a row for each feature, also global info as constant-rows?
      • how to scale to 0-255?
    • output: peak confidence
    • training:
      • use known associations to train? but we haven't mapped traits, and it'll be biased if we use peaks to map traits to use for annotating peaks
      • use SardiNIA pvalues as labels for training on MGI peaks? & vice-versa? is this a thing that people do? Is it okay that one is a founder population, or does that not make a big difference? Should MAF difference be factored into label confidence?
    • method: column-wise convolution, a local-max layer (20 variants each?), one fully-connected layer (i don't understand deep learning)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant