New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeking combined effects of variants on a continuous phenotype with pyseer elastic nets #261
Comments
Thanks for the very detailed description of your experimental setup. One perhaps easier thing you could do that goes in the direction of what you describe would be to run a lasso regression, by setting the elastic net with |
Thank you! I think the actual effect on the stress resilience phenotype is caused by bigger networks of genes, e.g. regulons and such. Will I not lose a lot of these low effect variants when using lasso regression? |
You could try fitting a number of models varying the |
Thanks, I hope I grasped the general idea! So I'll try to fit a model with a maximal alpha value so that the predictions are still accurate enough.
Is this even close to correct usage? Another question that just popped into my mind is why is |
I would use all unitigs in your subsequent analysis, although many will not be used for the predictions anyway because they will have a weight of 0. I fear that providing a smaller variant set might throw some errors (I'm not actually sure though, woud need to check the code and tests again). I would also follow the advice of not dropping correlated variants, unless you run into out-of-memory and models that take too long to train. |
I tried to train the model with my data in both ways . This works fine:
This throws an error:
Error message:
but when I remove the Now I have two models
the R2-values are 0.7405 and 0.8494, respectively. So, the model works better with fewer unitigs, but I'm not so sure about the logic behind it. Am I predicting the right thing with the right model? |
I don't think there's any particular error in your workflow, apart from the fact that when you use the |
Why do elastic net and LMM select different variants? |
Well, the two processes work quite differently; the main difference being that the LMM tests one feature at a time, while the elastic net uses all variants at the same time. That by itself will lead to different variants being flagged as associated with the phenotype. |
Hello!
I got to analysing after clearing the startup hiccups of #252.
A succinct summary of the original data set:
I anticipate that these stress resilience phenotypes are under much weaker selection and I expect numerous smaller effects rather than a few big ones. I thought that elastic nets could be used to find these smaller combinations or networks of variants that most adequately predict the phenotype of a test group.
My thought process so far:
The problem is that this is slow and the number of possible combinations and subcombinations is astronomical. One alternative would be to first detect biologically relevant combinations, but then I fear I would lose genes of yet unknown function in the process.
Can pyseer elastic nets be used like this? Does this make any sense?
The text was updated successfully, but these errors were encountered: