Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions needed for the whole-genome model output #243

Open
AmayAgrawal opened this issue Aug 2, 2023 · 1 comment
Open

Suggestions needed for the whole-genome model output #243

AmayAgrawal opened this issue Aug 2, 2023 · 1 comment

Comments

@AmayAgrawal
Copy link

AmayAgrawal commented Aug 2, 2023

Hi,

This is not an issue but more of a question from my side.

I used the LMM model implemented in Pyseer to perform my GWAS analysis and figured out some significant variants that plays a role in resistance. Next, I read few review papers where they benchmarked different GWAS methods and they showed that for moderate to high LD (which is the case with the bacteria I am working), whole-genome models implemented in Pyseer performs better as compared to LMM's. So I wanted to try those models also on my data. Now, I have two questions regarding this:

  1. I saw that whole-genome models are usually used for prediction of phenotype, but is it okay to use them for performing GWAS and getting the significant variants?

  2. Anyways, I tried to use whole-genome models to perform GWAS on my data and used --distances parameter to calculate an adjusted p-value. After performing the GWAS, a part of my output sorted on lrt-pvalue looks like below:

Variant af filter-pvalue lrt-pvalue beta notes
1_7570_C_T 0.161 3.67E-48 2.93E-47 4.24 matrix-inversion-error
1_7585_G_C 0.764 6.97E-31 4.94E-36 4.12 matrix-inversion-error
1_1473246_A_G 0.253 5.51E-33 3.47E-27 0.287 matrix-inversion-error
1_4269271_A_G 0.0792 3.01E-24 1.21E-23 0.871 bad-chisq
1_2155168_C_G 0.606 2.84E-20 1.32E-17 -0.463 matrix-inversion-error
1_761155_C_T 0.384 2.95E-07 2.14E-09 0.0511 matrix-inversion-error
1_2157481_C_T 0.0453 1.11E-06 2.61E-08 -0.64 bad-chisq
1_2157569_C_A 0.0453 1.11E-06 2.61E-08 -4.97E-15 bad-chisq

In this output, if I look at the 'notes' column, I can see 'matrix-inversion-error' and 'bad-chisq' for all printed variants. So I am not sure whether I should consider this as significant variant or not (Sorry if I this information is already present in documentation and I missed it)? To correct for multiple testing in the LMM model, I used the --output-patterns parameter and then used the count_patterns.py script to determine Bonferroni corrected p-value threshold. But when I use the whole-genome model, I cannot use the --output-patterns parameter here. So what strategy can I use here to correct for multiple testing?

Thanks a lot in advance

@johnlees
Copy link
Collaborator

When you get the p-values from the whole genome model, it is actually just running GWAS with the fixed effects model and appending these to the table. Here it looks like that model might not be working too well, given those error messages. Maybe try the LMM to get your p-values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants