How does PrediXcan with mashr model deal with missing genotype? #164

xiaoyangli2934 · 2022-09-26T17:58:22Z

Hi,

Recently I use PrediXcan to impute the gene expression data by applying MASHR model.

The vcf file was generated by merging 2 samples' vcf with different variants. vcf of one sample 1 includes variants that have an effect on gene A, and another one (sample 2) did not include those variants, which makes a large number of missing genotypes in the merged vcf. Since gene A is very important in our following analysis, we decided to keep those genotypes with missing values to predict the gene expression data, rather than filter them via max missing genotype rate. After calculation, we did obtain the imputed gene A expression data. But it seems weird. I attach a barplot to visualize the imputed gene expression for reference.

In this figure, the x-axis represents the gene expression level of gene A, the y-axis represents the number of individuals. Each row of grid indicates different samples, and each column of grid indicates different phenotypes highly related to gene A. We find the bar plot of sample 1 makes sense, but sample 2 obviously shows a different pattern.

Based on these, I want to ask several questions:

How does mashr model deal with this condition? I mean for some specific variants, 1/3 genotypes are missing.
Do you have any suggestions for this missing genotype problem in the gene expression prediction?

Thank you in advance! I will appreciate it if you can give us any suggestions or comments.

Xiaoyang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does PrediXcan with mashr model deal with missing genotype? #164

How does PrediXcan with mashr model deal with missing genotype? #164

xiaoyangli2934 commented Sep 26, 2022

How does PrediXcan with mashr model deal with missing genotype? #164

How does PrediXcan with mashr model deal with missing genotype? #164

Comments

xiaoyangli2934 commented Sep 26, 2022