You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I use PrediXcan to impute the gene expression data by applying MASHR model.
The vcf file was generated by merging 2 samples' vcf with different variants. vcf of one sample 1 includes variants that have an effect on gene A, and another one (sample 2) did not include those variants, which makes a large number of missing genotypes in the merged vcf. Since gene A is very important in our following analysis, we decided to keep those genotypes with missing values to predict the gene expression data, rather than filter them via max missing genotype rate. After calculation, we did obtain the imputed gene A expression data. But it seems weird. I attach a barplot to visualize the imputed gene expression for reference.
In this figure, the x-axis represents the gene expression level of gene A, the y-axis represents the number of individuals. Each row of grid indicates different samples, and each column of grid indicates different phenotypes highly related to gene A. We find the bar plot of sample 1 makes sense, but sample 2 obviously shows a different pattern.
Based on these, I want to ask several questions:
How does mashr model deal with this condition? I mean for some specific variants, 1/3 genotypes are missing.
Do you have any suggestions for this missing genotype problem in the gene expression prediction?
Thank you in advance! I will appreciate it if you can give us any suggestions or comments.
Xiaoyang
The text was updated successfully, but these errors were encountered:
Hi,
Recently I use PrediXcan to impute the gene expression data by applying MASHR model.
The vcf file was generated by merging 2 samples' vcf with different variants. vcf of one sample 1 includes variants that have an effect on gene A, and another one (sample 2) did not include those variants, which makes a large number of missing genotypes in the merged vcf. Since gene A is very important in our following analysis, we decided to keep those genotypes with missing values to predict the gene expression data, rather than filter them via max missing genotype rate. After calculation, we did obtain the imputed gene A expression data. But it seems weird. I attach a barplot to visualize the imputed gene expression for reference.
In this figure, the x-axis represents the gene expression level of gene A, the y-axis represents the number of individuals. Each row of grid indicates different samples, and each column of grid indicates different phenotypes highly related to gene A. We find the bar plot of sample 1 makes sense, but sample 2 obviously shows a different pattern.
Based on these, I want to ask several questions:
Thank you in advance! I will appreciate it if you can give us any suggestions or comments.
Xiaoyang
The text was updated successfully, but these errors were encountered: