New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Largest model coefficients for barely expressed genes? #86
Comments
Gabriela, I don't see your plots attached to the post. Can you paste the again, and also include a code snippet showing how you pre-filter (if at all) and fit the model? |
Sorry about the plots. They should be attached this time. As for the code, I extract the data from a Seurat object and then submit that to MAST. Creation of the Seurat object (excludes genes expressed in less than 10 cells): Running of MAST (condition corresponds to treatment, orig.ident corresponds to sample of origin): seur@meta.data$ngeneson = scale(seur@meta.data$nGene)[,1] seur@meta.data$condition = as.factor(substr(as.character(seur@meta.data$orig.ident), 1, nchar(as.character(seur@meta.data$orig.ident))-1)) mast.simple.raw = FromMatrix(as.matrix(seur@data), droplevels(seur@meta.data)) mast.simple.zlmFit <- zlm( I have run MAST on subsets of this data without re-performing the filtering, so in certain instances you would have genes expressed in less than 10 cells. But we've seen this in a few different cases so far, which makes me think the exact threshold for including a gene (at least if that low) does not have much of an impact. Thanks! |
I guess I'd be curious about 1. the standard errors around those estimates, 2. What is the expression across conditions, and 3. Whether the effect is still observed if you fit a model with empirical Bayes shrinkage. |
Hi there!
I had a question regarding the interpretation of the hurdle model coefficients, in particular with respect to genes expressed at low levels. Specifically: after running MAST to obtain differentially expressed genes on single-cell RNA-Seq data, we have noticed that the genes with the largest continuous model coefficients correspond to genes that barely/don't change in expression between conditions, and are also barely expressed. We're unsure how to interpret this: intuitively we would interpret that large coefficients would mean these genes play a role in explaining the differences between groups, but this doesn't make sense if they don't change/almost aren't expressed. Is this an artifact of how the model fitting occurs in MAST, and should one pre-filter the data to only include genes expressed in a larger fraction of cells?
I've attached 2 plots to illustrate what I'm talking about. The first plots the coefficient for the continuous component vs the logFC, and the 2nd plots it vs the number of cells each gene is expressed in.
Any guidance would be appreciated, thanks!
Gabriela
The text was updated successfully, but these errors were encountered: