Largest model coefficients for barely expressed genes? #86

gabprg · 2017-12-05T18:32:43Z

Hi there!

I had a question regarding the interpretation of the hurdle model coefficients, in particular with respect to genes expressed at low levels. Specifically: after running MAST to obtain differentially expressed genes on single-cell RNA-Seq data, we have noticed that the genes with the largest continuous model coefficients correspond to genes that barely/don't change in expression between conditions, and are also barely expressed. We're unsure how to interpret this: intuitively we would interpret that large coefficients would mean these genes play a role in explaining the differences between groups, but this doesn't make sense if they don't change/almost aren't expressed. Is this an artifact of how the model fitting occurs in MAST, and should one pre-filter the data to only include genes expressed in a larger fraction of cells?

I've attached 2 plots to illustrate what I'm talking about. The first plots the coefficient for the continuous component vs the logFC, and the 2nd plots it vs the number of cells each gene is expressed in.

Any guidance would be appreciated, thanks!

Gabriela

gfinak · 2017-12-05T19:20:16Z

Gabriela, I don't see your plots attached to the post. Can you paste the again, and also include a code snippet showing how you pre-filter (if at all) and fit the model?
Thanks
Greg

gabprg · 2017-12-05T19:37:45Z

Sorry about the plots. They should be attached this time. As for the code, I extract the data from a Seurat object and then submit that to MAST.

Creation of the Seurat object (excludes genes expressed in less than 10 cells):
"
seur = CreateSeuratObject(raw.data = merged.10x, project = "S35", min.cells = 10, min.genes = 100, normalization.method = "LogNormalize", do.scale = TRUE, do.center = TRUE, names.field = 2, names.delim = "_")
"

Running of MAST (condition corresponds to treatment, orig.ident corresponds to sample of origin):
"

seur@meta.data$ngeneson = scale(seur@meta.data$nGene)[,1]

seur@meta.data$condition = as.factor(substr(as.character(seur@meta.data$orig.ident), 1, nchar(as.character(seur@meta.data$orig.ident))-1))

mast.simple.raw = FromMatrix(as.matrix(seur@data), droplevels(seur@meta.data))

mast.simple.zlmFit <- zlm(
formula = formula(~condition + (1|orig.ident) + ngeneson),
sca = mast.simple.raw,
parallel = TRUE,
method = 'glmer',
ebayes = FALSE
)
print("Summarizing results...")
sum.lrt <- MAST::summary(mast.simple.zlmFit, doLRT = T)
"

I have run MAST on subsets of this data without re-performing the filtering, so in certain instances you would have genes expressed in less than 10 cells. But we've seen this in a few different cases so far, which makes me think the exact threshold for including a gene (at least if that low) does not have much of an impact.

Thanks!
Gabriela

gfinak · 2017-12-05T19:47:19Z

I guess I'd be curious about 1. the standard errors around those estimates, 2. What is the expression across conditions, and 3. Whether the effect is still observed if you fit a model with empirical Bayes shrinkage.
Can you show the expression across conditions for some of these?
@amcdavid might have some insights as well.

gabprg · 2017-12-05T20:25:27Z

Yep! Here's a plot with what the expression looks like for the top 9 genes with the largest absolute value for the continuous component coefficient. As for the standard error, do you mean just as in the width of the confidence interval which is also one of the MAST outputs? If so, I also plotted the coefficient vs the CI width for that coefficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Largest model coefficients for barely expressed genes? #86

Largest model coefficients for barely expressed genes? #86

gabprg commented Dec 5, 2017

gfinak commented Dec 5, 2017

gabprg commented Dec 5, 2017 •

edited

gfinak commented Dec 5, 2017

gabprg commented Dec 5, 2017

Largest model coefficients for barely expressed genes? #86

Largest model coefficients for barely expressed genes? #86

Comments

gabprg commented Dec 5, 2017

gfinak commented Dec 5, 2017

gabprg commented Dec 5, 2017 • edited

gfinak commented Dec 5, 2017

gabprg commented Dec 5, 2017

gabprg commented Dec 5, 2017 •

edited