Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construction of model and regularization when batch_var is used #78

Open
HillJamie opened this issue Oct 26, 2020 · 0 comments
Open

Construction of model and regularization when batch_var is used #78

HillJamie opened this issue Oct 26, 2020 · 0 comments

Comments

@HillJamie
Copy link

HillJamie commented Oct 26, 2020

Hi Christoph,

I was hoping you could provide more details on how the batch_var option works.

In the paper you state "we first learn regularized models using only the sequencing depth covariate, as described above. We next perform a second round of NB regression, including both the depth covariate and additional nuisance parameters as model predictors. In this round, the depth-dependent parameters are fixed to their previously regularized values, while the additional parameters are unconstrained and fit during the regression"

I understand this to mean that, for a batch effect with two levels "batch_1" and "batch_2", in the first round:

log(expression) ~ (Intercept) + log_umi, at a dispersion theta. All 3 parameters are regularized

And in the second round:

log(expression) ~ batch_1 + batch_2 + (regularized log_umi from first round * batch_1) + (regularized log_umi from first round * batch_2), at a new dispersion. No parameters are regularized.

Is this correct, such that effectively an expression of the same form as the first round is fit to each batch, and the two batches share information only through the new dispersion? I suspect I must be missing something, because when running an example through Seurat's SCTransform wrapper, all the parameter estimates in "model_pars" differ from those in "model_pars_fit", which would seem to suggest that all terms are regularized.

Finally, I wondered about the advantages of this approach over something like the following, which would involve estimating one fewer coefficient:

log(expression) ~ (New intercept including contribution from batch_1) + log_umi (re-estimated, potentially without regularization) + batch_2, at a new dispersion

Thank you for your help, and for making a great tool,
Jamie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant