Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make size factors trainable #77

Open
Hoeze opened this issue Jul 11, 2019 · 6 comments
Open

Make size factors trainable #77

Hoeze opened this issue Jul 11, 2019 · 6 comments

Comments

@Hoeze
Copy link
Contributor

Hoeze commented Jul 11, 2019

I'd need to calculate some size factors but I am unable to find the tracking issue any more:
Are size factors trainable yet?

Having this should give better results than DESeq2's geometric mean...

@davidsebfischer
Copy link
Contributor

Size factors should not be trainable, not sure if I understand you correctly. You can supply the size factor as a covariate though if you want to regress it out. You can also feed it into spline transforms if you want a more flexible regression.

@Hoeze
Copy link
Contributor Author

Hoeze commented Jul 11, 2019

Citing from DESeq2:

The "iterate" estimator iterates between estimating the dispersion with a design of ~1,
and finding a size factor vector by numerically optimizing the likelihood of the ~1 model.

I'd like to optimize:

sf_i = 1 + sfvar_i - mean_x(sfvar_x) // shift mean to 1
mu_ij = sf_i * exp(...)

@davidsebfischer
Copy link
Contributor

This is done as pre-GLM size factor determination step as far as I understand it: https://rdrr.io/bioc/DESeq2/man/estimateSizeFactors.html. Do you also want this as a distinct step? My first impression of "trainable" was that you wanted to do this during GLM fitting?

@Hoeze
Copy link
Contributor Author

Hoeze commented Jul 12, 2019

This could be done as a separate step, but I think it could also be included as an option in the GLM optimizer. E.g. by fist optimizing the GLM and then iterating between GLM-step and sf-step optimization.
This way, the size factors should rather incorporate genes with a low variance instead of high-variance genes while also considering batch correction and confounders.

What do you think about this? Will the size-factors diverge?

@davidsebfischer
Copy link
Contributor

I would tend to not supporting this functionality. We would have to separately benchmark that this is stable and it would increase run time by one order of magnitude, that is too much at the current run times. We could think about having it as a prior step but I am not sure how good this sf is for single-cell RNAseq, so that requires benchmarking, too.

@davidsebfischer
Copy link
Contributor

I think estimating size factors as a parameteric function of fixed effects during regression would be easier alternative to "refitting" size factors, this allows for some trend correction and does not increase complexity of the algorithm drastically. This could be done across genes, ie also as a cell specific parameter that is constant across genes, as we vectorise anyway. This requires some work though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants