Make size factors trainable #77

Hoeze · 2019-07-11T13:22:13Z

I'd need to calculate some size factors but I am unable to find the tracking issue any more:
Are size factors trainable yet?

Having this should give better results than DESeq2's geometric mean...

davidsebfischer · 2019-07-11T13:50:47Z

Size factors should not be trainable, not sure if I understand you correctly. You can supply the size factor as a covariate though if you want to regress it out. You can also feed it into spline transforms if you want a more flexible regression.

Hoeze · 2019-07-11T14:04:49Z

Citing from DESeq2:

The "iterate" estimator iterates between estimating the dispersion with a design of ~1,
and finding a size factor vector by numerically optimizing the likelihood of the ~1 model.

I'd like to optimize:

sf_i = 1 + sfvar_i - mean_x(sfvar_x) // shift mean to 1
mu_ij = sf_i * exp(...)

davidsebfischer · 2019-07-12T07:28:47Z

This is done as pre-GLM size factor determination step as far as I understand it: https://rdrr.io/bioc/DESeq2/man/estimateSizeFactors.html. Do you also want this as a distinct step? My first impression of "trainable" was that you wanted to do this during GLM fitting?

Hoeze · 2019-07-12T09:17:36Z

This could be done as a separate step, but I think it could also be included as an option in the GLM optimizer. E.g. by fist optimizing the GLM and then iterating between GLM-step and sf-step optimization.
This way, the size factors should rather incorporate genes with a low variance instead of high-variance genes while also considering batch correction and confounders.

What do you think about this? Will the size-factors diverge?

davidsebfischer · 2019-07-12T10:21:51Z

I would tend to not supporting this functionality. We would have to separately benchmark that this is stable and it would increase run time by one order of magnitude, that is too much at the current run times. We could think about having it as a prior step but I am not sure how good this sf is for single-cell RNAseq, so that requires benchmarking, too.

davidsebfischer · 2019-07-12T15:09:48Z

I think estimating size factors as a parameteric function of fixed effects during regression would be easier alternative to "refitting" size factors, this allows for some trend correction and does not increase complexity of the algorithm drastically. This could be done across genes, ie also as a cell specific parameter that is constant across genes, as we vectorise anyway. This requires some work though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make size factors trainable #77

Make size factors trainable #77

Hoeze commented Jul 11, 2019 •

edited

davidsebfischer commented Jul 11, 2019

Hoeze commented Jul 11, 2019

davidsebfischer commented Jul 12, 2019

Hoeze commented Jul 12, 2019 •

edited

davidsebfischer commented Jul 12, 2019

davidsebfischer commented Jul 12, 2019

Make size factors trainable #77

Make size factors trainable #77

Comments

Hoeze commented Jul 11, 2019 • edited

davidsebfischer commented Jul 11, 2019

Hoeze commented Jul 11, 2019

davidsebfischer commented Jul 12, 2019

Hoeze commented Jul 12, 2019 • edited

davidsebfischer commented Jul 12, 2019

davidsebfischer commented Jul 12, 2019

Hoeze commented Jul 11, 2019 •

edited

Hoeze commented Jul 12, 2019 •

edited