fix : enable vst be used "blindly" and to fit its own dispersion #268

laudmt · 2024-04-09T12:08:05Z

Reference Issue or PRs

What does your PR implement? Be specific.

Proposed solution :

A self.fit_type is defined at class initialisation and will by default set the fit_type for DEA and VST.
If needed, a fit type can be passed to deseq2() and vst() in order to launch the two analysis with a separate fit_type. It will set self.fit_type to the user one.
self.vst_fit() and self.vst_transform() will call internally the self.fit_type.
vst_transform() should always be called after vst_fit() or deseq2(). Raise an exception if needed trend_coef have not been computed by deseq2() or vst_fit()
Add unit tests

BorisMuzellec · 2024-04-10T08:27:38Z

Thanks for the PR @laudmt!

I think it does the trick for VST, but I have a little concern regarding the fact that providing a fit_type other than None overwrites a DeseqDataSet's trend_fit_type.

Here's an example of problematic / ambiguous behavior that could occur:

dds = DeseqDataSet(counts=counts, metadata=metadata, trend_fit_type="parametric")
# Start by computing VST
dds.vst(fit_type="mean")
# Then perform DEA
dds.deseq2() # what trend_fit_type is used to compute the curve ?

In the above example, dds.deseq2() would use trend_fit_type = "mean" (because it was overwritten by vst), but the user probably intended to fit VST with a curve and then run DEA with a parametric curve.

I think we should find a way to make the choices of curve fit type for vst() and deseq2() completely independent.

laudmt · 2024-04-10T09:13:13Z

Thanks for the PR @laudmt!

I think it does the trick for VST, but I have a little concern regarding the fact that providing a fit_type other than None overwrites a DeseqDataSet's trend_fit_type.

Here's an example of problematic / ambiguous behavior that could occur:
dds = DeseqDataSet(counts=counts, metadata=metadata, trend_fit_type="parametric")
# Start by computing VST
dds.vst(fit_type="mean")
# Then perform DEA
dds.deseq2() # what trend_fit_type is used to compute the curve ?
In the above example, dds.deseq2() would use trend_fit_type = "mean" (because it was overwritten by vst), but the user probably intended to fit VST with a curve and then run DEA with a parametric curve.

I think we should find a way to make the choices of curve fit type for vst() and deseq2() completely independent.

Indeed thank you, I was missing this use case.
I will then store two different fit type one for dea and one for vst and enable self.fit_X methods to take a fit_type in argument.

BorisMuzellec · 2024-04-25T08:43:18Z

pydeseq2/dds.py

+            self.current_fit_type = fit_type
+            print(f"Using {self.current_fit_type} fit type.")


self.current_fit_type doesn't seem to be used elsewhere, so the deseq2() fit_type argument doesn't have any effect

Good catch, forgot to set it to fit_type.

BorisMuzellec · 2024-04-25T08:45:14Z

pydeseq2/dds.py

+            self.fit_type = fit_type
+            print(f"fit type used : {self.fit_type}")


If we set a new fit_type in VST, then I think we should either set it back to the one provided at initialization, or make it clear that it will also change the fit type used in deseq2()

If I understand correctly, the user should be able, after he launches vst with its own fit_type, to launch deseq2 and expect the fit_type he provided at initialisation?

Is there a risk if deseq2 and vst have their own separate fit_type (lets say self.vst_fit_type and self.dea_fit_type) and since they both share all the other attributes filled during the fit, that vst will compute stuff that deseq2 will have access to it but is not supposed to?

In that case, should they not share any of the attributes (varm, obs, design_factors etc) ?

Sorry for the late answer @laudmt!

If I understand correctly, the user should be able, after he launches vst with its own fit_type, to launch deseq2 and expect the fit_type he provided at initialisation?

I think that this is indeed what should happen by default, i.e. if no other fit_type is provided when calling deseq2().

Is there a risk if deseq2 and vst have their own separate fit_type (lets say self.vst_fit_type and self.dea_fit_type) and since they both share all the other attributes filled during the fit, that vst will compute stuff that deseq2 will have access to it but is not supposed to?

In that case, should they not share any of the attributes (varm, obs, design_factors etc) ?

Currently, this is what happens. vst() and deseq2() populate the same fields (varm["genewise_dispersions"] and uns["trend_coeffs"] in particular), but they recompute and overwrite them every time (except in the case when we call vst(use_design=True) after calling deseq2() with the same fit type, in which case we can re-use the computation).

In principle, the only issue I see with this is if a user calls e.g. vst(use_design=False) after deseq2(). In that case, varm["genewise_dispersions"] will be overwritten, and if the user wants to perform some analysis of the gene-wise dispersions (e.g. if they call plot_dispersions()), then they will access the wrong values (not the ones used in the DEA).

Perhaps a cleaner thing to do would be to prefix any parameter stored by vst() with "vst_"? (varm["vst_genewise_dispersions"], uns["vst_trend_coeffs"], etc.). This would create some memory overhead but would prevent any confusion. What do you think?

pydeseq2/dds.py

BorisMuzellec · 2024-05-23T14:23:54Z

Hi @laudmt, I've pushed a commit in which I implemented the solution I suggested earlier, i.e. setting two separate fit_type and vst_fit_type attributes and duplicating fields (with or without a "vst_" prefix) to avoid data leakage.

Could you have a look when you have time and tell me if that would work for you :) ?

…within the VST pipeline with a "vst_" prefix

for more information, see https://pre-commit.ci

laudmt self-assigned this Apr 9, 2024

laudmt requested review from BorisMuzellec, maikia, arthurPignetOwkin and mandreux-owkin as code owners April 9, 2024 12:08

BorisMuzellec reviewed Apr 25, 2024

View reviewed changes

pydeseq2/dds.py Outdated Show resolved Hide resolved

BorisMuzellec reviewed Apr 25, 2024

View reviewed changes

pydeseq2/dds.py Outdated Show resolved Hide resolved

laudmt and others added 8 commits May 27, 2024 16:24

enable vst to fit its own dispersion

e5cde86

docs: make list display in sphinx

34caa0c

update fit_type handling

0692a8e

fix doc

faa051e

chore: fix docstring format

62de26d

fix typo and doc

a4dd488

fix doc

4e451c1

refactor: create a vst_fit_type attribute and store parameters fit …

30cf585

…within the VST pipeline with a "vst_" prefix

BorisMuzellec force-pushed the fix/fit_type branch from 9873f8a to 30cf585 Compare May 27, 2024 14:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

7bb719d

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix : enable vst be used "blindly" and to fit its own dispersion #268

fix : enable vst be used "blindly" and to fit its own dispersion #268

laudmt commented Apr 9, 2024 •

edited

BorisMuzellec commented Apr 10, 2024

laudmt commented Apr 10, 2024

BorisMuzellec Apr 25, 2024

laudmt Apr 26, 2024

BorisMuzellec Apr 25, 2024

laudmt Apr 26, 2024

BorisMuzellec May 2, 2024

BorisMuzellec commented May 23, 2024

		self.current_fit_type = fit_type
		print(f"Using {self.current_fit_type} fit type.")

		self.fit_type = fit_type
		print(f"fit type used : {self.fit_type}")

fix : enable vst be used "blindly" and to fit its own dispersion #268

Are you sure you want to change the base?

fix : enable vst be used "blindly" and to fit its own dispersion #268

Conversation

laudmt commented Apr 9, 2024 • edited

Reference Issue or PRs

What does your PR implement? Be specific.

BorisMuzellec commented Apr 10, 2024

laudmt commented Apr 10, 2024

BorisMuzellec Apr 25, 2024

Choose a reason for hiding this comment

laudmt Apr 26, 2024

Choose a reason for hiding this comment

BorisMuzellec Apr 25, 2024

Choose a reason for hiding this comment

laudmt Apr 26, 2024

Choose a reason for hiding this comment

BorisMuzellec May 2, 2024

Choose a reason for hiding this comment

BorisMuzellec commented May 23, 2024

laudmt commented Apr 9, 2024 •

edited