Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The vst() function does not work when the algorithm switches to a mean-based dispersion trend. #263

Open
dguin opened this issue Mar 28, 2024 · 2 comments · May be fixed by #268
Open
Labels
bug Something isn't working

Comments

@dguin
Copy link

dguin commented Mar 28, 2024

Describe the bug
Calling the vst() leads to the fit_type variable being hard-coded to the chosen fit_type. Consequently when the vst_transform is called it breaks with:
File ".venv/lib/python3.10/site-packages/pydeseq2/dds.py", line 411, in vst_transform a0, a1 = self.uns["trend_coeffs"] KeyError: 'trend_coeffs'

Expected behavior
When switching to mean based, update fit_type to 'mean' in line 918.
self.fit_type = 'mean'

@dguin dguin added the bug Something isn't working label Mar 28, 2024
@BorisMuzellec
Copy link
Collaborator

Hi @dguin, good catch, indeed with the recent versions there now are two attributes redundantly encoding the trend curve fit type, self.fit_type (set in vst) and self.trend_fit_type. We should harmonize them.

Before I make any changes, one thing I'm wondering about is whether using one fit type for VST and another in the DEA (e.g. "mean" and then "parametric", or vice-versa) is a common thing, or if we can consider that the fit types should be the same in both?

@dguin
Copy link
Author

dguin commented Apr 3, 2024

Hi again @BorisMuzellec, I looked at the R documentation and code and it appears that the R version of DESeq2 re-fits dispersion if fit-type is set to "parametric" and dispersions have not been calculated (https://rdrr.io/bioc/DESeq2/man/varianceStabilizingTransformation.html):

in case dispersions have not yet been estimated for object, this parameter is passed on to estimateDispersions (options described there).
So it does appear that the R code at least allows for using one fit type in the VST and another in DEA. Additionally the readme mentions:

Note that neither rlog transformation nor the VST are used by the differential expression estimation in DESeq, which always occurs on the raw count data, through generalized linear modeling which incorporates knowledge of the variance-mean dependence. The rlog transformation and VST are offered as separate functionality which can be used for visualization, clustering or other machine learning tasks. See the transformation section of the vignette for more details.
So it appears that they are meant to be slightly decoupled.

Happy to make the changes and open a PR if you'd like!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants