Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results with the same dependencies #216

Open
bernheder opened this issue Dec 1, 2023 · 4 comments
Open

Different results with the same dependencies #216

bernheder opened this issue Dec 1, 2023 · 4 comments

Comments

@bernheder
Copy link

I am having a problem, where I can't reproduce the results of a single factor analysis, due to different results obtained by the DeseqStats function. The obtained LFC are slightly different in two separate conda environments.

Both environments run on the same machine and have following dependencies:

Python: 3.10
pydeseq2: 0.4.1

and the dependencies used by pydeseq2 are in both environments:
anndata: 0.8.0
statsmodels: 0.13.5
numpy: 1.24.4
pandas: 2.0.0
scikit-learn: 1.2.2
scipy: 1.10.1

Do you have an idea, where this difference could come from?

thanks

@BorisMuzellec
Copy link
Collaborator

Hi @bernheder, not sure where this could come from... Probably some implicit dependency with different installed versions.

Since LFCs are already different, the issue comes from a method called in DeseqDataSet.

What is the order of magnitude of the differences between the LFCs between both environments?

Can you share the result of saving the list of packages installed in both conda environments (e.g. conda list > env1.txt) and comparing them with diff (diff env1.txt env2.txt)?

@bernheder
Copy link
Author

bernheder commented Dec 5, 2023

Hi @BorisMuzellec
you are right, i checked again, the differences start in the DeseqDataSet (or at least after running deseq2() on the dds).

the differences between genes is <0.01.

Here the diff (which is quite big and unwieldy), but one difference I notice now is that actually not both environments are anaconda envs (miniforge3 vs anaconda) which changes the build number of the packages and also may change system libraries

diff.txt

Thanks!

@BorisMuzellec
Copy link
Collaborator

Thanks for checking! It's hard to tell from the diff what could be causing those differences... I think you're probably right, it could come from different system / linear algebra libraries.

@bernheder
Copy link
Author

@BorisMuzellec I checked again and I found another possible cause.

when creating the DeseqDataSet:
dds = DeseqDataSet(counts=count_df, metadata=clinical_df, design_factors="condition")

where both count_df and clinical_df are Dataframes with datatype: int64.

I get back Results of different data types in both environments:

env1: dds.X.dtype -> float32;
env2: dds.X.dtype -> int64;

do you have an idea why the datatype is float32 in one case and int64 the other?

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants