Number of significant genes different with several runs of Pyseer #264

Samriddhi0906 · 2024-02-29T19:57:01Z

After running Pyseer using

pyseer --phenotypes phenotypes.tsv --pres gene_presence_absence.Rtab --similarity phylogeny_similarity.tsv --lmm --covariates covariates.tsv --use-covariates 2 --cpu 8 > $1

and then filtering for significant genes using lrt-pvalue < 0.05 the number of significant genes varies between pyseer runs even though none of the input files have any changes.

In total 7 runs with covariates were run. Within these the lowest number of significant genes is 1245, the highest is 1395. Also, each run has a different number of significant genes.

The expectation would be that each run has the same number of significant genes. When filtering for filter-pvalue <0.05 the number of significant genes is constant.

Additionally, the number of significant genes after using covariates is about twice the number of significant genes without covariates (based on lrt-pvalue, however, they are the same when filtering using filter-pvalue).

Could you help me understand whether this behaviour is expected when running pyseer? Thanks in advance.

mgalardini · 2024-02-29T21:10:01Z

That comes a bit of a surprise, and this is not what we see in our unit tests, which return the same results every time. One thing I can think of is some stochasticity introduced when using multiple cores. Do you see the same variability when using a single core?

As an aside, a p-value threshold of 0.05 is likely too high, please refer to the docs for suggestions about setting such threshold.

Samriddhi0906 · 2024-03-04T11:31:29Z

Thanks for your response. I did run it three times with 1 CPU and I still get variable results.
wicovariates_cpu1_1.tsv: 6268
wicovariates_cpu1_2.tsv: 6345
wicovariates_cpu1_3.tsv: 6357

As for the p-value threshold, this is just for filtering and comparison to see whether I am getting variable results between runs. For my analysis, I correct it for multiple testing before taking any further steps.

mgalardini self-assigned this Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of significant genes different with several runs of Pyseer #264

Number of significant genes different with several runs of Pyseer #264

Samriddhi0906 commented Feb 29, 2024 •

edited

mgalardini commented Feb 29, 2024

Samriddhi0906 commented Mar 4, 2024

Number of significant genes different with several runs of Pyseer #264

Number of significant genes different with several runs of Pyseer #264

Comments

Samriddhi0906 commented Feb 29, 2024 • edited

mgalardini commented Feb 29, 2024

Samriddhi0906 commented Mar 4, 2024

Samriddhi0906 commented Feb 29, 2024 •

edited